iscide 2013 beijing. syntactic sensitive complexity for symbol-free sequence bo-shiang huang,...
TRANSCRIPT
IScIDE 2013Beijing
Syntactic sensitive complexity for symbol-free sequenceBo-Shiang Huang, Daw-Ran
Liou, Alex A. Simak
Cheng-Yuan Liou
National Taiwan UniversityDept. of Computer Science and
Information Engineering
Symbols
Piano Sonata No. 16 in C major, K. 545, by Mozart mov 2
Influenza A virus H7N9
5
MEQEQDTPWTQSTEHINTQKKESGQRTQRLEHPNSIQLMDHYLRTTSRVGMHKRIVYWKQWLSLKNLTQGSLKTRVSKRWKLFSKQEWIN
(A/Shanghai/02/2013(H7N9))Segment: PB1-F2 proteinProtein ID: AGL44435 Length: 90 AA
Languages
6
滾滾長江東逝水 浪花淘盡英雄是非成敗轉頭空 青山依舊在 , 幾度夕陽紅 白髮漁樵江渚上 慣看秋月春風 一壺濁酒喜相逢 古今多少事 都付笑談中
Transmission bits
7
….. 01110010010101…
Time series
A: maximal ˄
V: minimal ˅
U: up ↑
D: down ↓
Oil price (Dubai, 52 week records of 2012)
0
20
40
60
80
100
120
140
A V U D
Symbols
9
BitsCharactersWordsFeaturesMeaningsConcepts……..
•Introduction and review
Complexity of L-system
(2011)
•Complexity of symbol sequence 10
Lindenmayer system (1968)
•Powerful system used to model the growth processes of plants.
11
Lindenmayer system (1968)
•G=(V, ω, P) •V: alphabets•ω: the initial state of system•P: parallel rewriting rules;
mapping P: V →V* .
12
*variables: A , B
*start: A
*rules: (A → AB), (B → A)
n = 0 : A
n = 1 : AB
n = 2 : ABA
n = 3 : ABAAB
A / \ A B / | \ A B A / | | | \ A B A A B
Koch snowflake graph•Variables: F, +, - •Start: F--F--F•Rules: F→F+F--F+F•
14
n=0 n=1 n=2
Lindenmayer system
•Context-free grammar can be used to build a tree.
15
Context-free grammar
tree
F→F+F--F+F (bracket strings)
Lindenmayer system
•Can we deconstruct a tree to context-free grammars?
16
tree
Context-free grammar?
Deconstruction of tree
17
Rewriting rules
P → [-FTL][+FTR]TR → [-FTRL][+FTRR]TL → nullTRL → nullTRR → null
P
TL TR
TRL TRR
Bracketed strings of tree
19
[ FP ]
[-FTL] [+FTR]
[-FTRL] [+FTRR] [FP[-FTL][+FTR [-FTRL]
[+FTRR]]]
Context-free grammar
20
[FP[-FTL][+FTR [-FTRL][+FTRR]]]
P → [-FTL][+FTR]TR → [-FTRL][+FTRR]TL → nullTRL → nullTRR → null
[ FP ]
[-FTL] [+FTR]
[-FTRL] [+FTRR]
•Every non-terminal node can be rewritten as: P→LR
Abbreviation
21
[FP[-FTL[-FTLL][-FTLR]][+FTR [-FTRL][+FTRR[-FTRRL][+FTRRR[-FTRRRL]]]]]
P → [-FTL][+FTR]TL → [-FTLL][+FTLR]TR → [-FTRL][+FTRR]TRR → [-FTRRL][+FTRRR]TRRR → [-FTRRRL]TLL → nullTLR → nullTRL → nullTRRL → nullTRRRL → null
→ [-F][+F]→ [-F][+F]→ [-F][+F]→ [-F][+F]→ [-F]→ null→ null→ null→ null→ null
Classification
•Reason•There are too many rules.•Some of them are similar to each other.
22
P → [-FTL][+FTR] → [-F][+F]TL → [-FTLL][+FTLR] → [-F][+F]TR → [-FTRL][+FTRR] → [-F][+F]TRR → [-FTRRL][+FTRRR] → [-F][+F]TRRR → [-FTRRRL] → [-F]TLL → null → nullTLR → null → nullTRL → null → nullTRRL → null → nullTRRRL → null → null
Classification method 1
•Homomorphism
23
P → [-FTL][+FTR] → [-F][+F]TL → [-FTLL][+FTLR] → [-F][+F]TR → [-FTRL][+FTRR] → [-F][+F]TRR → [-FTRRL][+FTRRR] → [-F][+F]TRRR → [-FTRRRL] → [-F]TLL → null → nullTLR → null → nullTRL → null → nullTRRL → null → nullTRRRL → null → null
Isomorphism
Classification method 2
•Isomorphism• Level 0• Level 1• Level 2
25
Classification
•Combine homomorphism and isomorphism
26
P → [-FTL][+FTR] → [-F][+F]TL → [-FTLL][+FTLR] → [-F][+F]TR → [-FTRL][+FTRR] → [-F][+F]TRR → [-FTRRL][+FTRRR] → [-F][+F]TRRR → [-FTRRRL] → [-F]TLL → null → nullTLR → null → nullTRL → null → nullTRRL → null → nullTRRRL → null → null
(1)Class 3 → C3C3
4 (1)Class 3 →
C1C1
(1)Class 3 → C1C3
(1)Class 3 → C1C2
(1)Class 2 →C1
(5)Class 1 →null
Complexity formula (2011)
1
String to context-free grammar
28
[FP[-FTL][+FTR [-TRL][+FTRR]]]
V1 → V2V3V4
V2 → V2V3
V3 → V1
V4 → V3V2V3
Deconstruction procedure
29
Symbol sequenceTree Context-free grammar (bracketed
strings)
Classification (levels)
Complexity
Psychological complexity
30
Complexity of Music (2011)
31
One musical note can be divided into two or three sub units.
A half note can be rewritten into dierent notes.
Musical tree of Beethoven's Piano Sonata No. 6, Mov. 3.
Music tree of Rachmaninos piano concerto No.3 mov.
Bracketed strings for two trees.
Bracketed String of Beethoven Piano Sonata no 6. mov. 3
Bracketed strings for each node of rhythmic tree in Beethoven Piano Sonata no 6. mov. 3. (2 bracketed strings omitted)
Bracketed string of Rachmaninos piano concerto No.3 mov.1
Mozarts 19 Piano Sonatas, using isomorphic level 1
Mozarts 19 Piano Sonatas, using isomorphic level 2
Mozarts 19 Piano Sonatas, using isomorphic level 3
Beethovens 32 Piano Sonatas, using isomorphic level 1
Beethovens 32 Piano Sonatas, using isomorphic level 2
Beethovens 32 Piano Sonatas, using isomorphic level 3
Complexity of DNA sequence
(2013)
46
Computation procedure
47
DNA sequence
DNA tree
Context-free grammar
Classification
Complexity
Tree representation
48
AATTCCGGACTGCAGT ?
Tree representation
49
A C T G
Building tree
50
A A T T C CG G A C T G C A G T
A C T G
Classification table
Classification of Rules
Isomorphic Level #0
Isomorphic Level #1
Class #1 (19) C1 → C1C1 ( 8) C1 → C1C1
( 4) C1 → C1C2 ( 1) C1 → C1C3
( 4) C1 → C2C1 ( 1) C1 → C2C2
(20) C1 → C2C2 ( 1) C1 → C2C4
( 1) C1 → C3C1
( 1) C1 → C3C3
( 1) C1 → C4C2
( 5) C1 → C4C4
Class #2 (48) C2 → null ( 4) C2 → C4C5
Class #3 ( 4) C3 → C5C4
Class #4 (20) C4 → C5C5
Class #5 (48) C5 → null
51
Classification of Rules
Count
Isomorphic Depth #1
Class #1 19 ( 8) C1 → C1C1
( 1) C1 → C1C3
( 1) C1 → C2C2
( 1) C1 → C2C4
( 1) C1 → C3C1
( 1) C1 → C3C3
( 1) C1 → C4C2
( 5) C1 → C4C4
Class #2 4 ( 4) C2 → C4C5
Class #3 4 ( 4) C3 → C5C4
Class #4 20 (20) C4 → C5C5
Class #5 48 (48) C5 → null
Complexity V5(z) = 1 (definition)
V4(z) = (z x ((20 x V5(z) x V5(z)))) / 20 = z
V3(z) = (z x (( 4 x V5(z) x V4(z)))) / 4 = z2
V2(z) = (z x (( 4 x V4(z) x V5(z)))) / 4 = z2
V1(z) = (z x (( 8 x V1(z) x V1(z)) +
( 1 x V1(z) x V3(z)) +
( 1 x V2(z) x V2(z)) +
( 1 x V2(z) x V4(z)) +
( 1 x V3(z) x V1(z)) +
( 1 x V3(z) x V3(z)) +
( 1 x V4(z) x V2(z)) +
( 5 x V4(z) x V4(z)))) / 19
52
Ebola virus
53
40
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
Iso 2, frag 64 Iso 2, frag 32
Complexity of H7N9 PB1-F2
1 2 3 4 5 60.95
0.951
0.952
0.953
0.954
0.955
0.956
0.957
0.958
0.959
32AA
64AA
Complexity of text sequence
Using 1 to 27 (5 bits) to represent alphabets plus space character. (BIN)
Constructing binary tree.
Building tree for text sequence
56
00 00 10 10 01 01 11 11 00 01 10 11 01 00 11 10
00 01 10 11
Procedure
57
Text sequence
Tree structure
Rewriting rules
Classification
Complexity
Complexity of “Declaration of Independence”
Calculated every 256 bits. (July 4, 1776)
1 9 17 25 33 41 49 57 65 73 81 89 97 1051131211291371451530.99785
0.9979
0.99795
0.998
0.99805
0.9981
0.99815
BIN
Complexity of “Declaration of Independence”
Calculated every 512 bits. (July 4, 1776)
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677
0.99785
0.9979
0.99795
0.998
0.99805
0.9981
0.99815
BIN
Complexity of “Declaration of Independence”
60Calculated every 1024 bits. (July 4,
1776)
1 2 3 4 5 6 7 8 9 10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
0.999485
0.99949
0.999495
0.9995
0.999505
0.99951
0.999515
0.99952
0.999525
BIN
紅 Dream of Red Chamber,1754?, 紅樓夢
61
Unicode + ASCII
32 bits for each Character and punctuation
Complexity (tree) for each 1024 bits
Complexity of “ 紅樓夢 第 1~10 回”
62 Dream of the Red Chamber 1754, Unicode
266
130
194
258
322
386
450
514
578
642
706
770
834
898
962
1026
1090
1154
1218
1282
1346
1410
1474
1538
1602
1666
1730
1794
1858
1922
1986
0.991
0.9915
0.992
0.9925
0.993
0.9935
0.994
0.9945
0.995
0.9955
0.996
紅樓夢
Complexity of “ 紅樓夢第 11~20 回”
Dream of the Red Chamber 1754
262
122
182
242
302
362
422
482
542
602
662
722
782
842
902
962
1022
1082
1142
1202
1262
1322
1382
1442
1502
1562
1622
1682
1742
1802
1862
0.991
0.9915
0.992
0.9925
0.993
0.9935
0.994
0.9945
0.995
0.9955
0.996
紅樓夢
第 11~20回
Complexity of “ 紅樓夢 第 21~30 回”
64
267
132
197
262
327
392
457
522
587
652
717
782
847
912
977
1042
1107
1172
1237
1302
1367
1432
1497
1562
1627
1692
1757
1822
1887
1952
2017
2082
2147
2212
2277
2342
0.991
0.9915
0.992
0.9925
0.993
0.9935
0.994
0.9945
0.995
0.9955
0.996
紅樓夢
Complexity of “ 紅樓夢 第 31~40 回”
65
262
122
182
242
302
362
422
482
542
602
662
722
782
842
902
962
1022
1082
1142
1202
1262
1322
1382
1442
1502
1562
1622
1682
1742
1802
1862
1922
1982
2042
2102
2162
0.991
0.9915
0.992
0.9925
0.993
0.9935
0.994
0.9945
0.995
0.9955
0.996
紅樓夢
Complexity of “ 紅樓夢 第 41~50 回”
66
272
142
212
282
352
422
492
562
632
702
772
842
912
982
1052
1122
1192
1262
1332
1402
1472
1542
1612
1682
1752
1822
1892
1962
2032
2102
2172
0.991
0.9915
0.992
0.9925
0.993
0.9935
0.994
0.9945
0.995
0.9955
0.996
紅樓夢
Complexity of “ 紅樓夢 第 51~60 回”
67
277
152
227
302
377
452
527
602
677
752
827
902
977
1052
1127
1202
1277
1352
1427
1502
1577
1652
1727
1802
1877
1952
2027
2102
2177
2252
2327
0.991
0.9915
0.992
0.9925
0.993
0.9935
0.994
0.9945
0.995
0.9955
0.996
紅樓夢
Low complexity sections in 紅樓夢
68
第一回便是『了』,『了』便是『好』;若不『了』便不『好』;若要『好』,第五回:「癡情司」,「結怨司」,「朝啼司」,「暮哭司」,「春感司」,「第十三回、賈敕、賈效、賈敦、賈赦、賈政、賈琮、賈㻞、賈珩、賈珖、賈琛、賈第四十回梅花式的,也有荷葉式的,也有葵花式的,也有方的,也有圓的,其式不
第五十四回 (lowest complexity)
、太婆婆、媳婦、孫子媳婦、重孫子媳婦、親孫子媳婦、姪孫子、重孫子
Quasi-regular structure
69
To our knowledge, there is no other method can pick such
quasi-regular sections in
arts, music, DNA, literatures, and transmission bits ...
Complexity of “ 三國演義 第 1~10 回”
Romance of the Three Kingdoms,
146
91
136
181
226
271
316
361
406
451
496
541
586
631
676
721
766
811
856
901
946
991
1036
1081
1126
1171
1216
1261
1306
1351
1396
1441
0.9925
0.993
0.9935
0.994
0.9945
0.995
0.9955
0.996
三國演義
Complexity of “ 三國演義 第 11~20 回”
Romance of the Three Kingdoms
258
114
170
226
282
338
394
450
506
562
618
674
730
786
842
898
954
1010
1066
1122
1178
1234
1290
1346
1402
1458
1514
1570
1626
1682
0.9925
0.992999999999999
0.9935
0.993999999999999
0.9945
0.994999999999999
0.9955
0.995999999999999
三國演義
Complexity of “ 三國演義 第 21~30 回”
Romance of the Three Kingdoms
262
122
182
242
302
362
422
482
542
602
662
722
782
842
902
962
1022
1082
1142
1202
1262
1322
1382
1442
1502
0.9925
0.992999999999999
0.9935
0.993999999999999
0.9945
0.994999999999999
0.9955
0.995999999999999
三國演義
Low complexity sections in Three Kindom
73
第二十回 劉昂。昂生漳侯劉祿。祿生沂水侯劉戀。戀生欽陽侯劉英。英生安國
侯劉
第二十二回 常之人,然後有非常之事;有非常之事,然後立非常之功。夫非常者,
固
第二十三回 也;不讀詩書,是口濁也;不納忠言,是耳濁也;不通古今,是身濁
也;
SummaryRepresentation is not unique.
Study of ancient languages.
Transmission anomaly
different from Kullback-Leibler
divergence
Measure of structural complexity.
74
Thanks for listening.
75