iscide 2013 beijing. syntactic sensitive complexity for symbol-free sequence bo-shiang huang,...

Post on 18-Dec-2015

223 Views

Category:

Documents

3 Downloads

Preview:

Click to see full reader

TRANSCRIPT

IScIDE 2013Beijing

Syntactic sensitive complexity for symbol-free sequenceBo-Shiang Huang, Daw-Ran

Liou, Alex A. Simak

Cheng-Yuan Liou

National Taiwan UniversityDept. of Computer Science and

Information Engineering

 

 

Symbols

 

 

Piano Sonata No. 16 in C major, K. 545, by Mozart mov 2

Influenza A virus H7N9

5

MEQEQDTPWTQSTEHINTQKKESGQRTQRLEHPNSIQLMDHYLRTTSRVGMHKRIVYWKQWLSLKNLTQGSLKTRVSKRWKLFSKQEWIN

(A/Shanghai/02/2013(H7N9))Segment: PB1-F2 proteinProtein ID: AGL44435 Length: 90 AA

Languages

6

滾滾長江東逝水 浪花淘盡英雄是非成敗轉頭空 青山依舊在 , 幾度夕陽紅 白髮漁樵江渚上 慣看秋月春風 一壺濁酒喜相逢 古今多少事 都付笑談中

Transmission bits

7

….. 01110010010101…

Time series

A: maximal ˄

V: minimal ˅

U: up ↑

D: down ↓

Oil price (Dubai, 52 week records of 2012)

0

20

40

60

80

100

120

140

A V U D

Symbols

9

BitsCharactersWordsFeaturesMeaningsConcepts……..

•Introduction and review

Complexity of L-system

(2011)

•Complexity of symbol sequence 10

Lindenmayer system (1968)

•Powerful system used to model the growth processes of plants.

11

Lindenmayer system (1968)

•G=(V, ω, P) •V: alphabets•ω: the initial state of system•P: parallel rewriting rules;

mapping P: V →V* .

12

*variables: A , B

*start: A

*rules: (A → AB), (B → A)

n = 0 : A

n = 1 : AB

n = 2 : ABA

n = 3 : ABAAB

A / \ A B / | \ A B A / | | | \ A B A A B

Koch snowflake graph•Variables: F, +, - •Start: F--F--F•Rules: F→F+F--F+F•

14

n=0 n=1 n=2

Lindenmayer system

•Context-free grammar can be used to build a tree.

15

Context-free grammar

tree

F→F+F--F+F (bracket strings)

Lindenmayer system

•Can we deconstruct a tree to context-free grammars?

16

tree

Context-free grammar?

Deconstruction of tree

17

Rewriting rules

P → [-FTL][+FTR]TR → [-FTRL][+FTRR]TL → nullTRL → nullTRR → null

P

TL TR

TRL TRR

Bracketed strings of tree

19

[ FP ]

[-FTL] [+FTR]

[-FTRL] [+FTRR] [FP[-FTL][+FTR [-FTRL]

[+FTRR]]]

Context-free grammar

20

[FP[-FTL][+FTR [-FTRL][+FTRR]]]

P → [-FTL][+FTR]TR → [-FTRL][+FTRR]TL → nullTRL → nullTRR → null

[ FP ]

[-FTL] [+FTR]

[-FTRL] [+FTRR]

•Every non-terminal node can be rewritten as: P→LR

Abbreviation

21

[FP[-FTL[-FTLL][-FTLR]][+FTR [-FTRL][+FTRR[-FTRRL][+FTRRR[-FTRRRL]]]]]

P → [-FTL][+FTR]TL → [-FTLL][+FTLR]TR → [-FTRL][+FTRR]TRR → [-FTRRL][+FTRRR]TRRR → [-FTRRRL]TLL → nullTLR → nullTRL → nullTRRL → nullTRRRL → null

→ [-F][+F]→ [-F][+F]→ [-F][+F]→ [-F][+F]→ [-F]→ null→ null→ null→ null→ null

Classification

•Reason•There are too many rules.•Some of them are similar to each other.

22

P → [-FTL][+FTR] → [-F][+F]TL → [-FTLL][+FTLR] → [-F][+F]TR → [-FTRL][+FTRR] → [-F][+F]TRR → [-FTRRL][+FTRRR] → [-F][+F]TRRR → [-FTRRRL] → [-F]TLL → null → nullTLR → null → nullTRL → null → nullTRRL → null → nullTRRRL → null → null

Classification method 1

•Homomorphism

23

P → [-FTL][+FTR] → [-F][+F]TL → [-FTLL][+FTLR] → [-F][+F]TR → [-FTRL][+FTRR] → [-F][+F]TRR → [-FTRRL][+FTRRR] → [-F][+F]TRRR → [-FTRRRL] → [-F]TLL → null → nullTLR → null → nullTRL → null → nullTRRL → null → nullTRRRL → null → null

Isomorphism 

Classification method 2

•Isomorphism• Level 0• Level 1• Level 2

25

Classification

•Combine homomorphism and isomorphism

26

P → [-FTL][+FTR] → [-F][+F]TL → [-FTLL][+FTLR] → [-F][+F]TR → [-FTRL][+FTRR] → [-F][+F]TRR → [-FTRRL][+FTRRR] → [-F][+F]TRRR → [-FTRRRL] → [-F]TLL → null → nullTLR → null → nullTRL → null → nullTRRL → null → nullTRRRL → null → null

(1)Class 3 → C3C3

4 (1)Class 3 →

C1C1

(1)Class 3 → C1C3

(1)Class 3 → C1C2

(1)Class 2 →C1

(5)Class 1 →null

Complexity formula (2011)

1

String to context-free grammar

28

[FP[-FTL][+FTR [-TRL][+FTRR]]]

V1 → V2V3V4

V2 → V2V3

V3 → V1

V4 → V3V2V3

Deconstruction procedure

29

Symbol sequenceTree Context-free grammar (bracketed

strings)

Classification (levels)

Complexity

Psychological complexity

30

Complexity of Music (2011)

31

One musical note can be divided into two or three sub units.

 

 

A half note can be rewritten into dierent notes.  

Musical tree of Beethoven's Piano Sonata No. 6, Mov. 3.

Music tree of Rachmaninos piano concerto No.3 mov.

 

 

Bracketed strings for two trees.

 

 

Bracketed String of Beethoven Piano Sonata no 6. mov. 3

 

 

Bracketed strings for each node of rhythmic tree in Beethoven Piano Sonata no 6. mov. 3. (2 bracketed strings omitted)

 

 

Bracketed string of Rachmaninos piano concerto No.3 mov.1

 

 

Mozarts 19 Piano Sonatas, using isomorphic level 1

 

 

Mozarts 19 Piano Sonatas, using isomorphic level 2

 

 

Mozarts 19 Piano Sonatas, using isomorphic level 3

 

 

Beethovens 32 Piano Sonatas, using isomorphic level 1

 

 

Beethovens 32 Piano Sonatas, using isomorphic level 2

 

 

Beethovens 32 Piano Sonatas, using isomorphic level 3

Complexity of DNA sequence

(2013)

46

Computation procedure

47

DNA sequence

DNA tree

Context-free grammar

Classification

Complexity

Tree representation

48

AATTCCGGACTGCAGT ?

Tree representation

49

A C T G

Building tree

50

A A T T C CG G A C T G C A G T

A C T G

Classification table

Classification of Rules

Isomorphic Level #0

Isomorphic Level #1

Class #1 (19) C1 → C1C1 ( 8) C1 → C1C1

( 4) C1 → C1C2 ( 1) C1 → C1C3

( 4) C1 → C2C1 ( 1) C1 → C2C2

(20) C1 → C2C2 ( 1) C1 → C2C4

( 1) C1 → C3C1

( 1) C1 → C3C3

( 1) C1 → C4C2

( 5) C1 → C4C4

Class #2 (48) C2 → null ( 4) C2 → C4C5

Class #3 ( 4) C3 → C5C4

Class #4 (20) C4 → C5C5

Class #5 (48) C5 → null

51

Classification of Rules

Count

Isomorphic Depth #1

Class #1 19 ( 8) C1 → C1C1

( 1) C1 → C1C3

( 1) C1 → C2C2

( 1) C1 → C2C4

( 1) C1 → C3C1

( 1) C1 → C3C3

( 1) C1 → C4C2

( 5) C1 → C4C4

Class #2 4 ( 4) C2 → C4C5

Class #3 4 ( 4) C3 → C5C4

Class #4 20 (20) C4 → C5C5

Class #5 48 (48) C5 → null

Complexity V5(z) = 1 (definition)

V4(z) = (z x ((20 x V5(z) x V5(z)))) / 20 = z

V3(z) = (z x (( 4 x V5(z) x V4(z)))) / 4 = z2

V2(z) = (z x (( 4 x V4(z) x V5(z)))) / 4 = z2

V1(z) = (z x (( 8 x V1(z) x V1(z)) +

( 1 x V1(z) x V3(z)) +

( 1 x V2(z) x V2(z)) +

( 1 x V2(z) x V4(z)) +

( 1 x V3(z) x V1(z)) +

( 1 x V3(z) x V3(z)) +

( 1 x V4(z) x V2(z)) +

( 5 x V4(z) x V4(z)))) / 19

52

Ebola virus

53

40

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

Iso 2, frag 64 Iso 2, frag 32

Complexity of H7N9 PB1-F2

1 2 3 4 5 60.95

0.951

0.952

0.953

0.954

0.955

0.956

0.957

0.958

0.959

32AA

64AA

Complexity of text sequence

Using 1 to 27 (5 bits) to represent alphabets plus space character. (BIN)

Constructing binary tree.

Building tree for text sequence

56

00 00 10 10 01 01 11 11 00 01 10 11 01 00 11 10

00 01 10 11

Procedure

57

Text sequence

Tree structure

Rewriting rules

Classification

Complexity

Complexity of “Declaration of Independence”

Calculated every 256 bits. (July 4, 1776)

1 9 17 25 33 41 49 57 65 73 81 89 97 1051131211291371451530.99785

0.9979

0.99795

0.998

0.99805

0.9981

0.99815

BIN

Complexity of “Declaration of Independence”

Calculated every 512 bits. (July 4, 1776)

1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677

0.99785

0.9979

0.99795

0.998

0.99805

0.9981

0.99815

BIN

Complexity of “Declaration of Independence”

60Calculated every 1024 bits. (July 4,

1776)

1 2 3 4 5 6 7 8 9 10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

0.999485

0.99949

0.999495

0.9995

0.999505

0.99951

0.999515

0.99952

0.999525

BIN

紅 Dream of Red Chamber,1754?, 紅樓夢

61

Unicode + ASCII

32 bits for each Character and punctuation

Complexity (tree) for each 1024 bits

Complexity of “ 紅樓夢 第 1~10 回”

62 Dream of the Red Chamber 1754, Unicode

266

130

194

258

322

386

450

514

578

642

706

770

834

898

962

1026

1090

1154

1218

1282

1346

1410

1474

1538

1602

1666

1730

1794

1858

1922

1986

0.991

0.9915

0.992

0.9925

0.993

0.9935

0.994

0.9945

0.995

0.9955

0.996

紅樓夢

Complexity of “ 紅樓夢第 11~20 回”

Dream of the Red Chamber 1754

262

122

182

242

302

362

422

482

542

602

662

722

782

842

902

962

1022

1082

1142

1202

1262

1322

1382

1442

1502

1562

1622

1682

1742

1802

1862

0.991

0.9915

0.992

0.9925

0.993

0.9935

0.994

0.9945

0.995

0.9955

0.996

紅樓夢

第 11~20回

Complexity of “ 紅樓夢 第 21~30 回”

64

267

132

197

262

327

392

457

522

587

652

717

782

847

912

977

1042

1107

1172

1237

1302

1367

1432

1497

1562

1627

1692

1757

1822

1887

1952

2017

2082

2147

2212

2277

2342

0.991

0.9915

0.992

0.9925

0.993

0.9935

0.994

0.9945

0.995

0.9955

0.996

紅樓夢

Complexity of “ 紅樓夢 第 31~40 回”

65

262

122

182

242

302

362

422

482

542

602

662

722

782

842

902

962

1022

1082

1142

1202

1262

1322

1382

1442

1502

1562

1622

1682

1742

1802

1862

1922

1982

2042

2102

2162

0.991

0.9915

0.992

0.9925

0.993

0.9935

0.994

0.9945

0.995

0.9955

0.996

紅樓夢

Complexity of “ 紅樓夢 第 41~50 回”

66

272

142

212

282

352

422

492

562

632

702

772

842

912

982

1052

1122

1192

1262

1332

1402

1472

1542

1612

1682

1752

1822

1892

1962

2032

2102

2172

0.991

0.9915

0.992

0.9925

0.993

0.9935

0.994

0.9945

0.995

0.9955

0.996

紅樓夢

Complexity of “ 紅樓夢 第 51~60 回”

67

277

152

227

302

377

452

527

602

677

752

827

902

977

1052

1127

1202

1277

1352

1427

1502

1577

1652

1727

1802

1877

1952

2027

2102

2177

2252

2327

0.991

0.9915

0.992

0.9925

0.993

0.9935

0.994

0.9945

0.995

0.9955

0.996

紅樓夢

Low complexity sections in 紅樓夢

68

第一回便是『了』,『了』便是『好』;若不『了』便不『好』;若要『好』,第五回:「癡情司」,「結怨司」,「朝啼司」,「暮哭司」,「春感司」,「第十三回、賈敕、賈效、賈敦、賈赦、賈政、賈琮、賈㻞、賈珩、賈珖、賈琛、賈第四十回梅花式的,也有荷葉式的,也有葵花式的,也有方的,也有圓的,其式不

第五十四回 (lowest complexity)

、太婆婆、媳婦、孫子媳婦、重孫子媳婦、親孫子媳婦、姪孫子、重孫子

Quasi-regular structure

69

To our knowledge, there is no other method can pick such

quasi-regular sections in

arts, music, DNA, literatures, and transmission bits ...

Complexity of “ 三國演義 第 1~10 回”

Romance of the Three Kingdoms,

146

91

136

181

226

271

316

361

406

451

496

541

586

631

676

721

766

811

856

901

946

991

1036

1081

1126

1171

1216

1261

1306

1351

1396

1441

0.9925

0.993

0.9935

0.994

0.9945

0.995

0.9955

0.996

三國演義

Complexity of “ 三國演義 第 11~20 回”

Romance of the Three Kingdoms

258

114

170

226

282

338

394

450

506

562

618

674

730

786

842

898

954

1010

1066

1122

1178

1234

1290

1346

1402

1458

1514

1570

1626

1682

0.9925

0.992999999999999

0.9935

0.993999999999999

0.9945

0.994999999999999

0.9955

0.995999999999999

三國演義

Complexity of “ 三國演義 第 21~30 回”

Romance of the Three Kingdoms

262

122

182

242

302

362

422

482

542

602

662

722

782

842

902

962

1022

1082

1142

1202

1262

1322

1382

1442

1502

0.9925

0.992999999999999

0.9935

0.993999999999999

0.9945

0.994999999999999

0.9955

0.995999999999999

三國演義

Low complexity sections in Three Kindom

73

第二十回 劉昂。昂生漳侯劉祿。祿生沂水侯劉戀。戀生欽陽侯劉英。英生安國

侯劉 

第二十二回 常之人,然後有非常之事;有非常之事,然後立非常之功。夫非常者,

固 

第二十三回 也;不讀詩書,是口濁也;不納忠言,是耳濁也;不通古今,是身濁

也;

SummaryRepresentation is not unique.

Study of ancient languages.

Transmission anomaly

different from Kullback-Leibler

divergence

Measure of structural complexity.

74

Thanks for listening.

75

top related