Download - Finite State Automata and Tries
![Page 1: Finite State Automata and Tries](https://reader036.vdocuments.mx/reader036/viewer/2022062502/568163f3550346895dd57762/html5/thumbnails/1.jpg)
Finite State Automata and Tries
Sambhav JainIIIT Hyderabad
![Page 2: Finite State Automata and Tries](https://reader036.vdocuments.mx/reader036/viewer/2022062502/568163f3550346895dd57762/html5/thumbnails/2.jpg)
Finite State Automata and Tries 2
Think !!!
• How to store a dictionary in computer?
• How to search for an entry in that dictionary?
– Say you have each word length exactly equal to 10 characters and can take any letter from ‘a-z’
Eg. aaaaaaaaaa, abcdefghij, …. etc Language = [a-z]{10} - RegEx
![Page 3: Finite State Automata and Tries](https://reader036.vdocuments.mx/reader036/viewer/2022062502/568163f3550346895dd57762/html5/thumbnails/3.jpg)
Finite State Automata and Tries 3
A Simple Way
• aaaaaaaaaa• aaaaaaaaab• aaaaaaaaac• ….• ….• ….• ….• zzzzzzzzzz
A Linear Sorted List of Entries
![Page 4: Finite State Automata and Tries](https://reader036.vdocuments.mx/reader036/viewer/2022062502/568163f3550346895dd57762/html5/thumbnails/4.jpg)
Finite State Automata and Tries 4
A Simple Way
• aaaaaaaaaa• aaaaaaaaab• aaaaaaaaac• ….• ….• ….• ….• zzzzzzzzzz
Character to be stored = 2610
= 1.41167096 × 1014
Each character take 1 Byte
~ 141 TB
![Page 5: Finite State Automata and Tries](https://reader036.vdocuments.mx/reader036/viewer/2022062502/568163f3550346895dd57762/html5/thumbnails/5.jpg)
Finite State Automata and Tries 5
Smart Way !
a b c d w x y z
a b c d w x y z
a b c d w x y z
……………………………………………..
……………………………………………..
……………………………………………..
………………………………..……………………………………………………………………………………….
![Page 6: Finite State Automata and Tries](https://reader036.vdocuments.mx/reader036/viewer/2022062502/568163f3550346895dd57762/html5/thumbnails/6.jpg)
Finite State Automata and Tries 6
Smart Way !
a b c d w x y z
a b c d w x y z
a b c d w x y z
……………………………………………..
……………………………………………..
……………………………………………..
………………………………..……………………………………………………………………………………….
•Total Storage = 26x10 = 260 bytes•Traverse 10 nodes
![Page 7: Finite State Automata and Tries](https://reader036.vdocuments.mx/reader036/viewer/2022062502/568163f3550346895dd57762/html5/thumbnails/7.jpg)
Finite State Automata and Tries 7
Does it work for Natural Language
• Oxford Advanced English Learner 20th Edition– A quarter of a million distinct English words,
excluding inflections, and words from technical and regional vocabulary not covered by the OED
• After inflections ? – eat,eats,eaten,eating …..
• What after multiple inflexion ???– beauty, beautiful, beautifully …
![Page 8: Finite State Automata and Tries](https://reader036.vdocuments.mx/reader036/viewer/2022062502/568163f3550346895dd57762/html5/thumbnails/8.jpg)
Finite State Automata and Tries 8
Example (Store & Search)
e
e
a
s
t
n
i n g
![Page 9: Finite State Automata and Tries](https://reader036.vdocuments.mx/reader036/viewer/2022062502/568163f3550346895dd57762/html5/thumbnails/9.jpg)
Finite State Automata and Tries 9
Example
e
e
a
s
t
n
i n g
b
![Page 10: Finite State Automata and Tries](https://reader036.vdocuments.mx/reader036/viewer/2022062502/568163f3550346895dd57762/html5/thumbnails/10.jpg)
Finite State Automata and Tries 10
Example
e
e
a
s
t
n
i n g
b
f
s
a
![Page 11: Finite State Automata and Tries](https://reader036.vdocuments.mx/reader036/viewer/2022062502/568163f3550346895dd57762/html5/thumbnails/11.jpg)
Finite State Automata and Tries 11
Example
e
e
a
s
t
n
i n g
b
f
s
a
tei
r
w
![Page 12: Finite State Automata and Tries](https://reader036.vdocuments.mx/reader036/viewer/2022062502/568163f3550346895dd57762/html5/thumbnails/12.jpg)
Finite State Automata and Tries 12
Inflectional morphology
• Deals with word forms of a root, when there is no change in lexical category.
• Each word form gives different values of features like gender, number, person, etc.
![Page 13: Finite State Automata and Tries](https://reader036.vdocuments.mx/reader036/viewer/2022062502/568163f3550346895dd57762/html5/thumbnails/13.jpg)
Finite State Automata and Tries 13
Paradigm
• For a given root, there are many word forms with different features.
• Ex. Forms of Hindi root laDakA (boy)
Direct Oblique
Singular laDakA laDake
Plural laDake laDakoM
![Page 14: Finite State Automata and Tries](https://reader036.vdocuments.mx/reader036/viewer/2022062502/568163f3550346895dd57762/html5/thumbnails/14.jpg)
Finite State Automata and Tries 14
Paradigm
- 'laDakoM' is plural with oblique case - given by feature structure {num=pl,
case=obl} - 'laDake' stands for two feature structures + Singular oblique (Ex. laDake ne kahA ...) - where oblique means 'laDake' is followed
by a postposition marker + plural direct case (Ex. laDake Aye)
![Page 15: Finite State Automata and Tries](https://reader036.vdocuments.mx/reader036/viewer/2022062502/568163f3550346895dd57762/html5/thumbnails/15.jpg)
Finite State Automata and Tries 15
Paradigmo Paradigms - What operation is done on root to obtain word forms - Model using pairs: (delete string, add string) | direct oblique ---|----------------------- sg | (O,O) (A,e) pl | (A,e) (A,oM) o List roots with paradigms they follow: - ghoDA follows paradigm laDakA - charkhA follows paradigm laDakA - laDakA follows paradigm laDakA•
![Page 16: Finite State Automata and Tries](https://reader036.vdocuments.mx/reader036/viewer/2022062502/568163f3550346895dd57762/html5/thumbnails/16.jpg)
Finite State Automata and Tries 16
l k | | a a | | D p | | -------- a | | | a A D | | | k ------- | | | | ------------ | I i | | | ------- | A e o | | | A | | | | | | A e o M M | M
![Page 17: Finite State Automata and Tries](https://reader036.vdocuments.mx/reader036/viewer/2022062502/568163f3550346895dd57762/html5/thumbnails/17.jpg)
Finite State Automata and Tries 17
Abstracting out suffixes
k l | | a a | | p D | | a --------- | | | D #1 a A | | k (#1) I
#1: Corresponds to paradigm for 'laDakA'
![Page 18: Finite State Automata and Tries](https://reader036.vdocuments.mx/reader036/viewer/2022062502/568163f3550346895dd57762/html5/thumbnails/18.jpg)
Finite State Automata and Tries 18
- Suffix trie (forward)
#1 | -------------- | | | e o A | M
![Page 19: Finite State Automata and Tries](https://reader036.vdocuments.mx/reader036/viewer/2022062502/568163f3550346895dd57762/html5/thumbnails/19.jpg)
Finite State Automata and Tries 19
• Can we further optimize our search ?- Use knowledge of paradigms
- Use suffix tree
![Page 20: Finite State Automata and Tries](https://reader036.vdocuments.mx/reader036/viewer/2022062502/568163f3550346895dd57762/html5/thumbnails/20.jpg)
Finite State Automata and Tries 20
• Store suffix tree in main memory• Store rest of the categorized by paradigm in
hard disk• Do backward search for suffix tree• Identify the paradigm• Search only in that paradigm set• Eg. if ‘–ing’ occur you first won’t be searching
word like home, cat, god …
![Page 21: Finite State Automata and Tries](https://reader036.vdocuments.mx/reader036/viewer/2022062502/568163f3550346895dd57762/html5/thumbnails/21.jpg)
Finite State Automata and Tries 21
Finite State Automata
• Trie is a data structure
• FSA is the computational approach
• Slight difference in representation – Putting characters on edges rather than nodes
![Page 22: Finite State Automata and Tries](https://reader036.vdocuments.mx/reader036/viewer/2022062502/568163f3550346895dd57762/html5/thumbnails/22.jpg)
Finite State Automata and Tries 22
+ / \ l / \ k + + a | | a | | + + D | | p | | + + a | | a | | + + k | | D | | + + \ / 0 \ / 0 +______ e/ \o \ A / \ \ (+) + (+) | |M (+)
![Page 23: Finite State Automata and Tries](https://reader036.vdocuments.mx/reader036/viewer/2022062502/568163f3550346895dd57762/html5/thumbnails/23.jpg)
Finite State Automata and Tries 23
FSAo A deterministic finite-state machine formally is - Q: A finite set of states (Ex.:{q0,q1,q2}) - SIGMA: A finite set of input alphabet (Ex.: {a,b,c}) - Start state: A state in Q, from which machine starts (Ex.: q0) - F: A set of accepting states (Ex.: {q2}) - DELTA (q,i): A transition function or transition matrix where: - q MEMBER Q, i MEMBER SIGMA, - DELTA(q,i) MEMBER Q
Thus, DELTA(q,i): Q x SIGMA --> Q
![Page 24: Finite State Automata and Tries](https://reader036.vdocuments.mx/reader036/viewer/2022062502/568163f3550346895dd57762/html5/thumbnails/24.jpg)
Finite State Automata and Tries 24
RECOGNITION Problem
• Till now we were handling only RECOGNITION problem
• If FSA reach a final state at the end of input string then EXIST
• Else NOT
![Page 25: Finite State Automata and Tries](https://reader036.vdocuments.mx/reader036/viewer/2022062502/568163f3550346895dd57762/html5/thumbnails/25.jpg)
Finite State Automata and Tries 25
• But we seek analyzed output• We want the machine to tell– Root– Gender– Number– Person– Case– Etc ……
![Page 26: Finite State Automata and Tries](https://reader036.vdocuments.mx/reader036/viewer/2022062502/568163f3550346895dd57762/html5/thumbnails/26.jpg)
Finite State Automata and Tries 26
Finite State TransducerFST is like the finite state automation defined earlier, except each arc is labelled by a pair of symbols: i:o where i: symbol in input string o: symbol output by FST when are is taken
+ Ex. arc in finite state transducer corresponding to 'e' in 'ladake'
e : ((+pl, -direct), (+sg, +dir)) q1 +----------------->--------------------+ q2
Two pairs of symbols: i : o - i is: 'e' - o is: '((+pl, -direct), (+sg, +dir))'
+ Ex. Morph Analyzer: Match input with i, if successful go ahead & produce o in output
![Page 27: Finite State Automata and Tries](https://reader036.vdocuments.mx/reader036/viewer/2022062502/568163f3550346895dd57762/html5/thumbnails/27.jpg)
Finite State Automata and Tries 27
o Formally: Finite state transducer - Q: Finite set of states q0, ..., qN - SIGMA_IN: Finite set of input symbols - SIGMA_OUT: Finite set of pairs output symbols - q0: Start state (q0 IN Q) - F: Set of final accepting states (F SUBSET Q) - DELTA (q, i:o) : For every state q, gives a set of states that can be reached from q with i in SIGMA_IN, and o in SIGMA_OUT.
![Page 28: Finite State Automata and Tries](https://reader036.vdocuments.mx/reader036/viewer/2022062502/568163f3550346895dd57762/html5/thumbnails/28.jpg)
Finite State Automata and Tries 28
Example
• on board
![Page 29: Finite State Automata and Tries](https://reader036.vdocuments.mx/reader036/viewer/2022062502/568163f3550346895dd57762/html5/thumbnails/29.jpg)
Finite State Automata and Tries 29
Tools for FSA
• Lex• OpenFST– (www.openfst.org/)
• AT&T FSM Toolkit – (http://www2.research.att.com/~fsmtools/fsm/)