order independentincremental evolving fuzzy grammar fragment learner
DESCRIPTION
ISDA 09TRANSCRIPT
ORDER INDEPENDENTINCREMENTAL EVOLVING FUZZY GRAMMAR FRAGMENT LEARNER
•Nurfadhlina Mohd Sharef,Department of Computer Science,Faculty of Computer Science and Information Technology,University Putra Malaysia,[email protected],[email protected],
•Trevor Martin,•Yun ShenArtificial Intelligence Group,Intelligent System Lab,University of BristolBristol, United [email protected], [email protected]
OUTLINE
Introduction Text Fragment Learning Fuzzy Grammar Fragment Order Independent
** Source from World Incidents Tracking System
Human is able to
understand a class without
following a strict
pattern
SUMMARY EXAMPLES
“On 27 May 2004, in Nadapuram, Kerala, India, a bomb exploded at a store, causing moderate damage to the establishment but no casualties. No group claimed responsibility.”
“On 27 May 2004, at about 7:30 AM, in the Tanahu District, Nepal, a press vehicle ran over a landmine, killing the driver and injuring two passengers. No group claimed responsibility, although it is widely believed the Communist Party of Nepal (Maoist)/United People's Front was responsible.”
“ On 27 May 2004, at night, in Pulwama, Jammu and Kashmir, India, armed assailants fired upon and killed a former Special Police Officer in his home. No group claimed responsibility.”
** Source from World Incidents Tracking System
date
date
date time
time
victim
victim
TEXT FRAGMENT LEARNINGText Fragment Examples XML Tag:
Event Type- Bomb exploded- Explosion occurred- detonated a bomb-detonated a timed improvised explosive device
Bombing
- Attacks to occur- Attackers threw a grenade- Assailants attacked a security vehicle- Gunmen killed a member
Armed Attack
learn the underlying grammar patterns of similar texts; exploiting both syntactical and semantic properties
BOMBING TEXT FRAGMENT “On 1 January 2004, in Srinagar, Jammu and Kashmir, India, an
assailant on a bicycle was carrying a bomb, which prematurely exploded, injuring six civilians. No group claimed responsibility.”
“On 27 May 2004, in Nadapuram, Kerala, India, a bomb exploded at a store, causing moderate damage to the establishment but no casualties. No group claimed responsibility.”
“On 2 February 2004, in Khowst, Nader Shah Kot District, Afghanistan, an explosion occurred at a concert, frightening the crowd, but causing no injuries. No group claimed responsibility.”
“On 1 March 2004, in Burgos, Italy, an attacker detonated a bomb outside the apartment home of the mayor of Burgos, killing the official's father. The mayor had been a target of several attacks and threats, including a recent bombing at his mother's grave. No group claimed responsibility.”
** Source from World Incidents Tracking System
Learning Fuzzy Grammar Fragments
“To investigate an incremental evolving fuzzy
grammar fragment learningwith independentorder feature”
FUZZY GRAMMAR FRAGMENT
Text Fragment Grammar Fragment•Bomb exploded•Explosion occurred•detonated a bomb•a timed improvised explosive device detonated •detonated explosives •bombers threw grenades
•BombType-BombAction•BombAction•BombAction-Article-BombType•Article-BombType-BombAction
•BombAction-BombType•CriminalListanywordBombType
** Grammar derivation is done using a set of predefined terminal sets of type regular expression, enumeration and compound
learn the underlying structure of
the data
convert the texts into a
more structured form
+
Grammar Class
EVOLVING GRAMMAR ISSUES
Text Fragment1Word1Word2…Wordn
Grammar1Term1Term2…Termn
Text Fragment2Word1Word2…Wordn
Grammar2Term1Term2…Termn
Text Class
How to recognize the fragmentHow to represent the grammar? How to generalize the grammar
Automatic Generation
• How to tell one grammar is better ?
Grammar1Term1Term2…Termn
GRAMMAR SIMILARITYSource string W E D N E S D A Y
Target string T U E S D A Y
Edit distance* S=1 S=1 D=1 D=1 = = = = =
Table 1: Example of string edit distance operation (*I:Insert, D:Delete, S:Substitute)
Table 2: Example of Grammar Edit Distance Operation (*I:Insert, D:Delete, S:Substitute)
Cost(sg,tg) = <I D S Rs Rt>=<1 1 1 null null>I:InsertD:DeleteS:SubstituteRs: remaining in SourceRt: remaining in Target
Source grammar, sg
Number Word Word Streetending Placename
Target grammar, tg
Number Placename Streetending Placename Countyname
Edit distance*
= S=1 D=1 = = I=1
Start s=new string maxMem=membership(s,TG)
maxMem<1?
GRAMMAR COMBINATION
costST=costTS=1Y
gx=Combine(sg,tg)
costST=1 && costTS=0 gx=sgN
Y
N
costST=0 && costTS=1 gx=tgY
Update TG:GTj={GTj1gti} union {gx} gx=sg
N
sg=deriveGrammar(s)tg=target grammar with maxMem
costST=grammarSimilarity(sg,tg)costTS=grammarSimilarity (tg,sg)
End
Y
N
MINIMAL COMBINATION RULES
Source Grammar
Target Grammar
Cost(Source,Target)
Cost(Target,Source)
Combination Operation
Combinedgrammar
a-b-c a-b 0 1 0 - - 1 0 0 - - Insert a-b-[c]
a-b a-b-c 1 0 0 - - 0 1 0 - - Insert a-b-[c]
a-b-c a-B-c 0 0 0 - - 0 0 1 - - Merge a-B-c, B>b
a-B-c a-b-c 0 0 1 - - 0 0 0 - - Merge a-B-c, B>b
a-F-c a-G-c 0 0 1 - - 0 0 1 - - Create a-X-c , X:=F||G,GF ≠
ORDER INDEPENDENT IEFG Let be a triplet =<α α S, GS,GT> where S is a finite
permutation of strings i.e. a sequence in which each string appears exactly once.parse) S.
GT is the set of combined grammars that represents (canparse) S.
In order to show GS = GT we note thatGS ≤ GT Ext(GS) Ext(GT)↔ ⊆GT ≤ GS Ext(GT) Ext(GS)↔ ⊆
whereExt(GS) is the set of strings parseable by GSExt(GT) is the set of strings parseable by GT
Hence it suffices to show thatExt(GS) = Ext(GT)
THEOREM 1: FOR α =<S,GS,GT> EXT(GS)=EXT(GT)
Proof by induction Basis n=1
Clearly GS = {gs1} = GT, so Ext(GS) = Ext(GT) Inductive step
We assume that Ext(GSj1) = Ext(GTi1) for some arbitrary value i=j and j>1 and show that
Ext(GSj) = Ext(GTi)Note that
Ext(GSj) = Ext(GSj1) Ext(gsj)∪
Let:Ext(GSj) = Ext(GSj1) Ext(gs∪ j) gx=Combine(gsj,gti)Ext(gx) = Ext(gsj) Ext (gt∪ i)
Case1: Combine(gsj,gti) if Cost(gsj,gti)=Cost(gti, gsj)=1In this case, Ext(GTi) = Ext(GTi1 {gti}) Ext(g∪ x)Hence Ext(GTi) = (Ext(GTi1) (Ext(gti)) Ext(g∪ x) = Ext(GTx1) Ext(gs∪ j)Case2: Combine(gsj,gti) if gsj is more general than gti i.e. Ext(gti) Ext(gs⊆ j)In this case, Ext(GTi) = Ext(GTi1) Ext(gs∪ j)Case3: Combine(gsj,gti) if gti is more general than gsj i.e. Ext(gsj) ⊆ Ext (gti)In this case, Ext(GTi) = Ext(GTi1)
Therefore Ext(GSj1) = Ext(Gtj1) implies Ext(GSj) = Ext(Gti)
Thus in all cases the inductive hypothesis is true and Ext(GSj)=Ext(Gti) ■
LEMMA 2.1 For any two permutations S and S*, giving derived grammars GS
and GS*
Ext(GSj) = Ext(GSj*) Proof
Each example string sj leads to a derived grammar gsi. Clearly from the definition
Ext(GS) = Ext(gs1) Ext(gs2) …∪ ∪ This is independent of the order in which the example strings are
presented.
Theorem 2:Given =<α S, GS ,GT> and * =<α S*, GS* ,GT*>, Then Ext(GTi) = Ext(GTi*)
Proof By lemma 2.1, Ext(GSj) = Ext(GSj*) By theorem 1, Ext(GSj) = Ext(Gti) and Ext(GSj*) = Ext(GTi*) Hence Ext(GTi) = Ext(GTi*) Corollary 2 Cost(GT,GT*) = Cost(GT*,GT)= <0 0 0 null null> This ends the proof of Theorem 2■
To show that the IEFG process is independent of the order in which examples are presented, we consider a different permutation S* leading to l * =<S*, GS* ,GT*> and show that Ext(GTi) = Ext(GTi*)
EXAMPLE
generated results may not be syntactically identical but rather yield same (approximate) parsing coverage even when the pattern instances are presented in different orders.
CONCLUSION An algorithm that features independent training order can ensure
robust results. This paper discusses an orderindependent fuzzy grammar fragment
learning method which is implemented using incremental evolving method.
The formalized theory is supported with empirical evidence which generates grammars that have equal (approximated) parsing coverage of the trained dataset regardless of the orders.