incremental evolving grammar fragments

21
Incremental Evolutionary Grammar Fragments Nurfadhlina Mohd Sharef, Trevor Martin, Yun Shen Artificial Intelligence Group, University of Bristol, BS8 1TR UK [email protected] , [email protected] , [email protected]

Upload: nurfadhlina-mohd-sharef

Post on 18-Dec-2014

384 views

Category:

Technology


1 download

DESCRIPTION

Incremental Evolving Grammar Fragments, UK Computational Intelligence Workshop 2008, Leicester UK

TRANSCRIPT

Page 1: Incremental Evolving Grammar Fragments

Incremental Evolutionary Grammar Fragments

Nurfadhlina Mohd Sharef, Trevor Martin, Yun ShenArtificial Intelligence Group, University of Bristol, BS8 1TR UK

[email protected], [email protected], [email protected]

Page 2: Incremental Evolving Grammar Fragments

Outline

• Background of problem

• Literature review

• Shortcoming of evolutionary approach

• Fuzzy text pattern learning

• Grammar Approximation

• Conclusion

Page 3: Incremental Evolving Grammar Fragments

Digital Obesity

report

website

News paper TV newspamphlet

Sms/mms

comicbooksbrochures

meeting

Page 4: Incremental Evolving Grammar Fragments

Information Overload?

Page 5: Incremental Evolving Grammar Fragments

Text Structure

• Grammar: the word order governs the message that is to be delivered in the sentences

• Short vs. Long texts

• Full language model (such as the subject-verb-object approach) is difficult to specify, complex to process, and subject to problem domains.

Page 6: Incremental Evolving Grammar Fragments

Learning Text Fragments• Shorter Sentence• Less Structured• Multiple patterns• Do not follow formal grammar rules• No need for complete language model• e.g:,

• dates and times, • names of products, • names of people, • simple sentence forms such as questions, complaints, and

news.

Page 7: Incremental Evolving Grammar Fragments

Grammars for Postal Addressnumber, street name, town, postCode‘21 London Rd Ipswich Suffolk IP1 2EZ’

• And others:

• ‘29 Meredith Rd Ipswich’ number, street name,town

• ‘Belfairs Hotel 33 Graham Rd Ipswich’ word, business, number, street name, town

• The variations of the pattern will probably increase as more data samples are encountered.

Address A: 29 Meredith Rd Ipswich

A is an address, but is B a

valid address?

A and B are valid addresse

s!

Address B: Future House, 31, Mars Ave, Mars

Page 8: Incremental Evolving Grammar Fragments

Existing Approaches

• tagging-based information extraction • document distributions and statistical model• evolutionary genetic algorithms• semantic nets• fuzzy methods

Aimed at generating grammars that would parse fully defined dataset and cannot easily cope with

the addition of a new training example.

Figure 1: Example of information tagging

Page 9: Incremental Evolving Grammar Fragments

Genetic Algorithm for Grammar Parsing

• Goal: Generate grammar that would cover past and new examples

• Approach: binary trees of non-terminal nodesleft branch: T:= {word, number, street ending,…}right branch: T U {AND, OR, OPTIONAL}

• Population Setting: Groups of grammar files with varied number of grammar definitions

• Mating selection (Elitist): Among files within and between groups and among grammar elements in and between groups

• Genetic operators: crossover and mutation

• Fitness Function: measure the ability of the grammar to parse test strings

Figure 2: Address Grammar Fragments Binary Tree

25 acacia avenue

Page 10: Incremental Evolving Grammar Fragments

gen 0 Groups Total Fitness

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

1 2 3 4 5 6 7 8 9 10

Group

gram

mar

file

s sc

ore

0

1

2

3

4

5

6

7

8

9

Gen 32 Groups Total Fitness

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

group

gra

mm

ar

file

sco

re

0

1

2

3

4

5

6

7

8

9

10

11

Result: 1. Fitness is low although all grammars have converged (average highest

score=0.388, highest score=0.6)

2. Effective for grammar building but requires complete retraining if the initial set of examples is not sufficiently general to create a good classifier.

Figure 3: parsing score of generated grammar groups in generation 0

Figure 4: parsing score of generated grammar groups in generation 32

Page 11: Incremental Evolving Grammar Fragments

Fuzzy Approach for Text Pattern Learning

• To describe a relation between the text and the grammar fragment

• Represents the membership degree of the grammar belongingto the text.

• The grammar element can be terminal as well as fuzzy sets

ALPHANUMERIC

NUMBER ALPHABETIC

ANYWORD

PC1 PC2

PLACENAME BUSINESS TYPE

ETC

CITY NAME

Figure 5: Partial Order Table for UK Address

Page 12: Incremental Evolving Grammar Fragments

Grammar Similarity

• Fuzzy Grammar and Fuzzy Membership• Loosely inspired by Levenshtein Edit Distance

Source string W E D N E S D A Y

Target string T U E S D A Y

Edit distance* S=1 S=1 D=1 D=1 = = = = =

Table 1: Example of string edit distance operation (*I:Insert, D:Delete, S:Substitute)

Source grammar

Number Word Word Streetending Placename

Target grammar Number Placename Streetending Placename Countyname

Edit distance* = S=1 D=1 = = I=1

Table 2: Example of Grammar Edit Distance Operation (*I:Insert, D:Delete, S:Substitute)

Page 13: Incremental Evolving Grammar Fragments

Fuzzy Parsing• Fuzzy Membership: Measure the parsing degree of a

grammar on strings

• Fuzzy Overlap: CostGG(GS, GT): estimate of the cost of changing a string parsed by the grammar GS into one parsed by the grammar GT.

… (Eq. 1)

… (Eq. 2)

… (Eq. 3)

I: insertionD: Deletion

S: SubstituteRs: Remainder in the source

Rt: Remainder in the target

Page 14: Incremental Evolving Grammar Fragments

Equations (I)S, T : sequences of grammar elements,

s, t : terminal symbols, TSi and TSj : (fuzzy) sets of terminal symbols,

X : any single grammar element Hs, Ht : tags.

Page 15: Incremental Evolving Grammar Fragments

Equations (II)S, T : sequences of grammar elements,

s, t : terminal symbols, TSi and TSj : (fuzzy) sets of terminal symbols,

X : any single grammar element Hs, Ht : tags.

Page 16: Incremental Evolving Grammar Fragments

Equations (III)S, T : sequences of grammar elements,

s, t : terminal symbols, TSi and TSj : (fuzzy) sets of terminal symbols,

X : any single grammar element Hs, Ht : tags.

Page 17: Incremental Evolving Grammar Fragments

Incremental Evolution Strategy

Suppose we have a set of positive examples (P).

We find the grammar fragment Hmax that parses

Sp with maximum membership

• If CostGG(Sp,Hmax) ≤ (CostGG(Sp,Hi))

• Then we shall incrementally alter Hmax or create a new grammar.

CostGG(Sp,Hmax) ≥ max (CostGG(Hi,Hmax))

Page 18: Incremental Evolving Grammar Fragments

Grammar Approximation Operators• Create a new rule Hnew ::= Sp, where appropriate

substring can be tagged and restrict to maintain single optional

Hfinal=[Hi]GHi+1

• Merge duplicate grammar definition which can be generalized and replace with a more generalize fuzzy superset grammar

Hi:={gi, gi+1,…, gn}, gi = moreGeneral(gS,gT)• Replace contiguous optional grammar with optional

fuzzy grammar

[Hnew]={gi, gi+1,…, gn}

Page 19: Incremental Evolving Grammar Fragments

Fuzzy Grammar OverlapTarget grammar

Column=0 Column=1 Column=2 Column=3 Column=4 Column=5

Source grammar

Null number anyWord streetend placename postcode

Row=0 Null 0 0 0 1 0 0 2 0 0 3 0 0 4 0 0 5 0 0

Row=1 number 0 1 0 0 0 0 (E) 1 0 0 2 0 0 3 0 0 5 0 0

Row=2 placename 0 2 0 0 1 0 0 0 1 (D) 1 0 1 2 0 1 4 0 1

Row=3 streetend 0 3 0 0 2 0 0 1 1 0 0 1 (C) 1 0 1 (B) 3 0 1 (A)

Figure 6: Overlap Matrix

Approximation Result:X:=placeNameX:=anyWordG1:= placeName-postCodeAddr:= number-X-streetend-[G1]

Approximation Result Generalized:G1:= placeName-postCodeAddr: Number-anyWord-streetend-[G1]

Final costI=0, D=0, S=0

I=0, D=0, S=1

3 0 1 (A)1 0 1 (B)0 0 1 (C)

0 0 1 (D)

0 0 0 (E)

I: insertion + remainder of targetD: Deletion+ remainder of source

S: Substitute

Page 20: Incremental Evolving Grammar Fragments

Grammar ApproximationAddress Grammar derived from

addressApproximated Grammar

107 hatfield rd ipswich ip3 9ag

number-placeName-streetend-placeName-postCode

ADDR:=number-placeName-streetend-placeName-postCode

121 sidegate ln ipswich

number-anyWord-streetend-placeName

G1:=streetend-placeName-[postCode]G2:=anyWordG2:=placeNameADDR:=number-G2-G1

Figure 7: Grammar Approximation Example

alnesbourne priory club nacton rd ipswich

anyWord-anyWord-anyWord-anyWord-streetend-placeName

G1:=postCode

G2:=streetend-placeName-[G1]

G3:=anyWord-anyWord

G4:=anyWord

G4:=number

ADDR:=G4-anyWord-[G3]-G2

Page 21: Incremental Evolving Grammar Fragments

Conclusion and Future Work

• The fuzzy method outperforms the standard genetic techniques to create fuzzy grammars

• Highlight: ability to learn new text pattern without sacrificing past data

• Approximation operators: escaped from the common genetic operators

• Future Work: refine the approximation method and test with other softer structures data