classifying sentences using induced structure menno van zaanen luiz augusto pizzato diego...

23
Classifying Sentences using Induced Structure Menno Van Zaanen Luiz Augusto Pizzato Diego Mollá-Aliod [email protected] Centre for Language Technology Macquarie University Sydney, Australia

Post on 21-Dec-2015

214 views

Category:

Documents


1 download

TRANSCRIPT

Classifying Sentences using Induced Structure

Menno Van Zaanen

Luiz Augusto Pizzato

Diego Mollá-Aliod

[email protected]

Centre for Language Technology

Macquarie University

Sydney, Australia

Van Zaanen, Pizzato, Molla; SPIRE-2005. Buenos Aires, 2-4 November 2005.(2/23)

Overview

• Sentence Classification Problem

• Induced Structure Approach– Alignment Based Learning– Trie Based Classifier

• Results

• Concluding Remarks

• Future Work

Van Zaanen, Pizzato, Molla; SPIRE-2005. Buenos Aires, 2-4 November 2005.(3/23)

Sentence Classification

• Assist several NLP task: document summarisation, information extraction, question answering, among others.

• Question Classification:• Definition: What is a golden parachute?• List: Name two brands of shaving cream.• Factoid questions:

– HUM:IND: Who discover the penicillin?– LOC:CITY: What is the capital of Australia?– FOOD, PLANT, ANIMAL: What do bats eat?

Van Zaanen, Pizzato, Molla; SPIRE-2005. Buenos Aires, 2-4 November 2005.(4/23)

Current approaches

• Handcrafted regular expressions:– Pros: Rules are understandable. Few rules satisfy a

large amount of the questions (Zip’s Law).

– Cons: Difficult to construct. Limited performance.

• Machine Learning:– Pros: Computer automatically finds “rules”.

– Cons: Rules and knowledge generated are not readable.

Van Zaanen, Pizzato, Molla; SPIRE-2005. Buenos Aires, 2-4 November 2005.(5/23)

Classifying by Induced Structure

• Process fits between ML and RE– Learn patterns from sentences;– Use these patterns in the classification phase;

TrainingData

Extract Structure

Structure

SentenceSentenceClassifier

Class

Van Zaanen, Pizzato, Molla; SPIRE-2005. Buenos Aires, 2-4 November 2005.(6/23)

Classifying by Induced Structure

• Propose two distinct approaches:– Alignment-Based Learning Classifier (ABL)

• ABL is a generic grammatical inference framework, that learns structure using plain text.

– Trie-Based Classifier• Classifies sentences based on partial matches in a Trie

structure.

Van Zaanen, Pizzato, Molla; SPIRE-2005. Buenos Aires, 2-4 November 2005.(7/23)

Alignment-Based Learning Classifier (ABL)

• Developed under the idea that constituents in sentences can be interchanged.– The book is on the table.– The car is on the driveway.

Van Zaanen, Pizzato, Molla; SPIRE-2005. Buenos Aires, 2-4 November 2005.(8/23)

Alignment-Based Learning Classifier (ABL)

• Developed under the idea that constituents in sentences can be interchanged.– The (book) is on the (table).– The (car) is on the (driveway).

the

book

on the

table

is

car driveway

Van Zaanen, Pizzato, Molla; SPIRE-2005. Buenos Aires, 2-4 November 2005.(9/23)

Alignment-Based Learning Classifier (ABL)

EAT Questions

DESC (What) (is (caffeine))

DESC (What) (is (Teflon))

LOC (Where) is (Milan)

LOC What (are the twin cities)

unhypo

What is .* DESC 2

What .* DESC 2

.* is caffeine DESC 1

.* is Teflon DESC 1

Where is .* LOC 1

.* is Milan LOC 1

What .* LOC 1

hypo

caffeine DESC 1

is caffeine DESC 1

What DESC 2

Teflon DESC 1

is Teflon DESC 1

Milan LOC 1

Where LOC 1

are the twin cities LOC 1

Van Zaanen, Pizzato, Molla; SPIRE-2005. Buenos Aires, 2-4 November 2005.(10/23)

Trie-Based Classifier

• T(S) = {T(S/a1), T(S/a2) ,…,T(S/ar)}

– Where S is the set of sentences and S/an are the sentences starting with an, but stripped of the initial element.

a|b|c|d|e|f|...|z

a|b|c|d|e|f|...|z

a|b|c|d|...|r|...|z

car

a|b|c|d|e|f|...|z

a|b|c|d|e|f|...|z

a|b|c|d|...|r|...|z

a|b|c|d|e|f|...|zzebra

Van Zaanen, Pizzato, Molla; SPIRE-2005. Buenos Aires, 2-4 November 2005.(11/23)

Trie-Based Classifier

1

2where

6who

19how

7is 13J.

9dean 10of 11ICS 12$ (eoq)

15$ (eoq)

8the

16of 17ICS 18$ (eoq)

3is 4Chile 5$ (eoq)

20far

21is 22Athens 23$ (eoq)

24tall 25is 26Sting 27$ (eoq)

^ (boq)

14Smith18

1HUM:DESC

FreqEAT

7

1HUM:IND

2HUM:DESC

FreqEAT

Van Zaanen, Pizzato, Molla; SPIRE-2005. Buenos Aires, 2-4 November 2005.(12/23)

Trie-Based Classifier

1 6who

7is

9dean

10of

11ICS

12$ (eoq)

^ (boq)

$^ who is prime minister of Australia

?

the

8the

?

• Look-ahead process:

Van Zaanen, Pizzato, Molla; SPIRE-2005. Buenos Aires, 2-4 November 2005.(13/23)

Implementations

• ABL– Hypo / Unhypo– Words / POS– default / prior

• Trie-based– Strict / Flex– Words / POS

Van Zaanen, Pizzato, Molla; SPIRE-2005. Buenos Aires, 2-4 November 2005.(14/23)

Implementations

• ABL– Hypo / Unhypo– Words / POS– default / prior

• Trie-based– Strict / Flex– Words / POS

unhypo

What is .* DESC 2

What .* DESC 2

.* is caffeine DESC 1

.* is Teflon DESC 1

Where is .* LOC 1

.* is Milan LOC 1

What .* LOC 1

hypo

caffeine DESC 1

is caffeine DESC 1

What DESC 2

Teflon DESC 1

is Teflon DESC 1

Milan LOC 1

Where LOC 1

are the twin cities

LOC 1

Van Zaanen, Pizzato, Molla; SPIRE-2005. Buenos Aires, 2-4 November 2005.(15/23)

Implementations

• ABL– Hypo / Unhypo– Words / POS– default / prior

• Trie-based– Strict / Flex– Words / POS

EAT Questions

DESC (What/WP) (is/VBZ (caffeine/NN))

DESC (What/WP) (is/VBZ (Teflon/NNP))

LOC (Where/WRB) is/VBZ (Milan/NNP)

LOC What/WP (are/VBP the/DT twin/JJ cities/NNS)

EAT Questions

DESC (What) (is (caffeine))

DESC (What) (is (Teflon))

LOC (Where) is (Milan)

LOC What (are the twin cities)

Van Zaanen, Pizzato, Molla; SPIRE-2005. Buenos Aires, 2-4 November 2005.(16/23)

Implementations

• ABL– Hypo / Unhypo– Words / POS– default / prior

• Trie-based– Strict / Flex– Words / POS

unhypo

What is .* DESC 2

What .* DESC 2

.* is caffeine DESC 1

.* is Teflon DESC 1

Where is .* LOC 1

.* is Milan LOC 1

What .* LOC 1

What is a mobile phone?

default:4: DESC1: LOC

prior:2: DESC1: LOC

Van Zaanen, Pizzato, Molla; SPIRE-2005. Buenos Aires, 2-4 November 2005.(17/23)

Implementations

• ABL– Hypo / Unhypo– Words / POS– default / prior

• Trie-based– Strict / Flex– Words / POS

1 6who 7is 9dean 10of 11ICS 12$ (eoq)^ (boq)

$^ who is prime minister of Australia

?

the

8the

?

Van Zaanen, Pizzato, Molla; SPIRE-2005. Buenos Aires, 2-4 November 2005.(18/23)

Implementations

• ABL– Hypo / Unhypo– Words / POS– default / prior

• Trie-based– Strict / Flex– Words / POS

1 6whoWP 7

isVBZ 9

deanNN 10

ofIN 11

ICSNNP 12

$ (eoq)^ (boq)

$eoq

^boq

whoWP

isVBZ

primeJJ

ministerNN

ofIN

AustraliaNNP

?

theDT

8theDT

?

Van Zaanen, Pizzato, Molla; SPIRE-2005. Buenos Aires, 2-4 November 2005.(19/23)

Results

coarse fine

words POS words POS

Baseline 0.188 0.188 0.110 0.110

ABL hypo default 0.516 0.682 0.336 0.628

prior 0.554 0.624 0.238 0.472

unhypo default 0.652 0.638 0.572 0.558

prior 0.580 0.594 0.520 0.432

Trie strict 0.844 0.812 0.738 0.710

flex 0.850 0.792 0.742 0.692

Van Zaanen, Pizzato, Molla; SPIRE-2005. Buenos Aires, 2-4 November 2005.(20/23)

Concluding Remarks

• Numeric results are not better than ML

• Showed that induced structure can obtain good results without using complex linguistic features

• These approaches can produce rules in the form of regular expressions than can be manually adjusted to better fit the problem.

Van Zaanen, Pizzato, Molla; SPIRE-2005. Buenos Aires, 2-4 November 2005.(21/23)

Future Work

• Regular Expressions can be improved:– Hand-tuning unique REs found by ABL– Augmenting the complexity of REs by

incorporating extra information

• Wildcard match:– Words tend to be semantically related;– Seem to be the focus words of the questions

Van Zaanen, Pizzato, Molla; SPIRE-2005. Buenos Aires, 2-4 November 2005.(22/23)

Review

• Sentence Classification Problem

• Induced Structure Approach– Alignment Based Learning– Trie Based Classifier

• Results

• Concluding Remarks

• Future Work

Classifying Sentences using Induced Structure

Menno Van Zaanen

Luiz Augusto Pizzato

Diego Mollá-Aliod

[email protected]

Centre for Language Technology

Macquarie University

Sydney, Australia