learning to understand web site update requests william w. cohen, einat minkov, anthony tomasic

45
Learning To Understand Web Site Update Requests William W. Cohen, Einat Minkov, Anthony Tomasic

Upload: osborn-marsh

Post on 28-Dec-2015

216 views

Category:

Documents


1 download

TRANSCRIPT

Learning To Understand Web Site Update Requests

William W. Cohen, Einat Minkov, Anthony Tomasic

2

3

• Directory Info

• Courses

• Projects

• Publications

• Sponsors

• Events

• …

4

Web site update

Database

Not interested (and better not) interact directly with the database. Dislike active effort.Happy with natural language.

5

Web site update

Database

Not interested (and better not) interact directly with the database. Dislike active effort.Happy with natural language.

Natural language

Formal language

6

Web site update

Database

Not interested (and better not) interact directly with the database. Dislike active effort.Happy with natural language.

a BOTTLENECK

Natural language

Formal language

7

Web site update

Database

Not interested (and better not) interact directly with the database. Dislike active effort.Happy with natural language.

a BOTTLENECK

The Problem(s):

• Natural Language Processing

• Email text

• The domain is fixed, however, database schema changes over time.

Can we automate the webMaster?

Natural language

Formal language

8

Framework and motivation

• Assume DB-backed website, where schema changes over time

• Address changes in factual content of website • Assume update of one tuple

• The system should interact with the user, where the goal is to avoid errors in website update:

• User requests some change, via NL email• System analyzes request• System presents preview page and editable form

version of request• user can verify correctness (vs case for DB queries,

Q/A,...) => source of training data• Partial correctness is useful.

A learning system. Message understanding is decomposed into entity recognition and classification tasks.

SCOPE

THE HUMAN FACTOR

FRAMEWORK

Einat Minkov
What exactly do you mean by 'adaptive NLP components'?also - 'no other changed allowed (yet) - for example?what is the PREVIEW PAGE?

9

POS tags

NP chunks

words, ...

feat

ure

s

entity1, entity2, ....

email msg

Shallow NLP Feature Building

C

C

C

C

requestType

targetRelation

targetAttrib

Classification

newEntity1,...

oldEntity1,...

keyEntity1,...

otherEntity1,...

InformationExtraction

databaseweb pagetemplates

Update Request Construction

preview page

user-editable formversion of request

confirm?

LEARNERoffline

trainingdata

User

10

Related work

• “Mr.Web” [Lockerd et-al, CHI-03]• NL interfaces to DBs, to facilitate queries. • Learning commonly used as tool to develop non-adaptive

NLP components. Semantics learning systems: CHILL [Zelle & Mooney, 96], [Miller et-al, 96]

• Many non-learning NLP systems incorporate domain knowledge (hand-tuned).

What’s new:– Understanding update requests, rather than queries or

statements: partially correct analysis still useful here.– Deep analysis in limited but evolving domain– Email text

Einat Minkov
What exactly do you mean by 'adaptive NLP components'?also - 'no other changed allowed (yet) - for example?what is the PREVIEW PAGE?

11

Outline

• The experimental corpus

• Request decomposition

• Sub-task evaluation

• End-to-end evaluation

• Conclusions

• Future directions

12

Experimental corpus

13

Experimental corpus

User1

Mike Roborts should be Micheal Roberts in the staff listing, pls fix it. Thanks - W

14

Experimental corpus

User1

User2

User3

....

Mike Roborts should be Micheal Roberts in the staff listing, pls fix it. Thanks - W

On the staff page, change Mike to Michael in the listing for “Mike Roberts”.

15

Experimental corpus

User1

User2

User3

....

Add this as Greg Johnson’s phone number: 412 281 2000

Please add “412-281-2000” to greg johnson’s listing on the staff page.

16

Experimental corpus

User1

User2

User3

....

Add this as Greg Johnson’s phone number: 412 281 2000

Please add “412-281-2000” to greg johnson’s listing on the staff page.

617 examples: ~20 subjects x ~30 tasks

17

Preprocessing – entity names are made distinct

User1

User2

User3

....

Add this as Greg Johnson’s phone number: 412 281 2000

Please add “543-341-8999” to fred flintstone’s listing on the staff page.

Modification: to make entity-extraction reasonable, remove duplicate entities by replacing them with alternatives (preserving case, typos, etc)

18

Experimental corpus

• 617 examples

• Factual updates (with some exceptions)

• Each update request refers to a single tuple in the database.

• In the underlying database, a relation does not contain two attributes or more of the same “type” (entity).

• Text is ungrammatical and noisy!

19

Outline

• The experimental corpus

• Request decomposition

• Sub-task evaluation

• End-to-end evaluation

• Conclusions

• Future directions

20

RelationAttribute

Request type

Entity recognition

Entity role

21

POS tags

NP chunks

words, ...

feat

ure

s

entity1, entity2, ....

email msg

Shallow NLP Feature Building

C

C

C

C

requestType

targetRelation

targetAttrib

newEntity1,...

oldEntity1,...

keyEntity1,...

otherEntity1,...

InformationExtraction

22

Outline

• The experimental corpus

• Request decomposition

• Sub-task evaluation

• End-to-end evaluation

• Conclusions

• Future directions

23

POS tags

NP chunks

words, ...

feat

ure

s

entity1, entity2, ....

email msg

Shallow NLP Feature Building

C

C

C

C

requestType

targetRelation

targetAttrib

newEntity1,...

oldEntity1,...

keyEntity1,...

otherEntity1,...

InformationExtraction

hi webmaster- on the Education Technology Personnel page under "staff", please change [Robert Paul]PERSON's Phone number from “x[365]PHONE" to “x[3655]PHONE". Thanks, [Martha]PERSON

24

1. Entity recognition

• ‘Offline’ training data.

• Experiments:– hand-coded rules (cascaded FST in

“Mixup” language)

– Used CRF for learning

– A standard feature set vs. “tuned” feature set

– results are in entity-level F1 (harmonic avg of recall and precision)

• Good performance, also for a small dataset.Users tend to use the terminology and formats of the website, resulting in reduced variability.

25

Robustness Evaluation

System robustness: how robust is the learned model to new user styles or new requests?

message (user 1,req 1)

message (user 2,req 1)

..

message (user 1,req 2)

message (user 2,req 2)

..

message (user 1,req 3)

message (user 2,req 3)

..

test

train

train

26

Robustness Evaluation

System robustness: how robust is the learned model to new user styles or new requests?

message (user 1,req 1)

message (user 2,req 1)

..

message (user 1,req 2)

message (user 2,req 2)

..

message (user 1,req 3)

message (user 2,req 3)

..

train

test

train

27

Entity recognition

75

80

85

90

95

100

Time Date Amount Phone# Room# Person

En

tity

F 1 5CV

5CV - USR

5CV - REQ

28

POS tags

NP chunks

words, ...

feat

ure

s

entity1, entity2, ....

email msg

Shallow NLP Feature Building

C

C

C

C

requestType

targetRelation

targetAttrib

newEntity1,...

oldEntity1,...

keyEntity1,...

otherEntity1,...

InformationExtraction

hi webmaster- on the Education Technology Personnel page under "staff", please change [Robert Paul]key's Phone number from “x[365]OLD" to “x[3655]NEW". Thanks, [Martha]

29

2. Role-based entity classification

• Entity “roles”:– keyEntity: value used to

retrieve a tuple that will be updated (“delete greg’s phone number”)

– newEntity: value to be added to database (“William’s new office # is 5307 WH”).

– oldEntity: value to be overwritten or deleted (“change mike to Michael in the listing for ...”)

– irrelevantEntity: not needed to build the request (“please add .... – thanks, William”)

Features:

• closest preceding “action verb” (add, change, delete, remove, ...)

• closest preceding preposition

• is the entity part of a determined NP

• presence of a possession marker

.. change [Robert Paul] 's

..

.. change .. from “x[365]" .. .. change .. to “x[3655]".

30

Role-based classification results

• Task of semantic nature

• The text is semi-ungrammatical.

• However, good results with a small, simple set of features

• Semi-finite set of language patterns: ‘change’, ‘correction’, ‘update’, ‘replace’, ‘should be’, ‘wrong’, ‘needs to be’ ..

70

75

80

85

90

95

100

keyEntity newEntity oldEntity

F1

5CV

5CV - USR

5CV - REQ

31

POS tags

NP chunks

words, ...

feat

ure

s

entity1, entity2, ....

email msg

Shallow NLP Feature Building

C

C

C

C

requestType

targetRelation

targetAttrib

newEntity1,...

oldEntity1,...

keyEntity1,...

otherEntity1,...

InformationExtraction

32

3. Target relation classification

Good results with “bag of words” features.

Even better results when given the included entity types. (e.g, phone number people relation)

70

75

80

85

90

95

100

People Budget Events Sponsors

F1

5CV

5CV - USR

5CV - REQ

33

POS tags

NP chunks

words, ...

feat

ure

s

entity1, entity2, ....

email msg

Shallow NLP Feature Building

C

C

C

C

requestType

targetRelation

targetAttrib

newEntity1,...

oldEntity1,...

keyEntity1,...

otherEntity1,...

InformationExtraction

34

50

55

60

65

70

75

80

85

90

95

100

deleteTuple deleteValueF1

5CV

5CV - USR

5CV - REQ

4. Request type classification

Can be determined from entity roles and action verb, except for deleteTuple and deleteValue.

“Delete the phone number for Scott” “Pls delete the whole entry for Scott”

Features:• counts of entity role types• action verbs• nouns in NPs which are

(probably) objects of action verb

• A small set of nouns, tagged with a dictionary

Performance is way better with 12-words of schema-specific knowledge: dictionary of terms like phone, extension, room, office, ...

Can be learned from data.

35

POS tags

NP chunks

words, ...

feat

ure

s

entity1, entity2, ....

email msg

Shallow NLP Feature Building

C

C

C

C

requestType

targetRelation

targetAttrib

newEntity1,...

oldEntity1,...

keyEntity1,...

otherEntity1,...

InformationExtraction

36

0102030405060708090

100

Person

al na

me

Phone

#

Room

#

Publica

tion

Photo CV

Amou

nt

F1

5. Target attribute classification–“Delete the phone number for Scott” phone–“Delete Scott’s extension” phone

Additional features:– small dictionaries (BOW) e.g., phone, extension, line.– Can be learned from data due to redundancy

• The vocabularies used in the corpus are small.

• Conjecture: tendency to use the terminology of the website. Also, the vocabularies are naturally small.

37

Outline

• The experimental corpus

• Request Decomposition

• Sub-task evaluation

• End-to-end evaluation

• Conclusions

• Future directions

38

End-to-end performance

• Tasks are not independent

• Consider noisy inputs

• Evaluation criterion: % of completely correct messages.

Results in ~40% of perfectly processed messages.

Entity Recognition

Entity Role Classification

Request Type Classification

Relation Classification

Target Attr. Classification

39

End-to-end: Individual tasks

Entity Recognition

Entity Role Classification

Request Type Classification

Relation Classification

Target Attr. Classification

85%

99%67%

80%

Noisy inputs

% of perfectly processed messages

40

End-to-end: Composite tasks

• The user would get the

correct form in 79.2% of messages.

Entity Recognition

Entity Role Classification

Request Type Classification

Relation Classification

Target Attr. Classification

85%

99%67%

80%

41

• The user would get the

correct form in 79.2% of messages.

• The correct form, with all entities extracted correctly:

53.4% of messages.

Entity Recognition

Entity Role Classification

Request Type Classification

Relation Classification

Target Attr. Classification

85%

99%67%

80%

End-to-end: Composite tasks

42

• The user would get the

correct form in 79.2% of messages.

• The correct form, with all entities extracted correctly:

53.4% of messages.

• Fully correctly processed

messages: 39.2%

Entity Recognition

Entity Role Classification

Request Type Classification

Relation Classification

Target Attr. Classification

85%

99%67%

80%

End-to-end: Composite tasks

43

Conclusions• A promising rate of 40% messages processed correctly. This rate is

expected to improve as data accumulates.

• System architecture means all schema-dependent knowledge can be learned– Potential to adapt to changes in schema– Data needed for learning can be collected from user– Learning appears to be possible on reasonable time-scales (10s or

100s of relevant examples, not thousands)

• The noisy informal email text can be successfully processed applying a learning approach, using small sets of syntactic features.

• The system is robust to user style, and request variation.

• Human subject experiments show partially correct results to be useful.

Thus, a realistic adaptive, automatic webmaster assistant.

44

Future directions

• Relax the restriction that each request concerns the update of one tuple per email.

• Evaluate more complex entity types for the entity recognition component (coverage).

• Entity recognition may be improved by database lookups.

• Collective classification to improve on message-based classification performance (e.g., entity roles) as well as pipeline processing.

45

Thank You.

Questions?