using information extraction for question answering
DESCRIPTION
Using Information Extraction for Question Answering. Done by Rani Qumsiyeh. Problem. More Information added to the web everyday. Search engines exist but they have a problem This calls for a different kind of search engine. History of QA. QA can be dated back to the 1960’s - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Using Information Extraction for Question Answering](https://reader036.vdocuments.mx/reader036/viewer/2022070403/568139e1550346895da1982e/html5/thumbnails/1.jpg)
Using Information Extraction for Question Answering
Done by
Rani Qumsiyeh
![Page 2: Using Information Extraction for Question Answering](https://reader036.vdocuments.mx/reader036/viewer/2022070403/568139e1550346895da1982e/html5/thumbnails/2.jpg)
Problem
More Information added to the web everyday. Search engines exist but they have a problem
This calls for a different kind of search engine.
![Page 3: Using Information Extraction for Question Answering](https://reader036.vdocuments.mx/reader036/viewer/2022070403/568139e1550346895da1982e/html5/thumbnails/3.jpg)
History of QA
QA can be dated back to the 1960’s
Two common approaches to design QA: Information Extraction Information Retrieval
Two conferences to evaluate QA systems TREC (Text REtrieval Conference) MUC (Message Understanding Conference)
![Page 4: Using Information Extraction for Question Answering](https://reader036.vdocuments.mx/reader036/viewer/2022070403/568139e1550346895da1982e/html5/thumbnails/4.jpg)
Common Issues with QA systems
Information retrieval deals with keywords.
Information extraction learns the question.
The question could have multiple variations which meansEasier for IR but more broad resultsHarder for IE but more EXACT results
![Page 5: Using Information Extraction for Question Answering](https://reader036.vdocuments.mx/reader036/viewer/2022070403/568139e1550346895da1982e/html5/thumbnails/5.jpg)
Message Understanding Conference (MUC) Sponsored by the Defense Advanced Research
Projects Agency (DARPA) 1991-1998.
Developed methods for formal evaluation of IE systems
In the form of a competition, where the participants compare their results with each other and against human annotators‘ key templates.
Short system preparation time to stimulate portability to new extraction problems. Only 1 month to adapt the system to the new scenario before the formal run.
![Page 6: Using Information Extraction for Question Answering](https://reader036.vdocuments.mx/reader036/viewer/2022070403/568139e1550346895da1982e/html5/thumbnails/6.jpg)
Evaluation Metrics
Precision and recall: Precision: correct answers/answers produced Recall: correct answers/total possible answers
F-measure Where is a parameter representing relative
importance of P & R:
E.g., =1, then P&R equal weight, =0, then only P Current State-of-Art: F=.60 barrier
![Page 7: Using Information Extraction for Question Answering](https://reader036.vdocuments.mx/reader036/viewer/2022070403/568139e1550346895da1982e/html5/thumbnails/7.jpg)
MUC Extraction Tasks
Named Entity task (NE) Template Element task (TE) Template Relation task (TR) Scenario Template task (ST) Coreference task (CO)
![Page 8: Using Information Extraction for Question Answering](https://reader036.vdocuments.mx/reader036/viewer/2022070403/568139e1550346895da1982e/html5/thumbnails/8.jpg)
Named Entity Task (NE)
Mark into the text each string that represents, a person, organization, or location name, or a date or time, or a currency or percentage figure
![Page 9: Using Information Extraction for Question Answering](https://reader036.vdocuments.mx/reader036/viewer/2022070403/568139e1550346895da1982e/html5/thumbnails/9.jpg)
Template Element Task (TE)
Extract basic information related to organization, person, and artifact entities, drawing evidence from everywhere in the text.
![Page 10: Using Information Extraction for Question Answering](https://reader036.vdocuments.mx/reader036/viewer/2022070403/568139e1550346895da1982e/html5/thumbnails/10.jpg)
Template Relation task (TR)
Extract relational information on employee_of, manufacture_of, location_of relations etc. (TR expresses domain independent relationships between entities identified by TE)
![Page 11: Using Information Extraction for Question Answering](https://reader036.vdocuments.mx/reader036/viewer/2022070403/568139e1550346895da1982e/html5/thumbnails/11.jpg)
Scenario Template task (ST)
Extract prespecified event information and relate the event information to particular organization, person, or artifact entities (ST identifies domain and task specific entities and relations)
![Page 12: Using Information Extraction for Question Answering](https://reader036.vdocuments.mx/reader036/viewer/2022070403/568139e1550346895da1982e/html5/thumbnails/12.jpg)
Coreference task (CO)
Capture information on corefering expressions, i.e. all mentions of a given entity, including those marked in NE and TE (Nouns, Noun phrases, Pronouns)
![Page 13: Using Information Extraction for Question Answering](https://reader036.vdocuments.mx/reader036/viewer/2022070403/568139e1550346895da1982e/html5/thumbnails/13.jpg)
An Example
The shiny red rocket was fired on Tuesday. It is the brainchild of Dr. Big Head. Dr. Head is a staff scientist at We Build Rockets Inc.
NE: entities are rocket, Tuesday, Dr. Head and We Build Rockets
CO: it refers to the rocket; Dr. Head and Dr. Big Head are the same
TE: the rocket is shiny red and Head‘s brainchild TR: Dr. Head works for We Build Rockets Inc. ST: a rocket launching event occurred with the various
participants.
![Page 14: Using Information Extraction for Question Answering](https://reader036.vdocuments.mx/reader036/viewer/2022070403/568139e1550346895da1982e/html5/thumbnails/14.jpg)
Scoring templates
Templates are compared on a slot-by-slot basisCorrect: response = keyPartial: response » keyIncorrect: response != keySpurious: key is blank
• overgen=spurious/actual
Missing: response is blank
![Page 15: Using Information Extraction for Question Answering](https://reader036.vdocuments.mx/reader036/viewer/2022070403/568139e1550346895da1982e/html5/thumbnails/15.jpg)
Maximum Results Reported
![Page 16: Using Information Extraction for Question Answering](https://reader036.vdocuments.mx/reader036/viewer/2022070403/568139e1550346895da1982e/html5/thumbnails/16.jpg)
KnowitAll, TextRunner, KnowitNow Differ in implementation, but do the same
thing.
![Page 17: Using Information Extraction for Question Answering](https://reader036.vdocuments.mx/reader036/viewer/2022070403/568139e1550346895da1982e/html5/thumbnails/17.jpg)
Using them as QA systems
Able to handle questions that produce 1 relationWho is the president of the US? “can
handle”Who was the president of the US in
1998? “fails”
Produces a huge number of facts that the user still has to go through.
![Page 18: Using Information Extraction for Question Answering](https://reader036.vdocuments.mx/reader036/viewer/2022070403/568139e1550346895da1982e/html5/thumbnails/18.jpg)
Textract
Aims at solving ambiguity in text by introducing more named entities.
What is Julian Werver Hill's wife's telephone number? equivalent to: What is Polly's telephone
number?
Where is Werver Hill's affiliated company located? equivalent to: Where is Microsoft located?
![Page 19: Using Information Extraction for Question Answering](https://reader036.vdocuments.mx/reader036/viewer/2022070403/568139e1550346895da1982e/html5/thumbnails/19.jpg)
Proposed System
Determine what named entity we are looking for using Textract.
Use Part of Speech tagging.
Use TextRunner as the basis for search.
Use WordNet to find synonyms.
Use extra entities in text as “constraints”
![Page 20: Using Information Extraction for Question Answering](https://reader036.vdocuments.mx/reader036/viewer/2022070403/568139e1550346895da1982e/html5/thumbnails/20.jpg)
Example
![Page 21: Using Information Extraction for Question Answering](https://reader036.vdocuments.mx/reader036/viewer/2022070403/568139e1550346895da1982e/html5/thumbnails/21.jpg)
Example
(WP who) (VBD was) (DT the) (JJ first) (NN man) (TO to) (VB land) (IN on) (DT the) (NN moon)
The verb (VB) is treated as the argument. The noun (NN) is treated as the predicate We make sure that position is maintained We keep prepositions if they have two nouns.
(president of the US) Other non stop words are constraints, i.e.,
“first”
![Page 22: Using Information Extraction for Question Answering](https://reader036.vdocuments.mx/reader036/viewer/2022070403/568139e1550346895da1982e/html5/thumbnails/22.jpg)
Example
![Page 23: Using Information Extraction for Question Answering](https://reader036.vdocuments.mx/reader036/viewer/2022070403/568139e1550346895da1982e/html5/thumbnails/23.jpg)
Anaphora Resolution
Use anaphora resolution to determine that landed is not associated with landed but wrote instead.
![Page 24: Using Information Extraction for Question Answering](https://reader036.vdocuments.mx/reader036/viewer/2022070403/568139e1550346895da1982e/html5/thumbnails/24.jpg)
Use Synonyms
We use word net to find possible synonyms for verbs and nouns to produce more facts.
We only consider 3 synonyms as it takes more time the more fact retrievals we have to do.
![Page 25: Using Information Extraction for Question Answering](https://reader036.vdocuments.mx/reader036/viewer/2022070403/568139e1550346895da1982e/html5/thumbnails/25.jpg)
Using constraints
![Page 26: Using Information Extraction for Question Answering](https://reader036.vdocuments.mx/reader036/viewer/2022070403/568139e1550346895da1982e/html5/thumbnails/26.jpg)
Delimitations
Works well with Who, When, Where questions as named entity is easily determined.Achieves about 90% accuracy on all
Works less well with What, How questionsAchieves about 70% accuracy
Takes about 13 seconds to answer question.
![Page 27: Using Information Extraction for Question Answering](https://reader036.vdocuments.mx/reader036/viewer/2022070403/568139e1550346895da1982e/html5/thumbnails/27.jpg)
Future Work
Build an ontology to determine named entity and parse question (faster)
Handle combinations of questions. When and where did the holocaust
happen?