is question answering an acquired skill? soumen chakrabarti iit bombay with ganesh ramakrishnan...

Is Question Answering an Acquired Skill? Soumen Chakrabarti IIT Bombay With Ganesh Ramakrishnan Deepa Paranjpe Vijay Krishnan Arnab Nandi Slide 3 QAChakrabarti Web search and QA Information need words relating things + thing aliases = telegraphic Web queries Cheapest laptop with wireless best price laptop 802.11 Why is the sky blue? sky blue reason When was the Space Needle built? Space Needle history Entity + relation extraction technology better than ever (SemTag, KnowItAll, Biotext) Ontology extension (e.g., is a kind of) List extraction (e.g., is an instance of) Slot-filling (author X wrote book Y) Slide 4 QAChakrabarti Factoid QA Specialize given domain to a token related to ground constants in the query What animal is Winnie the Pooh? hyponym(animal) NEAR Winnie the Pooh When was television invented? instance-of(time) NEAR television NEAR synonym(invented) FIND x NEAR GroundConstants(question) WHERE x IS-A Atype(question) Ground constants: Winnie the Pooh, television Atypes: animal, time Slide 5 QAChakrabarti A relational view of QA Entity class or atype may be expressed by A finite IS-A hierarchy (e.g. WordNet, TAP) A surface pattern matching infinitely many strings (e.g. digit+, Xx+, preceded by a preposition) Match selectors, specialize atype to answer tokens QuestionAtype clues Selectors Answer passage Question words Answer zone Direct syntactic match Entity class IS-A Limit search to certain rows Locate which column to read Answer zone Attribute or column name Slide 6 QAChakrabarti Benefits of the relational view Scaling up by dumbing down Next stop after vector-space Far short of real knowledge representation and inference Barely getting practical at (near) Web scale Can set up as a learning problem: train with questions (query logs) and answers in context Transparent, self-tuning, easy to deploy Feature extractors used in entity taggers Relational/graphical learning on features Slide 7 QAChakrabarti What TREC QA feels like How to assemble chunker, parser, POS and NE tagger, WordNet, WSD, into a QA system? Experts get much insight from old QA pairs Matching an upper-cased term adds a 60% bonus for multi-words terms and 30% for single words Matching a WordNet synonym discounts by 10% (lower case) and 50% (upper case) Lower-case term matches after Porter stemming are discounted 30%; upper-case matches 70% Slide 8 QAChakrabarti Talk outline Relational interpretation of QA Motivation for a clean-room IE+ML system Learning to map between questions and answers using is-a hierarchies and IE-style surface patterns Can handle prominent finite set of atypes: person, place, time, measurements, Extending to arbitrary atype specializations Required for what and which questions Ongoing work and concluding remarks Slide 9 QAChakrabarti Feature + Soft match FIND x NEAR GroundConstants(question) WHERE x IS-A Atype(question) No fixed question or answer type system Convert x IS-A Atype(question) to a soft match DoesAtypeMatch(x, question) QuestionAnswer tokens Passage IE-style surface feature extractors WordNet hypernym feature extractors IE-style surface feature extractors Question feature vector Snippet feature vector Learn joint distrib. Slide 10 QAChakrabarti Feature extraction: Intuition howwho fastmanyfarrich wrotefirst How fast can a cheetah run? A cheetah can chase its prey at up to 90 km/h How fast does light travel? Nothing moves faster than 186,000 miles per hour, the speed of light rate#n#2 abstraction#n#6 NNS rate#n#2 magnitude_relation#n#1 mile#n#3 linear_unit#n#1 measure#n#3 definite_quantity#n#1 paper_money#n#1 currency#n#1 writer, composer, artist, musician NNP, person explorer Slide 11 QAChakrabarti Feature extractors Question features: 1, 2, 3-token sequences starting with standard wh-words Passage surface features: hasCap, hasXx, isAbbrev, hasDigit, isAllDigit, lpos, rpos, Passage WordNet features: all noun hypernym ancestors of all senses of token Get top 300 passages from IR engine For each token invoke feature extractors Label = 1 if token is in answer span, 0 o/w Question vector x q, passage vector x p Slide 12 QAChakrabarti Preliminary likelihood ratio tests Surface patternsWordNet hypernyms Slide 13 QAChakrabarti A simple, flat conditional model Let x = x q x p (pairwise product of elems) Model Pr(Y=1|x) = exp(w x)/(1+exp(w x)) For every question-feature, passage-feature pair, w has a parameter Expect to perform better than linear model x=(x p,x q ) Can discount for redundancy in pair info If x q (x p ) is fixed, what x p (x q ) will yield the largest Pr(Y=1|x)? (linear iceberg query) how_far when what_city region#n#3 entity#n#1 Slide 14 QAChakrabarti Classification accuracy Pairing more accurate than linear model Steep learning curve; linear never gets it beyond prior atypes like proper nouns (common in TREC) Are the estimated w parameters meaningful? Slide 15 QAChakrabarti Parameter anecdotes Surface and WordNet features complement each other General concepts get negative params: use in predictive annotation Learning is symmetric (Q A) Slide 16 QAChakrabarti Query-driven information extraction Basis of atypes A, a A could be a synset, a surface pattern, feature of a parse tree Question q projected to vector (w a : a A) in atype space via learning conditional model E.g. if q is when or how long w hasDigit and w time_period#n#1 are large, w region#n#1 is small Each corpus token t has associated indicator features a (t ) for every a E.g. hasDigit (3,000) = is-a(region#n#1) (Japan) = 1 Can also learn [0,1] value of is-a proximity Slide 17 QAChakrabarti Single token scoring A token t is a candidate answer if H q (t ): Reward tokens appearing near selectors matched from question 0/1: appears within fixed window with selector/s Activation in linear token sequence model Proximity in chunk sequences, parse trees, Order tokens by decreasing Atype indicator features of the token Projection of question to atype space the armadillo, found in Texas, is covered with strong horny plates Slide 18 QAChakrabarti Mean reciprocal rank (MRR) n q = smallest rank among answer passages MRR = (1/|Q|) q Q (1/n q ) Dropping passage from #1 to #2 as bad as dropping it from #2 to TREC requires MRR5: round up n q >5 to Improving rank from 20 to 6 as useless as improving it from 20 to 15 Aggregate score influenced by many complex subsystems Complete description rarely available Slide 19 QAChakrabarti Effect of eliminating non-answers 300 top IR score hits If Pr(Y=1|token) < threshold reject token All tokens rejected then reject passage Present survivors in IR order Slide 20 QAChakrabarti Drill-down and ablation studies Scale average MRR improvement to 1 What, Which < average Who average Atype of what and which not captured well by 3-grams starting at wh-words Atype ranges over essentially infinite set with relatively little training data Slide 21 QAChakrabarti Talk outline Relational interpretation of QA Motivation for a clean-room IE+ML system Learning to map between questions and answers using is-a hierarchies and IE-style surface patterns Can handle prominent finite set of atypes: person, place, time, measurements, Extending to arbitrary atype specializations Required for what and which questions Ongoing work and concluding remarks Slide 22 QAChakrabarti What, which, name atype clues Assumption: Question sentence has a wh- word and a main/auxiliary verb Observation: Atype clues are embedded in a noun phrase (NP) adjoining the main or auxiliary verb Heuristic: Atype clue = head of this NP Use a shallow parser and apply rule Head can have attributes Which (American (general)) is buried in Salzburg? Name (Saturns (largest (moon))) Slide 23 QAChakrabarti Atype clue extraction stats Simple heuristic quite effective If successful, extracted atype is mapped to WordNet synset (moon celestial body etc.) If no atype of this form available, try the self- evident atypes (who, when, where, how_X etc.) New boolean feature for candidate token: is token hyponym of atype synset? Slide 24 QAChakrabarti The last piece: Learning selectors Which question words are likely to appear (almost) unchanged in an answer passage? Constants in select-clauses of SQL queries Guides backoff policy for keyword query Local and global features POS of word, POS of adjacent words, case info, proximity to wh-word Suppose word is associated with synset set S NumSense: size of S (how polysemous is the word?) NumLemma: average #lemmas describing s S POS@0POS@1POS@-1 Slide 25 QAChakrabarti Selector results Global features (IDF, NumSense, NumLemma) essential for accuracy Best F1 accuracy with local features alone: 7173% With local and global features: 81% Decision trees better than logistic regression F1=81% as against LR F1=75% Intuitive decision branches But logistic regression gives scores for query backoff Slide 26 QAChakrabarti Putting together a QA system QA System Wordnet POS Tagger Training Corpus Shallow parser Learning tools N-E Tagger Slide 27 QAChakrabarti Question Passage Index Corpus Sentence splitter Passage indexer Candidate passage Keyword query Keyword query generator Shallow Parser Noun and verb markers Atype Extractor Atype clues Learning to rerank passages Sample features: Do selectors match? How many? Is some non-selector passage token a specialization of the questions atype clue? Min, avg, linear token distance between candidate token and matched selectors Learning to rerank passages Sample features: Do selectors match? How many? Is some non-selector passage token a specialization of the questions atype clue? Min, avg, linear token distance between candidate token and matched selectors Logistic Regression Reranked passages Putting together a QA system Tokenizer POS Tagger Tagged question Tokenizer POS Tagger Entity Extractor Tagged passage Selector Learner Is QA pair? Slide 28 QAChakrabarti Learning to re-rank passages Remove passage tokens matching selectors User already knows these are in passage Find passage token/s specializing atype For each candidate token collect Atype of question, original rank of passage Min, avg linear distances to matched selectors POS and entity tag of token if available Ushuaia, a port of about 30,000 dwellers set between the Beagle Channel and How many inhabitants live in the town of Ushuaia selector match Surface pattern hasDigits WordNet match 5 tokens apart1 Slide 29 QAChakrabarti Re-ranking results Categorical and numeric attributes Logistic regression Good precision, poor recall Use logit score to re-rank passages Rank of first correct passage shifts substantially Slide 30 QAChakrabarti MRR gains from what, which, name Substantial gain in MRR What/which now show above-average MRR gains TREC 2000 top MRRs: 0.76 0.71 0.46 0.46 0.31 Slide 31 QAChakrabarti Generalization across corpora Across-year numbers close to train/test split on a single year Features and model seem to capture corpus- independent linguistic Q+A artifacts Slide 32 QAChakrabarti Conclusion Clean-room QA= feature extraction+learning Recover structure info from question Learn correlations between question structure and passage features Competitive accuracy with negligible domain expertise or manual intervention Ongoing work Model how selector and atype are related Model coefficients to predictive annotation Combine token scores to better passage scores Treat all question types uniformly Use redundancy available from the Web

is question answering an acquired skill? soumen chakrabarti iit bombay with ganesh ramakrishnan...

Documents