1
Question Answering Techniques and Systems
Mihai Surdeanu (TALP)Marius Paşca (Google - Research)*
TALP Research CenterDep. Llenguatges i Sistemes Informàtics
Universitat Politècnica de [email protected]
*The work by Marius Pasca (currently [email protected]) was performed as part of his PhD work at Southern Methodist University in Dallas, Texas.
2
Overview What is Question Answering? A “traditional” system Other relevant approaches Distributed Question Answering
3
Problem of Question Answering
What is the nationality of Pope John Paul II?… stabilize the country with its help, the Catholic hierarchy stoutly held
out for pluralism, in large part at the urging of Polish-born Pope John Paul II. When the Pope emphatically defended the Solidarity trade union during a 1987 tour of the…
When was the San Francisco fire?… were driven over it. After the ceremonial tie was removed - it burned in the
San Francisco fire of 1906 – historians believe an unknown Chinese worker probably drove the last steel spike into a wooden tie. If so, it was only…
Where is the Taj Mahal?… list of more than 360 cities around the world includes the Great Reef in
Australia, the Taj Mahal in India, Chartre’s Cathedral in France, and Serengeti National Park in Tanzania. The four sites Japan has listed include…
4
Problem of Question Answering
What is the nationality of Pope John Paul II?… stabilize the country with its help, the Catholic hierarchy stoutly held
out for pluralism, in large part at the urging of Polish-born Pope John Paul II. When the Pope emphatically defended the Solidarity trade union during a 1987 tour of the…
Natural language question, not keyword
queries
Short text fragment, not URL
list
5Searching for: Etna Where is Naxos? Searching for: Naxos What continent is Taormina in? What is the highest volcano in Europe?
Compare with…
Documentcollection
From the Caledonian Star in the Mediterranean – September 23, 1990 (www.expeditions.com):
On a beautiful early morning the Caledonian Star approaches Naxos, situated on the east coast of Sicily. As we anchored and put the Zodiacs into the sea we enjoyed the great scenery. Under Mount Etna, the highest volcano in Europe, perches the fabulous town of Taormina. This is the goal for our morning.After a short Zodiac ride we embarked our buses with local guides and went up into the hills to reach the town of Taormina.Naxos was the first Greek settlement at Sicily. Soon a harbor was established but the town was later destroyed by invaders.[...]
Searching for: Taormina
6
Document Retrieval Users submit queries corresponding to their
information needs. System returns (voluminous) list of full-length
documents. It is the responsibility of the users to find information of
interest within the returned documents. Open-Domain Question Answering (QA)
Users ask questions in natural language. What is the highest volcano in Europe? System returns list of short answers. … Under Mount Etna, the highest volcano in Europe,
perches the fabulous town … Often more useful for specific information needs.
Beyond Document Retrieval
7
Evaluating QA Systems National Institute of Standards and Technology (NIST) organizes yearly
the Text Retrieval Conference (TREC), which has had a QA track for the past 5 years: from TREC-8 in 1999 to TREC-12 in 2003.
The document set Newswire textual documents from LA Times, San Jose Mercury News, Wall
Street Journal, NY Times etcetera: over 1M documents now. Well-formed lexically, syntactically and semantically (were reviewed by
professional editors). The questions
Hundreds of new questions every year, the total is close to 2000 for all TRECs.
Task Initially extract at most 5 answers: long (250B) and short (50B). Now extract only one exact answer. Several other sub-tasks added later: definition, list, context.
Metrics Mean Reciprocal Rank (MRR): each question assigned the reciprocal rank of
the first correct answer. If correct answer at position k, the score is 1/k.
8
Overview What is Question Answering? A “traditional” system
SMU ranked first at TREC-8 and TREC-9
The foundation of LCC’s PowerAnswer system (http://www.languagecomputer.com)
Other relevant approaches Distributed Question Answering
9
QA Block Architecture
QuestionProcessing
QuestionProcessing
PassageRetrieval
PassageRetrieval
AnswerExtraction
AnswerExtraction
WordNet
NER
Parser
WordNet
NER
ParserDocumentRetrieval
DocumentRetrieval
Keywords Passages
Question Semantics
Captures the semantics of the questionSelects keywords for PR
Extracts and ranks passagesusing surface-text techniques
Extracts and ranks answersusing NL techniques
Q A
10
Question Processing Flow
Q Questionparsing
Questionparsing
Construction of thequestion representation
Construction of thequestion representation
Answer type detectionAnswer type detection
Keyword selectionKeyword selection
Questionsemantic representation
AT category
Keywords
11
Lexical Terms Examples Questions approximated by sets of
unrelated words (lexical terms) Similar to bag-of-word IR models
Question (from TREC QA track) Lexical terms
Q002: What was the monetary value of the Nobel Peace Prize in 1989?
monetary, value, Nobel, Peace, Prize
Q003: What does the Peugeot company manufacture?
Peugeot, company, manufacture
Q004: How much did Mercury spend on advertising in 1993?
Mercury, spend, advertising, 1993
Q005: What is the name of the managing director of Apricot Computer?
name, managing, director, Apricot, Computer
12
Question Stems and Answer Type Examples
Question Question stem
Answer type
Q555: What was the name of Titanic’s captain?
What Person
Q654: What U.S. Government agency registers trademarks?
What Organization
Q162: What is the capital of Kosovo? What City
Q661: How much does one ton of cement cost?
How much Quantity
Other question stems: Who, Which, Name, How hot... Other answer types: Country, Number, Product...
Identify the semantic category of expected answers
13
Building the Question Representation
from the question parse tree, bottom-up traversal with a set of propagation rules
Q006: Why did David Koresh ask the FBI for a word processor?Q006: Why did David Koresh ask the FBI for a word processor?
Why did David Koresh ask the FBI for a word processor
WRB VBD NNP NNP VB DT NNP IN DT NN NN
WHADVP NP NP NP
PP VP SQ SBARQ
- assign labels to non-skip leaf nodes- propagate label of head child node, to parent node- link head child node to other children nodes [published in COLING 2000]
14
Building the Question Representation
from the question parse tree, bottom-up traversal with a set of propagation rules
Q006: Why did David Koresh ask the FBI for a word processor?Q006: Why did David Koresh ask the FBI for a word processor?
Why did David Koresh ask the FBI for a word processor
WRB VBD NNP NNP VB DT NNP IN DT NN NN
WHADVP NP NP NP
PP VP SQ SBARQ
askprocessor
askask
FBI processor
REASON
Koresh
Questionrepresentation
DavidKoresh ask FBI
word processorREASON
15
Detecting the Expected Answer Type
In some cases, the question stem is sufficient to indicate the answer type (AT)
Why REASON When DATE
In many cases, the question stem is ambiguous Examples
What was the name of Titanic’s captain ? What U.S. Government agency registers trademarks? What is the capital of Kosovo?
Solution: select additional question concepts (AT words) that help disambiguate the expected answer type
Examples captain agency capital
16
AT Detection Algorithm Select the answer type word from the question
representation. Select the word(s) connected to the question. Some
content-free words are skipped (e.g. “name”). From the previous set select the word with the highest
connectivity in the question representation. Map the AT word in a previously built AT
hierarchy The AT hierarchy is based on WordNet, with some
concepts associated with semantic categories, e.g. “writer” PERSON.
Select the AT(s) from the first hypernym(s) associated with a semantic category.
17
Answer Type Hierarchy
researcheroceanographer
chemist
scientist,man of science
Americanislander,
island-dweller
westerner
inhabitant,dweller, denizen
actor
actress
dancer
performer,performing artist
balletdancertragedian
ERSONP
Whatresearcher
discovered
Hepatitis-B vaccine
What researcher discovered thevaccine against Hepatitis-B?
What is the name of the French oceanographer who owned Calypso?
PERSON What oceanographer
ownedCalypso
name
French
PERSON
18
Evaluation of Answer Type Hierarchy
Controlled variation of the number of WordNet synsets included in answer type hierarchy.
Test on 800 TREC questions.
0% 0.296 3% 0.404 10% 0.437 25% 0.451 50% 0.461
Precision score (50-byte answers)
Hierarchy coverage
The derivation of the answer type is the main source of unrecoverable errors in the QA system
19
Keyword Selection AT indicates what the question is
looking for, but provides insufficient context to locate the answer in very large document collection
Lexical terms (keywords) from the question, possibly expanded with lexical/semantic variations provide the required context
20
Keyword Selection Algorithm
1. Select all non-stop words in quotations2. Select all NNP words in recognized named
entities3. Select all complex nominals with their
adjectival modifiers4. Select all other complex nominals5. Select all nouns with adjectival modifiers6. Select all other nouns7. Select all verbs8. Select the AT word (which was skipped in all
previous steps)
21
Keyword Selection Examples
What researcher discovered the vaccine against Hepatitis-B? Hepatitis-B, vaccine, discover, researcher
What is the name of the French oceanographer who owned Calypso? Calypso, French, own, oceanographer
What U.S. government agency registers trademarks? U.S., government, trademarks, register, agency
What is the capital of Kosovo? Kosovo, capital
22
Passage Retrieval
QuestionProcessing
QuestionProcessing
PassageRetrieval
PassageRetrieval
AnswerExtraction
AnswerExtraction
WordNet
NER
Parser
WordNet
NER
ParserDocumentRetrieval
DocumentRetrieval
Keywords Passages
Question Semantics
Captures the semantics of the questionSelects keywords for PR
Extracts and ranks passagesusing surface-text techniques
Extracts and ranks answersusing NL techniques
Q A
23
Passage Retrieval Architecture
Passage ExtractionPassage Extraction
Passage Quality
Keyword Adjustment
Keyword Adjustment
Passage Scoring
Passage Scoring
Passage Ordering
Passage Ordering
Keywords No
Passages
Yes
Documents
DocumentRetrieval
DocumentRetrieval
RankedPassages
24
Passage Extraction Loop Passage Extraction Component
Extracts passages that contain all selected keywords
Passage size dynamic Start position dynamic
Passage quality and keyword adjustment In the first iteration use the first 6 keyword
selection heuristics If the number of passages is lower than a
threshold query is too strict drop a keyword If the number of passages is higher than a
threshold query is too relaxed add a keyword
25
Passage Scoring (1/2) Passages are scored based on keyword windows For example, if a question has a set of keywords: {k1,
k2, k3, k4}, and in a passage k1 and k2 are matched twice, k3 is matched once, and k4 is not matched, the following windows are built:
k1 k2 k3k2 k1
Window 1
k1 k2 k3k2 k1
Window 2
k1 k2 k3k2 k1
Window 3
k1 k2 k3k2 k1
Window 4
26
Passage Scoring (2/2) Passage ordering is performed using a radix
sort that involves three scores: largest SameWordSequenceScore, largest DistanceScore, smallest MissingKeywordScore.
SameWordSequenceScore Computes the number of words from the question
that are recognized in the same sequence in the window
DistanceScore The number of words that separate the most distant
keywords in the window MissingKeywordScore
The number of unmatched keywords in the window
27
Answer Extraction
QuestionProcessing
QuestionProcessing
PassageRetrieval
PassageRetrieval
AnswerExtraction
AnswerExtraction
WordNet
NER
Parser
WordNet
NER
ParserDocumentRetrieval
DocumentRetrieval
Keywords Passages
Question Semantics
Captures the semantics of the questionSelects keywords for PR
Extracts and ranks passagesusing surface-text techniques
Extracts and ranks answersusing NL techniques
Q A
28
Ranking Candidate Answers
Answer type: Person Text passage: “Among them was Christa
McAuliffe, the first private citizen to fly in space. Karen Allen, best known for her starring role in “Raiders of the Lost Ark”, plays McAuliffe. Brian Kerwin is featured as shuttle pilot Mike Smith...”
Best candidate answer: Christa McAuliffe
Q066: Name the first private citizen to fly in space.
Answer ranking scheme ranking features
29
Features for Answer Ranking
relNMW – number of question terms matched in the answer passage
relSP – number of question terms matched in the same phrase as the candidate answer
relSS – number of question terms matched in the same sentence as the candidate answer
relFP – flag set to 1 if the candidate answer is followed by a punctuation sign
relOCTW – number of question terms matched, separated from the candidate answer by at most three words and one comma
relSWS – number of terms occurring in the same order in the answer passage as in the question
relDTW – average distance from candidate answer to question term matchesRobust heuristics that work on unrestricted text!
30
Answer Ranking based on Machine Learning
Relative relevance score computed for each pair of candidates (answer windows)relPAIR = wSWS relSWS + wFP relFP
+ wOCTW relOCTW + wSP relSP + wSS relSS
+ wNMW relNMW + wDTW relDTW + threshold if relPAIR positive, then first candidate from pair is
more relevant Perceptron model used to learn the weights
published in SIGIR 2001 Scores in the 50% MRR for short answers, in
the 60% MRR for long answers
31
Evaluation on the Web
Answer extraction from Google
AltaVista Answer extraction from AltaVista
Precision score
0.29 0.44 0.15 0.37
Questions with a correct answer among top 5 returned answers
0.44 0.57 0.27 0.45
- test on 350 questions from TREC (Q250-Q600)- extract 250-byte answers
32
System Extension:Answer Justification
Experiments with Open-Domain Textual Question Answering. Sanda Harabagiu, Marius Paşca and Steve Maiorano. Answer justification using unnamed
relations extracted from the question representation and the answer representation (constructed through a similar process).
33
System Extension:Definition Questions
Definition questions ask about the definition or description of a concept:
Who is John Galt? What is anorexia nervosa?
Many “information nuggets” are acceptable answers
Who is George W. Bush? … George W. Bush, the 43rd President of the United
States… George W. Bush defeated Democratic incumbent
Ann Richards to become the 46th Governor of the State of Texas…
Scoring Any information nugget is acceptable Precision score over all information nuggets
34
Answer Detection with Pattern Matching
What <be> a <QP> ?Who <be> <QP> ? example: “Who is Zebulon Pike?”<QP>, the <AP><QP> (a <AP>)<AP HumanConcept> <QP> example: “explorer Zebulon Pike”
Question patterns
Answer patterns
Q386: What is anorexia nervosa?
cause of anorexia nervosa, an eating disorder...
Q358: What is a meerkat?
the meerkat, a type of mongoose, thrives in...
Q340: Who is Zebulon Pike?
in 1806, explorer Zebulon Pike sighted the...
For Definition questions
35
Answer Detection with Concept Expansion
Enhancement for Definition questions Identify terms that are semantically related to the
phrase to define WordNet hypernyms (more general concepts)
Question WordNet hypernym
Detected answer candidate
What is a shaman?
{priest, non-Christian priest}
Mathews is the priest or shaman
What is a nematode?
{worm} nematodes, tiny worms in soil
What is anise? {herb, herbaceous plant}
anise, rhubarb and other herbs
[published in AAAI Spring Symposium 2002]
36
Evaluation on Definition Questions
Determine the impact of answer type detection with pattern matching and concept expansion test on the Definition questions from TREC-9
and TREC-10 (approx. 200 questions) extract 50-byte answers
Results precision score: 0.56 questions with a correct answer among top
5 returned answers: 0.67
37
References Marius Paşca. High-Performance, Open-Domain
Question Answering from Large Text Collections, Ph.D. Thesis, Computer Science and Engineering Department, Southern Methodist University, Defended September 2001, Dallas, Texas
Marius Paşca. Open-Domain Question Answering from Large Text Collections, Center for the Study of Language and Information (CSLI Publications, series: Studies in Computational Linguistics), Stanford, California, Distributed by the University of Chicago Press, ISBN (Paperback): 1575864282, ISBN (Cloth): 1575864274. 2003
38
Overview What is Question Answering? A “traditional” system Other relevant approaches
LCC´s PowerAnswer + COGEX IBM’s PIQUANT CMU’s Javelin ISI’s TextMap BBN’s AQUA
Distributed Question Answering
39
PowerAnswer + COGEX (1/2)
Automated reasoning for QA: A Q, using a logic prover. Facilititates both answer validation and answer extraction.
Both question and answer(s) transformed in logic forms. Example:
Heavy selling of Standard & Poor’s 500-stock index futures in Chicago relentlessly beat stocks downwards.
Heavy_JJ(x1) & selling_NN(x1) & of_IN(x1,x6) & Standard_NN(x2) & &_CC(x13,x2,x3) & Poor(x3) & ‘s_POS(x6,x13) & 500-stock_JJ(x6) & index_NN(x4) & futures(x5) & nn_NNC(x6,x4,x5) & in_IN(x1,x8) & Chicago_NNP(x8) & relentlessly_RB(e12) & beat_VB(e12,x1,x9) & stocks_NN(x9) & downward_RB(e12)
40
PowerAnswer + COGEX (2/2)
World knowledge from: WordNet glosses converted to logic forms in the eXtended
WordNet (XWN) project (http://www.utdallas.edu/~moldovan)
Lexical chains game:n#3 HYPERNYM recreation:n#1 HYPONYM
sport:n#1 Argentine:a#1 GLOSS Argentina:n#1
NLP axioms to handle complex NPs, coordinations, appositions, equivalence classes for prepositions etcetera
Named-entity recognizer John Galt HUMAN
A relaxation mechanism is used to iteratively uncouple predicates, remove terms from LFs. The proofs are penalized based on the amount of relaxation involved.
41
IBM’s Piquant Question processing conceptually similar
to SMU, but a series of different strategies (“agents”) available for answer extraction. For each question type, multiple agents might run in parallel.
Reasoning engine and general-purpose ontology from Cyc used as sanity checker.
Answer resolution: remaining answers are normalized and a voting strategy is used to select the “correct” (meaning most redundant) answer.
42
Piquant QA Agents Predictive annotation agent
“Predictive annotation” = the technique of indexing named entities and other NL constructs along with lexical terms. Lemur has built-in support for this now.
General-purpose agent, used for almost all question types. Statistical Query Agent
Derivation from a probabilistic IR model, also developed at IBM. Also general-purpose.
Description Query Generic descriptions: appositions, parenthetical expressions. Applied mostly to definition questions.
Structured Knowledge Agent Answers from WordNet/Cyc. Applied whenever possible.
Pattern-Based Agent Looks for specific syntactic patterns based on the question form. Applied when the answer is expected in a well-structured form.
Dossier Agent For “Who is X?” questions. A dynamic set of factual questions used to learn “information nuggets”
about persons.
43
Pattern-based Agent Motivation: some questions (with or without AT)
indicate that the answer might be in a structured form
What does Knight Rider publish? transitive verb, missing object.
Knight Rider publishes X. Patterns generated:
From a static pattern repository, e.g. birth and death dates recognition.
Dynamically from the question structure. Matching of the expected answer pattern with
the actual answer text is not at word level, but at a higher linguistic level based on full parse trees (see IE lecture).
44
Dossier Agent Addresses “Who is X?” questions. Generates initially a series of generic
questions: When was X born? What was X’s profession?
Future iterations dynamically decided based on the previous answers? If X’s profession is “writer” the next
question is: What did X write? A static ontology of biographical questions
used.
45
CyC Sanity Checker Post-processing component that
Rejects insane answers “How much does a grey wolf weigh?” “300 tons” A grey wold IS-A wolf. Weight of a wolf known in Cyc. Cyc returns: SANE, INSANE, or DON’T KNOW.
Boosts answer confidence when the answer is SANE. Typically called for numerical answer types:
What is the population of Maryland? How much does a grey wolf weigh? How high is Mt. Hood?
46
Answer Resolution Called when multiple agents are applied
for the same question. Distribution of agents: the predictive-annotation and the statistical agent by far the most common.
Each agent provides a canonical answer (e.g. normalized named entity) and a confidence score.
Final confidence for each candidate answer computed using a ML model with SVM.
47
CMU’s Javelin Architecture combines SMU’s and IBM’s
approaches. Question processing close to SMU’s approach. Passage retrieval loop conceptually similar to
SMU’s, but an elegant implementation. Multiple answer strategies similar to IBM’s
system. All of them are based on ML models (K nearest neighbours, decision trees) that use shallow-text features (close to SMU’s).
Answer voting, similar to IBM’s, used to exploit answer redundancy.
48
Javelin’s Retrieval Strategist
Implements passage retrieval, including the passage retrieval loop.
Uses the Inquiry IR system, probably Lemur by now. The retrieval loop uses all keywords in close proximity of each
other initially (stricter than SMU). Subsequent iterations relax the following query terms
Proximity for all question keywords: 20, 100, 250, AND Phrase proximity for phrase operators: less than 3 words or PHRASE Phrase proximity for named entities: less than 3 words or PHRASE Inclusion/exclusion of AT word
Accuracy for TREC-11 queries: how many questions had at least one correct document in the top N documents:
Top 30 docs: 80% Top 60 docs: 85% Top 120 docs: 86%
49
ISI’s TextMap: Pattern-Based QA
Examples Who invented the cotton gin?
<who> invented the cotton gin <who>'s invention of the cotton gin <who> received a patent for the cotton gin
How did Mahatma Gandhi die? Mahatma Gandhi died <how> Mahatma Gandhi drowned <who> assassinated Mahatma Gandhi
Patterns generated from the question form (similar to IBM), learned using a pattern discovery mechanism, or added manually to a pattern repository
The pattern discovery mechanism performs a series of generalizations from annotated examples:
Babe Ruth was born in Baltimore, on February 6, 1895. PERSON was born *g* in DATE
50
TextMap: QA Machine Translation
In machine translation, one collects translations pairs (s, d) and learns a model how to transform the source s into the destination d.
QA is redefined in a similar way: collect question-answer pairs (a, q) and learn a model that computes the probability that a question is generated from the given answer: p(q | parsetree(a)). The correct answer maximizes this probability.
Only the subsets of answer parse trees where the answer lies are used as training (not the whole sentence).
An off-the-shelf machine translation package (Giza) used to train the model.
51
TextMap:Exploiting the Data Redundancy
Additional knowledge resources are used whenever applicable
WordNet glosses What is a meerkat?
www.acronymfinder.com What is ARDA?
Etcetera The “known” answers are then simply searched in the
document collection together with question keywords Google is used for answer redundancy
TREC and Web (through Google) are searched in parallel. Final answer selected using a maximum entropy ML
model. IBM introduced redundancy for QA agents, ISI uses data
redundancy.
52
BBN’s AQUA Factual system converts both question and
answer to a semantic form (close to SMU’s) Machine learning used to measure the similarity
of the two representations. Was ranked best at the TREC definition pilot
organized before TREC-12 Definition system conceptually close to
SMU’s Had pronominal and nominal coreference
resolution Used a (probably) better parser (Charniak) Post-ranking of candidate answers using a tf * idf
model
53
Overview What is Question Answering? A “traditional” system Other relevant approaches Distributed Question Answering
54
Paragraphs
Sequential Q/A Architecture
Question ProcessingQuestion Processing
ParagraphRetrieval
ParagraphRetrieval
ParagraphScoring
ParagraphScoring
ParagraphOrdering
ParagraphOrdering
AnswerProcessing
AnswerProcessing
Question
Keywords
AcceptedParagraphs
Answers
55
Sequential Architecture Analysis
Module timing analysis
Analysis conclusions Performance bottleneck modules have well-
specified resource requirements fit for DLB Iterative tasks fit for partitioning Reduced inter-module communication effective
module migration/partitioning
Module % of
Task Time Iterative Task?
Granularity
QP 1.2% No PR 26.5% Yes Collection PS 2.2% Yes Paragraph PO 0.1% No AP 69.7% Yes Paragraph
56
Inter-Question Parallelism (1)
Q/A Task
QuestionDispatcher
QuestionDispatcher
LoadMonitor
LoadMonitor
Node 1
Q/A Task
QuestionDispatcher
QuestionDispatcher
LoadMonitor
LoadMonitor
Node N
Local Interconnection NetworkLocal Interconnection Network
…
Internet/DNS
57
Inter-Question Parallelism (2)
Question dispatcher Improves upon the DNS “blind” allocation Allocates a new question to the processor p
best fit for the average question. Processor p minimizes
Recovers from failed questions Load monitor
Updates and broadcasts local load Receives remote load information Detects system configuration changes
)(load)(load)(load pWpWp CPUCPU
QADISKDISK
QAQA
58
Intra-Question Parallelism (1)
ParagraphRetrieval (1)
ParagraphRetrieval (1)
…Question Processing
Question ProcessingQuestio
n
Keywords
ParagraphRetrieval
Dispatcher
ParagraphRetrieval
Dispatcher
ParagraphScoring (1)
ParagraphScoring (1)
ParagraphMerging
ParagraphMerging
Paragraphs
LoadMonitor
LoadMonitor
ParagraphRetrieval (2)
ParagraphRetrieval (2)
ParagraphScoring (2)
ParagraphScoring (2)
ParagraphRetrieval (k)
ParagraphRetrieval (k)
ParagraphScoring (k)
ParagraphScoring (k)
…
59
Intra-Question Parallelism (2)
ParagraphOrdering
ParagraphOrdering
AcceptedParagraphs
AnswerProcessing (1)
AnswerProcessing (1)
AnswerProcessingDispatcher
AnswerProcessingDispatcher
AnswerMerging
AnswerMerging
UnrankedAnswers
AnswerSorting
AnswerSorting Answer
sAnswer
Processing (2)
AnswerProcessing (2)
AnswerProcessing (n)
AnswerProcessing (n)
…
Paragraphs
…
LoadMonitor
LoadMonitor
60
Meta-Scheduling Algorithm
metaScheduler(task, loadFunction, underloadCondition)
select all processors p with underloadCondition(p) true if none selected then select processor p with the smallest
value for loadFunction(p) assign to each selected processor p an weight wp based on
its current load assign to each selected processor p a fraction wp of the
global task
partitioning
migration
62
Partitioning Example
P1 P2 Pn…
processorstime QPQP
PR1PR1
PS1PS1
AP1AP1
PR2PR2
PS2PS2
PRnPRn
PSnPSn
POPO
…
AP2AP2 APn
APn
63
Inter-Question ParallelismSystem Throughput
DNSINTER
DQA
4
8
12
0
2
4
6
8
10
12
14
Throughput(Questions/Minute)
Protocols
Processors
64
Intra-Question Parallelism
0
20
40
60
80
100
120
140
160
Question Response Time
(Seconds)
1 4 8 12
Processors
QPPR + PSPOAPOverhead