trains of thought: generating information maps
DESCRIPTION
Trains of Thought: Generating Information Maps. Dafna Shahaf , Carlos Guestrin and Eric Horvitz. T he abundance of books is a distraction. ‘‘. ,,. Lucius Annaeus Seneca. 4 BC – 65 AD. So, you want to understand a complex topic… Now what?. Search Engines are Great. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Trains of Thought: Generating Information Maps](https://reader030.vdocuments.mx/reader030/viewer/2022033100/56812cd2550346895d918c88/html5/thumbnails/1.jpg)
Trains of Thought: Generating Information Maps
Dafna Shahaf, Carlos Guestrin and Eric Horvitz
![Page 2: Trains of Thought: Generating Information Maps](https://reader030.vdocuments.mx/reader030/viewer/2022033100/56812cd2550346895d918c88/html5/thumbnails/2.jpg)
The abundance of books is a distraction‘‘
,,Lucius Annaeus Seneca
4 BC – 65 AD
![Page 3: Trains of Thought: Generating Information Maps](https://reader030.vdocuments.mx/reader030/viewer/2022033100/56812cd2550346895d918c88/html5/thumbnails/3.jpg)
So, you want to understand a complex topic…
Now what?
![Page 4: Trains of Thought: Generating Information Maps](https://reader030.vdocuments.mx/reader030/viewer/2022033100/56812cd2550346895d918c88/html5/thumbnails/4.jpg)
Search Engines are Great
• But do not show how it all fits together
![Page 5: Trains of Thought: Generating Information Maps](https://reader030.vdocuments.mx/reader030/viewer/2022033100/56812cd2550346895d918c88/html5/thumbnails/5.jpg)
Timeline Systems
![Page 6: Trains of Thought: Generating Information Maps](https://reader030.vdocuments.mx/reader030/viewer/2022033100/56812cd2550346895d918c88/html5/thumbnails/6.jpg)
Real Stories are not Linear
![Page 7: Trains of Thought: Generating Information Maps](https://reader030.vdocuments.mx/reader030/viewer/2022033100/56812cd2550346895d918c88/html5/thumbnails/7.jpg)
Metro Map
• A set of lines• Each line follows a coherent narrative thread• Structure + multiple aspects
austerity
bailout
junk status
Germany
protests
strike
labor unionsMerkel
![Page 8: Trains of Thought: Generating Information Maps](https://reader030.vdocuments.mx/reader030/viewer/2022033100/56812cd2550346895d918c88/html5/thumbnails/8.jpg)
Map Definition• A map M is a pair (G, P) where – G=(V,E) is a directed graph– P is a set of paths in G (metro lines)– Each e Î E must belong to at least one metro line
austerity
bailout
junk status
protests
strike
Germany
labor unionsMerkel
![Page 9: Trains of Thought: Generating Information Maps](https://reader030.vdocuments.mx/reader030/viewer/2022033100/56812cd2550346895d918c88/html5/thumbnails/9.jpg)
Game Plan
Objective Algorithm Does itwork?
![Page 10: Trains of Thought: Generating Information Maps](https://reader030.vdocuments.mx/reader030/viewer/2022033100/56812cd2550346895d918c88/html5/thumbnails/10.jpg)
Properties of a Good Map
1. Coherence
???
![Page 11: Trains of Thought: Generating Information Maps](https://reader030.vdocuments.mx/reader030/viewer/2022033100/56812cd2550346895d918c88/html5/thumbnails/11.jpg)
1 2 3 4 5
Greece
Europe
ItalyRepublican
Protest
Coherence: Main IdeaConnecting the Dots [S, Guestrin, KDD’10]
Debt default
Coherence is not a property of local interactions:
Incoherent: Each pair shares different words
![Page 12: Trains of Thought: Generating Information Maps](https://reader030.vdocuments.mx/reader030/viewer/2022033100/56812cd2550346895d918c88/html5/thumbnails/12.jpg)
1 2 3 4 5
Greece
Austerity
ItalyRepublican
Protest
Coherence: Main IdeaConnecting the Dots [S, Guestrin, KDD’10]
Debt default
A more-coherent chain:
Coherent: a small number of words captures the story
![Page 13: Trains of Thought: Generating Information Maps](https://reader030.vdocuments.mx/reader030/viewer/2022033100/56812cd2550346895d918c88/html5/thumbnails/13.jpg)
Properties of a Good Map
1. Coherence
Is it enough?
![Page 14: Trains of Thought: Generating Information Maps](https://reader030.vdocuments.mx/reader030/viewer/2022033100/56812cd2550346895d918c88/html5/thumbnails/14.jpg)
Max-coherence MapQuery: Clinton
Clinton visitsBelfast
Clinton setfor Dublin
High hopes for Clinton visit
Clinton, Religious Leaders Share
Thoughts
Church Leaders Praise Clinton's
'Spirituality'
Religion Leaders Divided on Clinton
Moral Issue
Clinton Should Resign, 2 Religious
Leaders Say
![Page 15: Trains of Thought: Generating Information Maps](https://reader030.vdocuments.mx/reader030/viewer/2022033100/56812cd2550346895d918c88/html5/thumbnails/15.jpg)
Properties of a Good Map
1. Coherence
2. Coverage
Should cover diverse topics important to
the user
![Page 16: Trains of Thought: Generating Information Maps](https://reader030.vdocuments.mx/reader030/viewer/2022033100/56812cd2550346895d918c88/html5/thumbnails/16.jpg)
Coverage• Select a small set of diverse articles that
covers the most important stories
January 17, 2009
Turning Down the Noise [El-Arini, Veda, S, Guestrin, KDD’09]
![Page 17: Trains of Thought: Generating Information Maps](https://reader030.vdocuments.mx/reader030/viewer/2022033100/56812cd2550346895d918c88/html5/thumbnails/17.jpg)
Coverage: The Idea• Documents cover concepts:
CorpusCoverage
![Page 18: Trains of Thought: Generating Information Maps](https://reader030.vdocuments.mx/reader030/viewer/2022033100/56812cd2550346895d918c88/html5/thumbnails/18.jpg)
High-coverage, Coherent Map
Greek Civil ServantsStrike over
Austerity MeasuresGreece Paralyzed
by New Strike
Greek Take to theStreets, but Lacing
Earlier Zeal
Infighting Adds to Merkel’s Woes
It’s Germany that Matters
UK Backs Germany’s Effort
Germany says the IMF should Rescue
Greece
IMF more Likely to Lead Efforts
IMF is Urged to Move Forward
![Page 19: Trains of Thought: Generating Information Maps](https://reader030.vdocuments.mx/reader030/viewer/2022033100/56812cd2550346895d918c88/html5/thumbnails/19.jpg)
Properties of a Good Map
1. Coherence
2. Coverage
3. Connectivity
![Page 20: Trains of Thought: Generating Information Maps](https://reader030.vdocuments.mx/reader030/viewer/2022033100/56812cd2550346895d918c88/html5/thumbnails/20.jpg)
Definition: Connectivity
• Experimented with formulations• Users do not care about connection type• Encourage connections between pairs of lines
![Page 21: Trains of Thought: Generating Information Maps](https://reader030.vdocuments.mx/reader030/viewer/2022033100/56812cd2550346895d918c88/html5/thumbnails/21.jpg)
Tying it all Together:Map Objective
• Coherence– Either coherent or not: Constraint
• Coverage– Must have!
• Connectivity– Nice to have
Consider all coherent maps with maximum possible coverage.
Find the most connected one.
![Page 22: Trains of Thought: Generating Information Maps](https://reader030.vdocuments.mx/reader030/viewer/2022033100/56812cd2550346895d918c88/html5/thumbnails/22.jpg)
Game Plan
Objective Algorithm Does itwork?
![Page 23: Trains of Thought: Generating Information Maps](https://reader030.vdocuments.mx/reader030/viewer/2022033100/56812cd2550346895d918c88/html5/thumbnails/23.jpg)
Approach Overview
Documents D
…
1. Coherence graph G 2. Coverage function f
f( ) = ?
3. Increase Connectivity
![Page 24: Trains of Thought: Generating Information Maps](https://reader030.vdocuments.mx/reader030/viewer/2022033100/56812cd2550346895d918c88/html5/thumbnails/24.jpg)
Coherence Graph: Main Idea
• Vertices correspond to short coherent chains• Directed edges between chains which can be
conjoined and remain coherent
1 2 3
4 5 6 5 8 9
1 2 3 5 8 9
![Page 25: Trains of Thought: Generating Information Maps](https://reader030.vdocuments.mx/reader030/viewer/2022033100/56812cd2550346895d918c88/html5/thumbnails/25.jpg)
Finding Vertices
• Vertices are short, coherent chains• Can use [KDD’10]– Expensive– Solving many LPs
• Take advantage of simplicity of short stories– No topic drift– Sampling-based (fast) algorithm
![Page 26: Trains of Thought: Generating Information Maps](https://reader030.vdocuments.mx/reader030/viewer/2022033100/56812cd2550346895d918c88/html5/thumbnails/26.jpg)
Finding Edges
• Problem: Combining several strong chains may result in a much-weaker chain
Discontinuity:Change of focus
![Page 27: Trains of Thought: Generating Information Maps](https://reader030.vdocuments.mx/reader030/viewer/2022033100/56812cd2550346895d918c88/html5/thumbnails/27.jpg)
A chain is m-coherent if each sub-chain (di, …, di+m) is coherent.
m-Coherence• Control discontinuity points:
• m: size of user's ‘history window‘– m=length(chain) : standard coherence– m=1: optimize transitions without context
![Page 28: Trains of Thought: Generating Information Maps](https://reader030.vdocuments.mx/reader030/viewer/2022033100/56812cd2550346895d918c88/html5/thumbnails/28.jpg)
Observation
• If two chains are m-Coherent and have m-1 overlap, the conjoined chain is m-coherent:
![Page 29: Trains of Thought: Generating Information Maps](https://reader030.vdocuments.mx/reader030/viewer/2022033100/56812cd2550346895d918c88/html5/thumbnails/29.jpg)
Using the Observation
1 2 3
2 3 4 2 3 5
1 2 3 5
• If two chains are m-Coherent and have m-1 overlap, the conjoined chain is m-coherent:
• Useful for divide and conquer:– Add edge if m-1 overlap
![Page 30: Trains of Thought: Generating Information Maps](https://reader030.vdocuments.mx/reader030/viewer/2022033100/56812cd2550346895d918c88/html5/thumbnails/30.jpg)
Approach Overview
Documents D
…
1. Coherence graph G 2. Coverage function f
f( ) = ?
3. Increase Connectivity
![Page 31: Trains of Thought: Generating Information Maps](https://reader030.vdocuments.mx/reader030/viewer/2022033100/56812cd2550346895d918c88/html5/thumbnails/31.jpg)
Finding High-Coverage Chains• Paths correspond to coherent chains.• Problem: find a path of length K maximizing
coverage of underlying articles
1 2 3
2 3 4 2 3 5
1 2 3 51 2 3 4Cover( ) > Cover( ) ?
![Page 32: Trains of Thought: Generating Information Maps](https://reader030.vdocuments.mx/reader030/viewer/2022033100/56812cd2550346895d918c88/html5/thumbnails/32.jpg)
Reformulation• Paths correspond to coherent chains.• Problem: find a path of length K maximizing
coverage of underlying articles
• Submodular orienteering– [Chekuri and Pal, 2005]– Quasipolynomial time recursive greedy– O(log OPT) approximation
Orienteering
a function of the nodes visited
![Page 33: Trains of Thought: Generating Information Maps](https://reader030.vdocuments.mx/reader030/viewer/2022033100/56812cd2550346895d918c88/html5/thumbnails/33.jpg)
Approach Overview: Recap
Documents D
…
1. Coherence graph G 2. Coverage function f
f( ) = ?
3. Increase Connectivity
Encodes all m-coherent
chains as graph paths
Submodular orienteering [Chekuri & Pal, 2005]
Quasipoly time recursive greedy
O(log OPT) approximation
![Page 34: Trains of Thought: Generating Information Maps](https://reader030.vdocuments.mx/reader030/viewer/2022033100/56812cd2550346895d918c88/html5/thumbnails/34.jpg)
Example Map: Greece Debt
![Page 35: Trains of Thought: Generating Information Maps](https://reader030.vdocuments.mx/reader030/viewer/2022033100/56812cd2550346895d918c88/html5/thumbnails/35.jpg)
Game Plan
Objective Algorithm Does itwork?
![Page 36: Trains of Thought: Generating Information Maps](https://reader030.vdocuments.mx/reader030/viewer/2022033100/56812cd2550346895d918c88/html5/thumbnails/36.jpg)
Evaluation
• User study– Document selection: capturing important content?– Micro-knowledge: question-answering– Macro-knowledge: high-level summaries– Effect of structure
• New York Times (2008-2010)– 18K+ articles– Chile, Haiti, Greece
![Page 37: Trains of Thought: Generating Information Maps](https://reader030.vdocuments.mx/reader030/viewer/2022033100/56812cd2550346895d918c88/html5/thumbnails/37.jpg)
Document Selection• Experts compose a list of important events• Subtopic recall (% of events in the map):
# lines
Subtopicrecall
![Page 38: Trains of Thought: Generating Information Maps](https://reader030.vdocuments.mx/reader030/viewer/2022033100/56812cd2550346895d918c88/html5/thumbnails/38.jpg)
Micro-Knowledge (Question Answering)
• Mechanical Turk
• Competitors:– Google News– Event threading (TDT) [Nallapati et al, 04]
– Structureless maps• Results: minor gains– map structure helps
Question 2: How many miners were trapped?
![Page 39: Trains of Thought: Generating Information Maps](https://reader030.vdocuments.mx/reader030/viewer/2022033100/56812cd2550346895d918c88/html5/thumbnails/39.jpg)
Macro-Knowledge(High-Level Summaries)
• Summarize complex story in a paragraph– Maps vs. Google News– ~15 paragraphs per task
• Mturk to evaluate paragraphs:– Which paragraph provided a more complete and
coherent picture of the story?– Justification: Paragraph A is more… – ~300 evaluations per task
![Page 40: Trains of Thought: Generating Information Maps](https://reader030.vdocuments.mx/reader030/viewer/2022033100/56812cd2550346895d918c88/html5/thumbnails/40.jpg)
Macro-Knowledge: Results
• Greece: 72% prefer maps– Justifications:
• Haiti: 59% prefer maps– Map users mostly summarized one story line
MapsGoogle News
Bottom line: maps are more useful as high-level tools for stories without a single dominant storyline
![Page 41: Trains of Thought: Generating Information Maps](https://reader030.vdocuments.mx/reader030/viewer/2022033100/56812cd2550346895d918c88/html5/thumbnails/41.jpg)
Conclusions• Formulated metrics characterizing good maps• Efficient methods with theoretical guarantees• User studies highlight the promise of the method• Website on the way!• Personalization
Thank you!
![Page 42: Trains of Thought: Generating Information Maps](https://reader030.vdocuments.mx/reader030/viewer/2022033100/56812cd2550346895d918c88/html5/thumbnails/42.jpg)
![Page 43: Trains of Thought: Generating Information Maps](https://reader030.vdocuments.mx/reader030/viewer/2022033100/56812cd2550346895d918c88/html5/thumbnails/43.jpg)
Finding Coherent Chains
• Goal: represent all coherent chains• Problem: intractable
• Divide and conquer:– Find short coherent chains– Concatenate to form longer coherent chains
![Page 44: Trains of Thought: Generating Information Maps](https://reader030.vdocuments.mx/reader030/viewer/2022033100/56812cd2550346895d918c88/html5/thumbnails/44.jpg)
Website