amplifying community content creation with mixed-initiative information extraction
DESCRIPTION
Amplifying Community Content Creation with Mixed-Initiative Information Extraction. Raphael Hoffmann, Saleema Amershi, Kayur Patel, Fei Wu, James Fogarty, Daniel S. Weld. “What Russian-born writers publish in the U.S.?”. Advanced Interfaces Leverage Structure of Content. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Amplifying Community Content Creation with Mixed-Initiative Information Extraction](https://reader036.vdocuments.mx/reader036/viewer/2022062501/56816766550346895ddc4a55/html5/thumbnails/1.jpg)
Amplifying Community Content
Creation with Mixed-Initiative
Information ExtractionRaphael Hoffmann, Saleema Amershi, Kayur
Patel, Fei Wu, James Fogarty, Daniel S. Weld
![Page 2: Amplifying Community Content Creation with Mixed-Initiative Information Extraction](https://reader036.vdocuments.mx/reader036/viewer/2022062501/56816766550346895ddc4a55/html5/thumbnails/2.jpg)
“What Russian-born writers publish in the U.S.?”
![Page 3: Amplifying Community Content Creation with Mixed-Initiative Information Extraction](https://reader036.vdocuments.mx/reader036/viewer/2022062501/56816766550346895ddc4a55/html5/thumbnails/3.jpg)
Advanced Interfaces Leverage Structure of Content
Huynh et al., UIST’06
Hoffmann et al., UIST’07Toomim et al., CHI’09
Dontcheva et al., UIST’06, UIST’07
![Page 4: Amplifying Community Content Creation with Mixed-Initiative Information Extraction](https://reader036.vdocuments.mx/reader036/viewer/2022062501/56816766550346895ddc4a55/html5/thumbnails/4.jpg)
How can we obtain the necessary structure on Web
scale?• Community Content Creation• Information Extraction
![Page 5: Amplifying Community Content Creation with Mixed-Initiative Information Extraction](https://reader036.vdocuments.mx/reader036/viewer/2022062501/56816766550346895ddc4a55/html5/thumbnails/5.jpg)
Community Content Creation
![Page 6: Amplifying Community Content Creation with Mixed-Initiative Information Extraction](https://reader036.vdocuments.mx/reader036/viewer/2022062501/56816766550346895ddc4a55/html5/thumbnails/6.jpg)
Community Content Creation
Requires• Critical
mass• Incentives
![Page 7: Amplifying Community Content Creation with Mixed-Initiative Information Extraction](https://reader036.vdocuments.mx/reader036/viewer/2022062501/56816766550346895ddc4a55/html5/thumbnails/7.jpg)
Information Extraction
![Page 8: Amplifying Community Content Creation with Mixed-Initiative Information Extraction](https://reader036.vdocuments.mx/reader036/viewer/2022062501/56816766550346895ddc4a55/html5/thumbnails/8.jpg)
Information Extraction
• Training dataexpensive
• Error-prone
![Page 9: Amplifying Community Content Creation with Mixed-Initiative Information Extraction](https://reader036.vdocuments.mx/reader036/viewer/2022062501/56816766550346895ddc4a55/html5/thumbnails/9.jpg)
Our Goal: Synergistic Pairing
![Page 10: Amplifying Community Content Creation with Mixed-Initiative Information Extraction](https://reader036.vdocuments.mx/reader036/viewer/2022062501/56816766550346895ddc4a55/html5/thumbnails/10.jpg)
More user contributions
![Page 11: Amplifying Community Content Creation with Mixed-Initiative Information Extraction](https://reader036.vdocuments.mx/reader036/viewer/2022062501/56816766550346895ddc4a55/html5/thumbnails/11.jpg)
More precise extractors
![Page 12: Amplifying Community Content Creation with Mixed-Initiative Information Extraction](https://reader036.vdocuments.mx/reader036/viewer/2022062501/56816766550346895ddc4a55/html5/thumbnails/12.jpg)
What this work is about• Synergistic method for amplifying
Community Content Creation and Information Extraction
• Use of search advertising for evaluation
![Page 13: Amplifying Community Content Creation with Mixed-Initiative Information Extraction](https://reader036.vdocuments.mx/reader036/viewer/2022062501/56816766550346895ddc4a55/html5/thumbnails/13.jpg)
Outline• Motivation• Case Study: Intelligence in Wikipedia• Designing for the Wikipedia
Community• Search Advertising Deployment
Study• Conclusion
![Page 14: Amplifying Community Content Creation with Mixed-Initiative Information Extraction](https://reader036.vdocuments.mx/reader036/viewer/2022062501/56816766550346895ddc4a55/html5/thumbnails/14.jpg)
Case Study: Intelligence in Wikipedia
What Russian-born writers publish in the U.S.?Search
![Page 15: Amplifying Community Content Creation with Mixed-Initiative Information Extraction](https://reader036.vdocuments.mx/reader036/viewer/2022062501/56816766550346895ddc4a55/html5/thumbnails/15.jpg)
<Ayn Rand, birthdate, February 2, 1905><Ayn Rand, birthplace, Saint Petersburg><Ayn Rand, occupation, writer>
Some Structured Content in Wikipedia
![Page 16: Amplifying Community Content Creation with Mixed-Initiative Information Extraction](https://reader036.vdocuments.mx/reader036/viewer/2022062501/56816766550346895ddc4a55/html5/thumbnails/16.jpg)
Lack of Structured Content in Wikipedia
![Page 17: Amplifying Community Content Creation with Mixed-Initiative Information Extraction](https://reader036.vdocuments.mx/reader036/viewer/2022062501/56816766550346895ddc4a55/html5/thumbnails/17.jpg)
Previous Work:Learning from Existing
Infoboxes [Wu et.al. CIKM’07]
<Ben, birthplace, Paris>Ben is living in Paris.
Extractor(~60-90% precision)
![Page 18: Amplifying Community Content Creation with Mixed-Initiative Information Extraction](https://reader036.vdocuments.mx/reader036/viewer/2022062501/56816766550346895ddc4a55/html5/thumbnails/18.jpg)
Community-based Validation
of Extractions
“We think Ayn Rand’s birthplace is Saint Petersburg. Is this correct?”
![Page 19: Amplifying Community Content Creation with Mixed-Initiative Information Extraction](https://reader036.vdocuments.mx/reader036/viewer/2022062501/56816766550346895ddc4a55/html5/thumbnails/19.jpg)
Outline• Motivation• Case Study: Intelligence in Wikipedia• Designing for the Wikipedia
Community• Search Advertising Deployment
Study• Conclusion
![Page 20: Amplifying Community Content Creation with Mixed-Initiative Information Extraction](https://reader036.vdocuments.mx/reader036/viewer/2022062501/56816766550346895ddc4a55/html5/thumbnails/20.jpg)
MethodDesign• Interviews with Wikipedians• Design of 3 interfaces• Talk-aloud studies with 9 participants
Evaluation• Search advertising study with 2473
visitors
![Page 21: Amplifying Community Content Creation with Mixed-Initiative Information Extraction](https://reader036.vdocuments.mx/reader036/viewer/2022062501/56816766550346895ddc4a55/html5/thumbnails/21.jpg)
Incentivizing ContributionAudience• Target experienced Wikipedians
(power law)• Target newcomers
Motivation• Co-ercion (unacceptable to
Wikipedia)• Using information extraction to make
the ability to contribute visible and easy
![Page 22: Amplifying Community Content Creation with Mixed-Initiative Information Extraction](https://reader036.vdocuments.mx/reader036/viewer/2022062501/56816766550346895ddc4a55/html5/thumbnails/22.jpg)
Contribution as a Non-Primary Task
• We want to solicit contributions from people pursuing some other task(the information need that brought them to this article)
• Using information extraction to ease contribution, we explore a tradeoff between intrusiveness and contribution rate(Popup, Highlight, and Icon designs)
![Page 23: Amplifying Community Content Creation with Mixed-Initiative Information Extraction](https://reader036.vdocuments.mx/reader036/viewer/2022062501/56816766550346895ddc4a55/html5/thumbnails/23.jpg)
Designed Three Interfaces• Popup
(immediate interruption strategy)• Highlight
(negotiated interruption strategy)• Icon
(negotiated interruption strategy)
![Page 24: Amplifying Community Content Creation with Mixed-Initiative Information Extraction](https://reader036.vdocuments.mx/reader036/viewer/2022062501/56816766550346895ddc4a55/html5/thumbnails/24.jpg)
Popup Interface
![Page 25: Amplifying Community Content Creation with Mixed-Initiative Information Extraction](https://reader036.vdocuments.mx/reader036/viewer/2022062501/56816766550346895ddc4a55/html5/thumbnails/25.jpg)
Highlight Interface
hover
![Page 26: Amplifying Community Content Creation with Mixed-Initiative Information Extraction](https://reader036.vdocuments.mx/reader036/viewer/2022062501/56816766550346895ddc4a55/html5/thumbnails/26.jpg)
Highlight Interface
![Page 27: Amplifying Community Content Creation with Mixed-Initiative Information Extraction](https://reader036.vdocuments.mx/reader036/viewer/2022062501/56816766550346895ddc4a55/html5/thumbnails/27.jpg)
Highlight Interface
hover
![Page 28: Amplifying Community Content Creation with Mixed-Initiative Information Extraction](https://reader036.vdocuments.mx/reader036/viewer/2022062501/56816766550346895ddc4a55/html5/thumbnails/28.jpg)
Highlight Interface
![Page 29: Amplifying Community Content Creation with Mixed-Initiative Information Extraction](https://reader036.vdocuments.mx/reader036/viewer/2022062501/56816766550346895ddc4a55/html5/thumbnails/29.jpg)
Icon Interface
hover
![Page 30: Amplifying Community Content Creation with Mixed-Initiative Information Extraction](https://reader036.vdocuments.mx/reader036/viewer/2022062501/56816766550346895ddc4a55/html5/thumbnails/30.jpg)
Icon Interface
![Page 31: Amplifying Community Content Creation with Mixed-Initiative Information Extraction](https://reader036.vdocuments.mx/reader036/viewer/2022062501/56816766550346895ddc4a55/html5/thumbnails/31.jpg)
Icon Interface
hover
![Page 32: Amplifying Community Content Creation with Mixed-Initiative Information Extraction](https://reader036.vdocuments.mx/reader036/viewer/2022062501/56816766550346895ddc4a55/html5/thumbnails/32.jpg)
Icon Interface
![Page 33: Amplifying Community Content Creation with Mixed-Initiative Information Extraction](https://reader036.vdocuments.mx/reader036/viewer/2022062501/56816766550346895ddc4a55/html5/thumbnails/33.jpg)
Outline• Motivation• Case Study: Intelligence in Wikipedia• Designing for the Wikipedia
Community• Search Advertising Deployment
Study• Conclusion
![Page 34: Amplifying Community Content Creation with Mixed-Initiative Information Extraction](https://reader036.vdocuments.mx/reader036/viewer/2022062501/56816766550346895ddc4a55/html5/thumbnails/34.jpg)
How do you evaluate this?Contribution as a non-primary task
Can lab study show if interfaces increase
spontaneous contributions?
![Page 35: Amplifying Community Content Creation with Mixed-Initiative Information Extraction](https://reader036.vdocuments.mx/reader036/viewer/2022062501/56816766550346895ddc4a55/html5/thumbnails/35.jpg)
Search Advertising Study • Deployed interfaces on Wikipedia
proxy • 2000 articles• One ad per article
“ray bradbury”
![Page 36: Amplifying Community Content Creation with Mixed-Initiative Information Extraction](https://reader036.vdocuments.mx/reader036/viewer/2022062501/56816766550346895ddc4a55/html5/thumbnails/36.jpg)
Search Advertising Study• Select interface round-robin• Track session ID, time, all
interactions• Questionnaire pops up 60 sec after
page loads
Logs
baseline
popup
highlight
icon
proxy
![Page 37: Amplifying Community Content Creation with Mixed-Initiative Information Extraction](https://reader036.vdocuments.mx/reader036/viewer/2022062501/56816766550346895ddc4a55/html5/thumbnails/37.jpg)
Baseline Interface
![Page 38: Amplifying Community Content Creation with Mixed-Initiative Information Extraction](https://reader036.vdocuments.mx/reader036/viewer/2022062501/56816766550346895ddc4a55/html5/thumbnails/38.jpg)
Search Advertising Study• Used Yahoo and Google• 2473 visitors• Deployment for ~ 7 days• ~ 1M impressions• Estimated cost: $1500
(generous support from Yahoo)
![Page 39: Amplifying Community Content Creation with Mixed-Initiative Information Extraction](https://reader036.vdocuments.mx/reader036/viewer/2022062501/56816766550346895ddc4a55/html5/thumbnails/39.jpg)
An Early Observation
“We think Ray Bradbury’s nationalityis American. Is this correct?”
“Please check with the Britannica!”
“If I knew would I really need to look”“We think the summary should
say Ray Bradbury’s nationality is American. Is this what the article
says?”
![Page 40: Amplifying Community Content Creation with Mixed-Initiative Information Extraction](https://reader036.vdocuments.mx/reader036/viewer/2022062501/56816766550346895ddc4a55/html5/thumbnails/40.jpg)
Baseline Icon Highlight PopupVisitors 476 869 563 565Distinct Contributors 0 26 42 44
Contribution Likelihood 0% 3.0% 7.5% 7.8%
Number of Contributions 0 58 88 78
Contributions per Visit 0 .07 .16 .14
Survey Responses 12 24 25 18
Saw I Could Help Improve
11/33(33%)
30/73(41%)
23/58(40%)
24/52(46%)
Intrusiveness (1:not – 5:very) 3.0 3.3 3.5 3.5
![Page 41: Amplifying Community Content Creation with Mixed-Initiative Information Extraction](https://reader036.vdocuments.mx/reader036/viewer/2022062501/56816766550346895ddc4a55/html5/thumbnails/41.jpg)
Baseline Icon Highlight PopupVisitors 476 869 563 565Distinct Contributors 0 26 42 44
Contribution Likelihood 0% 3.0% 7.5% 7.8%
Number of Contributions 0 58 88 78
Contributions per Visit 0 .07 .16 .14
Survey Responses 12 24 25 18
Saw I Could Help Improve
11/33(33%)
30/73(41%)
23/58(40%)
24/52(46%)
Intrusiveness (1:not – 5:very) 3.0 3.3 3.5 3.5
![Page 42: Amplifying Community Content Creation with Mixed-Initiative Information Extraction](https://reader036.vdocuments.mx/reader036/viewer/2022062501/56816766550346895ddc4a55/html5/thumbnails/42.jpg)
![Page 43: Amplifying Community Content Creation with Mixed-Initiative Information Extraction](https://reader036.vdocuments.mx/reader036/viewer/2022062501/56816766550346895ddc4a55/html5/thumbnails/43.jpg)
More user contributions
![Page 44: Amplifying Community Content Creation with Mixed-Initiative Information Extraction](https://reader036.vdocuments.mx/reader036/viewer/2022062501/56816766550346895ddc4a55/html5/thumbnails/44.jpg)
More precise extractors
![Page 45: Amplifying Community Content Creation with Mixed-Initiative Information Extraction](https://reader036.vdocuments.mx/reader036/viewer/2022062501/56816766550346895ddc4a55/html5/thumbnails/45.jpg)
Users are conservative• Of extractions that visitors marked
as correct, 90.4% were indeed valid
• Of extractions that visitors marked as incorrect, 57.9% were indeed incorrect
![Page 46: Amplifying Community Content Creation with Mixed-Initiative Information Extraction](https://reader036.vdocuments.mx/reader036/viewer/2022062501/56816766550346895ddc4a55/html5/thumbnails/46.jpg)
Area under Precision/Recall curve
with only existing infoboxes
Areaunder
P/R curve
birth
_dat
e
birth
_pla
ce
deat
h_da
te
natio
nalit
y
occu
patio
n
Using 5 existing infoboxes per attribute
0
.12
![Page 47: Amplifying Community Content Creation with Mixed-Initiative Information Extraction](https://reader036.vdocuments.mx/reader036/viewer/2022062501/56816766550346895ddc4a55/html5/thumbnails/47.jpg)
Area under Precision/Recall curve
after adding user contributions
0
.12Area
underP/R curve
birth
_dat
e
birth
_pla
ce
deat
h_da
te
natio
nalit
y
occu
patio
n
Using 5 existing infoboxes per attribute
![Page 48: Amplifying Community Content Creation with Mixed-Initiative Information Extraction](https://reader036.vdocuments.mx/reader036/viewer/2022062501/56816766550346895ddc4a55/html5/thumbnails/48.jpg)
Improvements and Number of Existing
Infoboxes• Improvements larger if few existing
infoboxes– significant improvements for 5, 10, 25,
50, 100 existing infoboxes
• Most infobox classes have few instances– 72% of classes have 100 or fewer
instances– 40% of classes have 10 or fewer
instances
![Page 49: Amplifying Community Content Creation with Mixed-Initiative Information Extraction](https://reader036.vdocuments.mx/reader036/viewer/2022062501/56816766550346895ddc4a55/html5/thumbnails/49.jpg)
Synergy
![Page 50: Amplifying Community Content Creation with Mixed-Initiative Information Extraction](https://reader036.vdocuments.mx/reader036/viewer/2022062501/56816766550346895ddc4a55/html5/thumbnails/50.jpg)
Going Beyond Wikipedia• Research on contribution to
communities shows parallels between Wikipedia and others
• Wikipedians may not be typical, but our contributions were solicited from people using search to complete their everyday tasks
• Goal: Hooks to platforms like MediaWiki
![Page 51: Amplifying Community Content Creation with Mixed-Initiative Information Extraction](https://reader036.vdocuments.mx/reader036/viewer/2022062501/56816766550346895ddc4a55/html5/thumbnails/51.jpg)
Conclusions• Synergistic method for amplifying
Community Content Creation and Information Extraction– Significantly increased likelihood of
contribution– Significantly improved quality of
extraction• Demonstrated use of search
advertising in evaluating interfaces as a non-primary task
![Page 52: Amplifying Community Content Creation with Mixed-Initiative Information Extraction](https://reader036.vdocuments.mx/reader036/viewer/2022062501/56816766550346895ddc4a55/html5/thumbnails/52.jpg)
Raphael HoffmannSaleema Amershi
Kayur PatelFei Wu
James FogartyDaniel S. Weld
{raphaelh,samershi,kayur,wufei,jfogarty,weld}
@cs.washington.eduUniversity of Washington
This work was supported by Office of Naval Research grant N00014-06-1-0147, CALO grant 03-000225, NSF grant IIS-0812590, the WRF / TJ Cable Professorship, a UW CSE Microsoft Endowed Fellowship, a NDSEG Fellowship, a Web-advertising donation by Yahoo, and an equipment donation from Intel’s Higher Education Program.
Thank You!
![Page 53: Amplifying Community Content Creation with Mixed-Initiative Information Extraction](https://reader036.vdocuments.mx/reader036/viewer/2022062501/56816766550346895ddc4a55/html5/thumbnails/53.jpg)
Related Work• Snow, O’Connor, Jurafsky, Ng. Cheap and Fast –
But is it Good? Evaluating Non-Expert Annotations for Natural Language Tasks, EMNLP’08
• DeRose, Chai, Gao, Shen, Doan, Bohannon, Zhu. Building Community Wikipedias: A Human-Machine Approach, ICDE’08
• Ahn, Dabbish. Labeling Images with a Computer Game, CHI’04
• Mankoff, Hudson, Abowd. Interaction Techniques for Ambiguity Resolution in Recognition-Based Interface, UIST’00
• Culotta, Kristjansson, McCallum, Viola. Corrective Feedback and Persistent Learning for Information Extraction. Artificial Intelligence 170(14)
• Cosley, Frankowski, Terveen, Riedl. SuggestBot: Using Intelligent Task Routing to Help People Find Work in Wikipedia, IUI’07