recruiting online volunteers for linguistic knowledge acquisition ed kenschaft job talk may 13, 2008...
TRANSCRIPT
![Page 1: Recruiting Online Volunteers for Linguistic Knowledge Acquisition Ed Kenschaft job talk May 13, 2008 45 minutes](https://reader030.vdocuments.mx/reader030/viewer/2022013004/56649dbb5503460f94aaca1f/html5/thumbnails/1.jpg)
Recruiting Online Volunteers forLinguistic Knowledge Acquisition Ed Kenschaft job talk May 13, 2008 45 minutes www.kenschaft.org/papers/linguistathome.html
![Page 2: Recruiting Online Volunteers for Linguistic Knowledge Acquisition Ed Kenschaft job talk May 13, 2008 45 minutes](https://reader030.vdocuments.mx/reader030/viewer/2022013004/56649dbb5503460f94aaca1f/html5/thumbnails/2.jpg)
Outline
The internet is essentially unregulated, immensely huge, and growing exponentially.
Terrorist groups use the internet for recruiting and training.
Computational linguistics subdisciplines such as opinion detection can be used to identify terrorist websites.
Most such systems require training data which is not readily available.
Other research projects have had success recruiting internet volunteers for comparably difficult tasks.
I propose to do the same with opinion labeling, and then extend to other related areas.
![Page 3: Recruiting Online Volunteers for Linguistic Knowledge Acquisition Ed Kenschaft job talk May 13, 2008 45 minutes](https://reader030.vdocuments.mx/reader030/viewer/2022013004/56649dbb5503460f94aaca1f/html5/thumbnails/3.jpg)
Challenge: Internet
1.36 billion users (Q1 2008) across entire populated world and diverse language groups 20.7% annual growth (December 2006 to December 2007)
103,160,364 active domains (May 03, 2008) 332,840,730 deleted domains 648,853 new domains in past 24 hours (May 03, 2008)
619,939 (May 12, 2008)
Source (user info): www.internetworldstats.com/ Copyright © 2008, Miniwatts Marketing Group
Source (domain info): www.domaintools.com/internet-statistics/
![Page 4: Recruiting Online Volunteers for Linguistic Knowledge Acquisition Ed Kenschaft job talk May 13, 2008 45 minutes](https://reader030.vdocuments.mx/reader030/viewer/2022013004/56649dbb5503460f94aaca1f/html5/thumbnails/4.jpg)
Internet Users
most users in Asia, Europe, and North America fastest growth in Middle East, Africa, and Latin America
![Page 5: Recruiting Online Volunteers for Linguistic Knowledge Acquisition Ed Kenschaft job talk May 13, 2008 45 minutes](https://reader030.vdocuments.mx/reader030/viewer/2022013004/56649dbb5503460f94aaca1f/html5/thumbnails/5.jpg)
![Page 6: Recruiting Online Volunteers for Linguistic Knowledge Acquisition Ed Kenschaft job talk May 13, 2008 45 minutes](https://reader030.vdocuments.mx/reader030/viewer/2022013004/56649dbb5503460f94aaca1f/html5/thumbnails/6.jpg)
![Page 7: Recruiting Online Volunteers for Linguistic Knowledge Acquisition Ed Kenschaft job talk May 13, 2008 45 minutes](https://reader030.vdocuments.mx/reader030/viewer/2022013004/56649dbb5503460f94aaca1f/html5/thumbnails/7.jpg)
Primary Languages of Internet Users
largely European, East Asian, and Arabic 206 million others
fastest growth in Arabic
![Page 8: Recruiting Online Volunteers for Linguistic Knowledge Acquisition Ed Kenschaft job talk May 13, 2008 45 minutes](https://reader030.vdocuments.mx/reader030/viewer/2022013004/56649dbb5503460f94aaca1f/html5/thumbnails/8.jpg)
![Page 9: Recruiting Online Volunteers for Linguistic Knowledge Acquisition Ed Kenschaft job talk May 13, 2008 45 minutes](https://reader030.vdocuments.mx/reader030/viewer/2022013004/56649dbb5503460f94aaca1f/html5/thumbnails/9.jpg)
![Page 10: Recruiting Online Volunteers for Linguistic Knowledge Acquisition Ed Kenschaft job talk May 13, 2008 45 minutes](https://reader030.vdocuments.mx/reader030/viewer/2022013004/56649dbb5503460f94aaca1f/html5/thumbnails/10.jpg)
![Page 11: Recruiting Online Volunteers for Linguistic Knowledge Acquisition Ed Kenschaft job talk May 13, 2008 45 minutes](https://reader030.vdocuments.mx/reader030/viewer/2022013004/56649dbb5503460f94aaca1f/html5/thumbnails/11.jpg)
Internet Summary
internet is vast, with tremendously fast growth most content and users are from developed nations and
well-studied languages fastest growth is in developing nations and less-studied
languages analysts and technology need to keep up
![Page 12: Recruiting Online Volunteers for Linguistic Knowledge Acquisition Ed Kenschaft job talk May 13, 2008 45 minutes](https://reader030.vdocuments.mx/reader030/viewer/2022013004/56649dbb5503460f94aaca1f/html5/thumbnails/12.jpg)
Challenge: Use of Internet for Global Terrorism
several thousand terrorist websites, growing exponentially purposes
propaganda -- worldwide, anonymous "The Global Islamic Call to Resistance", 1600 pages, call for self-
starting terrorist cells "Questions and Uncertainties Concerning the Mujahideen and their
Operations", doctrinal justifications news bulletins videos of American soldiers being blown up video statements on recent events video game, "Night of Bush Capturing"
...
![Page 13: Recruiting Online Volunteers for Linguistic Knowledge Acquisition Ed Kenschaft job talk May 13, 2008 45 minutes](https://reader030.vdocuments.mx/reader030/viewer/2022013004/56649dbb5503460f94aaca1f/html5/thumbnails/13.jpg)
Challenge: Use of Internet for Global Terrorism
purposes [continued] training manuals, e.g. assassination, manufacturing
poisons/explosives "Encyclopedia of Preparation", huge & growing online manual
coordinate attacks between individuals or groups internet jihadist Irhabi007 helped plan attacks by two men from
Atlanta, GA, on Washington, DC, targets "... networks within networks, connections within
connections and links between individuals that cross local, national and international boundaries."
Peter Clarke, head of the counter-terrorism branch of London's Metropolitan Police
![Page 14: Recruiting Online Volunteers for Linguistic Knowledge Acquisition Ed Kenschaft job talk May 13, 2008 45 minutes](https://reader030.vdocuments.mx/reader030/viewer/2022013004/56649dbb5503460f94aaca1f/html5/thumbnails/14.jpg)
Internet Terrorism Summary
"The radicalisation process is occurring more quickly, more widely and more anonymously in the internet age, raising the likelihood of surprise attacks by unknown groups whose members and supporters may be difficult to pinpoint."
National Intelligence Estimate, USA, 2006 "We have to find a way to stanch the flow. The internet
creates a constant reservoir of radicalised people which terrorist groups and networks can draw upon."
Professor Bruce Hoffman, terrorism expert, Georgetown University
![Page 15: Recruiting Online Volunteers for Linguistic Knowledge Acquisition Ed Kenschaft job talk May 13, 2008 45 minutes](https://reader030.vdocuments.mx/reader030/viewer/2022013004/56649dbb5503460f94aaca1f/html5/thumbnails/15.jpg)
How can we identify terrorist websites?
![Page 16: Recruiting Online Volunteers for Linguistic Knowledge Acquisition Ed Kenschaft job talk May 13, 2008 45 minutes](https://reader030.vdocuments.mx/reader030/viewer/2022013004/56649dbb5503460f94aaca1f/html5/thumbnails/16.jpg)
Digression: Humans and Computers are Different
computers can do many things that humans can't do (well) humans can do many things that computers can't do (well)
![Page 17: Recruiting Online Volunteers for Linguistic Knowledge Acquisition Ed Kenschaft job talk May 13, 2008 45 minutes](https://reader030.vdocuments.mx/reader030/viewer/2022013004/56649dbb5503460f94aaca1f/html5/thumbnails/17.jpg)
Examples of Differences
computers only find new prime numbers scan the entire web for "Osama bin Laden"
humans only recognize emotions from facial expressions captcha
both play chess
![Page 18: Recruiting Online Volunteers for Linguistic Knowledge Acquisition Ed Kenschaft job talk May 13, 2008 45 minutes](https://reader030.vdocuments.mx/reader030/viewer/2022013004/56649dbb5503460f94aaca1f/html5/thumbnails/18.jpg)
Crossover
humans can impersonate computers long division find new prime numbers
computers can impersonate humans Eliza – requires clever rules, limited domain machine learning – requires lots of data
![Page 19: Recruiting Online Volunteers for Linguistic Knowledge Acquisition Ed Kenschaft job talk May 13, 2008 45 minutes](https://reader030.vdocuments.mx/reader030/viewer/2022013004/56649dbb5503460f94aaca1f/html5/thumbnails/19.jpg)
Opinion Detection
identify opinions and attitudes in texts (more generally, modalities)
humans are very good at it, computers are not
![Page 20: Recruiting Online Volunteers for Linguistic Knowledge Acquisition Ed Kenschaft job talk May 13, 2008 45 minutes](https://reader030.vdocuments.mx/reader030/viewer/2022013004/56649dbb5503460f94aaca1f/html5/thumbnails/20.jpg)
Opinion Detection (Examples)
"America is a mistake, admittedly a gigantic mistake, but a mistake nevertheless."
(Sigmund Freud) SPEAKER DISLIKES America
"The United States of America is a threat to world peace." (Nelson Mandela)
SPEAKER DISLIKES United States of America
![Page 21: Recruiting Online Volunteers for Linguistic Knowledge Acquisition Ed Kenschaft job talk May 13, 2008 45 minutes](https://reader030.vdocuments.mx/reader030/viewer/2022013004/56649dbb5503460f94aaca1f/html5/thumbnails/21.jpg)
Opinion Detection (continued)
"Mr. McGee, don't make me angry. You wouldn't like me when I'm angry."
(David Banner) Mr. McGee SHOULDN'T make me angry Mr. McGee DISLIKES me when I'm angry
"All I want for Christmas is my two front teeth." (personal communication)
SPEAKER WANTS my two front teeth
![Page 22: Recruiting Online Volunteers for Linguistic Knowledge Acquisition Ed Kenschaft job talk May 13, 2008 45 minutes](https://reader030.vdocuments.mx/reader030/viewer/2022013004/56649dbb5503460f94aaca1f/html5/thumbnails/22.jpg)
Opinion Detection Resources
humans can do this task well, but not fast enough computers are moderately successful in limited domains
TREC 2006 & 2007 accuracy of computers depends on availability of training
data
![Page 23: Recruiting Online Volunteers for Linguistic Knowledge Acquisition Ed Kenschaft job talk May 13, 2008 45 minutes](https://reader030.vdocuments.mx/reader030/viewer/2022013004/56649dbb5503460f94aaca1f/html5/thumbnails/23.jpg)
TREC 2006 Blog(Opinion Retrieval) Track
given a blog entry and a topic, identify whether: the entry is relevant to that topic the entry expresses an opinion on the topic the opinion is positive, negative, or mixed
no training data provided CMU used ~10,000 training examples from movie and
product reviews (Yang et al 2006)
![Page 24: Recruiting Online Volunteers for Linguistic Knowledge Acquisition Ed Kenschaft job talk May 13, 2008 45 minutes](https://reader030.vdocuments.mx/reader030/viewer/2022013004/56649dbb5503460f94aaca1f/html5/thumbnails/24.jpg)
TREC 2006 Examples
Opinionated Skype 2.0 eats its young
The elaborate press release and WSJ review while impressive don’t help mask the fact that, Skype is short on new ground breaking ideas. Personalization via avatars and ring-tones... big new idea? Not really. Phil Wolff over on Skype Journal puts it nicely when he writes, “If you’ve been using Skype, the Beta version of Skype 2.0 for Windows won’t give you a new Wow! experience.” ...
Non-Opinionated Skype Launches Skype 2.0 Features Skype Video
Skype released the beta version of Skype 2.0, the newest version of its software that allows anyone with an Internet connection to make free Internet calls. The software is designed for greater ease of use, integrated video calling, and ...
![Page 25: Recruiting Online Volunteers for Linguistic Knowledge Acquisition Ed Kenschaft job talk May 13, 2008 45 minutes](https://reader030.vdocuments.mx/reader030/viewer/2022013004/56649dbb5503460f94aaca1f/html5/thumbnails/25.jpg)
TREC 2006 Results (MAP)
Topic relevance Best 42.29% Median 16.99%
Opinion finding Best 30.04% Median 10.59%
![Page 26: Recruiting Online Volunteers for Linguistic Knowledge Acquisition Ed Kenschaft job talk May 13, 2008 45 minutes](https://reader030.vdocuments.mx/reader030/viewer/2022013004/56649dbb5503460f94aaca1f/html5/thumbnails/26.jpg)
Where can we get training data?
![Page 27: Recruiting Online Volunteers for Linguistic Knowledge Acquisition Ed Kenschaft job talk May 13, 2008 45 minutes](https://reader030.vdocuments.mx/reader030/viewer/2022013004/56649dbb5503460f94aaca1f/html5/thumbnails/27.jpg)
Volunteer Projects
Enlist online volunteers Provide minimal training Optionally, frame as a competitive game "The easiest part is getting the public involved. Most
volunteer-computing projects can draw on tens of thousands of people with practically no advertising, relying on word of mouth. The problem is usually keeping these eager amateurs busy."
("Spreading the load", The Economist)
![Page 28: Recruiting Online Volunteers for Linguistic Knowledge Acquisition Ed Kenschaft job talk May 13, 2008 45 minutes](https://reader030.vdocuments.mx/reader030/viewer/2022013004/56649dbb5503460f94aaca1f/html5/thumbnails/28.jpg)
Non-computational Projects
amateur bird-watchers track bird migrations amateur astronomers spot new comets
![Page 29: Recruiting Online Volunteers for Linguistic Knowledge Acquisition Ed Kenschaft job talk May 13, 2008 45 minutes](https://reader030.vdocuments.mx/reader030/viewer/2022013004/56649dbb5503460f94aaca1f/html5/thumbnails/29.jpg)
Galaxy Zoo
roughly a million galaxies from Sloan Digital Sky Survey classify
elliptical clockwise spiral anticlockwise spiral unclear
identify interactions between galaxies, real or illusory
![Page 30: Recruiting Online Volunteers for Linguistic Knowledge Acquisition Ed Kenschaft job talk May 13, 2008 45 minutes](https://reader030.vdocuments.mx/reader030/viewer/2022013004/56649dbb5503460f94aaca1f/html5/thumbnails/30.jpg)
![Page 31: Recruiting Online Volunteers for Linguistic Knowledge Acquisition Ed Kenschaft job talk May 13, 2008 45 minutes](https://reader030.vdocuments.mx/reader030/viewer/2022013004/56649dbb5503460f94aaca1f/html5/thumbnails/31.jpg)
Galaxy Zoo Volunteers
100,000+ volunteers within a few months 30 volunteers classify each galaxy peak load 70,000 per hour final datasets
34,617,406 analyses 82,931 users filter unreliable volunteers using known test cases
![Page 32: Recruiting Online Volunteers for Linguistic Knowledge Acquisition Ed Kenschaft job talk May 13, 2008 45 minutes](https://reader030.vdocuments.mx/reader030/viewer/2022013004/56649dbb5503460f94aaca1f/html5/thumbnails/32.jpg)
Galaxy Zoo Results
unexpected source of error users are biased toward anticlockwise spirals
2 papers submitted for publication currently over 20 projects underway using resulting data future work
phase two: more detailed questions phase three: more image sources
www.galaxyzoo.org/
![Page 33: Recruiting Online Volunteers for Linguistic Knowledge Acquisition Ed Kenschaft job talk May 13, 2008 45 minutes](https://reader030.vdocuments.mx/reader030/viewer/2022013004/56649dbb5503460f94aaca1f/html5/thumbnails/33.jpg)
Stardust@home
Problem aerogel sent seven years and 3 billion km through space identify tracks of microparticles in gel
Volunteers 24,000 participants 40 million searches in under a year
Results 50 candidate dust particles, each identified by hundreds of
participants featured in seven conference papers stardustathome.ssl.berkeley.edu/
![Page 34: Recruiting Online Volunteers for Linguistic Knowledge Acquisition Ed Kenschaft job talk May 13, 2008 45 minutes](https://reader030.vdocuments.mx/reader030/viewer/2022013004/56649dbb5503460f94aaca1f/html5/thumbnails/34.jpg)
Herbaria@home
thousands of 19th-century plant specimens with handwritten notes
read notes and enter information into database
![Page 35: Recruiting Online Volunteers for Linguistic Knowledge Acquisition Ed Kenschaft job talk May 13, 2008 45 minutes](https://reader030.vdocuments.mx/reader030/viewer/2022013004/56649dbb5503460f94aaca1f/html5/thumbnails/35.jpg)
Herbaria@home Volunteers
162 volunteers, Zipfian distribution 68 volunteers transcribed 10 or more entries 24 volunteers transcribed 100 or more entries 7 volunteers transcribed 1000 or more entries
![Page 36: Recruiting Online Volunteers for Linguistic Knowledge Acquisition Ed Kenschaft job talk May 13, 2008 45 minutes](https://reader030.vdocuments.mx/reader030/viewer/2022013004/56649dbb5503460f94aaca1f/html5/thumbnails/36.jpg)
Herbaria@home Results
22702 specimens documented (May 5, 2008) no redundancy herbariaunited.org/atHome/
![Page 37: Recruiting Online Volunteers for Linguistic Knowledge Acquisition Ed Kenschaft job talk May 13, 2008 45 minutes](https://reader030.vdocuments.mx/reader030/viewer/2022013004/56649dbb5503460f94aaca1f/html5/thumbnails/37.jpg)
Open Mind Word Expert
word sense disambiguation He boarded the plane from gate 53. The ball is not in play until it crosses the plane.
systems need training data
![Page 38: Recruiting Online Volunteers for Linguistic Knowledge Acquisition Ed Kenschaft job talk May 13, 2008 45 minutes](https://reader030.vdocuments.mx/reader030/viewer/2022013004/56649dbb5503460f94aaca1f/html5/thumbnails/38.jpg)
Open Mind Word Expert Results
90,000 sense taggings over four months 240 words, 87 examples each on average inter-annotator agreement: 66.56% 66.23% precision, vs. 63.32% baseline best precision for words with most training examples
![Page 39: Recruiting Online Volunteers for Linguistic Knowledge Acquisition Ed Kenschaft job talk May 13, 2008 45 minutes](https://reader030.vdocuments.mx/reader030/viewer/2022013004/56649dbb5503460f94aaca1f/html5/thumbnails/39.jpg)
Volunteer Projects Summary
projects get anywhere between 100+ and 100,000+ volunteers
Zipfian distribution of contributions by volunteers
![Page 40: Recruiting Online Volunteers for Linguistic Knowledge Acquisition Ed Kenschaft job talk May 13, 2008 45 minutes](https://reader030.vdocuments.mx/reader030/viewer/2022013004/56649dbb5503460f94aaca1f/html5/thumbnails/40.jpg)
What makes for a successful project?
![Page 41: Recruiting Online Volunteers for Linguistic Knowledge Acquisition Ed Kenschaft job talk May 13, 2008 45 minutes](https://reader030.vdocuments.mx/reader030/viewer/2022013004/56649dbb5503460f94aaca1f/html5/thumbnails/41.jpg)
Games
"In every job that must be done, there is an element of fun. You find the fun, and – snap! – the job's a game."
(Mary Poppins) 9 billion human-hours of solitaire were played in 2003
7 million human-hours to build the Empire State Building,or 6.8 hours out of 2003
20 million human-hours to build the Panama Canal,or one day out of 2003
(Luis von Ahn, "Human Computation")
![Page 42: Recruiting Online Volunteers for Linguistic Knowledge Acquisition Ed Kenschaft job talk May 13, 2008 45 minutes](https://reader030.vdocuments.mx/reader030/viewer/2022013004/56649dbb5503460f94aaca1f/html5/thumbnails/42.jpg)
ESP Game
Problem: label images with words/captions Purposes
index images for search provide captions for visually impaired
![Page 43: Recruiting Online Volunteers for Linguistic Knowledge Acquisition Ed Kenschaft job talk May 13, 2008 45 minutes](https://reader030.vdocuments.mx/reader030/viewer/2022013004/56649dbb5503460f94aaca1f/html5/thumbnails/43.jpg)
ESP Game Setup
two people, strangers type whatever the other player is typing get points whenever you agree timed only store solutions when n pairs are recorded taboo words from previous solutions random test images to catch cheaters symmetric verification game
both players get same input and give same output each player verifies the other
![Page 44: Recruiting Online Volunteers for Linguistic Knowledge Acquisition Ed Kenschaft job talk May 13, 2008 45 minutes](https://reader030.vdocuments.mx/reader030/viewer/2022013004/56649dbb5503460f94aaca1f/html5/thumbnails/44.jpg)
![Page 45: Recruiting Online Volunteers for Linguistic Knowledge Acquisition Ed Kenschaft job talk May 13, 2008 45 minutes](https://reader030.vdocuments.mx/reader030/viewer/2022013004/56649dbb5503460f94aaca1f/html5/thumbnails/45.jpg)
ESP Game Results
75,000 players (after one year) many people play over 20 hours per week
15 million agreements highly accurate highly complete
large part of appeal is relation with anonymous partner www.espgame.org/
![Page 46: Recruiting Online Volunteers for Linguistic Knowledge Acquisition Ed Kenschaft job talk May 13, 2008 45 minutes](https://reader030.vdocuments.mx/reader030/viewer/2022013004/56649dbb5503460f94aaca1f/html5/thumbnails/46.jpg)
Peakaboom
Problem images with object labels
e.g. output of ESP Game need to locate objects in images used for training computer vision
![Page 47: Recruiting Online Volunteers for Linguistic Knowledge Acquisition Ed Kenschaft job talk May 13, 2008 45 minutes](https://reader030.vdocuments.mx/reader030/viewer/2022013004/56649dbb5503460f94aaca1f/html5/thumbnails/47.jpg)
Peakaboom Setup
player A sees image player B has to guess object in image player A clicks on image, revealing small area to player B asymmetric verification game
player A gets input, which player B has to guess player B verifies player A's analysis
![Page 48: Recruiting Online Volunteers for Linguistic Knowledge Acquisition Ed Kenschaft job talk May 13, 2008 45 minutes](https://reader030.vdocuments.mx/reader030/viewer/2022013004/56649dbb5503460f94aaca1f/html5/thumbnails/48.jpg)
![Page 49: Recruiting Online Volunteers for Linguistic Knowledge Acquisition Ed Kenschaft job talk May 13, 2008 45 minutes](https://reader030.vdocuments.mx/reader030/viewer/2022013004/56649dbb5503460f94aaca1f/html5/thumbnails/49.jpg)
Peakaboom Results
27,000 players in first four months 2,100,000 object locations many people averaged over 12 hours per day
for first 10 days www.peekaboom.org/
![Page 50: Recruiting Online Volunteers for Linguistic Knowledge Acquisition Ed Kenschaft job talk May 13, 2008 45 minutes](https://reader030.vdocuments.mx/reader030/viewer/2022013004/56649dbb5503460f94aaca1f/html5/thumbnails/50.jpg)
Verbosity (proposed)
Problem input common sense facts
e.g. "cereal is eaten with milk" Game
player A sees word player B has to guess word player A gets to fill in various templates
e.g. "object is typically near ____" asymmetric verification game
![Page 51: Recruiting Online Volunteers for Linguistic Knowledge Acquisition Ed Kenschaft job talk May 13, 2008 45 minutes](https://reader030.vdocuments.mx/reader030/viewer/2022013004/56649dbb5503460f94aaca1f/html5/thumbnails/51.jpg)
Toolkits
Amazon Mechanical Turk paid service requester posts task online, along with instructions and pay
rate worker views available tasks and selects those of interests
Examples examine an image and click on specified objects,
$0.05 per object evaluate relevance of search results, $0.02 per evaluation
www.mturk.com/
![Page 52: Recruiting Online Volunteers for Linguistic Knowledge Acquisition Ed Kenschaft job talk May 13, 2008 45 minutes](https://reader030.vdocuments.mx/reader030/viewer/2022013004/56649dbb5503460f94aaca1f/html5/thumbnails/52.jpg)
Toolkits (continued)
Bossa open source, Linux developer provides task-specific PHP scripts system rates volunteer skill, evaluates agreement among
volunteers pointer to Bolt, open source tutorial builder boinc.berkeley.edu/trac/wiki/BossaIntro/
Facebook install customized apps take advantage of social networks www.facebook.com/
![Page 53: Recruiting Online Volunteers for Linguistic Knowledge Acquisition Ed Kenschaft job talk May 13, 2008 45 minutes](https://reader030.vdocuments.mx/reader030/viewer/2022013004/56649dbb5503460f94aaca1f/html5/thumbnails/53.jpg)
Linguist@home (a.k.a. That's Your Opinion)
annotate sentences with opinions make it fun resources
customer data server expert consultant(s)
tools Bossa (PHP) or Java Facebook and standalone
![Page 54: Recruiting Online Volunteers for Linguistic Knowledge Acquisition Ed Kenschaft job talk May 13, 2008 45 minutes](https://reader030.vdocuments.mx/reader030/viewer/2022013004/56649dbb5503460f94aaca1f/html5/thumbnails/54.jpg)
Linguist@home 1-player game
display sentence display list of templates
determined by expert consultant highlight eligible participants
entities and events allow multiple answers
10 points for first, 20 for second, 30 for third, etc.
![Page 55: Recruiting Online Volunteers for Linguistic Knowledge Acquisition Ed Kenschaft job talk May 13, 2008 45 minutes](https://reader030.vdocuments.mx/reader030/viewer/2022013004/56649dbb5503460f94aaca1f/html5/thumbnails/55.jpg)
![Page 56: Recruiting Online Volunteers for Linguistic Knowledge Acquisition Ed Kenschaft job talk May 13, 2008 45 minutes](https://reader030.vdocuments.mx/reader030/viewer/2022013004/56649dbb5503460f94aaca1f/html5/thumbnails/56.jpg)
How can we assure that answers are valid?
![Page 57: Recruiting Online Volunteers for Linguistic Knowledge Acquisition Ed Kenschaft job talk May 13, 2008 45 minutes](https://reader030.vdocuments.mx/reader030/viewer/2022013004/56649dbb5503460f94aaca1f/html5/thumbnails/57.jpg)
Linguist@home 2-player game
symmetric verification 2 players each play same game points for matched answers
![Page 58: Recruiting Online Volunteers for Linguistic Knowledge Acquisition Ed Kenschaft job talk May 13, 2008 45 minutes](https://reader030.vdocuments.mx/reader030/viewer/2022013004/56649dbb5503460f94aaca1f/html5/thumbnails/58.jpg)
Future Work
extend to other linguistic subdisciplines e.g. topic classification
extend to other widely used & studied languages e.g. German, Chinese
extend to fastest growing languages e.g. Arabic sociopolitical factors
![Page 59: Recruiting Online Volunteers for Linguistic Knowledge Acquisition Ed Kenschaft job talk May 13, 2008 45 minutes](https://reader030.vdocuments.mx/reader030/viewer/2022013004/56649dbb5503460f94aaca1f/html5/thumbnails/59.jpg)
References ------. Playing or processing. The Economist. Dec 6, 2007.
------. Spreading the load. The Economist. Dec 6, 2007.
------. A world wide web of terror. The Economist. July 12, 2007.
Amir Alexander. Aerogel: The "Frozen Smoke" that Made Stardust Possible. The Planetary Society. November 8, 2006.
Nathaniel Ayewah, Rada Mihalcea, and Vivi Nastase. Building Multilingual Semantic Networks with Non-Expert Contributions over the Web. Proceedings of the KCAP 2003 Workshop on Distributed and Collaborative Knowledge Capture. Sanibel Island, Florida, November 2003.
Timothy Chklovski. 2005. Designing interfaces for guided collection of knowledge about everyday objects from volunteers. In Proceedings of the 10th international Conference on intelligent User interfaces (San Diego, California, USA, January 10 - 13, 2005). IUI '05. ACM, New York, NY, 311-313.
Timothy Chklovski, Using Analogy to Acquire Commonsense Knowledge from Human Contributors, MIT Artificial Intelligence Laboratory technical report AITR-2003-002, February 2003.
Timothy Chklovski and Rada Mihalcea. Exploiting Agreement and Disagreement of Human Annotators for Word Sense Disambiguation. Proceedings of the Conference on Recent Advances in Natural Language Processing (RANLP 2003). Borovetz, Bulgaria, September 2003.
Kate Land, Anze Slosar, Chris Lintott, Dan Andreescu, Steven Bamford, Phil Murray, Robert Nichol, M.Jordan Raddick, Kevin Schawinski, Alex Szalay, Daniel Thomas, Jan Van den Berg. Galaxy Zoo: The large-scale spin statistics of spiral galaxies in the Sloan Digital Sky Survey. Submitted March 22, 2008.
Chris J. Lintott, Kevin Schawinski, Anze Slosar, Kate Land, Steven Bamford, Daniel Thomas, M. Jordan Raddick, Robert C. Nichol, Alex Szalay, Dan Andreescu, Phil Murray, Jan van den Berg. Galaxy Zoo : Morphologies derived from visual inspection of galaxies from the Sloan Digital Sky Survey. Submitted to MNRAS, April 29, 2008.
![Page 60: Recruiting Online Volunteers for Linguistic Knowledge Acquisition Ed Kenschaft job talk May 13, 2008 45 minutes](https://reader030.vdocuments.mx/reader030/viewer/2022013004/56649dbb5503460f94aaca1f/html5/thumbnails/60.jpg)
References (continued) Rada Mihalcea and Timothy Chklovski. Open Mind Word Expert: Creating Large Annotated Data Collections with Web Users'
Help. Proceedings of the EACL 2003 Workshop on Linguistically Annotated Corpora (LINC 2003). Budapest, April 2003.
Iadh Ounis, Maarten de Rijke, Craig Macdonald, Gilad Mishne, Ian Soboroff. Overview of the TREC-2006 Blog Track. TREC 2006.
Luis von Ahn. Games With a Purpose. IEEE Computer Magazine, vol. 39, no. 6, pp. 92-94, June 2006.
Luis von Ahn. Human Computation. Google Tech Talks. July 26, 2006.
Luis von Ahn, Ruoran Liu and Manuel Blum. Peekaboom: A Game for Locating Objects in Images. ACM CHI 2006.
Luis von Ahn, S. Ginosar, M. Kedia, R. Liu and M. Blum. Improving Accessibility of the Web with a Computer Game. ACM CHI 2006.
Luis von Ahn, Mihir Kedia and Manuel Blum. Verbosity: A Game for Collecting Common-Sense Facts. ACM CHI 2006.
A. J. Westphal, C. C. Allen, R. Bastien, J. Borg, F. Brenker, J. C. Bridges, D. E. Brownlee, A. L. Butterworth, C. Floss, G. J. Flynn, D. Frank, Z. Gainsforth, E. Gruen, P. Hoppe, A. T. Kearsley, H. Leroux, L. R. Nittler, S. A. Sandford, A. Simionovici, F. J. Stadermann, R. M. Stroud, P. Tsou, T. Tyliszczak, J. Warren, M. E. Zolensky. Preliminary Examination of the Interstellar Collector of Stardust. 39th Lunar and Planetary Science Conference (2008), Abstract #1855.
Nicholos Wethington. Galaxy Zoo Gets a Makeover. Universe Today. April 23, 2008.
Nicholos Wethington. Galaxy Zoo Results Show that the Universe Isn't 'Lopsided'. Universe Today. March 28, 2008.
Hui Yang, Luo Si, Jamie Callan. Knowledge Transfer and Opinion Detection in the TREC2006 Blog Track. TREC 2006.