a novel approach to big data veracity using crowd-sourcing techniques
DESCRIPTION
Technical paper on - "A novel approach to big data veracity using crowd-sourcing techniques." presented at BMS Institute of Technology, Bangalore.TRANSCRIPT
BIG DATA and VERACITY:A novel approach to data
veracity using crowd-sourcing techniques
Samarth Bhargav, Bhoomika Agarwal, Abhiram Ravikumar and Vrishabh DN
April 18, 2014Presented at BMS Institute of Technology, Bangalore
Introduction
Big Data
● What is Big Data?● The 3 traditional V’s
o Volumeo Velocityo Variety
● Fourth V● Crowdsourcing
Volume
VarietyVelocity
Veracity
The 4 Vs of Big Data
Source: http://well-managed-business-intelligence.blogspot.in/2012/06/big-data-fourth.html
Crowdsourcing - Models in place
GOOGLE MAPS
WIKIPEDIA
DUOLINGO
RECAPTCHA
AMAZON TURK
● Digitizing one word at a time● Utilize the 10 seconds spent by humans, productively● Digitizing old books - herculean task for computers ● An efficient alternative to OCR● Workflow - entry, multiple-checks, verify, upload● 20 years of The New York Times Daily was digitized in
just a couple of months
reCAPTCHA
● “Enrich Google Maps with your local knowledge”● The Google Map Maker project● Data used by Google Maps and Google Earth● Projects like PhotoSphere and StreetView use huge
contributions from the masses● Workflow
○ add/edit places○ verified by a moderator○ cross-referenced and updated
Google Maps
WIKIPEDIA
● Termed as the “mother of all encyclopedias” ● Hosts an immense pool of data, multi-linguistic in nature
and entirely community driven● Run by donations from all over the world (crowdfunding)● Dynamic and constantly updated, thus scores big over
traditional encyclopedias
● Unbiased and high-quality information
● Data-verification and validation done instantly by both experts and general public
DUOLINGO
● Learn a language and translate the Web● Entirely free and crowd-driven● Luis van Ahn - ESP games and reCAPTCHA● Workflow
o website to be translated is uploadedo broken into parts & given to studentso students translate the doc during learning procedureo translated doc returned to owner
● Win-win situation for both students and corporates● Popular on both web as well as mobile platforms
Amazon Mechanical Turk
● Use of artificial intelligence to run businesses● HITs enable machine learning concepts● Workflow
o Requester places task on the site or through APIo Provider picks a suitable task o Payments made through Amazon gift certificates
● Advantages includeo Quality assuranceo Scalability optionso Lower cost
Analysis
● Handling data IS important● Google FLU tracker● KickStarter and CosmoQuest ● Lot of scope and wide opportunities
Repercussions
● Senator Kennedy’s story● FCRA (Fair Credit Reporting Act)● Crowds unaware of data-acquisition● Confidential data and security-leaks to be
addressed with care
Conclusion
Crowdsourcing model
Volume Velocity Variety Veracity
Google Maps terabytes high low medium
Duolingo terabytes medium high high
reCAPTCHA petabytes very high very high very high
Amazon Turk petabytes medium very high high
Wikipedia petabytes medium high very high
References1.
http://crowdsourcingweek.com/you-have-helped-digitize-millions-of-books-through-online-collaboration/
2. http://www.loopinsight.com/2014/03/14/duolingo-recaptcha-and-a-magnificent-piece-of-crowdsourcing/
3. http://www.cracked.com/article_19431_5-mind-blowing-things-crowds-do-better-than-experts.html
4. http://royal.pingdom.com/2012/02/08/google-maps-turns-7-years-old-amazing-facts-and-figures/
5. http://en.wikipedia.org/wiki/Amazon_Mechanical_Turk6. http://www.pomona.edu/academics/departments/psychology/files/Buhrmester%20-Crowdsourci
ng-Amazon-MTurk.pdf7. http://hcil2.cs.umd.edu/trs/2010-09/2010-09.pdf8. http://www.slideshare.net/davidgracia/crowdsourcing-at-wikipedia-85865849.
http://info.articleonepartners.com/crowdsourcing-series-wikipedia-the-godfather-of-crowdsourcing/
10. http://ezinearticles.com/?Wikipedia---A-Successful-Crowdsourcing-Project&id=3736803
Question & Answers time! :-)
Source:http://2.bp.blogspot.com/
Thank you, UTSAHA 2k’14.