a novel approach to big data veracity using crowd-sourcing techniques

14
BIG DATA and VERACITY: A novel approach to data veracity using crowd- sourcing techniques Samarth Bhargav, Bhoomika Agarwal, Abhiram Ravikumar and Vrishabh DN April 18, 2014 Presented at BMS Institute of Technology, Bangalore

Upload: abhiram-ravikumar

Post on 26-Jan-2015

107 views

Category:

Data & Analytics


0 download

DESCRIPTION

Technical paper on - "A novel approach to big data veracity using crowd-sourcing techniques." presented at BMS Institute of Technology, Bangalore.

TRANSCRIPT

Page 1: A novel approach to big data veracity using crowd-sourcing techniques

BIG DATA and VERACITY:A novel approach to data

veracity using crowd-sourcing techniques

Samarth Bhargav, Bhoomika Agarwal, Abhiram Ravikumar and Vrishabh DN

April 18, 2014Presented at BMS Institute of Technology, Bangalore

Page 2: A novel approach to big data veracity using crowd-sourcing techniques

Introduction

Big Data

● What is Big Data?● The 3 traditional V’s

o Volumeo Velocityo Variety

● Fourth V● Crowdsourcing

Volume

VarietyVelocity

Veracity

Bhoomika Agarwal
Make the next slide a brief about the 4 v's. Add the image from the intro page on it.
Abhiram Ravikumar
Done
Page 3: A novel approach to big data veracity using crowd-sourcing techniques

The 4 Vs of Big Data

Source: http://well-managed-business-intelligence.blogspot.in/2012/06/big-data-fourth.html

Page 4: A novel approach to big data veracity using crowd-sourcing techniques

Crowdsourcing - Models in place

GOOGLE MAPS

WIKIPEDIA

DUOLINGO

RECAPTCHA

AMAZON TURK

Page 5: A novel approach to big data veracity using crowd-sourcing techniques

● Digitizing one word at a time● Utilize the 10 seconds spent by humans, productively● Digitizing old books - herculean task for computers ● An efficient alternative to OCR● Workflow - entry, multiple-checks, verify, upload● 20 years of The New York Times Daily was digitized in

just a couple of months

reCAPTCHA

Bhoomika Agarwal
put the heading as a normal text. push the logo to the background as watermark
Abhiram Ravikumar
Okay!
Page 6: A novel approach to big data veracity using crowd-sourcing techniques

● “Enrich Google Maps with your local knowledge”● The Google Map Maker project● Data used by Google Maps and Google Earth● Projects like PhotoSphere and StreetView use huge

contributions from the masses● Workflow

○ add/edit places○ verified by a moderator○ cross-referenced and updated

Google Maps

Page 7: A novel approach to big data veracity using crowd-sourcing techniques

WIKIPEDIA

● Termed as the “mother of all encyclopedias” ● Hosts an immense pool of data, multi-linguistic in nature

and entirely community driven● Run by donations from all over the world (crowdfunding)● Dynamic and constantly updated, thus scores big over

traditional encyclopedias

● Unbiased and high-quality information

● Data-verification and validation done instantly by both experts and general public

Page 8: A novel approach to big data veracity using crowd-sourcing techniques

DUOLINGO

● Learn a language and translate the Web● Entirely free and crowd-driven● Luis van Ahn - ESP games and reCAPTCHA● Workflow

o website to be translated is uploadedo broken into parts & given to studentso students translate the doc during learning procedureo translated doc returned to owner

● Win-win situation for both students and corporates● Popular on both web as well as mobile platforms

Page 9: A novel approach to big data veracity using crowd-sourcing techniques

Amazon Mechanical Turk

● Use of artificial intelligence to run businesses● HITs enable machine learning concepts● Workflow

o Requester places task on the site or through APIo Provider picks a suitable task o Payments made through Amazon gift certificates

● Advantages includeo Quality assuranceo Scalability optionso Lower cost

Page 10: A novel approach to big data veracity using crowd-sourcing techniques

Analysis

● Handling data IS important● Google FLU tracker● KickStarter and CosmoQuest ● Lot of scope and wide opportunities

Page 11: A novel approach to big data veracity using crowd-sourcing techniques

Repercussions

● Senator Kennedy’s story● FCRA (Fair Credit Reporting Act)● Crowds unaware of data-acquisition● Confidential data and security-leaks to be

addressed with care

Page 12: A novel approach to big data veracity using crowd-sourcing techniques

Conclusion

Crowdsourcing model

Volume Velocity Variety Veracity

Google Maps terabytes high low medium

Duolingo terabytes medium high high

reCAPTCHA petabytes very high very high very high

Amazon Turk petabytes medium very high high

Wikipedia petabytes medium high very high

Page 13: A novel approach to big data veracity using crowd-sourcing techniques

References1.

http://crowdsourcingweek.com/you-have-helped-digitize-millions-of-books-through-online-collaboration/

2. http://www.loopinsight.com/2014/03/14/duolingo-recaptcha-and-a-magnificent-piece-of-crowdsourcing/

3. http://www.cracked.com/article_19431_5-mind-blowing-things-crowds-do-better-than-experts.html

4. http://royal.pingdom.com/2012/02/08/google-maps-turns-7-years-old-amazing-facts-and-figures/

5. http://en.wikipedia.org/wiki/Amazon_Mechanical_Turk6. http://www.pomona.edu/academics/departments/psychology/files/Buhrmester%20-Crowdsourci

ng-Amazon-MTurk.pdf7. http://hcil2.cs.umd.edu/trs/2010-09/2010-09.pdf8. http://www.slideshare.net/davidgracia/crowdsourcing-at-wikipedia-85865849.

http://info.articleonepartners.com/crowdsourcing-series-wikipedia-the-godfather-of-crowdsourcing/

10. http://ezinearticles.com/?Wikipedia---A-Successful-Crowdsourcing-Project&id=3736803

Page 14: A novel approach to big data veracity using crowd-sourcing techniques

Question & Answers time! :-)

Source:http://2.bp.blogspot.com/

Thank you, UTSAHA 2k’14.