how public sector is using mechanical turk

Post on 15-Jan-2015

87 Views

Category:

Internet

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

AWS World Wide Public Sector Symposium session 2 - how is Mechanical Turk being used to transform business process in the Public Sector

TRANSCRIPT

AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 2014

AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 2014

Transformational Impact of Cloud Labor

John Hoskins & Daniel Grayjhoskins@amazon.com

djgray@amazon.com

AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 201422

][ How is Mechanical Turk impacting Business?

AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 2014

Forestry Service wants to provide real time online campsite booking

• 350,000 individual campsites – exact location is unknown

• Thousands of campgrounds with little or no POI data (bathroom? shower? Boat ramp?)

• No concierge for a double booking

AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 2014

US Copyright Office would like to provide internet access to CR data

• Current data is contained exclusively on cards and microfilm

• Scanning project is underway• No taxonomy for discovery

“What would the internet be without a search engine?”

AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 20145

Business Need[ ]5

The FDA wants to provide instant access to product and drug recall and interaction information to better protect consumers.

AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 2014

• Over 2 MILLION serious ADRs yearly

• 100,000 DEATHS yearly

• ADRs 4th leading cause of death ahead of

pulmonary disease, diabetes, AIDS, pneumonia,

accidents and automobile death

Why[ ]

AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 20147

Business Problem[ ]7

Reports of interactions are delivered randomly and the current process to extract data from thousands of forms causes significant lag in its availability

AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 2014

• Data can be received in multiple formats – forms,

written and typed, email, electronic . . .

• Data is subject to HIPAA privacy regulations.

• Accuracy and response time are critical – budget

constraint obvious

8

Challenge[ ]

8

AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 2014

• Technology can shred the form into field level or

below

• OCR makes a pass at recognizing the data

• Workers correct OCR.

• Data from workers is reconstructed into digital input

for the database

• Data is made available through the API openFDA

9

Solution[ ]9

AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 20141

0

Business Need[ ]10

A Government Defense contractor needs to update its natural language processing system to accommodate “internet speak”.

AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 20141

1

Business Problem[ ]11

Comments from the internet in the form of posts and tweets more closely resemble spoken language – while NLP is predicated on written language.

AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 2014

• NLP is involved in a mission critical defense system

and is missing significant data due to inaccuracies.

• Cross referencing spoken language to written

language in Arabic is uniquely complex

• Training requires millions of data points of ground

truth

12

Challenge[ ]12

AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 2014

• Internet crawler scrapes posts with interesting key words and phrases.

• Phrases are translated by 5 unique native Arabic speakers (5 dialects)

with English as their second language

• Each of the 5 phrases are corrected by English grammar experts

• The five corrected phrases are voted on by a panel of 5 additional

workers

• The best phrase (highest score with least corrections) is sent to 5

native English speakers with Arabic as second language for translation

• Each result is corrected by Arabic grammar experts and then voted on

• Best result is fed into NLP with original phrase for learning

13

Solution[ ]13

AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 20141

4

Business Need[ ]14

Army Research Labs needed to annotate verbs across many permutations against actual human actions to train robots to recognize

AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 20141

5

Business Problem[ ]15

The volume of data required placed significant delays on the project – yet accuracy was paramount to the results

AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 2014

• Sample consisted of 100 different samples of 10

permutations of 35 verbs – 350,000 videos

• At 20 seconds each that’s almost 2000 hours – a

person year.

• Project needed completion within 60 days

16

Challenge[ ]16

AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 2014

• Workers were given 50 videos per task and asked if

the video represented a given verb permutation

• Gold standard videos were included in each batch of

50

• Vote consisted of 2 workers with 100% Gold

standard accuracy agreeing

17

Solution[ ]17

AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 2014

AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 2014

Thank You

http://www.mturk.com

18

John Hoskins, Amazon Mechanical Turk

hoskins@amazon.com

top related