Transcript
Page 1: Industry-Scale Crowdsourcing of Data and Terminology, CHAT2013

Industry-ScaleCrowdsourcing of

Data & TerminologyRahzeb Choudhury, TAUS

Page 2: Industry-Scale Crowdsourcing of Data and Terminology, CHAT2013

TAUS MissionOur mission is to increase the size and significance of the translation industry to help the world communicate better.

Sharing Data & Knowledge…on an industry-level in anopen and transparentlandscape brings us all to a higher level of competence.

Page 3: Industry-Scale Crowdsourcing of Data and Terminology, CHAT2013

Where We Stand

Together We Know

More

We KnowBetter

Page 4: Industry-Scale Crowdsourcing of Data and Terminology, CHAT2013

Four Focus Areas

This slide may not be used or copied without permission from TAUS

Translation as a Utility

Data Technology

InteroperabilityMetrics

Page 5: Industry-Scale Crowdsourcing of Data and Terminology, CHAT2013
Page 6: Industry-Scale Crowdsourcing of Data and Terminology, CHAT2013
Page 7: Industry-Scale Crowdsourcing of Data and Terminology, CHAT2013

Members

Page 8: Industry-Scale Crowdsourcing of Data and Terminology, CHAT2013

Global Members

Page 9: Industry-Scale Crowdsourcing of Data and Terminology, CHAT2013

Academic, NGO & Government Members

Page 10: Industry-Scale Crowdsourcing of Data and Terminology, CHAT2013

Large Corporate Members

Page 11: Industry-Scale Crowdsourcing of Data and Terminology, CHAT2013

Small Corporate Members

Page 12: Industry-Scale Crowdsourcing of Data and Terminology, CHAT2013

Agency Members

Page 13: Industry-Scale Crowdsourcing of Data and Terminology, CHAT2013

Terminology

Page 14: Industry-Scale Crowdsourcing of Data and Terminology, CHAT2013

43.5%

39.9%

14.8%1.8%

Importance of Terminology Work

Very important

Quite important

Less important

Not important

Source: TaaS User Needs Survey, 2012. 1735 responses (approx 40% technicalwriters, 30% translators, plus others)

Page 15: Industry-Scale Crowdsourcing of Data and Terminology, CHAT2013

Source: TaaS User Needs Survey, 2012. 1735 responses (approx 40% technicalwriters, 30% translators, plus others)

Information Sources

Page 16: Industry-Scale Crowdsourcing of Data and Terminology, CHAT2013

Source: TaaS User Needs Survey, 2012. 1735 responses (approx 40% technicalwriters, 30% translators, plus others)

Information Sources

Page 17: Industry-Scale Crowdsourcing of Data and Terminology, CHAT2013

Source: TaaS User Needs Survey, 2012. 1735 responses (approx 40% technicalwriters, 30% translators, plus others)

Information Sources

Page 18: Industry-Scale Crowdsourcing of Data and Terminology, CHAT2013

Source: TaaS User Needs Survey, 2012. 1735 responses (approx 40% technicalwriters, 30% translators, plus others)

Information Sources

Page 19: Industry-Scale Crowdsourcing of Data and Terminology, CHAT2013

Source: TaaS User Needs Survey, 2012. 1735 responses (approx 40% technicalwriters, 30% translators, plus others)

Information Sources

Page 20: Industry-Scale Crowdsourcing of Data and Terminology, CHAT2013

Source: TaaS User Needs Survey, 2012. 1735 responses (approx 40% technicalwriters, 30% translators, plus others)

Main Problems

20.6%

12.2%

11.5%

10.3%

36.0%

9.4%

Lack ofresources/InsufficientterminologymanagementPoor quality/Up-to-dateness

Lack of information

Lack of convincingverification/Misleadinginformation online

Rest

Page 21: Industry-Scale Crowdsourcing of Data and Terminology, CHAT2013

Too many sources.Takes too much time.Effort is duplicated.

Results questionable.

Page 22: Industry-Scale Crowdsourcing of Data and Terminology, CHAT2013

…Centralization…

Page 23: Industry-Scale Crowdsourcing of Data and Terminology, CHAT2013
Page 24: Industry-Scale Crowdsourcing of Data and Terminology, CHAT2013

OwnedShared

Web

Page 25: Industry-Scale Crowdsourcing of Data and Terminology, CHAT2013

Machine Translation

Page 26: Industry-Scale Crowdsourcing of Data and Terminology, CHAT2013

Data and Quality

Amount of Data

MT Quality

More data

Algorithms

In-domain Data

Page 27: Industry-Scale Crowdsourcing of Data and Terminology, CHAT2013

OwnedShared

Web

Page 28: Industry-Scale Crowdsourcing of Data and Terminology, CHAT2013

Lack of access.Copyright.

Takes too much time.Effort is duplicated.

Quality questionable.

Page 29: Industry-Scale Crowdsourcing of Data and Terminology, CHAT2013

…Centralization…

Page 30: Industry-Scale Crowdsourcing of Data and Terminology, CHAT2013

Central Source of In-domain Data

OwnedShared

Web – to come in 2014

Page 31: Industry-Scale Crowdsourcing of Data and Terminology, CHAT2013
Page 32: Industry-Scale Crowdsourcing of Data and Terminology, CHAT2013

Terminology and Machine Translation

Page 33: Industry-Scale Crowdsourcing of Data and Terminology, CHAT2013

Data and Quality

Amount of Data

MT Quality

More data

Algorithms

In-domain Data

Usage/Feedback Data..Terminology!

Page 34: Industry-Scale Crowdsourcing of Data and Terminology, CHAT2013

…Centralization…

Page 35: Industry-Scale Crowdsourcing of Data and Terminology, CHAT2013

TAUS MissionOur mission is to increase the size and significance of the translation industry to help the world communicate better.

Sharing Data & Knowledge…on an industry-level in anopen and transparentlandscape brings us all to a higher level of competence.

Page 36: Industry-Scale Crowdsourcing of Data and Terminology, CHAT2013

Central Sources of Data and Terminology

Own Data – Private Vault Shared Data – In domain data Web Data – Data Collector

Own Terms – Build Own Collections Shared Term – In-domain terms Web Terms – Term Collector

But what about the crowd?

For language workers, CAT Tools & MT Systems

Page 37: Industry-Scale Crowdsourcing of Data and Terminology, CHAT2013

Source: TaaS User Needs Survey, 2012. 1735 responses (approx 40% technicalwriters, 30% translators, plus others)

Main Problems

20.6%

12.2%

11.5%

10.3%

36.0%

9.4%

Lack ofresources/InsufficientterminologymanagementPoor quality/Up-to-dateness

Lack of information

Lack of convincingverification/Misleadinginformation online

Rest

Page 38: Industry-Scale Crowdsourcing of Data and Terminology, CHAT2013

Central Sourcing of Data and Terminology

The crowd must verify!

Web Data – Data Collector Web Terms – Term Collector

But what about the crowd?

The crowd must source!

Page 39: Industry-Scale Crowdsourcing of Data and Terminology, CHAT2013

Unless the crowd helps tosource and verify…….

Too many sources.Takes time.

Effort is duplicated.Results questionable.

We maintain the status quo..

Page 40: Industry-Scale Crowdsourcing of Data and Terminology, CHAT2013

Register and engage:demo.taas-project.eu

Page 41: Industry-Scale Crowdsourcing of Data and Terminology, CHAT2013

This slide may not be used or copied without permission from TAUS

Thank you.Contact: [email protected]


Top Related