global user-generated content: the final localization frontier merle tenney

67
Global User-Generated Content: The Final Localization Frontier Merle Tenney

Upload: may-arnold

Post on 27-Dec-2015

216 views

Category:

Documents


3 download

TRANSCRIPT

Global User-Generated Content: The Final Localization Frontier

Merle Tenney

Agenda

• Dimensions of Content

• Dimensions of Translation

• Global Language Tools

• UGC Translation Practices

• Best Current Practices

• Global UGC Desiderata

• Call to Action

Dimensions of Content

UGC Pre–Web 2.0• 1965 Mainframe-based email, instant messaging

• 1969 ARPANET

• 1978–79 Bulletin board systems, discussion forums (Usenet)

• 1983 Internet (TCP/IP)

• 1991 World Wide Web (HTTP)

• 1993–2003 Blogging, social network services, user classifieds, user auctions, wikis, social

bookmarking, photo sharing

UGC Post–Web 2.0

• 2004 Tim O’Reilly, John Battelle, Dale Dougherty

define Web 2.0

• Leaders: Yahoo! Groups (1998), MySpace (2003),

LinkedIn (2003)

• 2004 Facebook

• 2005 YouTube

• 2006 Twitter

• 2009 Foursquare

Content Types

• Managed content (MC)

• Semi-managed content (SMC)

• User-generated content (UGC)• Individual content• Community content

• Computer-mediated communication (CMC)

Managed Content

• Authors: professional communicators (information developers)

• Examples: user interfaces, user assistance, technical documentation; marcom materials, newsletters; web pages, institutional blogs

• Requirements: institutional voice, subject matter expertise, polished writing

• Tools: content management systems, publishing systems, office applications, blogging software

Semi-Managed Content

• Authors: information workers

• Examples: technical reports, design documents; technotes, knowledge base articles, technical blogs, industry discussion lists

• Requirements: technical expertise, effectual writing

• Tools: content management systems, office applications, social network services, blogging software

User-Generated Content

• Authors: users and communities

• Examples: user profiles, blogs, discussion lists, wikis, reviews, ratings, tags, classifieds, auction listings, user documents, user multimedia

• Requirements: informed opinion, interesting content, effectual writing

• Tools: office applications, wiki software, social network services, blogging software, classified ad systems, customer feedback forums

Computer-Mediated Communication

• Authors: everyone

• Examples: emails, microblogs (tweets), direct messages, status updates, SMS messages (texting) , instant messages, chat sessions

• Requirements: interesting message, succinct, comprehensible writing

• Tools: email, blogging, microblogging, instant messaging, chat rooms, social network services, e-commerce, virtual worlds, online games

Content Pyramid

Content Structure

• Structured content

• Semi-structured content

• Unstructured content

Structured Content

• Description: content taken from a closed set of values specified by developers, such as list values, numbers, and related data types

• Examples: numerical data, structured keywords, taxonomies, values, lists (ratings, dates, gender, marital status, language, country, etc.)

• Translation: no translation per se; language-neutral data, multilingual textual expressions of underlying data handled by UI localization or locale-based data formatting

Semi-Structured Content

• Description: content taken from a constrained and self-organizing but not closed set of values developed by users

• Examples: user classifications, common search terms, user keywords, tag clouds, folksonomies

• Translation: specialized bilingual terminology, with fallback to machine translation as needed

Unstructured Content

• Description: open, unconstrained user text

• Examples: wikis, articles, blogs, discussions, reviews, chats, instant messages, emails

• Translation: machine translation in pull contexts, including cross-language search; computer-aided translation in push contexts

Content Forms

• Text

• Graphics

• Audio

• Video

• Virtual reality

• Location-based services

Nontextual Content Forms

• Integrated text

• Titles, legends, labels, callouts, subtitles, transcriptions, text layers, text tracks

• Associated text

• Metadata, tags, comments

• Accessibility text

• alt, longdesc attributes

Dimensions of Translation

Global Content Creation

• Zero translation (ZT)

• Machine translation (MT)

• Human translation (HT)

• Transcreation (TC)

• Original content (OC)

Translation Modes

• Machine translation• Unedited MT• Translation wiki

• Human translation• Volunteer translators

• Users & Friends• Community

• Paid translators• Semi-professional• Professional

Translation CostTranslation Mode Free Perks Discounted Market

Machine translation

Unedited MT X

Translation wiki X X

Human translation

Volunteer translators

Users & Friends X

Community X X

Paid translators

Semi-professional X

Professional X X

Individual UGCTranslation Modes

MT– Unedited

MT– Translation Wiki

HT – Users & Friends

HT– Community

HT– Semi-professionals

HT– Professionals

Individual UGC

Community UGCTranslation Modes

MT– Unedited

MT– Translation Wiki

HT – Users & Friends

HT– Community

HT– Semi-professionals

HT– Professionals

Community UGC

Composite ContentTranslation Modes

Managed Content

Semi-managed content

Individual UGC

Community UGC

Computer-mediatedcommunications

MT– Unedited

MT– Translation Wiki

HT – Users & Friends

HT– Community

HT– Semi-professionals

HT– Professionals

Push and Pull Translation Frameworks

• Differences in applications and translation requirements

• Push mode content translation

• Proactive, for anticipated demand

• Reactive, for attested demand

Push & Pull Translation Comparison

Initiator Time Frame Price Translation Tool

Push Translation author in advance,1 time

marketrate

computer-aided translation

Pull Translation reader on demand,n (0) times

free machinetranslation

10/15/2008 Web 2.0 Globalization – Merle Tenney 26

Push & Pull Translation Comparison

Permanence Popularity Purpose Quality

Push Translation evergreencontent

shorthead

informationdissemination

publication

Pull Translation deciduouscontent

longtail

informationassimilation

gist

10/15/2008 Web 2.0 Globalization – Merle Tenney 27

Global Language Tools

Global Content Creationand Translation

• Authoring and editing

• Automatic translation

• Computer-aided translation

Authoring and Editing

• Spelling checkers

• Style and grammar checkers

• Language compliance checkers

• Intelligent content reuse/authoring memory

• Electronic references

• Explanatory dictionaries

• Thesauri

• Bilingual dictionaries

• Style guides

Automatic Translation (AT)

• AT > MT (machine translation)

• AT ≥ MTM (machine translation + translation memory)

• Translation pre-editing tools (language compliance checker + authoring memory)

• Automatic text categorization (for selection of terminologies and translation memories)

• Translation memory (TM)

• Machine translation (MT)

Computer-Aided Translation (CAT)

• SL & TL text fields• Translation tools

• Machine translation• Translation memory• Translation search• Terminology access

• TL authoring and editing tools• General authoring and editing tools• Translation QA and translation post-editing tools

• Translation leveraging updates• Terminology updates• Translation memory updates

UGC Translation Challenges

Problems with UGC — Low Quality

• Terse, ungrammatical constructions

• nonstandard CAPITALIZATION

• Missing, creative punctuation

• Accidental, intentional misspellings

• Nonstandard diction—colloquial abbreviations & acronyms, leetspeak, emoticons

Problems with UGC — Intrinsic Characteristics

• Cryptic, clipped style (chats, IMs, tweets)

• Conversational style

• Diverse term variants

• Wide range of lexicon

• Frequent neologisms

Solutions for Problematic UGC — Low Quality

• Better writing, self-editing

• Editing by others (designated content agents)

• Authoring and editing tools

• Translation pre-editing tools

• Dialect translation tools

Solutions for Problematic UGC — Intrinsic Characteristics

• MT based on leveraged resources produced as by-product of CAT translations

• Terminologies and translation memories based on automatic text categorization

• Continued improvement in pull (MT) translation environments dependent on quantity and quality of effort in related push (CAT) translation environments

• Ergo, need to support push translation environments and aggregated, quality-controlled user, community, and professional translation resources

Best Current Practices

UGC Translation

• Push translation implementations

• Google Translator Toolkit

• Pull translation implementations

• Outlook email translation (PROMT)

• Mojofiti blog translation (Google)

• eBay listing translation (SYSTRAN)

• Translation viewers

• Unedited MT (Microsoft)

• Translation Wiki (Microsoft)

Google Translator Toolkit

Outlook Email Translation

Mojofiti Blog Translation

eBay Listing Translation

Translation Viewers

• Bilingual text display in web browser or document editor

• Translation views—different strokes for different folks

• Single-language view (SL or TL)

• Original or translated content, with rollover display of corresponding sentence from translated or original content

• Dual-language view (SL and TL)

• Side-by-side or over-and-under display of original and translated content, with synchronized scrolling and sentence highlighting

Global UGC Infrastructure

Bing Translator Source Text Rollover Mode

Global UGC Infrastructure

Bing Translator Target Text Rollover Mode

Global UGC Infrastructure

Bing Translator Side-by-Side Mode

Global UGC Infrastructure

Bing Translator Over-and-Under Mode

Bing Translation Wiki

Global UGC Desiderata

Basic UGC Implementations

• Social media sites

• Groups, discussions, blogs

• Social networking services

• Integration of social functionality and UGC in host sites

Language Support Mechanisms

• User (reader) language preferences• User interface preferences• Content preferences

• Content language tagging• Default language tagging• Automatic language identification• Explicit language tagging

• Global language tools• Authoring and editing• Automatic translation• Computer-aided translation

Service Provider Infrastructure

• Designated content agents (DCAs)

• Based on SL & TL roles and permissions

• DCA access limited by content selection, language version, and time period

• Content service exchange

• DCA listings searchable by services, availability, bona fides

• Service job workflows and dashboards

• Payment processing

SL Designated Content Agents

• Authors—ghostwriters/contract writers & attributed contributors

• Editors & proofreaders

• Search engine optimization (SEO) & social media optimization (SMO) experts

• Personal branding & career management advisors

TL Designated Content Agents

• Translators

• Transcreators & original content developers

• Terminologists

• Editors & proofreaders

• SEO & SMO experts

• Personal branding & career management advisors

Content Service Exchange

• DCA service profile with services, language, terms, prices

• DCA current availability with time zones and turnarounds

• Traditional DCA bona fides—degrees, associations, certifications

• Exchange-based DCA bona fides—time on exchange, clients/projects served, satisfaction ratings, client testimonials

Cross-Language Content Discovery

• Directory navigation

• Tags and links

• Cross-language search

Cross-Language Search

• Based on work in cross-language information retrieval (CLIR) and multilingual information retrieval (MLIR)

• Five steps of cross-language search• Translation of search terms• Search on translated search terms• Aggregation, filtering, ranking of search results• Translation of search engine results page (SERP)• Translation of selected page/profile/content

Call to Action

Social Network Service Developers

• Provide multilingual user and content identification

• Support push and pull translation frameworks

• Provide service framework for content agents

UGC Framework Developers

• Integrate automatic translation services

Writing and Translation Tool Developers

• Develop tools tuned to UGC

Professional Writers and Translators

• Get ready for the new opportunities in the global UGC ecosystem

Thank you!

Merle TenneyLanguage Technology Consultant

[email protected]

Bringing the world to great productsTaking great products to the world

Since You Asked

Defining User-Generated Content

• Author

• Users acting individually or in a community

• Initiator

• Users (self-initiated), professional (employee, service provider), community (social obligation)

• Medium

• Short text: text message, instant message, chat, microblog, status update

• Long text: profile, post, listing, review, email, blog, web page, document