global user-generated content: the final localization frontier merle tenney
Post on 27-Dec-2015
216 Views
Preview:
TRANSCRIPT
Agenda
• Dimensions of Content
• Dimensions of Translation
• Global Language Tools
• UGC Translation Practices
• Best Current Practices
• Global UGC Desiderata
• Call to Action
UGC Pre–Web 2.0• 1965 Mainframe-based email, instant messaging
• 1969 ARPANET
• 1978–79 Bulletin board systems, discussion forums (Usenet)
• 1983 Internet (TCP/IP)
• 1991 World Wide Web (HTTP)
• 1993–2003 Blogging, social network services, user classifieds, user auctions, wikis, social
bookmarking, photo sharing
UGC Post–Web 2.0
• 2004 Tim O’Reilly, John Battelle, Dale Dougherty
define Web 2.0
• Leaders: Yahoo! Groups (1998), MySpace (2003),
LinkedIn (2003)
• 2004 Facebook
• 2005 YouTube
• 2006 Twitter
• 2009 Foursquare
Content Types
• Managed content (MC)
• Semi-managed content (SMC)
• User-generated content (UGC)• Individual content• Community content
• Computer-mediated communication (CMC)
Managed Content
• Authors: professional communicators (information developers)
• Examples: user interfaces, user assistance, technical documentation; marcom materials, newsletters; web pages, institutional blogs
• Requirements: institutional voice, subject matter expertise, polished writing
• Tools: content management systems, publishing systems, office applications, blogging software
Semi-Managed Content
• Authors: information workers
• Examples: technical reports, design documents; technotes, knowledge base articles, technical blogs, industry discussion lists
• Requirements: technical expertise, effectual writing
• Tools: content management systems, office applications, social network services, blogging software
User-Generated Content
• Authors: users and communities
• Examples: user profiles, blogs, discussion lists, wikis, reviews, ratings, tags, classifieds, auction listings, user documents, user multimedia
• Requirements: informed opinion, interesting content, effectual writing
• Tools: office applications, wiki software, social network services, blogging software, classified ad systems, customer feedback forums
Computer-Mediated Communication
• Authors: everyone
• Examples: emails, microblogs (tweets), direct messages, status updates, SMS messages (texting) , instant messages, chat sessions
• Requirements: interesting message, succinct, comprehensible writing
• Tools: email, blogging, microblogging, instant messaging, chat rooms, social network services, e-commerce, virtual worlds, online games
Structured Content
• Description: content taken from a closed set of values specified by developers, such as list values, numbers, and related data types
• Examples: numerical data, structured keywords, taxonomies, values, lists (ratings, dates, gender, marital status, language, country, etc.)
• Translation: no translation per se; language-neutral data, multilingual textual expressions of underlying data handled by UI localization or locale-based data formatting
Semi-Structured Content
• Description: content taken from a constrained and self-organizing but not closed set of values developed by users
• Examples: user classifications, common search terms, user keywords, tag clouds, folksonomies
• Translation: specialized bilingual terminology, with fallback to machine translation as needed
Unstructured Content
• Description: open, unconstrained user text
• Examples: wikis, articles, blogs, discussions, reviews, chats, instant messages, emails
• Translation: machine translation in pull contexts, including cross-language search; computer-aided translation in push contexts
Nontextual Content Forms
• Integrated text
• Titles, legends, labels, callouts, subtitles, transcriptions, text layers, text tracks
• Associated text
• Metadata, tags, comments
• Accessibility text
• alt, longdesc attributes
Global Content Creation
• Zero translation (ZT)
• Machine translation (MT)
• Human translation (HT)
• Transcreation (TC)
• Original content (OC)
Translation Modes
• Machine translation• Unedited MT• Translation wiki
• Human translation• Volunteer translators
• Users & Friends• Community
• Paid translators• Semi-professional• Professional
Translation CostTranslation Mode Free Perks Discounted Market
Machine translation
Unedited MT X
Translation wiki X X
Human translation
Volunteer translators
Users & Friends X
Community X X
Paid translators
Semi-professional X
Professional X X
Individual UGCTranslation Modes
MT– Unedited
MT– Translation Wiki
HT – Users & Friends
HT– Community
HT– Semi-professionals
HT– Professionals
Individual UGC
Community UGCTranslation Modes
MT– Unedited
MT– Translation Wiki
HT – Users & Friends
HT– Community
HT– Semi-professionals
HT– Professionals
Community UGC
Composite ContentTranslation Modes
Managed Content
Semi-managed content
Individual UGC
Community UGC
Computer-mediatedcommunications
MT– Unedited
MT– Translation Wiki
HT – Users & Friends
HT– Community
HT– Semi-professionals
HT– Professionals
Push and Pull Translation Frameworks
• Differences in applications and translation requirements
• Push mode content translation
• Proactive, for anticipated demand
• Reactive, for attested demand
Push & Pull Translation Comparison
Initiator Time Frame Price Translation Tool
Push Translation author in advance,1 time
marketrate
computer-aided translation
Pull Translation reader on demand,n (0) times
free machinetranslation
10/15/2008 Web 2.0 Globalization – Merle Tenney 26
Push & Pull Translation Comparison
Permanence Popularity Purpose Quality
Push Translation evergreencontent
shorthead
informationdissemination
publication
Pull Translation deciduouscontent
longtail
informationassimilation
gist
10/15/2008 Web 2.0 Globalization – Merle Tenney 27
Global Content Creationand Translation
• Authoring and editing
• Automatic translation
• Computer-aided translation
Authoring and Editing
• Spelling checkers
• Style and grammar checkers
• Language compliance checkers
• Intelligent content reuse/authoring memory
• Electronic references
• Explanatory dictionaries
• Thesauri
• Bilingual dictionaries
• Style guides
Automatic Translation (AT)
• AT > MT (machine translation)
• AT ≥ MTM (machine translation + translation memory)
• Translation pre-editing tools (language compliance checker + authoring memory)
• Automatic text categorization (for selection of terminologies and translation memories)
• Translation memory (TM)
• Machine translation (MT)
Computer-Aided Translation (CAT)
• SL & TL text fields• Translation tools
• Machine translation• Translation memory• Translation search• Terminology access
• TL authoring and editing tools• General authoring and editing tools• Translation QA and translation post-editing tools
• Translation leveraging updates• Terminology updates• Translation memory updates
Problems with UGC — Low Quality
• Terse, ungrammatical constructions
• nonstandard CAPITALIZATION
• Missing, creative punctuation
• Accidental, intentional misspellings
• Nonstandard diction—colloquial abbreviations & acronyms, leetspeak, emoticons
Problems with UGC — Intrinsic Characteristics
• Cryptic, clipped style (chats, IMs, tweets)
• Conversational style
• Diverse term variants
• Wide range of lexicon
• Frequent neologisms
Solutions for Problematic UGC — Low Quality
• Better writing, self-editing
• Editing by others (designated content agents)
• Authoring and editing tools
• Translation pre-editing tools
• Dialect translation tools
Solutions for Problematic UGC — Intrinsic Characteristics
• MT based on leveraged resources produced as by-product of CAT translations
• Terminologies and translation memories based on automatic text categorization
• Continued improvement in pull (MT) translation environments dependent on quantity and quality of effort in related push (CAT) translation environments
• Ergo, need to support push translation environments and aggregated, quality-controlled user, community, and professional translation resources
UGC Translation
• Push translation implementations
• Google Translator Toolkit
• Pull translation implementations
• Outlook email translation (PROMT)
• Mojofiti blog translation (Google)
• eBay listing translation (SYSTRAN)
• Translation viewers
• Unedited MT (Microsoft)
• Translation Wiki (Microsoft)
Translation Viewers
• Bilingual text display in web browser or document editor
• Translation views—different strokes for different folks
• Single-language view (SL or TL)
• Original or translated content, with rollover display of corresponding sentence from translated or original content
• Dual-language view (SL and TL)
• Side-by-side or over-and-under display of original and translated content, with synchronized scrolling and sentence highlighting
Basic UGC Implementations
• Social media sites
• Groups, discussions, blogs
• Social networking services
• Integration of social functionality and UGC in host sites
Language Support Mechanisms
• User (reader) language preferences• User interface preferences• Content preferences
• Content language tagging• Default language tagging• Automatic language identification• Explicit language tagging
• Global language tools• Authoring and editing• Automatic translation• Computer-aided translation
Service Provider Infrastructure
• Designated content agents (DCAs)
• Based on SL & TL roles and permissions
• DCA access limited by content selection, language version, and time period
• Content service exchange
• DCA listings searchable by services, availability, bona fides
• Service job workflows and dashboards
• Payment processing
SL Designated Content Agents
• Authors—ghostwriters/contract writers & attributed contributors
• Editors & proofreaders
• Search engine optimization (SEO) & social media optimization (SMO) experts
• Personal branding & career management advisors
TL Designated Content Agents
• Translators
• Transcreators & original content developers
• Terminologists
• Editors & proofreaders
• SEO & SMO experts
• Personal branding & career management advisors
Content Service Exchange
• DCA service profile with services, language, terms, prices
• DCA current availability with time zones and turnarounds
• Traditional DCA bona fides—degrees, associations, certifications
• Exchange-based DCA bona fides—time on exchange, clients/projects served, satisfaction ratings, client testimonials
Cross-Language Search
• Based on work in cross-language information retrieval (CLIR) and multilingual information retrieval (MLIR)
• Five steps of cross-language search• Translation of search terms• Search on translated search terms• Aggregation, filtering, ranking of search results• Translation of search engine results page (SERP)• Translation of selected page/profile/content
Social Network Service Developers
• Provide multilingual user and content identification
• Support push and pull translation frameworks
• Provide service framework for content agents
Professional Writers and Translators
• Get ready for the new opportunities in the global UGC ecosystem
Merle TenneyLanguage Technology Consultant
Merle@MerleTenney.comwww.MerleTenney.com
Bringing the world to great productsTaking great products to the world
Defining User-Generated Content
• Author
• Users acting individually or in a community
• Initiator
• Users (self-initiated), professional (employee, service provider), community (social obligation)
• Medium
• Short text: text message, instant message, chat, microblog, status update
• Long text: profile, post, listing, review, email, blog, web page, document
top related