How STE and Analytical Tools Enabled
Intuit MT Program
Render Chiu, IntuitGroup Manager, Global Content & Localization
Alex Yanishevsky, WelocalizeSenior Product Architect
TAUS, October 14-15, 2013
• $4.15 billion rev in 2012 • Flagship products: QuickBooks,
TurboTax and Quicken• New: Mint.com, Intuit Money Manager• Markets: North America, Europe,
Singapore, Australia, India
The World is a big place…
3
•29 Million SMBs•500,000 Accountants•50 Million Employees
•600 Million SMBs•2.4 Million Accountants•1 Billion Employees
…and we’ve barely opened the door…
4
…but we have a clear vision…
5
…to be the World’s SMB Operating System!
…Then Business Tell Us To Have 10 Languages in 3 Months
Challenges (or Reality Check)How do you go global ASAP when you start from ground zero?
Requirement StatusBilingual translations None, except for FR-CAIn-house MT expertise NoneMT engine/technology NoneTMS + MT connector NoneStructured Content One Major Plus We Had
Going for Us: STE
What Were Our Options?Extreme Options We Chose Collaboration
• Lower cost by spreading the risk• Speed w/ immediate expertise• Scalability via deep supply chain
Four Questions on STE
• Q1. When is English not really English?• Q2. Does our choice of words hinder
translation?• Q3. Where does simplified English fit
in?• Q4. Does size really matter?
10
Q1. When is English not really English?
A. When the same words mean something different to different people• Clear (v)• Disable (v)• Table (v)
B. When we put words together in unexpected ways• One trick pony• Shoebox accounting
C. When we use slang or make up words• Huh, Oops, Whoopsie, Psst, Hmmm
11
Q2. Does our choice of words hinder translation?
A. Yes, when we use a word to mean more than one thing, and don’t provide context.
• “First” - does it mean first name (Prénom)? Or first day of the month (Premier)? First line (Première)*? First, run the export (D’abord)?
• “Tax” – is this sales tax? Income tax? Or payroll tax?
B. Yes, when we use two words with very similar meanings.• “Refund” and “rebate” - in French, they’re the same word.
– English: Enter a refund or rebate you receive– French: Entrer un remboursement ou un
remboursement que vous recevez.
12
Q2. Does our choice of words hinder translation?
C. Yes, when we don’t provide an English glossary of product-specific terms.
13
Q3. Where does simplified English fit in?
There are two basic types of controlled language:
1. Simplified or technical, used to improve documentation.
2. Logic based, used for software specifications, queries, proving theorems.
Simplified English belongs to the first category.
14
Q4. Does size really matter?YES!•Size of words. Use shorter words when possible:
• Help… not facilitate• Tell… not communicate or notify• Show… not indicate
•Size of sentences. Limit to 25 words (descriptions), 20 words (tasks)•Size of paragraphs. Limit to 6 sentences.•Size of titles, labels, and buttons.
• Content expands in other languages (buttons, field labels, etc.) FR +30%; DE +50%
Comprehensive MT ApproachWelocalize has a multi-tiered approach to machine translation (MT) implementation: 1) Evaluate content for MT readiness • source content audit • pre-translation editing• style and glossary verification
2) Assist in selection and integration of one or multiple MT engines into the localization technology ecosystem
3) Perform MT post-editing services• evaluation of MT output quality via workbench• human assessment and automated scoring• engine training feedback / engine improvement
4) Support transition from SaaS/hosted “black box” model to hosted glass box or in-house model
Predictable, Controllable, Progressive
Quality
Does Simplified English Work?Proof based on“Stinker” scores based on
POS Candidate ScorerPerplexity scores of bad LMTag density
• Stinker scores based on POS
• Perplexity scores based on LM
• Tag density
Candidate ScorerHow good is candidate text?
Take historically “bad” textPOS tagger on “bad text” and candidate textExclude list to reduce false positivesLower score is better, i.e. candidate text does
NOT match “bad” textRESULT: On sample of 10K sentences, Intuit’s
“stinker” score was 1.5-2 times LESS than 3 other companies!
• Compare to historically “bad” text
• Lower score is better
Capturing Perplexity for LMsBuild language model (LM) of “bad”
textIs candidate text closer to “bad” LM
or not?
RESULT: On sample of Intuit’s content (10K sentences), perplexity score was 2 times HIGHER, i.e. further away from “bad” text than 3 other companies!
Higher score is better
Tag Density
More tags results in increased post-editing and reduced efficiency
RESULT: On sample of 10K sentences, Intuit’s tag density was 3-4 times LESS than 3 other companies!
The fewer tags the better
How good is the MS Hub engine?Trained engine on 4,500 TUs and
3,000 glossary termsAutomatic scoringHuman evaluations for adequacy
and fluency
GOING LIVE!
• Auto scoring• Human evals
Results – Auto scoring 1
Bing PTBR
MS Hub PTBR
Bing ES
MS Hub ES
Bing DA
MS Hub DA
Bing NL
MS Hub NL
0.00 20.00 40.00 60.00 80.00
BLEU (MS Hub)BLEU (Bing)
Results – Auto scoring 2
Bing ID
MS Hub ID
Bing ZHCN
MS Hub ZHCN
Bing IT
MS Hub IT
Bing DE
MS Hub DE
0.00 20.00 40.00 60.00 80.00
BLEU (MS Hub)BLEU (Bing)
Results – Human evals 1
Bing ID
MS Hub ID
Bing ZHCN
MS Hub ZHCN
Bing IT
MS Hub IT
Bing DE
MS Hub DE
0.00 1.00 2.00 3.00 4.00 5.00
FluencyAdequacy
Results – Human evals 2
Bing PTBR
MS Hub PTBR
Bing ES
MS Hub ES
Bing DA
MS Hub DA
Bing NL
MS Hub NL
0.00 1.00 2.00 3.00 4.00 5.00
FluencyAdequacy
Lessons Learned
• Good wine comes from great grapes
• You can hire a professional tennis player to play for you
• You need a great team and a great partner
contactuswelocalizewww.welocalize.com 241 East 4th St. Suite 207Frederick, Maryland 21701 USA[t] +1.301.668.0330[t] +1.800.370.9515 Toll Free[f] +1.301.668.0335[e] [email protected]