turning nmt research into commercial products · • team members have published +100 on smt and...

Post on 26-May-2020

0 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

TurningNMTresearchintocommercialproducts

DragosMunteanuandAdriàdeGispert

Proceedings of AMTA 2018, vol. 2: MT Users' Track Boston, March 17 - 21, 2018 | Page 166

•  Foundedin1992•  3800+Employees•  56Offices•  38Countries•  400Partners•  1500Enterprisecustomers

helpingbigbrandsgoglobal

marketingcampaigns eCommerce documentation web,socialmedia analytics

78ofthetop100globalcompaniesworkwithSDL

+10BILLIONwordstranslatedmonthly

Proceedings of AMTA 2018, vol. 2: MT Users' Track Boston, March 17 - 21, 2018 | Page 167

SDLResearch–alonghistoryinMT

•  ResearchlabsinLosAngeles(USA)andCambridge(UK)•  Teammembershavepublished+100onSMTandrelatedtech

–  BillByrne,AbdessamadEchihabi,DragosMunteanu,GonzaloIglesias,EvaHasler,AdriàdeGispert,SteveDeNeefe,JonathanGraehl,WesFeely,LingTsou...

•  FormerlyLanguageWeaver–  15yearsofleadingexpertiseinSMT–  majorcontributions(papers/patents)inphrase-basedandstring-to-treeMT,

automata-basedhierarchicalMT,qualityestimation,tuning,evaluation...

•  Stronglinkswithacademia(UniversityofCambridge)

•  Summerinternships,industrialpost-docs

Proceedings of AMTA 2018, vol. 2: MT Users' Track Boston, March 17 - 21, 2018 | Page 168

Ourmission:BringMTresearchresultstoproducts

Approachesthatworkformanylanguagepairs

Connectors,plug-ins…

Translationspeed

Privacy!Top-qualityMTonpremiseandinprivatecloud

Hightranslationquality

Customization/Personalization

Respectfileformatsandtags

Controllablememoryanddiskfootprint

Robustnesstomis-spellings

○  Westrivetoprovideourcustomers:

Abilitytolearnovertime(AdaptiveMT)

Terminologyanddictionaries Consistency

Proceedings of AMTA 2018, vol. 2: MT Users' Track Boston, March 17 - 21, 2018 | Page 169

SDLSecureEnterpriseTranslationServer

ü 45NMTenginescurrentlyavailable

DataSecurity-  Onpremises/privatecloud-  Usedbygov’tfor15years

Quality/Customization-  NeuralMT-  CustomMTout-of-the-box

Cost-effectivescalability-  Elastic,optimizedfootprint-  Commodityhardware

EaseofUse/Integration-  deploysInhours-  MSplug-in&RESTAPI

Proceedings of AMTA 2018, vol. 2: MT Users' Track Boston, March 17 - 21, 2018 | Page 170

Neural Machine Translation

Proceedings of AMTA 2018, vol. 2: MT Users' Track Boston, March 17 - 21, 2018 | Page 171

SMT•  Symbolicmodels•  Independenceassumption

(separatesub-problems)•  Maximum-likelihood

estimation•  CPU-orientedtraining•  Source-side-guideddecoding•  Largedatabases

NeuralMT•  Continuous-spacemodels•  Singleend-to-endmodel•  Discriminativetraining•  RelianceonGPUs•  Target-side-guideddecoding•  Smallercompactmodels

Aparadigmshift

Proceedings of AMTA 2018, vol. 2: MT Users' Track Boston, March 17 - 21, 2018 | Page 172

Bettertranslationmodels

[Zhouetal.’16]

[Gehringetal.’17][Vaswanietal.’17]

[Sutskeveretal.’14][Bahdanauetal.’15]

Proceedings of AMTA 2018, vol. 2: MT Users' Track Boston, March 17 - 21, 2018 | Page 173

BetterBLEUscores

Rules-Based 1970

Statistical MT

2002

Neural MT

2016

WAT Jpn-Eng Eng-Jpn

2014 23.8 35.0

2015 25.4 35.8

2016 27.6 36.2

2017 28.4 41.5

+4.4 !! +6.5 !!

COMPARINGOUTPUTS

©2017SDLPlc

SMT

NMT

Observablequalityimprovement

国連難民高等弁務官事務所(UNHCR)は、内戦状態にあるシリアから逃れた難民の数が5百万人を超えたと発表した。

Office of the United Nations High Commissioner for Refugees (UNHCR) is in a state of civil war when the number of refugees who have escaped from Syria have exceeded 5 million people.

The United Nations High Commissioner for Refugees (UNHCR) announced that the number of refugees escaped from Syria in the civil war was over five million people.

ü  30%improvementoverSMTacrossallourproductizedengines

But…isitALLthatgood?

TherearesituationsinwhichNMTfails

•  Whenitfails,itfailsspectacularly –  unrelatedfluenttext–  repetitions,neurobabble…

•  MTuser/customerexpectations–  “MTisnotsupposedtodothis”!!!?!–  “CanitsupportthefeaturesIneed”???

Over-generationand‘neurobabble’

There was no clear correlation between the measured mass density and the measured mass density, and neither experiment A or B.

The company will pay approximately EUR 600 million in fines, and the U.S. Department of Justice (SEC) to pay for approximately EUR 600 million, and the U.S. Department of Justice and the Justice Department of Justice (SEC) to reduce the amount of internal control of the board of directors of the board of directors of the board of directors…

Over-generationand‘neurobabble’

There was no clear correlation between the measured mass density and the measured mass density, and neither experiment A or B.

The company will pay approximately EUR 600 million in fines, and the U.S. Department of Justice (SEC) to pay for approximately EUR 600 million, and the U.S. Department of Justice and the Justice Department of Justice (SEC) to reduce the amount of internal control of the board of directors of the board of directors of the board of directors…

DataisEVENMOREimportant

•  NewNMTmodelsarebetterlearners– Abetterfittothetrainingdata– Relevanttrainingdataiskey– Avoidbabbleandgethugegains!

•  Domainadaptation/dataselection[FreitagandAl-Onaizan’16][Chenetal.’17][Britzetal.’17][Farajianetal.’17][VanderWeesetal.’17][Wangetal.’17]…

Adaptingneuralmodels

Majorimprovements!Challenge:§  Adapttocustomer

domain/datawithminimalre-training

§  Maintainhighqualityacrossdomains

Jpn-Engcorpus #words

Generic >300M

Automotive <1M

Lexicalselection

•  NMTmodelshavefreedomtoproduceanytargetword–  Guidedbysource,notconstrained

•  SMTenginesweregoodatlexicalselection–canweleverage?–  T-table,n-gramandphraseprobabilities,memory-augmentedmodels/search

[Arthuretal.EMNLP’16][Stahlbergetal.EACL’17][Wangetal.;Dahlmannetal.;Fengetal.EMNLP’17][Zhangetal.IJCNLP’17]…

[Arthuretal.EMNLP’16]

[Wangetal.EMNLP’17]

NMTcanuseN-gramposteriorprobabilities

Stahlbergetal.(EACL’17):“NeuralMachineTranslationbyMinimisingtheBayes-riskwithRespecttoSyntacticTranslationLattices”

But…arethereguarantees?

•  Controlisamustforcommercialsuccess•  Oneverybadsentencecanputoffacustomer

–  Back-offifneeded

•  Customers/Usersexpectcertain‘features’–  Decodingspeed,dictionarysupport,formattingconstraints,AdaptiveMT,…

Dictionarysupport

•  Translationoutputmusttranslatedictionarymatchesexactly–constrainedsearch

•  EasyforSMTdecoders•  NMTbeamdecoderdoesnotkeepanalignmentbetween

sourceandtargetwords

“Zimra Games continues to innovate with the release next month of Coke Assault 3, which will satisfy the most demanding gamers.”

English German

ZimraGames ZimraGamesGmbH

CokeAssault3 CokeAssaultIII

… …

[Andersonetal.EMNLP’17][Hokamp&LiuACL’17][Chatterjeeetal.WMT’17]

Dictionarysupport

Constrainedsearch•  Buildafinite-stateacceptorwith

thetarget-sideconstraints•  Keeponeseparatestackper

eachacceptorstate•  Outputonlyhypothesesfrom

thefinalacceptorstateØ Constraintscanbewordsor

phrases[Andersonetal.EMNLP’17]

Dictionarysupport

Challenges•  Computationalcomplexity

growsexponentiallywiththenumberofconstraints–  orderisunknown

•  Nothingpreventsrepeateddecoding:

“Zimra Games GmbH setzt mit dem Veröffentlichung auf Coke Assault III im nächsten Monat der Angriff …

Entityconstraints

•  Decodermustalsorespectmeta-tags–  KeytosupportfileformatsusedbyMTusers

•  NMTmodelshouldnotbreaksequentialhistory•  Solutionsrequiremodelspecializationand/ordecodingrestrictions

“<B>Zimra Games</B> continues to innovate with the release <I>next month</I> of <B>Coke Assault <c=red>3</c></B>, which will satisfy the most demanding gamers.”

Decodingspeed

•  MTusersareexpectedtocertaintranslationspeeds–  Targetspeedvaries,butwellaboveresearchengines

•  Goalistoprovidebestqualityatdesiredspeed–  Speedvsqualitytrade-off

•  NMTdeploymentscenarios–  CPUonly–hand-helddevices,…–  GPU

•  NMTtrainingspeedalsorelevant

Decodingspeedvsqualitytrade-off(1)

•  Modelarchitecture–  recurrent,convolutional,attentional…

–  numberofparameters,layerprecomputations…

–  Unfoldingandshrinkingensembles

[StahlbergandByrne,EMNLP’17]

StahlbergandByrne(EMNLP’17):“UnfoldingandShrinkingNeuralMachineTranslationEnsembles”

Decodingspeedvsqualitytrade-off(2)

•  HardwareandLinearAlgebralibrary–  TypeofGPUcard–  CPU-GPUcommunication–  GPUusage

•  Batching–  standardintraining

Decodingspeedvsqualitytrade-off(3)

•  Decodingparameters–  beamsize,earlystopping…

•  Reducedvocabularysoftmax(CPU)•  Weightclippingintraining

–  Low-precisioninference[Wuetal.’16][Devlin,EMNLP’17]…

Copyright©2008-2017SDLplc.Allrightsreserved.Allcompanynames,brandnames,trademarks,servicemarks,imagesandlogosarethepropertyoftheirrespectiveowners.ThispresentationanditscontentareSDLconfidentialunlessotherwisespecified,andmaynotbecopied,usedordistributedexceptasauthorisedbySDL.

SoftwareandServicesforHumanUnderstanding

top related