an introduction to machine translation...machine translation can improve the capacity of...

16
An introduction to machine translation: What, when, why and how? WHITE PAPER Capita Translation and interpreting

Upload: others

Post on 26-Jun-2020

42 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: An introduction to machine translation...Machine translation can improve the capacity of translation, allowing for more content to be translated, with reduced costs and within a smaller

An introduction to machine translation:What, when, why and how?WHITE PAPER

Capita Translation and interpreting

Page 2: An introduction to machine translation...Machine translation can improve the capacity of translation, allowing for more content to be translated, with reduced costs and within a smaller
Page 3: An introduction to machine translation...Machine translation can improve the capacity of translation, allowing for more content to be translated, with reduced costs and within a smaller

3Machine translation

Introduction 4

What is machine translation (MT)? 5- How does it work?

When is MT appropriate? 6- What quality can you expect from MT?

- Will MT make human translators redundant?

What are the stages of MT? 8

What are the benefits of using MT? 9

What’s next for MT? 10

MT in practice 11- Case study: Global manufacturing client

- Case study: Scarab Sweepers

7 tips to create “machine translation friendly content” 12

Contents

Page 4: An introduction to machine translation...Machine translation can improve the capacity of translation, allowing for more content to be translated, with reduced costs and within a smaller

Capita Translation and interpreting

Introduction

We’re living in an increasingly connected world: markets, countries and populations previously considered remote or inaccessible are now potential users and customers. Translation connects people and businesses across the globe and allows them to communicate, no matter what the language.

However, content is growing at a rapid pace, and by the year 2020, approximately 1.7 megabytes of new information will be created every second for practically every human being on the planet. What’s more, there’s now a demand for immediacy, as users expect content to be available in their language instantly. With the expectation that mountains of content can be translated within limited budgets, translations are needed quicker than ever before.

Translating vast amounts of content could bring complications around cost, quality and time to market. Technology is becoming increasingly commonplace within translation, in order to remedy some of these potential issues.

Advanced translation technology, housed securely and maintained by professional language service providers, is changing the face of translation – large amounts of content can be translated quickly, at a reduced cost, whilst still maintaining the quality of the final content.

4

Page 5: An introduction to machine translation...Machine translation can improve the capacity of translation, allowing for more content to be translated, with reduced costs and within a smaller

5Machine translation

What is machine translation?

Machine translation (MT) is the use of software to translate text from one language to another. The term spans a variety of tools, with differing levels of maturity - from free, online translation tools to custom-built, industry-specific translation engines.

Whilst free machine translation tools may seem appealing or useful to infer the general ‘gist’ of text, they aren’t appropriate for professional businesses due to issues surrounding quality and security. Once text is input into a free, online translation engine, the information is no longer secure or confidential. On the other end of the spectrum, professional Language Service Providers (LSPs) build and maintain secure translation engines and linguistic assets, which remain the intellectual property of the client, not the translation provider.

Securely hosted and custom-built machine translation engines can help to create a tailored output, reduce the cost and turnaround time of translations for clients, and maintain quality and consistency across languages.

How does it work?

All machine translation engines use software to carry out substitutions of words in one language for words in another language. Simple, free translation engines will stop there, producing a low-quality translation which could be inaccurate or sound unnatural to a native speaker.

Advanced machine translation engines are built from client or domain specific data, which produce more focused and usable output. These specific data sets are stored in a translation memory (a database which stores previously translated text) and a terminology database. As more and more content is translated, these linguistic assets can be updated and integrated to further improve the performance of the MT systems.

Translate: Files are run through the machine translation engine. The engine then checks the input and, based on the statistical models and AI, it produces an automated output.

Post-edit: The automatically generated translation is populated in the target files. A professional translator can check and amend the output, where required.

Deliver: The translated document is delivered to the client. The translation memory is updated with newly translated terms.

Page 6: An introduction to machine translation...Machine translation can improve the capacity of translation, allowing for more content to be translated, with reduced costs and within a smaller

Capita Translation and interpreting 6

When is MT appropriate?

The use of MT depends on the original content, subject matter, purpose and intended audience. Machine translation works best on structured content, such as technical subject matter, user manuals, or manufacturing related material.

MT is also appropriate when looking to understand the ‘gist’ or to get the general meaning of a document – a technique which can be applied across a variety of sectors, for example, in the legal sector for multilingual document reviews, or during the eDiscovery process.

Marketing, advertising or creative content isn’t appropriate for machine translation, as the engines are not able to convey the meaning of idioms, colloquialisms or slang, as a human translator can.

Before any new MT project, an ‘MT evaluation’ stage will be carried out. Project/Account Managers will call upon their knowledge, expertise and industry tools to assess whether MT is the best solution. This stage also includes conversations around turnaround time, audience and target languages, to ensure that client expectations are managed correctly.

What quality can you expect from MT?

There are 3 key areas which will determine the quality of any machine translated content: data, engine building and evaluation.

1. Data

As the age old saying goes: “you get out what you put in” and machine translation is no exception. Data is the foundation of MT, and therefore, collecting, preparing and cleaning data is key to a high-quality output. Before the translation process can start, it’s important to ensure that your data and terminology is aligned and prepared in line with the translation engine.

2. Engine building

MT engines can be tailored to specific industries, making the content output more focused. These engines can be customised further, using client specific data (such as glossaries and style guides), and they can be integrated into your standard localisation process as a means of enhancing productivity – whether that be in the LSP’s translation management system (TMS) or your external systems.

When LSPs build machine translation engines, various models are built from different samples of data, and the outputs are scored and compared to select the best performing engine. In advanced, mature language technology teams, engine building is a continuous process – small tweaks and changes are made to engines based on feedback and quality scores.

Page 7: An introduction to machine translation...Machine translation can improve the capacity of translation, allowing for more content to be translated, with reduced costs and within a smaller

7Machine translation

3. Evaluation

One of the biggest questions surrounding quality in the localisation industry is evaluation, especially as a ‘one size fits all’ approach isn’t applicable. Assessing the quality of a translation depends on the content type, intended audience and purpose.

With so many variables and subjectivity, it can be difficult to pinpoint what high-quality looks like. Independent organisations within the industry have developed tools, best practices and metrics, such as TAUS DQF, to help evaluate translation quality. As with engine building, evaluation is a continuous task – the outputs of a machine translation engine are still measured once the engine has been deployed.

As with data, assessing the quality of the source files at the beginning of the process is just as important. Some experienced LSPs have developed their own in-house quality checkers, which can identify any potential issues before the translation stage, such as spelling mistakes, line breaks and unmatched brackets.

Will MT make human translators redundant?

Whilst technology in the translation industry continues to evolve, translation and language are still subjective. Not all content is created equal - advertising, slogans or marketing content will always require input from a native-speaking linguist.

Post-editing is a key stage in machine translation, which involves input from a human translator, to ensure accuracy and a high-quality output. This level of input can vary, from ‘light’ to ‘full’.

Examples of ‘light’ post-edit tasks • Correct any obvious issues

• Check that key terminology is consistent

• Amend any confusing sentences

Examples of ‘full’ post-edit tasks • All tasks included in a ‘light’ post-edit

• Improve the ‘flow’ or ‘style’ of a text

• Check terminology matches approved linguistic assets, such as style guide and/or glossary

• Adapt cultural references to relevant target language/market

• Apply correct formatting

Page 8: An introduction to machine translation...Machine translation can improve the capacity of translation, allowing for more content to be translated, with reduced costs and within a smaller

8 Capita Translation and interpreting

What are the stages of MT?

Most experienced LSPs will have developed a check list or process map for new MT projects. The following stages are usually covered:

1. Consultation & evaluation

Assessment of whether machine translation is right for the source document, and which of your business challenges can be solved using this technology.

2. Content & data mining

Includes collection, preparation, cleaning and normalising of terminology, using your brand and language assets.

3. Processing & integration

Content is translated by the engine and then post-edited by a human linguist. Linguistic assets can be managed with tools such as translation memory, which helps to lower costs over time. By integrating these assets within the localisation workflow, productivity gains can be massive, as higher volumes of work can be translated in shorter time frames.

4. Quality evaluation

This stage involves measuring the quality of the translated content and a continual assessment of how you can get the most out of your MT engines, with a regular report of findings.

5. Maintenance

In order to guarantee machine translation that is not only accurate, but also reflects the client’s brand and tone of voice, the engine will be continually updated with the latest customer data, including glossaries.

Page 9: An introduction to machine translation...Machine translation can improve the capacity of translation, allowing for more content to be translated, with reduced costs and within a smaller

9

What are the benefits of using MT?

Improved productivity

Machine translation can improve the capacity of translation, allowing for more content to be translated, with reduced costs and within a smaller time frame. What’s more, machine translation can produce gist level translations too, so content that was not previously translated as part of a standard translation process, due to higher cost and turnaround time, can now be processed via machine translation.

Lower translation costs

Effective deployment of these tools can see translation costs and turnaround times reduced by up to 30%, without any sacrifice in quality. Applying technology obviously speeds up the process, so greater volumes of work can be translated in a much shorter period of time.

Reduced time to market

Automating part of the translation process and reducing the human involvement inevitably saves time. Studies indicate that a translator carrying out post-editing can produce 5,000 to 10,000 words per day, in comparison to 2,000 through human-only translation.

Secure environment

Translating your files using free, online translation tools is a huge risk to your intellectual property and the security your files, as the minute you upload your content to the internet, this information is made public property.

Using a professional LSP ensures that all client information is stored within a secure data centre, and all translation projects are managed and performed within a central ecosystem, rather than files being transferred via unsecure email networks or stored on linguists’ personal computers.

Page 10: An introduction to machine translation...Machine translation can improve the capacity of translation, allowing for more content to be translated, with reduced costs and within a smaller

Capita Translation and interpreting 10

What does the future hold for MT?

Machine translation has made huge technological progress in the past decade, and developments in this area look set to increase further, with technology at the forefront of many LSPs’ agendas.

What’s more, with the right approach and investment into machine translation, the challenge of balancing time, cost and quality can be overcome. This realisation has led to teams within large businesses proactively requesting the use of technology in the translation process, in order to better manage their multilingual content budget.

Machine translation is likely to become a key focus for translation buyers, as the recent technological advancements are quickly leaving behind the old perception that MT only provided poor quality translations.

Some of the technology heavyweights such as Google, Microsoft and Facebook are making news headlines with their own developments in translation technology and natural language processing, as they strive to ensure that their products, websites, apps and services are available and understandable to global users, no matter the language.

The move towards neural machine translation is also noticeable amongst some of the large technology companies and LSPs. Neural-based engines are able to take into account the context and structure of the source sentence in order to create a more ‘natural’ sounding translation.

Many LSPs are continually testing, experimenting, deploying and incorporating new models, including neural-based, into their machine translation engines, to create the optimum combination.

Machine translation is expected to become a more integral part of the translation process in future years, and develop further, with more intelligent and sophisticated processing of language data. As with translation in general, a ‘one size fits all approach’ isn’t appropriate for MT, but that’s not to say that it’s an exclusive process for a specific subject matter. LSPs can work with linguists and customers to build and develop customised machine translation engines, to deliver high-quality multilingual content on time, and within budget.

Page 11: An introduction to machine translation...Machine translation can improve the capacity of translation, allowing for more content to be translated, with reduced costs and within a smaller

11Machine translation

MT in practice

Case study: Global manufacturing client

We have been providing technical translations to a global leader in the plant and construction industry for a number of years, covering content types such as manuals and service magazines.

In one year, we received 127 million English words for translation from the client, across 52 different language combinations. Through the use of translation memories, the figure of 127 million was reduced to 10 million words requiring new translation.

Extensive translation memories were used to build customised MT engines, increasing the level of usable translated output created by the engines.

With the use of MT engines, 50% of the words received for translation were processed by MT engines, enabling the customer to significantly reduce their translation costs, with a saving of 30% on the new word rate.

Case study: Scarab Sweepers

A global leader in the design and manufacture of road sweepers, Scarab Sweepers, currently operates in over 25 countries around the world. In order to successfully operate in so many different countries, operating manuals must be available in a variety of languages. This requires a high volume of technical translation, localised for the different countries and languages they are required in.

With the use of our machine translation engines, we were able to deliver considerable cost savings during the translation process. As we have worked with Scarab Sweepers previously, we were able to further build on a good working relationship, and fully understand their objectives.

“It would be very difficult to improve on the current service I receive from Capita TI. I am extremely pleased with everything that Capita TI has done for Scarab Sweepers in translating our manuals into other languages.” Andre Ray,

Technical Publication, Scarab Sweepers

Page 12: An introduction to machine translation...Machine translation can improve the capacity of translation, allowing for more content to be translated, with reduced costs and within a smaller

Capita Translation and interpreting 12

7 tips to create “machine translation friendly content”

In order to achieve maximum results from machine translation, both in terms of quality and cost, it is essential to write your documentation in a clear, coherent, concise and structurally correct format.

1. Spell check

This might sound like a basic rule, but a machine translation engine cannot accurately identify and translate a word that has been spelled incorrectly. Even a human translator would still question the source content, leading to longer turnaround times. Make sure that you proofread your content before sending it for translation.

2. Recycle sentences and be consistent

A translation memory can recognise repeated phrases throughout a document, and can store and update their multilingual version in a consistent manner. By re-using and repeating phrases and terms throughout your content, costs and turnaround times will be reduced.

Good example: It is important to eliminate any errors in a document. Proofreading is crucial because proofreading eliminates errors. Bad example: It is important to remove any mistakes in a document. Proofreading is crucial because proof-reading eliminates errors.

3. Keep your sentences short, with a simple grammatical structure

Although advanced technology can handle lengthy sentences, try to keep to the notion that one idea equals one sentence. Where possible, break long sentences into two shorter ones. Keep your sentences between 5-25 words, as these are the easiest sentences for a machine to translate. Sentences of less than 5 words though can prove to be problematic, as they are seen to be vague or ambiguous.

Do not over-complicate the structure of your sentences. Ensure that each phrase is complete (begins with a capital letter, has one main clause, and ends with the correct punctuation).

Good example: Machine translation can play a vital role in your localisation strategy. Bad example: Your translated copy, as part of your localisation strategy, can be assisted by a machine that plays a vital role; that which we call machine translation.

Page 13: An introduction to machine translation...Machine translation can improve the capacity of translation, allowing for more content to be translated, with reduced costs and within a smaller

13Machine translation

4. Remove unnecessary words

Get rid of words that do not contribute to the meaning of a sentence, or words that over-complicate the structure.

Good example: He works on marketing projects. Bad example: He is the man who works on marketing projects.

5. Avoid ambiguity; use the active voice

The active voice is a style of writing that cuts out vagueness and ambiguity. Again, if a human is unsure on the exact meaning of a phrase, then a machine translation engine is going to struggle, especially if your sentence has a double meaning.

Good example: I will always remember my first time using a machine translation engine. Bad example: My first time using a machine translation engine will always be remembered.

This phrase is vague because it is unclear who will always remember using the machine translation engine; it could be you, someone else, or the world in general.

6. Use the definite article, even when you don’t want to

Try to specify nouns using “the”, as a machine translation engine can struggle to distinguish between verbs and nouns. A lot of short nouns can also be verbs, for example ‘skip’, ‘bank’, ‘lodge’ – these can cause further confusion if used without a definite article. Instructions and user manuals often omit the definite article.

Good example: Build the engine. Train the engine. Use the engine. Bad example: Build engine. Train engine. Use engine.

7. Avoid idioms/clichés/slang/colloquialisms/abbreviations

A machine translation engine may not convey the correct meaning of colloquial or idiomatic phrases and the translation may not make sense to international users.

Good example: She didn’t come into the office as she was not feeling well. Bad example: She didn’t come into the office as she was under the weather.

Page 14: An introduction to machine translation...Machine translation can improve the capacity of translation, allowing for more content to be translated, with reduced costs and within a smaller

Capita Translation and interpreting 14

Notes

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Page 15: An introduction to machine translation...Machine translation can improve the capacity of translation, allowing for more content to be translated, with reduced costs and within a smaller
Page 16: An introduction to machine translation...Machine translation can improve the capacity of translation, allowing for more content to be translated, with reduced costs and within a smaller

Find out more about how partnering with Capita gives you the assurance of quality, global reach and trusted delivery on time, every time by visiting:

https://www.capitatranslationinterpreting.com

Or for account queries please contact us at:

Email: [email protected] (UK): +44 (0)845 367 7000