2013 gala miami: breaking into latin maerican markets on a small budget

Post on 18-May-2015

124 Views

Category:

Technology

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

The Latin American market is composed of a mix of various Spanish dialects. If a company really wants to reach a specific audience in Latin America, it must use the right dialect. But how is it possible to translate marketing materials into four or five Spanish dialects without dramatically increasing costs? This session will discuss how a joint effort to create an MT engine for translating international Spanish into specific Latin American dialects (Spanish for Argentina, Chile, Columbia, Mexico, and Puerto Rico) made this challenge feasible, economical, and replicable.

TRANSCRIPT

An MT Case Study:

Breaking into Latin American Markets

on a Small Budget

María Azqueta (SeproTec) & Diego Bartolomé (tauyou)

Spanish Worldwide

Spanish Language:

• Also known as Castellano.

• Latin-derived Romance language.

• Spanish is one of the six official languages of

the United Nations and an official language of

the European Union.

Spanish Worldwide

Spanish Worldwide

0 200 400 600 800 1000 1200

Mandarin Chinese

Spanish

English

Hindi/Urdu

407 million

311 million

955 million

360 million

Second most spoken language by number of native speakers

Spanish Worldwide

• For demographic reasons, the percentage of the

orld’s populatio that speaks Spa ish as a ati e language is increasing, while the percentage of

Chinese and English speakers is decreasing.

• Withi three or four ge eratio s, % of the orld’s population will communicate in Spanish.

• I 5 , the U ited States ill e the orld’s foremost Spanish speaking country.

Spanish on the Internet

• Spanish is the third most widely used language on

the Net.

• The use of Spanish on the Net has experienced a

growth rate of 807.4% between 2000 and 2011.

• Spain and Mexico are among the 20 countries with

the highest number of internet users.

• The demand for documents in Spanish is the fourth

largest fro a o g the orld’s la guages.

Spanish Worldwide and its Differences

High demand for translations into Spanish.

But… is the same Spanish spoken everywhere?

Spanish Worldwide and its Differences

RAE (Royal Spanish Academy) :

– Created in the 18th century, it is widely seen as

the arbiter of what is considered standard

Spanish.

– It produces authoritative dictionaries and

grammar guides.

– Although its decisions are not formally binding,

they are widely followed in both Spain and Latin

America.

Spanish Worldwide and its Differences

Lexical variations

Grammatical differences

Idioms

Different dialects and many differences:

Spanish Worldwide and its Differences

‘Neutral’ or ‘International’

Spanish

Latin American Spanish & European Spanish

Market Trend:

Why Adapt to the

Local Spanish of Each Country?

To reach different markets

People are most likely to buy when a product is advertised in their dialect

Why Adapt to the

Local Spanish of Each Country?

EN: Take a card from the deck

ES: Coge una carta de la baraja

Client A (Gaming Industry)

Why Adapt to the

Local Spanish of Each Country?

ES: Coge una carta de la baraja

AR: Agarrá una carta del mazo

CL: Toma una carta del naipe

CO: Coge una carta de la baraja

MX: Saca una carta de la baraja

PR: Coge una carta de la baraja

Coger (32 entries) http://rae.es/rae.html

1.tr. Asir, agarrar o tomar. U. t. c. prnl.

31. intr. vulg. Am. Realizar el acto sexual

Why Adapt to the

Local Spanish of Each Country?

Advise Clients

If you really want to break into a specific

market, you must decide which country

you want to target and localize your

material for the different Spanish dialects

spoken in each individual country.

The Main Problems Clients Face

Is there a cost-efficient solution

on the market?

tauyou MT Solution at SeproTec

Hybrid machine translation since January 2011

La guages: EN, ES, PT, GA, FR, IT…

Do ai s: Legal, Te h i al…

Glossaries and forbidden words lists

Average translated words per month: 700,000

Initial Brainstorming

MT from

EN > different ES dialects

Extensive post-editing would be required

Final Scope of the Project

Human translation + revision

English > Spanish (Spain)

MT of Spanish (Spain) into Spanish from:

• Argentina

• Chile

• Colombia

• Mexico

• Puerto Rico

Initial Approach for Latin American MT

Traditional Workflow

. Gather tra slatio e ories (EN → ES-XX)

2. Add generic material

3. Develop engine

4. Add linguistic pre- and post-processing

5. Improve quality over time

Drawbacks

Varying MT Quality

Depending on the domain and dialect

Initial Inconsistencies among Dialects

Handled with glossaries

Medium Post-Editing Effort

Could be improved over time

New Approach

Translate EN to Standard ES

Via standard high-quality human translation

Convert Standard ES to Latin American Variants

From Spanish to Spanish

Better final quality is achieved

Specifications

Countries

Argentina, Chile, Colombia, Mexico, Puerto Rico

Internal Glossaries to Handle Lexical Variations

It corrects discordance

Idioms

Grammatical Differences

It adapts verb tenses

Testing the Prototype Engine

Extraction of several texts (fashion, real-estate, human resources, automobile)

Sent to linguists and/or translators in each target country for localization

Performance of the same localizations by the engine

Comparison and contrasting of human and machine localization results

First Bug Report

Not all terms were localized

Concordance issues

(masc./fem.; sing./pl.)

Verbal tenses for Argentina

Human vs. Machine MT: 7.78 % error rate

First Bug Report

Some terms were changed/localized by the engine, but not by the humans.

(example)

Human error or MT error?

Testing the Prototype Engine

A glossary was created by extracting the terms localized by the linguists/translators.

This glossary was then sent to the same people who localized the texts to verify that all the terms were correctly localized and nothing was missing.

Testing the Prototype Engine

The glossary grew by 36.91%!

Testing the Prototype Engine

People can miss things.

Although many different variants of Spanish

exist, Spanish speakers understand many

terms that are foreign to their own dialect

when they read them in context,

sometimes to the point of accepting them

as their own. I believe that this may be

due to the phenomenon of globalization

and the internet.

Latest Bug Report

MT: 1.21% error rate

Achievements

Very little post-editing needed

Reduced error rate

Shortened deadlines

Significant cost reduction

Conclusions

Human localization is not perfect.

MT is not perfect either.

Combining human and machine translation

helps achieve high quality and reduce cost.

Further Work

Improving Glossaries

Through a simple web interface for PE

Extending Spanish Language Coverage

More dialects

Traductor.cervantes.es

Incorporating more languages

English, French and Portuguese

Bibliography

Yule, G. (2006). The Study of Language: Third

Edition, Cambridge University New York.

RAE

Instituto Cervantes

http://www.linguapress.com

THANK YOU FOR

YOUR TIME!

María Azqueta

mazqueta@seprotec.com

Diego Bartolomé diego.bartolome@tauyou.com

top related