utx introduction (iso-tc37 matsue) -20150709-yamamoto (distributable)

17
UTX, a simple glossary format Yamamoto Yuji JISC member UTX Team Leader, AAMT Representative, CosmosHouse (Originally delivered on June 25, 2015. Modified in July 9, 2015)

Upload: yujiya

Post on 23-Jan-2018

411 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: UTX Introduction (ISO-TC37 Matsue) -20150709-Yamamoto (distributable)

UTX, a simple glossary format

Yamamoto YujiJISC member

UTX Team Leader, AAMT

Representative, CosmosHouse

(Originally delivered on June 25, 2015. Modified in July 9, 2015)

Page 2: UTX Introduction (ISO-TC37 Matsue) -20150709-Yamamoto (distributable)

Less is moreSimplification is a sophisticated process

Page 3: UTX Introduction (ISO-TC37 Matsue) -20150709-Yamamoto (distributable)

AAMT (Asia-Pacific Association for MT)

MT users

MT researchersMT manufacturers

http://www.aamt.info/

EAMT(Europe)

AAMT

(Asia-Pacific)

AMTA(Americas)

IAMT: International Association for MT

Page 4: UTX Introduction (ISO-TC37 Matsue) -20150709-Yamamoto (distributable)

Language/translation consultant

UTX team leader

SDL Trados official instructor (fully SDL Certified)

YAMAMOTO Yuji (Representative, CosmosHouse)

Page 5: UTX Introduction (ISO-TC37 Matsue) -20150709-Yamamoto (distributable)

What is UTX?

UTX stands for Universal Terminological eXchange

Developed by AAMT (Asia-Pacific Association for MT)

Simple glossary format for terminology tools and MT

For creating, sharing, and reusing glossary data

Page 6: UTX Introduction (ISO-TC37 Matsue) -20150709-Yamamoto (distributable)

• Easy to create and manage on a spreadsheet

Simple tab-delimited

• Quality control via term statusReliable

• MT and termbase toolsConvertible to other formats

• Manage bilingual glossaries for both ways

Bidirectional bi/multilingual

4 merits of the UTX glossary format

Page 7: UTX Introduction (ISO-TC37 Matsue) -20150709-Yamamoto (distributable)

UTX is non-expert friendly

• Systematic approach to translation is not yet fully developed in Japan.–No translation major in universities.

• There are many individuals and small LSPs who could benefit from standardized glossary formats.

• UTX is especially easy to use when you start creating a glossary.

Page 8: UTX Introduction (ISO-TC37 Matsue) -20150709-Yamamoto (distributable)

UTX as a step towards a more complicated format (TBX)

No glossary TBX

No glossary

UTX TBX

This is the hard part!

Page 9: UTX Introduction (ISO-TC37 Matsue) -20150709-Yamamoto (distributable)

Edit UTX glossaries on Excel

Standardized term status

Page 10: UTX Introduction (ISO-TC37 Matsue) -20150709-Yamamoto (distributable)

UTX facilitates

sharing and reusing of glossaries

Translation Client

Language Service

ProviderTranslator A

UTX

glossaries

Translator B

Page 11: UTX Introduction (ISO-TC37 Matsue) -20150709-Yamamoto (distributable)

UTX is Rule-based MT friendly

• Statistical MT is less accurate with Japanese.

– difference of language structures.

• In Japan, RbMT packages (Toshiba, Fujitsu, Cross Language, Kodensha, and more) are available.

• Some SMT (and hybrid MT) can also use UTX.

Page 12: UTX Introduction (ISO-TC37 Matsue) -20150709-Yamamoto (distributable)

UTX can handle conversions to simple formats

• Some information might be lost, but still useful.

• Some users/tools don’t need the awesome power of TBX.

Page 13: UTX Introduction (ISO-TC37 Matsue) -20150709-Yamamoto (distributable)

Conversion to/from UTX

MT

dictionaries

Termbases

Excel

UTX

glossaries

TBX

UTX Converter

Page 14: UTX Introduction (ISO-TC37 Matsue) -20150709-Yamamoto (distributable)

UTX glossary sample

#UTX 1.11; en-US/zh-CN; 2014-09-25; copyright: AAMT (2012); license: CC-by 3.0

#src tgt src:pos term status

Asia-Pacific Association for Machine Translation

亚洲太平洋机器翻译协会 properNoun approved

dictionary administrator 字典管理员 noun approved

contributor 用语提交者 noun provisional

domain 领域 noun

glossary 词汇表 noun

bidirectional 双向 adjective approved

merge 合并 verb approved

Source term(American English)

Part of speech

Term statusTarget term (Chinese)

Manage essential glossary data in a standardized format

Information about the glossary (creation date, license, etc.)

Term status provides reliability

Page 15: UTX Introduction (ISO-TC37 Matsue) -20150709-Yamamoto (distributable)

Use case examples

• 2.2 million entries (as of 2015) are created by the Japan Patent Office.

– Chinese-Japanese glossary

• Glossary data for MT/interpretation for tourism.

Page 16: UTX Introduction (ISO-TC37 Matsue) -20150709-Yamamoto (distributable)

Conclusion

1. UTX can help non-experts.

2. A UTX glossary serves as a basis for a TBX glossary.

3. UTX addresses the need of MT for non-European languages (Japanese, Korean, etc.)

Page 17: UTX Introduction (ISO-TC37 Matsue) -20150709-Yamamoto (distributable)

• Visit http://www.aamt.info/english/utx/for the specification and samples (free)

• Or search for “UTX glossary”

• We welcome your feedback!

More info