from: michel suignard, microsoft corporation date ...suignard/idna_country_names.pdf · 2 names...

23
1 From: Michel Suignard, Microsoft Corporation Date: February 01 2008 Subject: Research on IDN cc TLD names Version: 1.0 Summary: This document investigates country names which could be used in short forms to populate the Internet root zone in the form of new IDN TLDs. Because, unlike for the two letter acronyms used for the current ccTLD names, there is no authoritative source for these native IDN TLDs, it is necessary to investigate various sources to research the matter and come up with names. Disclaimer: The author is by no mean implying that the names mentioned below are what should be used by the appropriate constituencies. It is up to members of these relevant constituencies to determine the final names when they become available. This is simply a research work and input on what they could be. Context The context of the research was as following: Use the shortest name which is still deemed acceptable by the community, Recognize that some countries and territories are multi-lingual by nature (India is probably the extreme case), Determine Unicode encoding (same encoding as ISO/IEC 10646) for these names. Classifications There are clearly two categories of results: 1. Countries and Territories that only use Latin script based names to designate themselves. (Example: France, Belgium and most of Europeans countries). It is not obvious that users in these countries will see a benefit in transitioning to a scheme using longer names. Typically the current TLD name is well accepted and perceived as reasonably user friendly. A list with native names has been created for reference, but is unlikely to be used. 2. Countries and territories that only use non Latin names or a mix of Latin/Non Latin names to designate themselves. It is likely that the Latin names will not be used, but there is a good chance that the entity name expressed in other scripts will be used. For example, the name of the country Azerbaijan can be written as: Azərbaycan, ‘Азəрбајҹан’, and آذربايجان. It is likely that should IDN TLDs be created for the ‘az’ TLD, only ‘Азəрбајҹан’ and ‘ آذربايجان’ would be considered. Users processing Latin based content would probably prefer the ‘az’ term over the longer ‘Azərbaycan’. Issues/Concerns: Multiple names Countries/Territories designate themselves with text sequences which vary greatly in length. Typically the longer ones contain a reference to a political system, like ‘Democratic Republic’ or such. For IDN usage it seems preferable to use the shorter version. In the few existing experiments this seems to have been the trend. For example, for the People Republic of China, the official full name for the country is ‘中华人民共 和国’, but the name used in some IDN TLD experiments has been ‘中国’. Furthermore, many of the long

Upload: others

Post on 15-Sep-2019

10 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: From: Michel Suignard, Microsoft Corporation Date ...suignard/IDNA_country_names.pdf · 2 names contain space characters, which is problematic in the context of IDN. Ultimately it

1

From: Michel Suignard, Microsoft Corporation

Date: February 01 2008

Subject: Research on IDN cc TLD names

Version: 1.0

Summary:

This document investigates country names which could be used in short forms to populate the Internet root

zone in the form of new IDN TLDs. Because, unlike for the two letter acronyms used for the current

ccTLD names, there is no authoritative source for these native IDN TLDs, it is necessary to investigate

various sources to research the matter and come up with names.

Disclaimer: The author is by no mean implying that the names mentioned below are what should be used

by the appropriate constituencies. It is up to members of these relevant constituencies to determine the final

names when they become available. This is simply a research work and input on what they could be.

Context

The context of the research was as following:

Use the shortest name which is still deemed acceptable by the community,

Recognize that some countries and territories are multi-lingual by nature (India is probably the

extreme case),

Determine Unicode encoding (same encoding as ISO/IEC 10646) for these names.

Classifications

There are clearly two categories of results:

1. Countries and Territories that only use Latin script based names to designate themselves.

(Example: France, Belgium and most of Europeans countries). It is not obvious that users in these

countries will see a benefit in transitioning to a scheme using longer names. Typically the current

TLD name is well accepted and perceived as reasonably user friendly. A list with native names

has been created for reference, but is unlikely to be used.

2. Countries and territories that only use non Latin names or a mix of Latin/Non Latin names to

designate themselves. It is likely that the Latin names will not be used, but there is a good chance

that the entity name expressed in other scripts will be used. For example, the name of the country

Azerbaijan can be written as: Azərbaycan, ‘Азəрбајҹан’, and آذربايجان. It is likely that should IDN

TLDs be created for the ‘az’ TLD, only ‘Азəрбајҹан’ and ‘آذربايجان’ would be considered. Users

processing Latin based content would probably prefer the ‘az’ term over the longer ‘Azərbaycan’.

Issues/Concerns:

Multiple names

Countries/Territories designate themselves with text sequences which vary greatly in length. Typically the

longer ones contain a reference to a political system, like ‘Democratic Republic’ or such. For IDN usage it

seems preferable to use the shorter version. In the few existing experiments this seems to have been the

trend. For example, for the People Republic of China, the official full name for the country is ‘中华人民共

和国’, but the name used in some IDN TLD experiments has been ‘中国’. Furthermore, many of the long

Page 2: From: Michel Suignard, Microsoft Corporation Date ...suignard/IDNA_country_names.pdf · 2 names contain space characters, which is problematic in the context of IDN. Ultimately it

2

names contain space characters, which is problematic in the context of IDN. Ultimately it is the full

prerogative of the owner of the country/territorial entity to make the decision.

Some Countries/Territories have alternate names of same length which are used in different context. For

example, Taiwan uses both ‘台灣’ and ‘臺灣’.

Usage of space

Many Countries/Territories contain space characters in all their designation and no shorter single word

designator seems to exist. Unless new input can be provided, it seems prudent to either replace the space

character by ‘-’ or concatenate the words. For example ‘Bosnia and Herzegovina’ can be concatenated into

‘Босна-и-Херцеговина’. Western Sahara may have its Arabic name concatenated as in ‘الصحراءالغربيت’,

because it remains legible.

Usage of symbols and punctuation characters

Some Latin script based native names use symbols and punctuations characters such as grave accent (non

combining), apostrophe, etc…It does not seem to appear in other scripts.

Usage of string ending with a combining character for right to left text

There is at least one occurrence where the IDNA rules preventing the usage of combining character as the

last character of a right to left label is problematic. That occurrence prevents the proper representation of

‘Maldives’ in Dhivehi in IDN.

Zero Width Joiner (ZWJ) and Zero Width Non Joiner (ZWNJ)

These two characters were initially to create variants rendering of semantically identical text strings. As

such they are removed from input string by the IDNA Nameprep process. However they influence the

display logic. Consider for example ‘Sri Lanka’ written in Sinhala: ‘ශරs ලංකා’, which uses both a space

character (prohibited by IDN) and a Zero Width Joiner (ZWJ) (map to nothing by IDN). Removing the

space gives ‘ශරtලංකා’ which is still readable, but removing the ZWJ completely modifies the appearance of

the ‘Sri’ cluster and gives the following text: ‘ශර ලංකා’.

Note: Similarly, Myanmar uses a ZWNJ in its name: as encoded with Unicode 3.2 used by IDN. Again

removing the ZWNJ would significantly alter the appearance (by hiding a virama that should be visible) as in:

မရနမာ. However, a new character is being added in Unicode 5.1: U+103A MYAMAR SIGN ASAT which

removes the need for that sequence.

In some languages, such as Persian, there are words with completely different meanings with the only

difference in placement of a single ZWNJ. That means ZWNJ cannot be ignored anywhere where there is a

semantic difference (example: ‘ یوامه‌ا ’ and ‘ یوام‌ها ’, first meaning ‘a letter’, second meaning ‘names of’).

Fortunately this does not appear in country names.

Usage of in-script confusable character

There is at least one occurrence of a country which can be represented by two different sequences of code

points which are visually confusable. Iran can be written in Persian as ‘ايران’ or in Arabic as ‘ رانیا ’. They

look identical, but the first name uses the Arabic Letter Farsi Yeh (U+06CC ی), while the second name use

the Arabic Letter Yeh (U+064A ي). The two letters are identical in their medial form, but different in other

forms (for example in isolated form as shown above).

Page 3: From: Michel Suignard, Microsoft Corporation Date ...suignard/IDNA_country_names.pdf · 2 names contain space characters, which is problematic in the context of IDN. Ultimately it

3

Remapped characters

Some countries or territories may use in their name a character which is remapped to a different character

beyond the upper to lower case folding. For example, because the final sigma ς cannot be used in IDN,

Cyprus has to be represented as ‘κύπροζ’, transforming the final sigma into a regular sigma ζ.

Input/Output

The tables below show input and output columns for native names. The input name is a string value

accepted by the IDNA nameprep process and may contains characters which don’t end logically in the

registry (remember that a Punycode transform is made in all case), such as capital letters. The output name

is the outcome of the nameprep process (without Punycode transformation). There are considerations in a

possible update to IDN to only support the output name and to greatly simplify the pre-processing at the

protocol level. These considerations are beyond the scope of this document.

Table

The following table shows IDN names for ccTLDs that have at least one non Latin script native name. It is

assumed that the native name using the Latin script would not be used. For example, for Cyprus, the ccTLD

name in Turkish, English, or other Latin based context should probably be ‘cy’ (instead of ‘Kıbrıs’,

‘Cyprus’, etc…) while in Greek context it would be ‘κύπροζ’. The Latin names are provided in the more

detailed appendices.

CC English name Native name

(Input)

Native name

(Output)

Language

.ae United Arab

Emirates Arabic االماراث االماراث

.af Afghanistan افغاوسخان افغاوسخان Arabic

.am Armenia Հայաստան հայաստան Armenian

.az Azerbaijan Азəрбајҹан

آذربايجان

азəрбајҹан

آذربايجان

Cyrillic

Arabic .ba Bosnia and

Herzegovina Босна-и-

Херцеговина

Босна-и-

херцеговина

Cyrillic

.bd Bangladesh বাাংলাদেশ বাাংলাদেশ Bengali

.bg Bulgaria България българия Cyrillic

.bh Bahrain البحريه البحريه Arabic

.bn Brunei Darussalam برووي برووي Arabic

.bt Bhutan འབག་ཡལ འབག་ཡལ Tibetan

.by Belarus Беларусь беларусь Cyrillic

.ca Canada ᑲᓇᑕ ᑲᓇᑕ Inuktitut (Syllabic)

.cn China 中国 中国 China (Mandarin)

.cy Cyprus Κύπρος κύπροζ Greek, Turkish uses ‘cy’

.dj Djibouti جيبىحي جيبىحي Arabic

.dz Algeria الجزائر الجزائر Arabic

.eg Egypt مصر مصر Arabic

.eh Western Sahara الصحراءالغربيت الصحراءالغربيت Arabic

.er Eritrea ኤርትራ

إريخريا

ኤርትራ

إريخريا

Amharic (Ethiopic)

Arabic

Page 4: From: Michel Suignard, Microsoft Corporation Date ...suignard/IDNA_country_names.pdf · 2 names contain space characters, which is problematic in the context of IDN. Ultimately it

4

.et Ethiopia ኢትዮጵያ ኢትዮጵያ Amharic (Ethiopic)

.ge Georgia საქართველო საქართველო Georgian

.gr Greece Ελλάδα ελλάδα Greek

.hk Hong Kong 香港 香港 Chinese

.il Israel ישראל

إسرائيل

ישראל

إسرائيل

Hebrew

Arabic

.in India भारत

ভারত ਭਾਰਤ ભારત இநதியா భరత ಭಾರತ ഭാരത भारतम ଭାରତ ভাৰত بهارث

भारत

ভারত ਭਾਰਤ ભારત இநதியா భరత ಭಾರತ ഭാരത भारतम ଭାରତ ভাৰত بهارث

Hindi, Marathi, Konkani (Devanagari)

Bengali

Punjabi (Gurmukhi)

Gujarati

Tamil

Telugu

Kannada

Malayalam

Sanskrit (Devanagari)

Oriya

Assamese (Bengali)

Sindhi, Urdu (Arabic)

.iq Iraq العراق العراق Arabic

.ir Iran, Islamic

Republic of رانیا

ايران

رانیا

ايران

Persian (Arabic with Farsi Yeh)

Persian (Arabic with Arabic Yeh)

.jo Jordan األردن األردن Arabic

.jp Japan 日本 日本 Japanese (Han)

.kg Kyrgyzstan Кыргызстан кыргызстан Cyrillic

.kh Cambodia កមពជា កមពជា Khmer

.km Comoros القمر القمر Arabic

.kp Korea, DPR 조선 조선 Hangul

.kr Korea, ROK 한국 한국 Hangul

.kw Kuwait الكىيج الكىيج Arabic

.kz Kazakhstan Қазақстан қазақстан Kazakh (Cyrillic)

.la Lao PDR ລາວ ລາວ Lao

.lb Lebanon لبىان لبىان Arabic

.lk Sri Lanka ශරL ලංකා இலஙகை

ශරලංකා (as of

IDNA2003)

இலஙகை

Sinhala

Tamil

.ly Libyan Arab

Jamahiriya Arabic ليبيا ليبيا

.ma Morocco المغرب المغرب Arabic

.me Montenegro Црна-Гора црна-гора Cyrillic

.mk Macedonia,

FYROM Македонија македонија Cyrillic

.mm Myanmar

မရနမာ (as of

IDNA2003)

Myanmar

Page 5: From: Michel Suignard, Microsoft Corporation Date ...suignard/IDNA_country_names.pdf · 2 names contain space characters, which is problematic in the context of IDN. Ultimately it

5

.mn Mongolia Монгол-улс

ᠮᠣᠨᠭᠣᠯ монгол-улс

ᠮᠣᠨᠭᠣᠯ Cyrillic (hyphen in name)

Mongolian

.mo Macau 澳門 澳門 Chinese

.mr Mauritania مىريخاويت مىريخاويت Arabic

.mv Maldives ދ ވ ދ N/A (as of

IDNA2003) Divehi (Thaana)

.np Nepal नपाल नपाल Nepali (Devanagari)

.om Oman عمان عمان Arabic

.pk Pakistan پاکسخان پاکسخان Urdu (Arabic)

.ps Palestinian

Territories,

Occupied

Arabic فلسطيه فلسطيه

.qa Qatar قطر قطر Arabic

.rs Serbia Србија србија Cyrillic

.ru Russian Federation Россия россия Cyrillic

.sa Saudi Arabia السعىديت السعىديت Arabic

.sd Sudan السىدان السىدان Arabic

.sg Singapore 新加坡 சிஙைபபூர

新加坡 சிஙைபபூர

Chinese

Tamil

.sy Syrian Arab

Republic Arabic سىريا سىريا

.td Chad حشاد حشاد Arabic

.th Thailand ประเทศไทย ประเทศไทย Thai

.tj Tajikistan Тоҷикистон тоҷикистон Tajikii (Cyrillic)

.tm Turkmenistan Туркменистан туркменистан Cyrillic

.tn Tunisia حىوس حىوس Arabic

.tw Taiwan 台灣 臺灣

台灣 臺灣

Modern Mandarin

Traditional Mandarin

.ua Ukraine Україна

Украина

україна

украина

Ukrainian (Cyrillic)

Russian (Cyrillic)

.uz Uzbekistan Ўзбекистон ўзбекистон Cyrillic

.ye Yemen اليمه اليمه Arabic

Appendices A and B contain information for all cc TLD to date, including Unicode encoding, Latin names,

long names, and alternate names. The choice of the alternate name selected in the previous table is only

tentative.

Finally, because the document uses real fonts to display text sequences, there are few cases where the

shaping is not correctly done (such as Myanmar). However the Unicode code sequences are always

provided (except for common ASCII text).

The names of countries and territories are based on the IANA database entries referenced below.

References:

http://en.wikipedia.org/wiki/List_of_countries_by_native_names

http://www.omniglot.com/countries/

http://www.iana.org/cctld/cctld-whois.htm

Page 6: From: Michel Suignard, Microsoft Corporation Date ...suignard/IDNA_country_names.pdf · 2 names contain space characters, which is problematic in the context of IDN. Ultimately it

6

Appendix A: Countries and Territories with at least one non-Latin name

.ae United Arab

Emirates االماراث

اإلماراث‌العربيت‌المخحدة

\x0627\x0644\x0627\x0645\x0627\x0631\x0627\x062A

\x0627\x0644\x0625\x0645\x0627\x0631\x0627\x062a\x0020

\x0627\x0644\x0639\x0631\x0628\x064a\x0629\x0020

\x0627\x0644\x0645\x062a\x062d\x062f\x0629

Arabic (short name)

Arabic (long name with spaces)

.af Afghanistan افغاوسخان \x0627\x0641\x063a\x0627\x0646\x0633\x062a\x0627\x0646 Arabic (Same name in Pashto and Dari)

.am Armenia Հայաստան \x0540\x0561\x0575\x0561\x057d\x057f\x0561\x0576 Armenian

.az Azerbaijan Azərbaycan

Азəрбајҹан

آذربايجان

Az\x0259rbaycan

\x0410\x0437\x04d9\x0440\x0431\x0430\x0458\x04b9\x0430\x043d

\x0622\x0630\x0631\x0628\x0627\x064A\x062C\x0627\x0646

Azeri (Latin)

Cyrillic

Arabic

.ba Bosnia and

Herzegovina Bosna I Hercegovina

Босна и Херцеговина

\x0411\x043e\x0441\x043d\x0430\x0020\x0438\x0020

\x0425\x0435\x0440\x0446\x0435\x0433\x043e\x0432

\x0438\x043d\x0430

Latin (Croatian, Bosnian, Serbian), (space in name)

Cyrillic (Bosnian, Serbian), (space in name)

.bd Bangladesh বাাংলাদেশ

বাাংলা \x09AC\x09BE\x0982\x09B2\x09BE\x09A6\x09C7\x09B6

\x09ac\x09be\x0982\x09b2\x09be

Bengali

Bengali (alternate)

.bg Bulgaria България \x0411\x044a\x043b\x0433\x0430\x0440\x0438\x044f Cyrillic

.bh Bahrain البحريه \x0627\x0644\x0628\x062d\x0631\x064a\x0646 Arabic

.bn Brunei

Darussalam Brunei

Brunei Darussalam

برووي

برووي‌دارالسالم

\x0628\x0631\x0648\x0646\x064A

\x0628\x0631\x0648\x0646\x064A\x0020\x062F\x0627

\x0631\x0627\x0644\x0633\x0644\x0627\x0645

Latin

Latin (long name with space)

Arabic

Arabic (long name with space)

.bt Bhutan འབག་ཡལ།

འབག་ཡལ

\x0F60\x0F56\x0FB2\x0F74\x0F42\x0F0B\x0F61\x0F74\x0F63\x0F0D

\x0F60\x0F56\x0FB2\x0F74\x0F42\x0F0B\x0F61\x0F74\x0F63

Tibetan (with ending ‘mark shad’)

Tibetan (without the ‘mark shad’)

.by Belarus Беларусь

Biełaruś

\x0411\x0435\x043b\x0430\x0440\x0443\x0441\x044c

Bie\x0142aru\x015B

Cyrillic

Latin

Page 7: From: Michel Suignard, Microsoft Corporation Date ...suignard/IDNA_country_names.pdf · 2 names contain space characters, which is problematic in the context of IDN. Ultimately it

7

.ca Canada Canada

Kanata

ᑲᓇᑕ

\x1472\x14c7\x1455

English/French (Latin)

Inuktitut (Latin)

Inuktitut (Syllabic)

.cn China 中国

中华人民共和国

\x4E2D\x56FD

\x4e2d\x534e\x4eba\x6c11\x5171\x548c\x56fd

China (Mandarin)

People Republic of China (Mandarin)

.cy Cyprus Κύπρος

Kıbrıs

\x039A\x03CD\x03C0\x03C1\x03BF\x03C2

K\x0131br\x0131s

Greek (Greek, also w/o accent))

Turkish (Latin)

.dj Djibouti Djibouti

جيبىحي

\x062C\x064A\x0628\x0648\x062A\x064A

Latin

Arabic

.dz Algeria الجزائر

Djazaïr

\x0627\x0644\x062c\x0632\x0627\x0626\x0631

Djaza/x00Efr

Arabic

Tamazight (latin)

.eg Egypt مصر \x0645\x0635\x0631 Arabic

.eh Western

Sahara x0627\x0644\x0635\x062D\x0631\x0627\x0621\x0020\x0627\ الصحراء‌الغربيت

\x0644\x063A\x0631\x0628\x064A\x0651\x0629

Arabic (space in name)

.er Eritrea Ertra

ኤርትራ

إريخريا

\x12A4\x122D\x1275\x122B

\x0625\x0631\x064A\x062A\x0631\x064A\x0627

Tigrinya (Latin)

Amharic (Ethiopic)

Arabic

.et Ethiopia ኢትዮጵያ \x12A2\x1275\x12EE\x1335\x12EB Amharic (Ethiopic)

.ge Georgia საქართველო \x10e1\x10d0\x10e5\x10d0\x10e0\x10d7\x10d5\x10d4\x10da\x10dd Georgian

.gr Greece Ελλάδα

Έλλας

\x0395\x03bb\x03bb\x03ac\x03b4\x03b1

\x0388\x03BB\x03BB\x03B1\x03C2

Greek

Greek (alternate)

.hk Hong Kong 香港

香港特別行政區

\x9999\x6e2f

\x9999\x6e2f\x7279\x5225\x884c\x653f\x5340

Chinese

Chinese (full name)

.il Israel ישראל

إسرائيل

\x05d9\x05e9\x05e8\x05d0\x05dc

\x0625\x0633\x0631\x0627\x0626\x064A\x0644

Hebrew

Arabic

Page 8: From: Michel Suignard, Microsoft Corporation Date ...suignard/IDNA_country_names.pdf · 2 names contain space characters, which is problematic in the context of IDN. Ultimately it

8

.in India भारत

ভারত ਭਾਰਤ ભારત இநதியா భరత భరత దశం ಭಾರತ ഭാരത ഭാരതം भारतम India

ଭାରତ ভাৰত بهارث

\x092d\x093e\x0930\x0924

\x09ad\x09be\x09b0\x09a4

\x0a2d\x0a3e\x0a30\x0a24

\x0aad\x0abe\x0ab0\x0aa4

\x0b87\x0ba8\x0bcd\x0ba4\x0bbf\x0baf\x0bbe

\x0C2D\x0C3E\x0C30\x0C24

\x0c2d\x0c3e\x0c30\x0c24\x0020\x0c26\x0c47\x0c36\x0c02

\x0cad\x0cbe\x0cb0\x0ca4

\x0d2d\x0d3e\x0d30\x0d24

\x0d2d\x0d3e\x0d30\x0d24\x0d02

\x092d\x093e\x0930\x0924\x092e\x094d

\x0b2d\x0b3e\x0b30\x0b24

\x09ad\x09be\x09f0\x09a4

\x0628\x0647\x0627\x0631\x062A

Hindi, Marathi, Konkani (Devanagari)

Bengali

Punjabi (Gurmukhi)

Gujarati

Tamil

Telugu

Telugu (alternate with space)

Kannada

Malayalam

Malayalam (alternate with anusvara sign)

Sanskrit (Devanagari)

English

Oriya

Assamese (Bengali)

Sindhi, Urdu (Arabic)

.iq Iraq العراق \x0627\x0644\x0639\x0631\x0627\x0642 Arabic

.ir Iran, Islamic

Republic of رانیا

ايران

\x0627\x06cc\x0631\x0627\x0646

\x0627\x064a\x0631\x0627\x0646

Persian (Arabic with the Farsi Yeh)

Persian (Arabic with the Arabic Yeh)

.jo Jordan األردن \x0627\x0644\x0623\x0631\x062f\x0646 Arabic

.jp Japan 日本 \x65e5\x672c Japanese (Han)

.kg Kyrgyzstan Кыргызстан \x041a\x044b\x0440\x0433\x044b\x0437\x0441\x0442\x0430\x043d Cyrillic

.kh Cambodia កមពជា Kambucā

Kâmpŭchea

x1780\x1798\x17D2\x1796\x17BB\x1787\x17B6

Kambuc\x0101

K\x00E2mp\x016Dchea

Khmer

Latin

Latin (alternate)

.km Comoros Comores

Komori

القمر

\x0627\x0644\x0642\x0645\x0631

French (Latin)

Comoro (Latin)

Arabic

Page 9: From: Michel Suignard, Microsoft Corporation Date ...suignard/IDNA_country_names.pdf · 2 names contain space characters, which is problematic in the context of IDN. Ultimately it

9

.kp Korea,

Democratic

People’s

Republic

조선 \xC870\xC120 Hangul

.kr Korea,

Republic of 한국 대한민국

\xD55C\xAD6D

\xb300\xd55c\xbbfc\xad6d

Hangul

Hangul (longer name)

.kw Kuwait الكىيج \x0627\x0644\x0643\x0648\x064a\x062a Arabic

.kz Kazakhstan Қазақстан \x049a\x0430\x0437\x0430\x049b\x0441\x0442\x0430\x043d Kazakh (Cyrillic)

.la Lao People’s

Democratic

Republic

ລາວ ສ.ປ.ປ. ລາວ ນລາວ

\x0EA5\x0EB2\x0EA7

\x0eaa\x002e\x0e9b\x002e\x0e9b\x002e\x0020\x0ea5\x0eb2\x0ea7

\x0E99\x0EA5\x0EB2\x0EA7

Lao

Lao (space and dots in name)

Lao (alternate from ikipedia)

.lb Lebanon لبىان \x0644\x0628\x0646\x0627\x0646 Arabic

.lk Sri Lanka ශරL ලංකා இலஙகை

\x0dc1\x0dca\x200d\x0dbb\x0dd3\x0020\x0dbd\x0d82\x0d9a\x0dcf

\x0B87\x0BB2\x0B99\x0BCD\x0B95\x0BC8

Sinhala (ZWJ, space in name)

Tamil

.ly Libyan Arab

Jamahiriya x0644\x064a\x0628\x064a\x0627\ ليبيا

.ma Morocco المغرب

‌المغربيتةالمملك \x0627\x0644\x0645\x063A\x0631\x0628

\x0627\x0644\x0645\x0645\x0644\x0643\x0629\x0020

\x0627\x0644\x0645\x063a\x0631\x0628\x064a\x0629

Arabic

Arabic (alternate with space in name)

.me Montenegro Crna Gora

Црна Гора

Crna\x0020 Gora

\x0426\x0440\x043d\x0430\x0020\x0413\x043e\x0440\x0430

Latin (spaces in name)

Cyrillic (spaces in name)

.mk Macedonia,

The Former

Yugoslav

Republic of

Македонија \x041c\x0430\x043a\x0435\x0434\x043e\x043d\x0438\x0458\x0430 Cyrillic

.mm Myanmar မရနမာ \x1019\x1039\x101B\x1014\x1039\x200C\x1019\x102C (Unicode 3.2)

\x1019\x1039\x101B\x1014\x103A\x1019\x102C (Unicode 5.1) Myanmar ( with ZWNJ, should look like: )

In Unicode 5.1, ZWNJ no longer needed

.mn Mongolia Монгол улс

ᠮᠣᠨᠭᠣᠯ \x041c\x043e\x043d\x0433\x043e\x043b\x0020\x0443\x043b\x0441

\x182E\x1823\x1828\x182D\x1823\x182F

Cyrillic (space in name)

Mongolian

Page 10: From: Michel Suignard, Microsoft Corporation Date ...suignard/IDNA_country_names.pdf · 2 names contain space characters, which is problematic in the context of IDN. Ultimately it

10

.mo Macau 澳門 澳門特別行政區 Macau

\x6fb3\x9580

\x6fb3\x9580\x7279\x5225\x884c\x653f\x5340

Chinese

Chinese (alternate)

Latin

.mr Mauritania Mauritania

مىريخاويت

\x0645\x0648\x0631\x064A\x062A\x0627\x0646\x064A\x0629

Latin

Arabic

.mv Maldives ދ ވ ދ

ރ އ ވ ދ ވ ދ

\x078B\x07A8\x0788\x07AC\x0780\x07A8

\x078b\x07a8\x0788\x07ac\x0780\x07a8\x0020

\x0783\x07a7\x0787\x07b0\x0796\x07ac

Divehi (Thaana)

Divehi (Thaana with space in name)

.np Nepal नपाल \x0928\x0947\x092a\x093e\x0932 Nepali (Devanagari)

.om Oman عمان \x0639\x0645\x0627\x0646 Arabic

.pk Pakistan پاکسخان \x067e\x0627\x06a9\x0633\x062a\x0627\x0646 Urdu (Arabic)

.ps Palestinian

Territories x0641\x0644\x0633\x0637\x064A\x0646 Arabic\ فلسطيه

.qa Qatar قطر \x0642\x0637\x0631 Arabic

.rs Serbia Srbija

Србија

Srbija

x0421\x0440\x0431\x0438\x0458\x0430

Latin

Cyrillic

.ru Russian

Federation Россия \x0420\x043e\x0441\x0441\x0438\x044f Cyrillic

.sa Saudi Arabia السعىديت

المملكت‌العربيت‌السعىديت

\x0627\x0644\x0633\x0639\x0648\x062F\x064A\x0629

x0627\x0644\x0645\x0645\x0644\x0643\x0629\x0020\x0627\x0644

\x0639\x0631\x0628\x064a\x0629\x0020\x0627\x0644\x0633\x0639

\x0648\x062f\x064a\x0629

Arabic (short name)

Arabic (long name with spaces)

.sd Sudan السىدان \x0627\x0644\x0633\x0648\x062F\x0627\x0646 Arabic

Page 11: From: Michel Suignard, Microsoft Corporation Date ...suignard/IDNA_country_names.pdf · 2 names contain space characters, which is problematic in the context of IDN. Ultimately it

11

.sg Singapore 新加坡 Singapore

Singapura

சிஙைபபூர

\x65b0\x52a0\x5761

\x0B9A\x0BBF\x0B99\x0BCD\x0B95\x0BAA\x0BCD\x0BAA

\x0BC2\x0BB0\x0BCD

Chinese

English (Latin)

Malay (Latin)

Tamil

.sy Syrian Arab

Republic x0633\x0648\x0631\x064a\x0627 Arabic سىريا

.td Chad Tchad

حشاد

\x062A\x0634\x0627\x062F

Latin

Arabic

.th Thailand ประเทศไทย ไทย

\x0E1B\x0E23\x0E30\x0E40\x0E17\x0E28\x0E44\x0E17\x0E22

\x0e44\x0e17\x0e22

Thai

Thai (alternate shorter form)

.tj Tajikistan Тоҷикистон \x0422\x043e\x04b7\x0438\x043a\x0438\x0441\x0442\x043e\x043d Tajikii (Cyrillic)

.tm Turkmenistan Türkmenistan

Туркменистан

T\x00Fcrkmenistan

\x0422\x0443\x0440\x043A\x043C\x0435\x043D\x0438\x0441

\x0442\x0430\x043D

Latin

Cyrillic

.tn Tunisia حىوس \x062a\x0648\x0646\x0633 Arabic

.tw Taiwan 台灣 臺灣

\x53f0\x7063

\x81FA\x7063

Modern Mandarin

Traditional Mandarin

.ua Ukraine Україна

Украина

\x0423\x043a\x0440\x0430\x0457\x043d\x0430

\x0423\x043A\x0440\x0430\x0438\x043D\x0430

Ukrainian (Cyrillic)

Russian (Cyrillic)

Page 12: From: Michel Suignard, Microsoft Corporation Date ...suignard/IDNA_country_names.pdf · 2 names contain space characters, which is problematic in the context of IDN. Ultimately it

12

.uz Uzbekistan O`zbеkistоn

O’zbekiston

Özbekistân

Ŭzbekiston

U’zbekiston Respublikasi

Ўзбекистон

Ўзбекистон Республикаси

O\x0060zbеkistоn

O\x0027zbekiston

\x00D6zbekistân

\x016Czbekiston

U\x0027zbekiston Respublikasi

\x040e\x0437\x0431\x0435\x043a\x0438\x0441\x0442\x043e\x043d

\x040e\x0437\x0431\x0435\x043a\x0438\x0441\x0442\x043e

\x043d\x0020\x0420\x0435\x0441\x043f\x0443\x0431\x043b

\x0438\x043a\x0430\x0441\x0438

Latin (grave accent in name)

Latin (apostrophe in name)

Latin

Latin

Latin (space, apostrophe in name)

Cyrillic

Cyrillic (space in name)

.ye Yemen اليمه \x0627\x0644\x064a\x0645\x0646 Arabic

Appendix B: Country and Territory using only Latin-based names

(note that two names: ‘su’ and ‘yu’ which are listed here are being phased out, although they could have had non Latin names)

.ac Ascension

Island Ascension Island

.ad Andorra Andorra

.ag Antigua and

Barbuda Antigua and

Barbuda

.ai Anguilla Anguilla

.al Albania Shqipëria Shqip\x00Ebria

.an Netherlands

Antilles Netherlands Antilles

Nederlandse Antillen

English

Dutch

.ao Angola Angola

.aq Antarctica Antartica

.ar Argentina Argentina

.as American

Samoa American

Samoa

Page 13: From: Michel Suignard, Microsoft Corporation Date ...suignard/IDNA_country_names.pdf · 2 names contain space characters, which is problematic in the context of IDN. Ultimately it

13

.at Austria Österreich \x00D6sterreich

.au Australia Australia

.aw Aruba Aruba

.ax Aland Islands Åland

Ahvenanmaa

\x00C5land Swedish

Finnish

.bb Barbados Barbados

.be Belgium Belgique

België

Belgien

Belgi\x00EB

French

Dutch

German

.bf Burkina Faso Burkina Faso

.bi Burundi Burundi

Uburundi

English

Kirundi

.bj Benin Bénin B\x00E9nin

.bl Saint

Barthelemy Saint Barthélemy

Saint Barthelemy

French

English

.bm Bermuda Bermuda

.bo Bolivia Bolivia Bolivia Suyu in Quechua

.br Brazil Brasil

.bs Bahamas Bahamas

.bv Bouvet Island Bouvet Island

Bouvetøya

Bouvet\x00F8ya

English

Norwegian

.bw Botswana Botswana

.bz Belize Belize

.cc Cocos

(Keeling)

Islands

Cocos (Keeling) Islands

.cd Congo, The

Democratic

Republic of the

Congo Kinshasa (ex Zaire)

Page 14: From: Michel Suignard, Microsoft Corporation Date ...suignard/IDNA_country_names.pdf · 2 names contain space characters, which is problematic in the context of IDN. Ultimately it

14

.cf Central

African

Republic

République Centrafricaine

Centrafrique

Bê-Afrîka

Ködörösêse tî Bêafrîka

B\x00EA\x002DAfr\x00EEka

Sango

.cg Congo,

Republic of Congo Brazzaville

.ch Switzerland Schweiz

Suisse

Svizzera

Svizra

German

French

Italian

Romansh

.ci Cote d'Ivoire Côte d'Ivoire C\x00F4te\x0020d\x0027Ivoire

.ck Cook Islands Cook Islands

.cl Chile Chile

.cm Cameroon Cameroun

Cameroon

.co Colombia Colombia

.cr Costa Rica Costa Rica Space in name

.cu Cuba Cuba

.cv Cape Verde Cabo Verde

.cx Christmas

Island Christmas Island

.cz Czech

Republic Česká republika \x010Cesk\x00E1\x0020republika Space in name

.de Germany Deutschland

Němska

Nimska

N\x011Bmska

German

Upper Sorbian

Lower Sorbian

.dk Denmark Danmark

.dm Dominica Dominica

.do Dominican

Republic República

Dominicana

Page 15: From: Michel Suignard, Microsoft Corporation Date ...suignard/IDNA_country_names.pdf · 2 names contain space characters, which is problematic in the context of IDN. Ultimately it

15

.ec Ecuador Ecuador Ecuador Suyu in Quechua

.ee Estonia Eesti

.es Spain España

Espainia

Espanya

Espa\x00F1a Spanish/Galician

Basque

Catalan

.eu European

Union European Union

.fi Finland Suomi

Finland

Suopma

Lää´ddjânnam

Suomâ

L\x00E4\x00E4\x00B4ddj\x00E2nnam

Suom\x00E2

Finnish

Swedish

Sami (Northern)

Sami (Skolt), (spacing acute in name)

Sami (Inari)

.fj Fiji Viti

Fiji

Fijian

English

.fk Falkland

Islands

(Malvinas)

Falkland Islands

.fm Micronesia,

Federal State

of

Federated States of Micronesia

.fo Faroe Islands Føroyar

Færøerne

F\x00F8royar

F\x00E6r\x00F8erne

Faroe

Danish

.fr France France

Frañs

França

Frànkrisch

Fra\x00F1s

Fran\x00E7a

Frànkrisch

Français

Breton

Occitan

Alsatian

.ga Gabon Gabon

.gb United

Kingdom United Kingdom Reserved (not used)

Page 16: From: Michel Suignard, Microsoft Corporation Date ...suignard/IDNA_country_names.pdf · 2 names contain space characters, which is problematic in the context of IDN. Ultimately it

16

.gd Grenada Grenada

.gf French Guiana Guyane

.gg Guernsey Guernsey

Dgèrnésy

Dg\x00E8rn\x00E9sy

.gh Ghana Ghana

.gi Gibraltar Gibraltar

.gl Greenland Grønland

Kalaallit Nunaat

Gr\x00F8nland Danish

Space in name

.gm Gambia Gambia

.gn Guinea Guinée Guin\x00E9e

.gp Guadeloupe Guadeloupe

.gq Equatorial

Guinea Guinée Équatoriale

Guinea Ecuatorial

Guin\x00E9e\x0020\x00C9quatoriale Space in name

.gs South Georgia

and the South

Sandwich

Islands

South Georgia and the South Sandwich Islands

.gt Guatemala Guatemala

.gu Guam Guåhan

Guam

Gu\x00E5han Chamorro

English

.gw Guinea-Bissau Guiné-Bissau Guin\x00E9\x002DBissau

.gy Guyana Guyana

.hm Heard and

McDonald

Islands

Heard and McDonald Islands

.hn Honduras Honduras

.hr Croatia/Hrvatska Hrvatska

.ht Haiti Haïti

Ayiti

Ha\x00EFti

.hu Hungary Magyarország Magyarorsz\x00E1g

Page 17: From: Michel Suignard, Microsoft Corporation Date ...suignard/IDNA_country_names.pdf · 2 names contain space characters, which is problematic in the context of IDN. Ultimately it

17

.id Indonesia Indonesia

.ie Ireland Eire

Éire

Éirinn

\x00C9ire

\x00C9irinn

English

Gaelic

.im Isle of Man Mannin

Ellan Vannin

.io British Indian

Ocean

Territory

British Indian Ocean Territory Diego Garcia

.is Iceland Ísland \x00CDsland

.it Italy Italia

.je Jersey Jersey

Jèrri

J\x00E8rri

.jm Jamaica Jamaica

.ke Kenya Kenya

.ki Kiribati Kiribati

.kn Saint Kitts and

Nevis Saint Kitts

and Nevis

.ky Cayman

Islands Cayman Islands

.lc Saint Lucia Saint Lucia

.li Liechtenstein Liechtenstein

.lr Liberia Liberia

.ls Lesotho Lesotho

.lt Lithuania Lietuva

Page 18: From: Michel Suignard, Microsoft Corporation Date ...suignard/IDNA_country_names.pdf · 2 names contain space characters, which is problematic in the context of IDN. Ultimately it

18

.lu Luxembourg Luxembourg

Luxemburg

Lëtzebuerg

L\x00EBtzebuerg

French, Luxembourgish

German

Luxembourgian

.lv Latvia Latvija

.mc Monaco Principauté de

Monaco

Monaco

Principaut\x00E9\x0020de\x0020Monaco (space in name)

.md Moldova,

Republic of Moldova

.mf Saint Martin

(French part) Saint Martin Space in name

.mg Madagascar Madagasikara

Madagascar

Malagasy

French

.mh Marshall

Islands Marshall Islands

Aelōn in M ajeļ

Ael\x014Dn\x0304 in M\x0327aje\x013C

.ml Mali Mali

.mp Northern

Mariana

Islands

Northern

Mariana Islands

.mq Martinique Martinique

.ms Montserrat Montserrat

.mt Malta Malta

.mu Mauritius Mauritius

Maurice

English

French

.mw Malawi Malawi

Malaŵi

Mala\x0175i

English

Chewa

.mx Mexico México M\00e9xico

.my Malaysia Malaysia

.mz Mozambique Moçambique Mo\x00E7ambique

Page 19: From: Michel Suignard, Microsoft Corporation Date ...suignard/IDNA_country_names.pdf · 2 names contain space characters, which is problematic in the context of IDN. Ultimately it

19

.na Namibia Namibia

.nc New

Caledonia Nouvelle-Calédonie Nouvelle\x0020Cal\x00E9donie Space in name

.ne Niger Niger

.nf Norfolk Island Norfolk Island Space in name

.ng Nigeria Nigeria

.ni Nicaragua Nicaragua

.nl Netherlands Nederland

Nederlân

Nederl\x00E2n

Dutch

Frisian

.no Norway Norge

Noreg

Norga

Vuodna

Nöörje

N\x00F6\x00F6rje

Bokmål

Nynorsk

Sami (Northern)

Sami (Lule)

Sami (Southern)

.nr Nauru Naoero

Nauru

Nauruan

English

.nu Niue Niue

.nz New Zealand New Zealand

Aotearoa

English

Maori

.pa Panama Panamá Panam\x00E1

.pe Peru Perú Per\x00FA Peru Suyu in Quechua

.pf French

Polynesia Polynésie Française Polyn\x00E9sie\x0020Fran\x00E7aise

.pg Papua New

Guinea Papua New Guinea

Papua Niugini

.ph Philippines Pilipinas

Philippines

Filipino

English

.pl Poland Polska

.pm Saint Pierre

and Miquelon Saint Pierre et Miquelon

Page 20: From: Michel Suignard, Microsoft Corporation Date ...suignard/IDNA_country_names.pdf · 2 names contain space characters, which is problematic in the context of IDN. Ultimately it

20

.pn Pitcairn Island Pitcairn Island

.pr Puerto Rico Puerto Rico

.pt Portugal Portugal

.pw Palau Palau

Belau

English

Palauan

.py Paraguay Paraguay

Paraguái

Paragu\x00E1i

Spanish, Englsih

Guarani

.re Reunion Island Île de la Réunion \x00CEle\x0020de\x0020la\x0020R\x00E9union

.ro Romania România Rom\x00E2nia

.rw Rwanda Rwanda

.sb Solomon

Islands Solomon Islands

.sc Seychelles Sesel

Seychelles

Seselwa

English, French

.se Sweden Sverige

Ruoŧŧa

Svierik

Sveerje

Ruo\x0167\x0167a

Swedish

Sami (Northern)

Sami (Lule)

Sami (Southern)

.sh Saint Helena Saint Helena

.si Slovenia Slovenija

.sj Svalbard and

Jan Mayen

Islands

Svalbard Jan Mayen

.sk Slovak

Republic Slovenská republika

Slovensko

Slovensk\x00E1\x0020republika other source w/o accent on the a, (space in name)

.sl Sierra Leone Sierra Leone

.sm San Marino San Marino

Page 21: From: Michel Suignard, Microsoft Corporation Date ...suignard/IDNA_country_names.pdf · 2 names contain space characters, which is problematic in the context of IDN. Ultimately it

21

.sn Senegal Sénégal

Senegal

Sounougal

S\x00E9n\x00E9gal

French

English

.so Somalia Soomaaliya

.sr Suriname Suriname

.st Sao Tome and

Principe São Tomé e Príncipe S\x00E3o\x0020Tom\x00E9\x0020e\x0020Pr\x00EDncipe

.su Soviet Union Soviet Union (phased out)

.sv El Salvador El Salvador (space in name)

.sz Swaziland Swaziland

eSwatini

kaNgwane

English

Swazi

.tc Turks and

Caicos Islands Turks and Caicos Islands

.tf French

Southern

Territories

Terres Australes et Antarctiques Françaises

.tg Togo Togo

.tk Tokelau Tokelau

.tl Timor-Leste Timor-Leste

Timor Lorosa'e

Timor Lorosa\x0027e

Portuguese (Latin Apostrophe in name)

.to Tonga Tonga

.tp East Timor East Timor Same as Timor-Leste

.tr Turkey Türkiye T\x00FCrkiye

.tt Trinidad and

Tobago Trinidad y Tobago

Trinidad and Tobago

English (space in name)

.tv Tuvalu Tuvalu

.tz Tanzania Tanzania

.ug Uganda Uganda

Page 22: From: Michel Suignard, Microsoft Corporation Date ...suignard/IDNA_country_names.pdf · 2 names contain space characters, which is problematic in the context of IDN. Ultimately it

22

.uk United

Kingdom

United Kingdom

y Deyrnas Unedig

Teyrnas Unedig

Unitit Kinrick

An Rywvaneth Unys

An Rìoghachd Aonaichte

Ríocht Aontaithe

English (space in name)

Welsh (spaces in name)

Welsh (spaces in name, less common)

Scots

Cornish

Scottish Gaelic

Irish

.um United States

Minor

Outlying

Islands

United States Minor Outlying Islands

.us United States United States

Estados Unidos

English (space in name)

Spanish (space in name)

.uy Uruguay Uruguay

.va Holy See

(Vatican City

State)

Vaticanum

Civitas Vaticana

Città del Vaticano

Latin

Latin

Italian

.vc Saint Vincent

and the

Grenadines

Saint Vincent and the Grenadines

.ve Venezuela Republica Bolivariana de Venezuela

Venezuela

.vg Virgin Islands,

British British Virgin islands

.vi Virgin Islands,

U.S. US Virgin Islands

.vn Vietnam Viêt Nam

Việt Nam

Vi\x00ea\x0323t\x0020Nam

Vi\x1EC7t\x0020Nam

(space in name), ệ should be written using \x1ec7

Correct spelling (wikipedia)

.vu Vanuatu Vanuatu

Page 23: From: Michel Suignard, Microsoft Corporation Date ...suignard/IDNA_country_names.pdf · 2 names contain space characters, which is problematic in the context of IDN. Ultimately it

23

.wf Wallis and

Futuna Islands Wallis et Futuna

.ws Samoa Samoa

.yt Mayotte Mayotte

.yu Yugoslavia Yugoslavia (phased out)

.za South Africa South Africa

Aforika Borwa

uMzantsi Afrika

iNingizimu Afrika

Suid Afrika

Afrika Borwa

English

Tswana

Xhosa

Zulu

Afrikaans

Northern Sotho

.zm Zambia Zambia

.zw Zimbabwe Zimbabwe