software internationalisation — single-byte scripts guy lacoursière software globalisation...
TRANSCRIPT
Software Internationalisation —Single-Byte Scripts
Guy LacoursièreSoftware Globalisation Consultant
AgendaAgenda
Deliverables
Definitions
Scripts Latin scripts
Greek
Hebrew
Cumulative testing
Sorting (optional)
References
Deliverables — Deliverables — English Internationalized ProductsEnglish Internationalized Products
We currently support Latin1 and Asian character sets: ISO-8859-1: Afrikaans, Basque, Catalan, Danish, Dutch, English, Faroese,
Finnish, French, German, Icelandic, Indonesian, Italian, Norwegian, Portuguese, Spanish, Swedish
Multibyte character sets: Japanese, traditional Chinese, simplified Chinese, Korean
Newly supported character sets: ISO-8859-2: Albanian, Croatian, Czech, Hungarian, Polish, Romanian, Serbian,
Slovak, Slovenian
ISO-8859-7/8/9: Greek, Hebrew, Turkish
Complex languages are not supported: Thai, Indic languages, Arabic
Goal: Unicode
We currently support Latin1 and Asian character sets: ISO-8859-1: Afrikaans, Basque, Catalan, Danish, Dutch, English, Faroese,
Finnish, French, German, Icelandic, Indonesian, Italian, Norwegian, Portuguese, Spanish, Swedish
Multibyte character sets: Japanese, traditional Chinese, simplified Chinese, Korean
Newly supported character sets: ISO-8859-2: Albanian, Croatian, Czech, Hungarian, Polish, Romanian, Serbian,
Slovak, Slovenian
ISO-8859-7/8/9: Greek, Hebrew, Turkish
Complex languages are not supported: Thai, Indic languages, Arabic
Goal: Unicode
We currently support Latin1 and Asian character sets: ISO-8859-1: Afrikaans, Basque, Catalan, Danish, Dutch, English, Faroese,
Finnish, French, German, Icelandic, Indonesian, Italian, Norwegian, Portuguese, Spanish, Swedish
Multibyte character sets: Japanese, traditional Chinese, simplified Chinese, Korean
Newly supported character sets: ISO-8859-2: Albanian, Croatian, Czech, Hungarian, Polish, Romanian, Serbian,
Slovak, Slovenian
ISO-8859-7/8/9: Greek, Hebrew, Turkish
Complex languages are not supported: Thai, Indic languages, Arabic
Goal: Unicode
We currently support Latin1 and Asian character sets: ISO-8859-1: Afrikaans, Basque, Catalan, Danish, Dutch, English, Faroese,
Finnish, French, German, Icelandic, Indonesian, Italian, Norwegian, Portuguese, Spanish, Swedish
Multibyte character sets: Japanese, traditional Chinese, simplified Chinese, Korean
Newly supported character sets: ISO-8859-2: Albanian, Croatian, Czech, Hungarian, Polish, Romanian, Serbian,
Slovak, Slovenian
ISO-8859-7/8/9: Greek, Hebrew, Turkish
Complex languages are not supported: Thai, Indic languages, Arabic
Goal: Unicode
DefinitionsDefinitions
Script System of characters composed of:
Letters, syllables or ideographs (with one or more possible directions)
Punctuation symbols
Numbers ( 0 1 2 3 4 5 6 7 8 9 ¼ ½ ¾ )
Other symbols ( ® $ # % & ± ° _ @ )
n scripts/language or n languages/script
Character set (or code page, or coded character set) Ordered group of characters assigned to code points.
Encoding System defining the storage mechanism for a given character set.
Single-Byte Character SetsSingle-Byte Character Sets
Expressed in 8-bit sequences.
The character set does not exceed 256 code points.
The encoding is the order of the character set code points.
A given code point may have a different value (character) depending on the character set.
The first 128 code points are always the same.
Latin ScriptsLatin ScriptsLatin 1 Character Set Latin 1 Character Set (ISO 8859-1)(ISO 8859-1)
Latin 1 Languages covered Afrikaans, Albanian, Basque,
Catalan, Danish, Dutch, English, Finnish, French, German, Icelandic, Irish, Italian, Norwegian, Portuguese, Scottish, Spanish, Swahili, Swedish
Notes Uppercase and lowercase letters
have two code points even though they refer to 2 forms of the same letter.
Some letters have no uppercase.
The base characters are the same for all Latin character sets.
Base characters a b c d e f g h i j k l m
n o p q r s t u v w x y z0 1 2 4 5 6 7 8 9! " ' ( ) , . : ; ? [ ] ^ { | } ~# $ % & ÷ × + - * / = \ < > _
Extended characters àÀ áÁ â ãà äÄ åÅ æÆ
çÇèÈ éÉ êÊ ëËðÐíÍ îÎ ïÏñÑòÒ óÓ ôÔ õÕ öÖ øØßùÙ úÚ ûÛ üÜýÝ ÿþÞ
Latin ScriptsLatin ScriptsISO 8859-1 vs. Windows 1252ISO 8859-1 vs. Windows 1252
Microsoft Windows' Latin 1 character set (code page 1252) is different from ISO 8859-1.
It contains about 20 extra characters, among others: The euro symbol ( ) The English curly quotes ( “ ” ) The ellipsis (…) The German opening quotes ( „ ) The bullet ( • ) The n-dash (–) The m-dash (—) The French uppercase and lowercase oe ligatures (œ Œ) The English trademark symbol (™)
These may not display correctly in non-Latin 1 systems.
Latin ScriptsLatin ScriptsISO 8859-1 vs. Windows 1252ISO 8859-1 vs. Windows 1252
Latin 1(ISO 8859-1)
Windows code page 1252
Latin ScriptsLatin ScriptsLatin 2 Character Set Latin 2 Character Set (ISO 8859-2)(ISO 8859-2)
Latin 1
Latin 2
Languages covered Czech, Hungarian, Polish,
Romanian, Croatian, Slovak, Slovenian, Sorbian
Notes Some characters are duplicates
from the Latin 1 character set.
The caron diacritic has two forms:
“ ˘ ” and “ ’ ”.
The T with cedilla has a glyph variant (T with comma) for Romanian.
Latin 2 characters common to Latin 1 use identical code points.
Extended characters ąĄ áÁ â ăĂ äÄ
ćĆ çÇ čČďĎéÉ ęĘ ëË ěĚðÐíÍ îÎłŁ ľĽ ĺĹńŃ ňŇóÓ ôÔ őŐ öÖŕŔ řŘśŚ šŠ şŞ § ß
ťŤ ţŢ ůŮ úÚ űŰ üÜ ýÝ
źŹ žŽ żŻ
ISO 8859-1 vs. ISO 8859-2ISO 8859-1 vs. ISO 8859-2
Latin 1(ISO 8859-1)
Latin 2(ISO 8859-2)
ISO 8859-1 vs. ISO 8859-2ISO 8859-1 vs. ISO 8859-2
All common characters have the same code points.
Characters that are different belong to separate language families (mostly West European vs. East European).
Allows a certain level of flexibility between languages.
Latin ScriptsLatin ScriptsLatin 3 Character Set Latin 3 Character Set (ISO 8859-3)(ISO 8859-3)
Latin 1
Latin 2
Latin 3
Languages covered Esperanto, Maltese
Notes Covered Turkish before the
introduction of Latin 5 in 1988.
Not supported.
Extended characters
àÀ áÁ â äÄ
ċĊ ĉĈ çÇèÈ éÉ êÊ ëËğĞħĦ ĥĤıI iİ ìÌ íÍ îÎ ïÏĵĴñÑòÒ óÓ ôÔ öÖşŞ ŝŜ §
ß ùÙ úÚ ûÛ üÜ ŭŬżŻ£¤
Latin ScriptsLatin ScriptsLatin 4 Character Set Latin 4 Character Set (ISO 8859-4)(ISO 8859-4)
Latin 1
Latin 2
Latin 3
Latin 4
Languages covered Estonian, Latvian, Lithuanian,
Greenlandic, Lappish
Notes Not supported.
Extended characters
ąĄ āĀ áÁ â ãà äÄ åÅ æÆčČēĒ éÉ ęĘ ëË ėĖðÐģĢ ĸ ķĶĩĨ íÍ îÎ īĪ įĮ ļĻņŅ ŋŊōŌ ôÔ õÕ öÖ øØŗŖšŠß ŧŦųŲ úÚ ûÛ üÜ ũŨ ūŪ¤ ÷
Latin ScriptsLatin ScriptsLatin 5 Character Set Latin 5 Character Set (ISO 8859-9)(ISO 8859-9)
Latin 1
Latin 2
Latin 3
Latin 4
Latin 5
Languages covered Turkish
Notes Very similar to Latin 1.
The letters ð, ý and þ from Latin 1 are replaced with Turkish letters.
Latin 5 characters common to Latin 1 use identical code points.
Issue: *.ini = *.İNİ, and *.n = *.INI
*.ini *.INI, and *.n *.İNİ
Extended characters àÀ áÁ â ãà äÄ åÅ æÆ
çÇèÈ éÉ êÊ ëËíÍ îÎ ïÏðÐ ---> ğĞñÑòÒ óÓ ôÔ õÕ öÖ øØßùÙ úÚ ûÛ üÜýÝ ---> ıİ ÿþÞ ---> şŞ
Latin ScriptsLatin ScriptsLatin 6 Character Set Latin 6 Character Set (ISO 8859-10)(ISO 8859-10)
Latin 1
Latin 2
Latin 3
Latin 4
Latin 5
Latin 6
Languages covered Nordic area
Inuit (Greenlandic Eskimo), non-Skolt Sami (Lappish), Icelandic
Notes Similar characters to Latin 4, but
with extra letters for the Nordic languages.
Latin 6 characters common to Latin 4 use different code points.
Very not supported.
Extended characters
ąĄ āĀ áÁ â ãà äÄ åÅ æÆčČēĒ éÉ ęĘ ëË ėĖðÐģĢ ĸ ķĶĩĨ íÍ îÎ īĪ įĮ ļĻņŅ ŋŊōŌ ôÔ õÕ öÖ øØŗŖšŠß ŧŦųŲ úÚ ûÛ üÜ ũŨ ūŪ¤ ÷
Latin ScriptsLatin ScriptsLatin 7 & 8 Character Sets Latin 7 & 8 Character Sets (ISO 8859-13 (ISO 8859-13 & 14)& 14)
Latin 1
Latin 2
Latin 3
Latin 4
Latin 5
Latin 6
Latin 7
Latin 8
Languages covered Latin 7: Baltic languages
Latin 8: Celtic languages
Notes Similar characters to Latin 4 and
6, but with extra letters for the Nordic languages.
Latin 7 characters common to Latin 4 and 6 use different code points.
Latin 8 characters common to Latin 1 use identical code points.
Not supported.
Latin ScriptsLatin ScriptsLatin 9 Character Set Latin 9 Character Set (ISO 8859-15)(ISO 8859-15)
Latin 1
Latin 2
Latin 3
Latin 4
Latin 5
Latin 6
Latin 7
Latin 8
Latin 9
Languages covered Same as Latin 1.
Notes Some Latin 9 characters common
to Latin 1 use different code points.
Less used characters are replaced:¨ ---> š ¦ ---> Š
¸ ---> ž ´ ---> Ž ½ ---> œ ¼ ---> Œ ¾ ---> Ÿ ¤ --->
Extended characters àÀ áÁ â ãà äÄ åÅ æÆ
çÇèÈ éÉ êÊ ëËíÍ îÎ ïÏðÐñÑòÒ óÓ ôÔ õÕ öÖ øØ œŒ šŠßùÙ úÚ ûÛ üÜýÝ ÿ Ÿ žŽþÞ
ISO 8859-15 vs. Windows 1252ISO 8859-15 vs. Windows 1252
Latin 9(ISO 8859-15)
Windows 1252
Latin Scripts in...Latin Scripts in...Non-Latin Character Sets!Non-Latin Character Sets!
Latin 1
Latin 2
Latin 3
Latin 4
Latin 5
Latin 6
Latin 7
Latin 8
Latin 9
Other
Languages Traditional Chinese
Simplified ChineseJapanese (romaji or romanji)Vietnamese
Notes Chinese, Japanese and Korean use
Latin letters for transliteration (sometime with tone accents) and numbers.
Vietnamese uses Latin characters with diacritics.
Latin characters are also used in the transliteration of Greek, Hebrew, Russian, etc.
Some Vietnamese extended characters
ðÐăĂ âÂêÊôÔ
…with tones
Languages Covered by Latin Character Languages Covered by Latin Character SetsSets
Language Character set (Latin-n)Czech 2
Danish 1 4 5 6 7 8 9
Dutch 1 5 9
English 1 2 3 4 5 6 7 8 9
Finnish 1 2 3 4 5 6 7 8 9
French 1 3 5 8 9
German 1 2 3 4 5 6 7 8 9
Hungarian 2
Italian 1 3 5 8 9
Norwegian 1 2 3 4 5 6 7 8 9
Polish 2 7
Portuguese 1 3 5 8 9
Romanian 2
Spanish 1 8 9
Swedish 1 4 5 6 7 8 9
Turkish 3 5
Language Character set (Latin-n)Czech 2
Danish 1 4 5 6 7 8 9
Dutch 1 5 9
English 1 2 3 4 5 6 7 8 9
Finnish 1 2 3 4 5 6 7 8 9
French 1 3 5 8 9
German 1 2 3 4 5 6 7 8 9
Hungarian 2
Italian 1 3 5 8 9
Norwegian 1 2 3 4 5 6 7 8 9
Polish 2 7
Portuguese 1 3 5 8 9
Romanian 2
Spanish 1 8 9
Swedish 1 4 5 6 7 8 9
Turkish 3 5
Greek ScriptGreek ScriptGreek Character SetGreek Character Set
One script, one character set, one language.
Contains modern monotonic upper & lowercase Greek letters, punctuation and a few accented Greek letters.
The rest is almost identical to Latin 1!
Missing from Latin 1: Latin punctuation: ¡ ¿ Currency symbols: ¢ ¤ ¥ Other symbols: ® ª º × ÷ µ ¶ Diacritics: ¸ Numbers: ¹ ¼ ¾
Extended characters
αβγδεζηικλμν…ΑΒΓΔΖΗΘΙΚΛΝΞ…
The rest...
² ³ ½£ ¦ § © ¬ ¯ ° ± « »· ¨
Hebrew ScriptHebrew ScriptHebrew Character SetHebrew Character Set
One script, one character set: Hebrew
Yiddish
Directionality of text: Hebrew letters are written from right to left (RTL).
Numbers (Arabic) are written from left to right (LTR).
Latin characters are written from left to right (LTR).
Order of the text depends on the predominant language.
Order of mirrored characters depends on neighboring characters.
Differences from Latin 1: Latin punctuation: ¡ ¿ are missing
Currency symbol: ₪ (new sheqel) is absent
Other symbols: ª º are missing× ÷ have different code points
Extended characters
תשרקעסליטחזוהדגבא
Final & nominal forms:
ך - כן - נ
ם - מף - פץ - צ
Final form
Hebrew User InterfaceHebrew User Interface
There are two types of Hebrew support: Hebrew-enabled product (supporting Hebrew characters)
Hebrew product (translated into Hebrew)
Both types must support RTL display. Text alignment may differ for characters, strings
and document. Normally, the logical order (or storage order or file
order) is the same as the reading order. The display order is bi-directional and does not
follow the logical order.
Hebrew User InterfaceHebrew User InterfaceLogical vs. VisualLogical vs. Visual
Input string: "Hebrew text : טסקט ילגנא "
In a LTR document:Hebrew text : אנגלי טקסט
In a RTL document:אנגלי Hebrew text : טקסט
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
How should it be displayed?
You get different displays depending on the main direction (script) of the document or the string.
Notice the direction of the colon.
Hebrew User Interface Hebrew User Interface —— Issues Issues
Display of improper characters.
Display in improper order.
Display in correct order; cursor in logical position.
Mix of Hebrew and Latin text.
Alignment inside an input field.
Copy and paste.
Carriage returns inside a Hebrew or mixed string.
Cumulative TestingCumulative Testing
Premisses: Testing in French or German includes English issues.
Testing of Greek includes non-Latin 1 character and font issues. Special cases:
Cursory testing of character and font issues per character set.
Sorting and comparision per language.
Hebrew: Bi-directionality
Turkish: INI files and anything related to case conversion
Total 50% Increase for ALL Total 50% Increase for ALL LanguagesLanguages
French or German: 100%
Greek: 15%
Hebrew: 15%
Turkish: 5%
Czech or Polish: 5%
Cursory testing: 10%
English: 0% English coverage: 100%
Swedish Danish Trad. Spanish Modern Spanish French(a, à, â, ã) (a, à, â, ã) (a, à, â, ä, ã, å) (a, à, â, ä, ã, å) (a, à, â, ä, ã, å)(æ, ae) (æ, ae) (æ, ae) (æ, ae)b b b b b(c, ç) (c, ç) (c, ç) (c, ç) (c, ç)cg < ch < ci cg < ch < ci ch (cz < ch < da) cg < ch < ci cg < ch < cid d d d d(e, è, é, ê, ë) (e, è, é, ê, ë) (e, è, é, ê, ë) (e, è, é, ê, ë) (e, è, é, ê, ë)f…h f…h f…h f…h f…h(i, ì, í, î, ï) (i, ì, í, î, ï) (i, ì, í, î, ï) (i, ì, í, î, ï) (i, ì, í, î, ï)j…l j…l j…l j…l j…llk < ll < lm lk < ll < lm ll (lz < ll < ma) lk < ll < lm lk < ll < lmm m m m m(n, ñ) (n, ñ) n n (n, ñ)
ñ ñ(o, ó, ò, ô, õ) (o, ó, ò, ô, õ) (o, ó, ò, ô, ö, õ, ø) (o, ó, ò, ô, ö, õ, ø) (o, ó, ò, ô, ö, õ, ø)p…r p…r p…r p…r p…r(ß, ss) (ß, ss) (ß, ss) (ß, ss) (ß, ss)t t t t t(u, ù, ú, û, ü) (u, ù, ú, û, ü) (u, ù, ú, û, ü) (u, ù, ú, û, ü) (u, ù, ú, û, ü)v…x v…x v…x v…x v…x(y, ý, ÿ) (y, ý, ÿ) (y, ý, ÿ) (y, ý, ÿ) (y, ý, ÿ)z z z z zå (æ, ae, ä)ä (ø, ö)(ö, ø) (å, aa)
Swedish Danish Trad. Spanish Modern Spanish French(a, à, â, ã) (a, à, â, ã) (a, à, â, ä, ã, å) (a, à, â, ä, ã, å) (a, à, â, ä, ã, å)(æ, ae) (æ, ae) (æ, ae) (æ, ae)b b b b b(c, ç) (c, ç) (c, ç) (c, ç) (c, ç)cg < ch < ci cg < ch < ci ch (cz < ch < da) cg < ch < ci cg < ch < cid d d d d(e, è, é, ê, ë) (e, è, é, ê, ë) (e, è, é, ê, ë) (e, è, é, ê, ë) (e, è, é, ê, ë)f…h f…h f…h f…h f…h(i, ì, í, î, ï) (i, ì, í, î, ï) (i, ì, í, î, ï) (i, ì, í, î, ï) (i, ì, í, î, ï)j…l j…l j…l j…l j…llk < ll < lm lk < ll < lm ll (lz < ll < ma) lk < ll < lm lk < ll < lmm m m m m(n, ñ) (n, ñ) n n (n, ñ)
ñ ñ(o, ó, ò, ô, õ) (o, ó, ò, ô, õ) (o, ó, ò, ô, ö, õ, ø) (o, ó, ò, ô, ö, õ, ø) (o, ó, ò, ô, ö, õ, ø)p…r p…r p…r p…r p…r(ß, ss) (ß, ss) (ß, ss) (ß, ss) (ß, ss)t t t t t(u, ù, ú, û, ü) (u, ù, ú, û, ü) (u, ù, ú, û, ü) (u, ù, ú, û, ü) (u, ù, ú, û, ü)v…x v…x v…x v…x v…x(y, ý, ÿ) (y, ý, ÿ) (y, ý, ÿ) (y, ý, ÿ) (y, ý, ÿ)z z z z zå (æ, ae, ä)ä (ø, ö)(ö, ø) (å, aa)
Swedish Danish Trad. Spanish Modern Spanish French(a, à, â, ã) (a, à, â, ã) (a, à, â, ä, ã, å) (a, à, â, ä, ã, å) (a, à, â, ä, ã, å)(æ, ae) (æ, ae) (æ, ae) (æ, ae)b b b b b(c, ç) (c, ç) (c, ç) (c, ç) (c, ç)cg < ch < ci cg < ch < ci ch (cz < ch < da) cg < ch < ci cg < ch < cid d d d d(e, è, é, ê, ë) (e, è, é, ê, ë) (e, è, é, ê, ë) (e, è, é, ê, ë) (e, è, é, ê, ë)f…h f…h f…h f…h f…h(i, ì, í, î, ï) (i, ì, í, î, ï) (i, ì, í, î, ï) (i, ì, í, î, ï) (i, ì, í, î, ï)j…l j…l j…l j…l j…llk < ll < lm lk < ll < lm ll (lz < ll < ma) lk < ll < lm lk < ll < lmm m m m m(n, ñ) (n, ñ) n n (n, ñ)
ñ ñ(o, ó, ò, ô, õ) (o, ó, ò, ô, õ) (o, ó, ò, ô, ö, õ, ø) (o, ó, ò, ô, ö, õ, ø) (o, ó, ò, ô, ö, õ, ø)p…r p…r p…r p…r p…r(ß, ss) (ß, ss) (ß, ss) (ß, ss) (ß, ss)t t t t t(u, ù, ú, û, ü) (u, ù, ú, û, ü) (u, ù, ú, û, ü) (u, ù, ú, û, ü) (u, ù, ú, û, ü)v…x v…x v…x v…x v…x(y, ý, ÿ) (y, ý, ÿ) (y, ý, ÿ) (y, ý, ÿ) (y, ý, ÿ)z z z z zå (æ, ae, ä)ä (ø, ö)(ö, ø) (å, aa)
Swedish Danish Trad. Spanish Modern Spanish French(a, à, â, ã) (a, à, â, ã) (a, à, â, ä, ã, å) (a, à, â, ä, ã, å) (a, à, â, ä, ã, å)(æ, ae) (æ, ae) (æ, ae) (æ, ae)b b b b b(c, ç) (c, ç) (c, ç) (c, ç) (c, ç)cg < ch < ci cg < ch < ci ch (cz < ch < da) cg < ch < ci cg < ch < cid d d d d(e, è, é, ê, ë) (e, è, é, ê, ë) (e, è, é, ê, ë) (e, è, é, ê, ë) (e, è, é, ê, ë)f…h f…h f…h f…h f…h(i, ì, í, î, ï) (i, ì, í, î, ï) (i, ì, í, î, ï) (i, ì, í, î, ï) (i, ì, í, î, ï)j…l j…l j…l j…l j…llk < ll < lm lk < ll < lm ll (lz < ll < ma) lk < ll < lm lk < ll < lmm m m m m(n, ñ) (n, ñ) n n (n, ñ)
ñ ñ(o, ó, ò, ô, õ) (o, ó, ò, ô, õ) (o, ó, ò, ô, ö, õ, ø) (o, ó, ò, ô, ö, õ, ø) (o, ó, ò, ô, ö, õ, ø)p…r p…r p…r p…r p…r(ß, ss) (ß, ss) (ß, ss) (ß, ss) (ß, ss)t t t t t(u, ù, ú, û, ü) (u, ù, ú, û, ü) (u, ù, ú, û, ü) (u, ù, ú, û, ü) (u, ù, ú, û, ü)v…x v…x v…x v…x v…x(y, ý, ÿ) (y, ý, ÿ) (y, ý, ÿ) (y, ý, ÿ) (y, ý, ÿ)z z z z zå (æ, ae, ä)ä (ø, ö)(ö, ø) (å, aa)
Swedish Danish Trad. Spanish Modern Spanish French(a, à, â, ã) (a, à, â, ã) (a, à, â, ä, ã, å) (a, à, â, ä, ã, å) (a, à, â, ä, ã, å)(æ, ae) (æ, ae) (æ, ae) (æ, ae)b b b b b(c, ç) (c, ç) (c, ç) (c, ç) (c, ç)cg < ch < ci cg < ch < ci ch (cz < ch < da) cg < ch < ci cg < ch < cid d d d d(e, è, é, ê, ë) (e, è, é, ê, ë) (e, è, é, ê, ë) (e, è, é, ê, ë) (e, è, é, ê, ë)f…h f…h f…h f…h f…h(i, ì, í, î, ï) (i, ì, í, î, ï) (i, ì, í, î, ï) (i, ì, í, î, ï) (i, ì, í, î, ï)j…l j…l j…l j…l j…llk < ll < lm lk < ll < lm ll (lz < ll < ma) lk < ll < lm lk < ll < lmm m m m m(n, ñ) (n, ñ) n n (n, ñ)
ñ ñ(o, ó, ò, ô, õ) (o, ó, ò, ô, õ) (o, ó, ò, ô, ö, õ, ø) (o, ó, ò, ô, ö, õ, ø) (o, ó, ò, ô, ö, õ, ø)p…r p…r p…r p…r p…r(ß, ss) (ß, ss) (ß, ss) (ß, ss) (ß, ss)t t t t t(u, ù, ú, û, ü) (u, ù, ú, û, ü) (u, ù, ú, û, ü) (u, ù, ú, û, ü) (u, ù, ú, û, ü)v…x v…x v…x v…x v…x(y, ý, ÿ) (y, ý, ÿ) (y, ý, ÿ) (y, ý, ÿ) (y, ý, ÿ)z z z z zå (æ, ae, ä)ä (ø, ö)(ö, ø) (å, aa)
Swedish Danish Trad. Spanish Modern Spanish French(a, à, â, ã) (a, à, â, ã) (a, à, â, ä, ã, å) (a, à, â, ä, ã, å) (a, à, â, ä, ã, å)(æ, ae) (æ, ae) (æ, ae) (æ, ae)b b b b b(c, ç) (c, ç) (c, ç) (c, ç) (c, ç)cg < ch < ci cg < ch < ci ch (cz < ch < da) cg < ch < ci cg < ch < cid d d d d(e, è, é, ê, ë) (e, è, é, ê, ë) (e, è, é, ê, ë) (e, è, é, ê, ë) (e, è, é, ê, ë)f…h f…h f…h f…h f…h(i, ì, í, î, ï) (i, ì, í, î, ï) (i, ì, í, î, ï) (i, ì, í, î, ï) (i, ì, í, î, ï)j…l j…l j…l j…l j…llk < ll < lm lk < ll < lm ll (lz < ll < ma) lk < ll < lm lk < ll < lmm m m m m(n, ñ) (n, ñ) n n (n, ñ)
ñ ñ(o, ó, ò, ô, õ) (o, ó, ò, ô, õ) (o, ó, ò, ô, ö, õ, ø) (o, ó, ò, ô, ö, õ, ø) (o, ó, ò, ô, ö, õ, ø)p…r p…r p…r p…r p…r(ß, ss) (ß, ss) (ß, ss) (ß, ss) (ß, ss)t t t t t(u, ù, ú, û, ü) (u, ù, ú, û, ü) (u, ù, ú, û, ü) (u, ù, ú, û, ü) (u, ù, ú, û, ü)v…x v…x v…x v…x v…x(y, ý, ÿ) (y, ý, ÿ) (y, ý, ÿ) (y, ý, ÿ) (y, ý, ÿ)z z z z zå (æ, ae, ä)ä (ø, ö)(ö, ø) (å, aa)
Sorting — 1Sorting — 1Sorting — 1Sorting — 1Feuille Microsoft Excel
Sorting — 2Sorting — 2Sorting — 2Sorting — 2
Sort order The system generates a sort key based on locale-specific
rules
A sort key consists of several weighted components that represent a character’s script, diacritics, case, etc.
Simple example of French sorting:
Sorting: Rules:élémentaireélève 1) Alphanumeric baseÉlève 2) Diacriticsélevé 3) CaseÉlevé 4) Non-alphanumeric dataélever
Sort order The system generates a sort key based on locale-specific
rules
A sort key consists of several weighted components that represent a character’s script, diacritics, case, etc.
Simple example of French sorting:
Sorting: Rules:elementaireeleve 1) Alphanumeric baseEleve 2) Diacriticseleve 3) CaseEleve 4) Non-alphanumeric dataelever
Sort order The system generates a sort key based on locale-specific
rules
A sort key consists of several weighted components that represent a character’s script, diacritics, case, etc.
Simple example of French sorting:
Sorting: Rules:élémentaireélève 1) Alphanumeric baseÉlève 2) Diacriticsélevé 3) CaseÉlevé 4) Non-alphanumeric dataélever
Sort order The system generates a sort key based on locale-specific
rules
A sort key consists of several weighted components that represent a character’s script, diacritics, case, etc.
Simple example of French sorting:
Sorting: Rules:elementaireeleve 1) Alphanumeric baseEleve 2) Diacriticseleve 3) CaseEleve 4) Non-alphanumeric dataelever
Sort order The system generates a sort key based on locale-specific
rules
A sort key consists of several weighted components that represent a character’s script, diacritics, case, etc.
Simple example of French sorting:
Sorting: Rules:élémentaireélève 1) Alphanumeric baseÉlève 2) Diacriticsélevé 3) CaseÉlevé 4) Non-alphanumeric dataélever
Sort order The system generates a sort key based on locale-specific rules
A sort key consists of several weighted components that represent a character’s script, diacritics, case, etc.
Simple example of French sorting:
Sorting: Rules:élémentaireélève 1) Alphanumeric baseÉlève 2) Diacriticsélevé 3) CaseÉlevé 4) Non-alphanumeric dataélevere-lever
ReferencesReferences
The ISO 8859 Alphabet Soup by Roman Czyborra. An absolute classic... http://czyborra.com/charsets/iso8859.html
Character table: http://www.microsoft.com/globaldev/reference/sbcs/1250.htm
Some Internet Explorer limitations: http://sizif.mf.uni-lj.si/linux/cee/app/ie30.html#http
More of the same: http://sizif.mf.uni-lj.si/linux/cee/charset.html
On fonts (a bit specialized): http://studweb.euv-frankfurt-o.de/twardoch/f/en/index.html
ISO 8859-2 vs.Windows Central European code page (1250): http://titus.uni-frankfurt.de/unicode/iso8859/iso8859b.htm#start