Download - This talk lasts 三十分钟
This talk lasts Localisation is easy
Administrative Notes
• @pilif on twitter
• pilif on github
• working at Sensational AG
• @pilif on twitter
• pilif on github
• working at Sensational AG
• warming up to shirts
Thanks Richard for the Recording
About that 💩
Maybe ES6…?
My host name is a horrible spoiler if you're into JRPGs. Disregard
however…
close enough.
Back to the topic at hand
Let’s talk terms
• Language is a language as it is spoken or written
• Locale is the name given to a set of parameters that define how things should be done for users speaking a certain language in a certain place
• There are many more locales than countries
Locale
• Locales consist of a language…
• … and a country
• … and sometimes specific variants
Specifying locales
• IETF BCP-47 document
• See RFC 5646 and RFC 4647
• Use language-script-territory@modifier
• POSIX uses language_territory.encoding@modifier
fr-Latn-CH
fr-CH
fr_CH.utf-8
The Locale affects many things
Number formatting• Probably the most obvious of the bunch.
• Decimal separator
• Thousands separator
• Sign
• Also: Currency information
Some Samples
de-CH de-DE en-US
decimal separator . , .
thousands separator ' . ,
12,435
en-US twelve thousand four hundred and thirty five
de-DE twelve comma four three five
de-CH error
Date Formatting
• Obviously names of months and weekdays
• Order of distinct parts
• Separator character
• Commonly used formats in different contexts
Date Formatting• Libraries usually provide a generic short/
medium/long format
• Libraries also provide templates
• If your library’s template language has any characters that are not for replacement, they are doing it wrong
• Apple does it right since 10.11 and iOS9
2015-07-18 17:47Long Medium Short
en-US July 18, 2015 at 4:58:00 PM CEST
Jul 18, 2015, 4:58:00 PM 7/18/15, 4:58 PM
fr-CA 18 juillet 2015 16:58:00 UTC+2
18 juil. 2015 16:58:00 15-07-18 16:58
fr-CH 18 juillet 2015 16:58:00 UTC+2
18 juil. 2015 16:58:00 18.07.15 16:58
fr-FR 18 juillet 2015 16:58:00 UTC+2
18 juil. 2015 16:58:00 18/07/2015 16:58
Choice of calendar• Most of the world is using the Gregorian
calendar
• The Julian calendar uses the same month names but is off by 13 days (they have July 5th right now)
• Other calendars use different month names
• Might affect holiday calculations
Collation order
• How to compare to strings. Which one is first?
• Where to put the characters with pesky accents?
• How to deal with case differences?
• What about non-latin scripts?
Collation fun*• Phonebook german vs. ordinary german, vs.
Austrian german (dealing with umlauts)
• Contractions (Spanish ch counts as one letter, ch in Czech sorts after h, but c after b, etc)
• Handling of accents is language-dependent
• Case insensitive is a mess
Case folding• Some languages don’t differentiate between upper- and
lowercase
• Inconsistent mapping between upper- and lowercase (ß => SS, the reverse is not always true)
• Uppercasing accented characters is language (and sometimes locale) dependent. French characters often loose accents when uppercasing
• Inconsistent uppercasing for some languages (uppercase turkish i is İ. Lowercase turkish I is ı)
Double the fun• Collation and Case-Folding provide an interesting
team
• Depending on locale, upper- and lowercase should be sorted together or apart
• In some locales, case doesn’t matter at all when sorting
• In some locales, case always matters when sorting
• Depends on the use-case
Collation strength
• icu created the concept of “collation strength”
• strength 1 is the most lenient
• strength 5 is the most exact
• Example: Strength 2 removes accents unless the language is Danish
‘nough said
RTL
Perspectives matter
Context matters• “This slide lasts one minute”
• “This talk lasts 30 minutes”
• “Lunch lasted 1:30 hours”
• “Tomorrow I’ll sleep in”
• “August, 1th is a national holiday”
Let’s get practical
Locale handling is like escaping
• Always store raw unformatted data
• Format near the end of the chain
• Just before you escape
• Parse user input as early as possible
• Use native data types
UI Language is not locale
• Users might prefer to use the os in a different language than what’s inferred by their locale
• Just because I’m in de_CH it doesn’t mean I want your software to speak german to me
• UI language is completely different from the users locale
Avoid this mess
Avoid this mess
Avoid this mess
Mixing Locales• Forming sentences in UI language with locale formatted
data is… challenging
• Be mindful that language might influence some locale formatting.
• “This talk lasts ”
• or rather “This talk lasts 30 minutes”
• It depends. Does the locale also use hours and minutes?
Never be helpful* and translate units
1kg in de_CH is not 1lbs in en_US
Btw: Apple’s APIs are really good at this
What about web sites?
• Never, ever infer UI language by IP Geolocation.
People from Google: This slide is for you!
What about web sites?
• Never, ever infer UI language by IP Geolocation.
• Ever. Ever. EVER.
People from Google: This slide is for you!
What about web sites?
• Never, ever infer UI language by IP Geolocation.
• Ever. Ever. EVER.
• Promise!
People from Google: This slide is for you!
What about web sites?
• Never, ever infer UI language by IP Geolocation.
• Ever. Ever. EVER.
• Promise!
• You may infer Locale from IP Geolocation though
People from Google: This slide is for you!
Rely on HTTP• Trust Accept-Language - by now browser set
it correctly
• Use the header to determine UI language
• Use the header to determine default locale
• But ask the user
• Same goes for time zones
SHOW ME SOME CODE ALREADY!!!
The past• There has always been date formatting
(Date.toLocaleString). Mostly useless
• People were self-nebling (search youtube for “ich neble selber”) for example in date pickers and libraries
• hint: applying substr() to Date.toDateString() is not a correct solution.
• same goes for using replace(‘.’, ‘,’) on a number
The present• Microsoft has donated a huge chunk of localisation code to the
jQuery project.
• It’s not integrated into jQuery, but maintained by the jQuery project
• Check out https://github.com/jquery/globalize
• Doesn’t support collation
• The library is big
• But most of it is data and this problem can only be solved with a huge database of special cases
Globalize.locale("fr-CH"); console.log(Globalize.formatDate( new Date(), {datetime: "medium" } )); console.log(Globalize.formatDate( new Date(), {skeleton: "yMMMM" } )); console.log(Globalize.formatNumber(12345.6789)); console.log(Globalize.formatCurrency(1956.3334, "EUR")); console.log(Globalize.formatRelativeTime(-35, "second"));
Globalize.locale("fr-CH"); console.log(Globalize.formatDate( new Date(), {datetime: "medium" } )); console.log(Globalize.formatDate( new Date(), {skeleton: "yMMMM" } )); console.log(Globalize.formatNumber(12345.6789)); console.log(Globalize.formatCurrency(1956.3334, "EUR")); console.log(Globalize.formatRelativeTime(-35, "second"));
Globalize.locale("fr-CH"); console.log(Globalize.formatDate( new Date(), {datetime: "medium" } )); console.log(Globalize.formatDate( new Date(), {skeleton: "yMMMM" } )); console.log(Globalize.formatNumber(12345.6789)); console.log(Globalize.formatCurrency(1956.3334, "EUR")); console.log(Globalize.formatRelativeTime(-35, "second"));
Globalize.locale("fr-CH"); console.log(Globalize.formatDate( new Date(), {datetime: "medium" } )); console.log(Globalize.formatDate( new Date(), {skeleton: "yMMMM" } )); console.log(Globalize.formatNumber(12345.6789)); console.log(Globalize.formatCurrency(1956.3334, "EUR")); console.log(Globalize.formatRelativeTime(-35, "second"));
Globalize.locale("fr-CH"); console.log(Globalize.formatDate( new Date(), {datetime: "medium" } )); console.log(Globalize.formatDate( new Date(), {skeleton: "yMMMM" } )); console.log(Globalize.formatNumber(12345.6789)); console.log(Globalize.formatCurrency(1956.3334, "EUR")); console.log(Globalize.formatRelativeTime(-35, "second"));
Globalize.locale("fr-CH"); console.log(Globalize.formatDate( new Date(), {datetime: "medium" } )); console.log(Globalize.formatDate( new Date(), {skeleton: "yMMMM" } )); console.log(Globalize.formatNumber(12345.6789)); console.log(Globalize.formatCurrency(1956.3334, "EUR")); console.log(Globalize.formatRelativeTime(-35, "second"));
Globalize.locale("fr-CH"); console.log(Globalize.formatDate( new Date(), {datetime: "medium" } )); console.log(Globalize.formatDate( new Date(), {skeleton: "yMMMM" } )); console.log(Globalize.formatNumber(12345.6789)); console.log(Globalize.formatCurrency(1956.3334, "EUR")); console.log(Globalize.formatRelativeTime(-35, "second"));
The future• ECMA-402 from 2012
• Yes. Specs from 2012 are “the future” in JS land
• Provides the global Intl object
• Date, Number formatting and Collation
• see: http://www.ecma-international.org/ecma-402/1.0/
Could be worse
node.js is still bikeshedding because icu
var f = new Intl.DateTimeFormat('de-CH', { weekday: 'long', year: 'numeric', month: 'long', day: 'numeric' }); console.log(f.format(new Date())); var n = new Intl.NumberFormat('de-CH', { style: "decimal", minimumFractionDigits: 2 }); console.log(n.format(1234.5)); var currency = new Intl.NumberFormat('de-CH', { style: "currency", currency: 'EUR' }); console.log(currency.format(1234.5)); var comp = new Intl.Collator('de-CH'); var words = [ "Swissjs", "swissjs", "is", "loads", "of", "fun" ]; console.log(words.sort(comp));
var f = new Intl.DateTimeFormat('de-CH', { weekday: 'long', year: 'numeric', month: 'long', day: 'numeric' }); console.log(f.format(new Date())); var n = new Intl.NumberFormat('de-CH', { style: "decimal", minimumFractionDigits: 2 }); console.log(n.format(1234.5)); var currency = new Intl.NumberFormat('de-CH', { style: "currency", currency: 'EUR' }); console.log(currency.format(1234.5)); var comp = new Intl.Collator('de-CH'); var words = [ "Swissjs", "swissjs", "is", "loads", "of", "fun" ]; console.log(words.sort(comp));
var f = new Intl.DateTimeFormat('de-CH', { weekday: 'long', year: 'numeric', month: 'long', day: 'numeric' }); console.log(f.format(new Date())); var n = new Intl.NumberFormat('de-CH', { style: "decimal", minimumFractionDigits: 2 }); console.log(n.format(1234.5)); var currency = new Intl.NumberFormat('de-CH', { style: "currency", currency: 'EUR' }); console.log(currency.format(1234.5)); var comp = new Intl.Collator('de-CH'); var words = [ "Swissjs", "swissjs", "is", "loads", "of", "fun" ]; console.log(words.sort(comp));
var f = new Intl.DateTimeFormat('de-CH', { weekday: 'long', year: 'numeric', month: 'long', day: 'numeric' }); console.log(f.format(new Date())); var n = new Intl.NumberFormat('de-CH', { style: "decimal", minimumFractionDigits: 2 }); console.log(n.format(1234.5)); var currency = new Intl.NumberFormat('de-CH', { style: "currency", currency: 'EUR' }); console.log(currency.format(1234.5)); var comp = new Intl.Collator('de-CH'); var words = [ "Swissjs", "swissjs", "is", "loads", "of", "fun" ]; console.log(words.sort(comp));
var f = new Intl.DateTimeFormat('de-CH', { weekday: 'long', year: 'numeric', month: 'long', day: 'numeric' }); console.log(f.format(new Date())); var n = new Intl.NumberFormat('de-CH', { style: "decimal", minimumFractionDigits: 2 }); console.log(n.format(1234.5)); var currency = new Intl.NumberFormat('de-CH', { style: "currency", currency: 'EUR' }); console.log(currency.format(1234.5)); var comp = new Intl.Collator('de-CH'); var words = [ "Swissjs", "swissjs", "is", "loads", "of", "fun" ]; console.log(words.sort(comp));
var f = new Intl.DateTimeFormat('de-CH', { weekday: 'long', year: 'numeric', month: 'long', day: 'numeric' }); console.log(f.format(new Date())); var n = new Intl.NumberFormat('de-CH', { style: "decimal", minimumFractionDigits: 2 }); console.log(n.format(1234.5)); var currency = new Intl.NumberFormat('de-CH', { style: "currency", currency: 'EUR' }); console.log(currency.format(1234.5)); var comp = new Intl.Collator('de-CH'); var words = [ "Swissjs", "swissjs", "is", "loads", "of", "fun" ]; console.log(words.sort(comp));
var f = new Intl.DateTimeFormat('de-CH', { weekday: 'long', year: 'numeric', month: 'long', day: 'numeric' }); console.log(f.format(new Date())); var n = new Intl.NumberFormat('de-CH', { style: "decimal", minimumFractionDigits: 2 }); console.log(n.format(1234.5)); var currency = new Intl.NumberFormat('de-CH', { style: "currency", currency: 'EUR' }); console.log(currency.format(1234.5)); var comp = new Intl.Collator('de-CH'); var words = [ "Swissjs", "swissjs", "is", "loads", "of", "fun" ]; console.log(words.sort(comp));
var f = new Intl.DateTimeFormat('de-CH', { weekday: 'long', year: 'numeric', month: 'long', day: 'numeric' }); console.log(f.format(new Date())); var n = new Intl.NumberFormat('de-CH', { style: "decimal", minimumFractionDigits: 2 }); console.log(n.format(1234.5)); var currency = new Intl.NumberFormat('de-CH', { style: "currency", currency: 'EUR' }); console.log(currency.format(1234.5)); var comp = new Intl.Collator('de-CH'); var words = [ "Swissjs", "swissjs", "is", "loads", "of", "fun" ]; console.log(words.sort(comp));
Conclusion• Proper localisation is part of our job to make the web useful for
everybody
• Use the libraries provided
• Whenever you think you know better than the library: No. You don’t.
• Remember that UI language and Locale are not always connected
• Don’t do IP geolocation for language choice
• When in doubt: Ask the user. She’ll know for sure.
Before I leave
""".length
[…"""].length
In case you answered 11 and 8, I salute you
Thanks everyone and enjoy your evening
• U+1F468 (MAN) 👨
• U+200D (ZERO WIDTH JOINER)
• U+2764 (HEAVY BLACK HEART) ❤
• U+FE0F (VARIATION SELECTOR-16)
• U+200D (ZERO WIDTH JOINER)
• U+1F48B (KISS MARK) 💋
• U+200D (ZERO WIDTH JOINER)
• U+1F468 (MAN) 👨