Download - ibm i globalization v3.11
CEC2011 – IBM i [email protected]
CEC2011
My profile
CEC2011
keywords
• I(nternazionalizatio)n i18n– Process of producing a product(design and code)
indipendent of a language, script, culture or character setNeutral
+• L(ocalizatio)n l10n
– Process of adapting an internazionalized product to specific languages, scripts, cultures and character sets Customize, extend
=
CEC2011
keywords
• G(lobalization)n g11n– Proper design and execution so one instance of
software, executing on a single machine, can process multilingual data ad present it culturally correct in a multicultural environment; G11N = I18N + L10N + Multilingual Support
CEC2011
Character representation
• Some characters from Italy, Germany, France, China, Greece, Sweden, Japan…
CEC2011
Character representation
• CS – Character set
– A collection of elements used to represent textual information (e.g. 0-9, a-z, A-Z, .,;:!? … )
– A Character Set generally supports more than one language
CEC2011
Character SET a subset of chars
• CS 695 – Euro Country Extended Code Page
CEC2011
Character SET a subset of chars
• CS 925 – Greece
CEC2011
Character SET a subset of chars
• CS 1172 – Japanese alpha and Katakana
CEC2011
Character SET a subset of chars
• CS 1150 – Cyrillic Russian
CEC2011
Character SET a subset of chars
• CS 1174 – People’s Republic of China
CEC2011
Character SET a subset of chars
CEC2011
Code Page
• Code Page (CP)
– Defines a subset of characters from a Character Set
– Each character in a character set is assigned a numerical representation (Hex Code)
CEC2011
CCSID
• A unique number (0-65535) used by IBM to uniquely identify a Character Set and a Code Page
• Defines an ENCODING Scheme
CEC2011
Encoding Scheme
ES Encoding Scheme1100 EBCDIC, single-byte, No code extension is allowed1301 EBCDIC, mixed single-byte and double-byte, using shift-in (SI) and shift-out (SO) code extension method4100 ISO 8, single-byte, No code extension is allowed7200 UCS-2, No code extension is allowed7808 UTF-8, No code extension is allowed
Encoding Scheme
• EBCDIC – SBCS (1Byte/Char)• EBCDIC – DBCS (2Byte/Char)• ASCII (1Byte/Char)
• UNICODE (………)
CEC2011
CCSID - Attributes
CCSID Character Set Code Page Encoding Scheme Description37 697 37 1100 USA
273 697 273 1100 Germany280 697 280 1100 Italy
1025 1150 1025 1100 Cyrillic Russian1388 1174 836 1301 Simplified Chinese
Character Set697 Latin 1
1150 Cyrillic Multilingual1174 Simplified Chinese Ext (EBCDIC/PC Common)
Encoding Scheme1100 EBCDIC, single-byte, No code extension is allowed. Number of States = 1.1301 EBCDIC, mixed single-byte and double-byte, using shift-in (SI) and shift-out (SO) code extension method
Code Page836 Simplified Chinese Extended
37 USA/Canada - CECP273 Germany F.R./Austria - CECP280 Italy - CECP
1025 Cyriliic multilingual
CEC2011
CCSID
• Same CS (697 Latin-1) Different CP Different CCSID Different Character position
1140: USA 1144: ITA
CEC2011
Fixed/Variant Code Points
VARIANT Code PointsCharacters that do change hex values (position):§, £, #, $, @, !
FIXED Code PointsCharacter that do NOT chages hex valuesA-Z, a-z, 0-9, ()/+-_*%.;:,
Hint: Avoid using characters that are not in the invariant character set for names and literals in programs.
CEC2011
SBCS-DBCS
• SBCS– EBCDIC – Each CCSID can store x’FF’ = 256 Chars
• DBCS– EBCDIC– Each CCSID can store x’FFFF’ = 65536
Chars– APAC Only:
Chinese (Simplifies and Traditional)JapaneseKorean
CEC2011
Data Integrity
• If characters are in both CCSID – Ok match!
• Else– Roundtrip
ITA è USA } ITA è– Substitution char
Some cases (e.g.FTP) Substitution char x’3F’
CEC2011
!
• Never use CCSID 65535 in a multilingual Environment
• 65535 means NO TRANSLATE– turns off automatic conversion – maintains the same codepoint across
different Codepages
• 65535 ok in a single language env
CEC2011
Numeric columns NO CCSID
CCSID
PF-SRC
PF-DTA
CEC2011
CCSID
CEC2011
• Job CCSID if set is used. • If the Job CCSID is set to *USRPRF then the
user profile is checked.• If the user profile CCSID is set then it is
used.• If the user profile value is set to *SYSVAL
then the system value is checked.• If the system value is set to 65535 then the
Language id is checked.• If the language id value is set then the
QTQ_DEFAULT_CCSID is used, else the language id is converted to a CCSID.
CCSID - escalation
CEC2011
iSeries Access for windows
• Not UNICODE Compliant• Needs NL Installation• Depends on Client (Win) codepage
Language CCSIDClient CodePage
German 273 850Italian 280 850Russian 1025 866Simpl.Chinese 1388 936
CEC2011
iSeries Access for windows
Language CCSIDClient CodePage
German 273 850Italian 280 850Russian 1025 866Simpl.Chinese 1388 936
CEC2011
iSeries Access fow windows
CEC2011
iSeries Access fow windows
Limits: 1 CCSID/Job
CEC2011
National Language
• Primary and secondary Language
CEC2011
National Language
• Primary and secondary Language
CEC2011
National Language
• Primary and secondary Language
CEC2011
About CP, CS, CCSID
http://www-01.ibm.com/software/globalization/g11n-res.html
CEC2011
• SBCS/DBCS
• Limits :one CCSID(language)/Work Session
• Limits :one CCSID(language)/DB.Column• Limits :more code (SBCS/DBCS)
Limits
CEC2011
Unicode
• Single Character Set– Contains all current and paste languages– A unique number for every character– Different way to store data (not only
16bit)– Has mapping to all CharSets
CEC2011
Unicode
• Now– Hundreds of CCSID: one for each
language (SBCS/DBCS)
• Unicode– One encoding system includes all
language characters
CEC2011
Unicode
There is a code page for every language, each character being represented by a number
CEC2011
Unicode - Endian
Little Endian(intel)
Big Endiani5
NO Endian
UTF16 BE
UTF16 LE
CEC2011
Unicode - Encodings
First version of unicode 2 byte/Char 65535 Characters
Version 2 multibyte > 1 million characters
Unicode supports three UTF formatsthere are three widely accepted schemes, or Unicode transformation formats ( UTF's )
– UTF-8– UTF-16 (default) – UTF-32
CEC2011
• Unicode (UCS-2) support 3 UTF formats– UTF8
No EndianWEBMultibyte
– UTF16Little-Big Endian (Little: Intel)Host Languages on i5 (RPG/CBL)
– UTF32No support on i5
Unicode - Encodings
CEC2011
Unicode - Encodings
UTF88 bit Blocks
ABC x’414243’
UTF1616 bit Blocks
ABC x’004100420043’
UTF3232 bit Blocks
ABC x’000000041000000042000000043’
CEC2011
Unicode - Multibyte
UTF8 (example)depending on the first bits…
CEC2011
Unicode – Multibyte - example
UTF8: 11100100-10001000-10101101
UTF16 BE
UTF16 LE
CEC2011
Unicode - CCSID
Encoding CCSID Note Char UnitUTF-8 1208 from 5.3 8 BitUTF-16 1200 from 5.3 16 BitUTF-32 NA 32 Bit
UCS-2 13488 superseded --> UTF-16 16 Bit
UCS-4 NA 32 Bit
UTF-8 (Unicode Transformation Format) is mapping algorithm : 1 char 1-n Octets Memory usage depend on different languages e.g.English 1 Byte/CharGreek/Russian/Arabian/Hebrew 1,7 Byte/CharOther European languages 1,1 Byte/CharChinese/Japanese/Hindi/Korean 3 Byte/Char
UTF161 Char 1-n 16BitGroupsUTF-16 is the standard for Unicode.
UCS-2 (Universal Multiple-Octet Coded Character Set) Superseded by UTF16
UTF8CCSID: 1208Data TYPE : CHARUTF16CCSID: 1200 (or 13488)
Data TYPE: Graphic
CEC2011
Unicode
CEC2011
Unicode
Remember…5250 Screen 1 CS – NO UNICODE Allowed
But…
CEC2011
Unicode – i access for WEB
Russian
English
Chinese
CEC2011
iSeries Navigator and Unicode
CEC2011
• Unicode - enabled softwareWebsphereLotus DominoDB2 UDBIFSWeb browsersXMLJava
• I5/OS components not Unicode enabled QSYS library systemOS/400 message filesPersonalCOMmunication
Unicode - enabled software
CEC2011
USER Interface
• DDS-5250
• JDBC-ODBC-WEB– Rewrite apps
CEC2011
RPG and Unicode
Default: CCSID 13488
If you need CCSID 1200
Unicode
CEC2011
RPG and Unicode
Very Easy!Remember:Char and Unicode :Different weight
CEC2011
CCSID to CCSID
• LF support
• iconv()
CEC2011
Something about IFS
• Table fields have a CCSID Tag
• Stream File in IFS has CCSID Tag
• Stream File in other system doesn’t
CEC2011
Something about IFS
How to translate correctly?
UTF16 BE
UTF16 LE
CEC2011
Something about IFS
BOM – Byte order markfirst bytes of stream file
UTF16 BE
UTF16 LE
CEC2011
Something about IFS
CEC2011
Something about IFS
Iconv()
CEC2011
Something about IFS
• Table fields have a CCSID Tag
• Stream Files in IFS have CCSID Tag
• Stream Files in other system don’t
• Stream files have BOM
• Table columns don’t
CEC2011
php
Means:php does not FULL support UTF-
16
CEC2011
php – setup UTF8
CEC2011
php – setup UTF8
Column DESCR CCSID 1208/13400/1200
Read correctly from 1208, 1200, 13488Write correctly from phpvars to 1208
CEC2011
php
CEC2011
Globalization guidelines
• User interface• messages, dialog boxes, online manuals, audio
output, animations, windows, help text, tutorials, diagnostics, clip art, icons, and any presentation control that is necessary to convey information to users
• Culture and conventions• Date and time, Address, Numeric shapes, Numeric
Values
• Product structure
CEC2011
User Interface
Variable Order
IconsAvoid text in icons.Avoid internationally recognized symbols in icons. (e.s. star6, cross/plus sign)Avoid the use of national flags in icons.
Line break rulesYou cannot use Latin script-based text formatting algorithms for Chinese/Japanese
CEC2011
Culture and conventions
CalendarAllow the user to select the calendar and calendar format.Be prepared to adapt to other calendar requirements.
CEC2011
Culture and conventions
Date and Time
Country FormatRussia 08 sen. 1994 g.The Netherlands 08 september 1994Bulgaria 1994-IX-08Arabic countries 08/09/94Germany 8.9.1994Iran 1373/6/17Islamic lunar 1415/4/2Israel 3 Trishrey 5755
Country FormatCanada 2.00 p.Canada (Québec) 14 hItaly 14.00Sweden kl 14.00USA 2.00 p.
CEC2011
Culture and conventions
Timezones
Time zones and daylight savings time (DST) affect time stamps.
There are some 3part products (e.g. TZN/400)
I5 system values doesn’t support different TZ
LPAR can be a solution
You can write our routine: offset can depend from the user, the InfoSystem… (Before trigger)
CEC2011
Culture and conventions
Paper SizesLetter, A4…
Cardinal number shape
Numeric ValuesNegative numbers format Decimal and thousands separators
Monetary AmountCountry FormatUS $12,345.67US USD 12,345.67Denmark kr 12.345,67France 12 345,67 €Portugal 12.345$67 €
CEC2011
Culture and conventions
Measurement systemMiles, inches, km, °C, °F….
First day of week
AddressFields, Labels, presentation order
Telephone formats+ - . numbers
CEC2011
Product structure
Isolating culture and language sensitive parts• easy to change
Write one set of application source code that will work correctly, without modification, in each of the required countries or regions.
CEC2011
TNX