unicode overview

15
IBM Global Services Unicode Overview | 12.01 March-2005 © 2005 IBM Corporation Unicode Overview

Upload: aniruddha-jha

Post on 07-Nov-2015

215 views

Category:

Documents


0 download

DESCRIPTION

Unicode Overview

TRANSCRIPT

IBM Presentations: Blue Pearl Basic templateDescribe what is Unicode
Identify the problems that we faced before having Unicode
Unicode Overview | 12.01
IBM Global Services
© 2005 IBM Corporation
What is Unicode?
Unicode Overview | 12.01
IBM Global Services
© 2005 IBM Corporation
What is Unicode?
Unicode is a Character encoding schema containing (almost) all characters used world wide.
It is a single code page incorporating all contemporary written languages and relevant technical disciplines (as well as many classical or historical texts of written languages) – 95,000+ characters to date.
Unicode Overview | 12.01
Code pages map text elements (letters, ideographs, symbols, dingbats, etc.) to the numeric value with which they are stored. The best known code page is US 7-bit ASCII, which is also incorporated in the 8-bit ISO-8859-1 (Latin 1) code page
But there are many code pages, because there are many written languages, as well as operating systems, data entry and display devices, and printers
Unicode is a single code page incorporating all contemporary written languages and relevant technical disciplines -95,000+ characters to date.
Each Unicode character has a unique number (called a “code point”) and name:
U+0041 A LATIN CAPITAL LETTER A
U+05D0 HEBREW LETTER ALEF
U+2F22 KANGXI RADICAL GO SLOWLY
The proper notation for a Unicode character is U+xxxx, where xxxx is a hexadecimal
number
Norwegian
Portuguese
Spanish
Swedish
Russian
Ukrainian
Greek
Hebrew
Thai
Korean
English
Japanese
Chinese
Taiwanese
A SAP system can run one of the ellipses at a time. In a standard setup, it is not possible to setup multiple code pages (multiple ellipses)
SAP is not supporting further setup of MDMP systems. SAP have implemented UNICODE which contains all the characters
For R/3, SAP delivered a workaround solution called MDMP (Multi Disc Multi Processor). But, if a user wants to work simultaneously using Japanese and Chinese data
impossible.
Other solutions (BW, CRM, APO etc) are not supported under MDMP !!
Unicode Overview | 12.01
A SAP system can run one of the Ellipses at a time
In a standard setup, it is not possible to setup multiple code pages (multiple ellipses)
English in this picture means US7ASCII
E.g. no support of the British £-sign except of the Western European cp
For R/3, SAP delivered a workaround solution called MDMP (Multi Disc Multi Processor)
Combination of code pages (ellipses)
The code page is derived from the logon language
When a user wants to work with Japanese data
he has to sign on with JA
When a user wants to work with Russian data
he has to sign on with RU
When a user wants to work simultaneously
using Japanese and Chinese data
impossible
Other solutions (BW, CRM, APO etc) are not supported under MDMP !!
SAP is not supporting further setup of MDMP systems
SAP have implemented UNICODE which contains all the characters
IBM Global Services
© 2005 IBM Corporation
Why Unicode ? (Contd.)
Single Code Page and MDMP approaches all have limitations and complications in terms of :
Number of different languages supported
Data integration
Handling of texts
Consolidation of data
A better Solution is Unicode. Almost all the languages can be supported and by having a single system running on Unicode, the problem of Data integration and consolidation reduces.
Unicode Overview | 12.01
IBM Global Services
© 2005 IBM Corporation
Special characters
‘Ø' = x'D8'
D8 is the code point
The proper notation for a Unicode character is U+xxxx, where xxxx is a hexadecimal number
Unicode Overview | 12.01
Character Representation in a Unicode codepage:
Each Unicode character has a unique number (called a “code point”) and name:
U+0041 A LATIN CAPITAL LETTER A
U+05D0 HEBREW LETTER ALEF
U+2F22 KANGXI RADICAL GO SLOWLY
The proper notation for a Unicode character is U+xxxx, where xxxx is a hexadecimal number
Unicode is encoded in three different Unicode Transformation Formats:
UTF-8 variable length, popular for HTML and similar protocols
UTF-16 fixed length, popular for storage on most application servers
UTF-32 too memory-intensive for most uses
SAP uses 390 code pages to support 41 languages, 22,378 characters! Code page definitions are in table TCP00. These approaches all have limitations and complications, in terms of number of languages supported, data integration, and what text can be entered and displayed correctly. A better solution is Unicode.
Tabelle1
0
1
2
3
4
5
6
7
8
9
A
B
C
D
E
F
0
1
2
SP
Upgrade to a Unicode system
A SAP system must be upgraded to a Non-Unicode system before it is converted to Unicode.
Upgrade from a Non-Unicode system to a Unicode system involves changes in :
Database conversion
ABAP changes
Memory and Disk space
In a multiple system landscape, SAP recommends converting one system at a time.
Unicode Overview | 12.01
In a Unicode System (US), character data types (C, N, D, T, STRING) are automatically treated as Unicode (X and XSTRING are not character types).
Accordingly, SAP did not create any Unicode data type.
Hence, (Non-Unicode System) NUS AND Unicode System (US) SHARE A SINGLE CODE SOURCE (the kernels are different, but the source code is the same).
IBM Global Services
© 2005 IBM Corporation
Unicode : General Information
In a Unicode System, The “Unicode checks active” Flag must be checked in the Attributes of ABAP Programs.
Unicode Overview | 12.01
In a Unicode System, if the “Unicode checks active” flag is not checked, it is a syntax error.
IBM Global Services
© 2005 IBM Corporation
SAP provided tools
SAP has provided with some transactions to check the ABAP source code for Unicode compliance:
Transaction UCCHECK – Reports the static errors in the ABAP source code.
Transaction SCOV – Runtime coverage analyzer
Useful Standard programs:
SWO_SET_UC_FLAG – Program to set the Unicode flag for BOR programs.
Unicode Overview | 12.01
Note that, these transactions or programs are never an exhaustive check for the Unicode compliance of an ABAP program.
Only after complete testing an ABAP program in a Unicode system successfully, the compatibility of the program can be ensured.
Note: The view maintenance dialogs that were generated in a Non-Unicode system may not be
Unicode-compatible. This maintenance dialogs can be regenerated using the program
RSVIMT_UC_VIEW_MAINT_GEN.
Matchcode IDs are not allowed in a Unicode system. The program TWTOOL01 checks if
there are any matchcode existing in the system.
Unicode check flag in program attribute for the Business object programs can be checked
using the program SWO_SET_UC_FLAG.
IBM Global Services
© 2005 IBM Corporation
The Transaction UCCHECK can also be used to set the Unicode flags in the attribute of ABAP programs
Unicode Overview | 12.01
After the Transaction UCCHECK is executed for a program and the result is displayed, select the program and click the ‘Set Unicode Attribute’ button. This way, by selecting more than one program (when checking multiple programs through UCCHECK), the Unicode flag can be checked for multiple programs simultaneously.
IBM Global Services
© 2005 IBM Corporation
Additional Information
The mapping of Unicode values for SAP characters can be seen from the table TCP01.
Detail information about Unicode standards and notifications can be availed at the Unicode Consortium: http://www.unicode.org . SAP is a member of this consortium.
Unicode Overview | 12.01
IBM Global Services
© 2005 IBM Corporation
Summary
Unicode is a Character encoding schema containing (almost) all characters used world wide
In a Unicode system, the ‘Unicode checks Active’ checkbox must be checked for the ABAP programs
UCCHECK reports the static Unicode errors in an ABAP program
Unicode Overview | 12.01
IBM Global Services
© 2005 IBM Corporation
Why do we need Unicode ?
What are the basic utility programs (and transactions) provided by SAP for a Unicode system?
Unicode Overview | 12.01
CÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏ