unicode system - outside communication for abap programmers

44
Unicode System: Outside Communication for ABAP Programmers Dr. Christian Hansen Server Technology Internationalization SAP AG

Upload: rafael-riso

Post on 18-Dec-2015

64 views

Category:

Documents


5 download

DESCRIPTION

Unicode System - Outside Communication for ABAP Programmers

TRANSCRIPT

  • Unicode System: Outside Communication for ABAP Programmers

    Dr. Christian HansenServer Technology Internationalization SAP AG

  • 2003 SAP AG, Unicode Outside Communication, Christian Hansen 2

    Contents

    Introduction About Code Pages Communication: The Ideal Picture Communication: The Reality

    Part I RFC Unicode Unicode Unicode single code page system Unicode MDMP system

    Part II File transfer Writing and reading files on the application server Writing and reading files on the front end

    Part III Common mistakes

    Exercises

  • 2003 SAP AG, Unicode Outside Communication, Christian Hansen 3

    About Code Pages: Conventional Code Pages

    Disadvantages of old standard code pages Each covers only a subset of all characters used Incompatibilities between different codepages Only restricted data exchange possible Too many of them

    CanonKYOCERA

    APPLE

    IS0-9

    IS0-2IS0-3

    IS0-5

    12IS0-71250

    1251

    1252

    HPIBM

    IS0-9

    IS0-2IS0-3

    EBCDIC

    12IS0-7

    697/0277697/

    05001252

    1256

    IS0-2IS0-3

    1257

    1254

    12501251

    1252

    Mircosoft

    ASCII

    BIG-5

    SJISIS0-9

    IS0-2IS0-3

    IS0-5

    IS0-6IS0-7IS0-8

    IS0-4

    IS0-1

    BIG-5

    SJISIS0-9

    IS0-2IS0-3

    IS0-5

    IS0-6IS0-7IS0-8

    IS0-4

    IS0-1 SAP: Languages: 41

    Characters: 22,378

    Code Pages: 390

  • 2003 SAP AG, Unicode Outside Communication, Christian Hansen 4

    Solution: Unicode, one Code Page for all Scripts

    English

    German

    Turkish

    DanishDutch,FinnishFrench, ItalianNorwegianPortugueseSpanish

    Swedish

    CroatianCzechHungarianPolish

    RumanianSlovakian

    Slovene

    RussianUkrainian

    Greek

    Hebrew

    Thai

    Korean

    Japanese Chinese

    Taiwanese

    Icel

    andi

    c

    And morelanguagescan besupportedeasilywithout the

    need fornew codepages orother newmethods

  • 2003 SAP AG, Unicode Outside Communication, Christian Hansen 5

    Solution: Unicode charactersASCIIGeneral Scripts

    Symbols

    CJK Ideographs

    Hangul

    Compatibility

    Surrogate Area

    65,000 characters

    Additional 1,000,000 characters

  • 2003 SAP AG, Unicode Outside Communication, Christian Hansen 6

    E3 91 B979 3434 79U+3479

    CE B1B1 0303 B1U+03B1C3 A4E4 0000 E4U+00E46161 0000 61U+0061a

    UTF-8UTF-16little endian

    UTF-16big endian

    Unicodescalar value

    Character

    Representation of Unicode Characters

    UTF-16 Unicode Transformation Format, 16 bit encoding Fixed length, 1 character = 2 bytes (surrogate pairs = 2 + 2 bytes) Platform-dependent byte order (big/little endian) 2 byte alignment restriction

    UTF-8 Unicode Transformation Format, 8 bit encoding Variable length, 1 character = 1...4 bytes Platform independent no alignment restriction 7 bit US ASCII compatible

  • 2003 SAP AG, Unicode Outside Communication, Christian Hansen 7

    Communication: The Ideal Picture

    The ideal Picture: only Unicode components

    R/3 Enterprise

    3rd Party

    mySAP BW

    R/3 Enterprise

    FilesInternet

    Conversions are done algorythmically (1:1 relation)

    No data misinterpretation

    No data loss All business relevant

    characters available at the same time

    ...

  • 2003 SAP AG, Unicode Outside Communication, Christian Hansen 8

    Communication: Reality

    R/3 4.6C

    3rd PartyEBCDIC

    mySAP BWISO8859-1

    R/3 Enterprise

    BIG-5SJIS

    IS0-8IS0-1 1251

    IS0-9

    IS0-2IS0-3IS0-7

    697/0277

    697/0500

    1252

    Files

    ISO8859-1SJIS

    ...charset=iso-8859-1" >...charset=windows-1257" >

    ...charset=utf-8" >...charset=Shift_JIS" >

    Internet

    The reality: Unicode and non-Unicode components

    Conversions between incompatible code pages everywhere

    Only common subset exchangeable

    Special rules have to be obeyed to make communication possible

    ...

  • 2003 SAP AG, Unicode Outside Communication, Christian Hansen 9

    Contents

    Introduction About Code Pages The Ideal Picture Reality

    Part I RFC Unicode Unicode Unicode single code page system Unicode MDMP system

    Part II File transfer Writing and reading files on the application server Writing and reading files on the front end

    Part III Common mistakes

    Exercises

  • 2003 SAP AG, Unicode Outside Communication, Christian Hansen 10

    RFC Unicode Unicode

    R/3 Enterprise R/3 Enterprise

    In case of an Unicode Unicode combination RFC passes all character data without code page conversion or merely with adaption of theendianness.

    UTF-16 big endian = SAP code page 4102 UTF-16 little endian = SAP code page 4103

    Information about the destination is maintained in SM59 special options character width in target system

    1 Byte = non-Unicode 2 Byte = Unicode

  • 2003 SAP AG, Unicode Outside Communication, Christian Hansen 11

    RFC Unicode non-Unicode single code page

    R/3 4.6CISO8859-1

    R/3 Enterprise

    In case of an Unicode non-Unicode single code page combination, RFC passes all character data with code page conversion between Unicode and the old code page.

    As Unicode is a true superset of any old standard codepage not all Unicode characters can be transfered to the non-Unicode system:

    # # # #

  • 2003 SAP AG, Unicode Outside Communication, Christian Hansen 12

    RFC Unicode non-Unicode MDMP

    R/3 4.6CISO8859-1

    SJIS

    R/3 Enterprise

    In case of an Unicode non-Unicode MDMP combination RFC passes all character data with code page conversion between Unicode and the different old code pages.

    Which of the MDMP code pages is choosen depends on the language:

    DE DE JA JA

    JA # JA #

  • 2003 SAP AG, Unicode Outside Communication, Christian Hansen 13

    RFC Unicode non-Unicode MDMP

    Excursion: Difference between flat and deep data types

    Flat: C, N, D, T, X, I, F, P and any structure consisting only of these fields

    Deep: STRING, XSTRING, table types, object references and any structure containing one of these types

    Deep data types are transferred using an UTF-8 encoded XML format (XRFC).

  • 2003 SAP AG, Unicode Outside Communication, Christian Hansen 14

    RFC Unicode non-Unicode MDMP

    Excursion: Difference between flat and deep data types

    Detailed conversion paths:

    Deep data: Unicode XML UTF-8 target code pageFlat data: Unicode target code page

    Deep data: Unicode XML UTF-8 source code pageFlat data: Unicode source code page

    Unicode system Non-Unicode system

    non-Unicode compatible source code page

    non-Unicode compatible target code page

  • 2003 SAP AG, Unicode Outside Communication, Christian Hansen 15

    RFC Unicode non-Unicode MDMP

    Deriving code pages a) : Data without language key

    Example: Flat data, logon language German

    Logon = DE Logon = DE #

    Source system

    Data type

    Source code page

    Intermediate format *

    Target code page

    Unicode Flat

    Unicode

    Logon language source system * * Logon language

    target systemDeep UTF-8 based XML

    non-Unicode

    Flat Logon language source systemLogon language source system

    UnicodeDeep UTF-8 based XML

    * XML / non-Uniocde compatible code page* * You may switch to Logon language target system using RFC bit option 0x200 at SM59 Special options RFC Bit Options

    SY-LANGU source system

  • 2003 SAP AG, Unicode Outside Communication, Christian Hansen 16

    RFC Unicode non-Unicode MDMP

    Deriving code pages b) : Data (flat) with language key

    Flat Structures containing a language key (domain SPRAS, DDIC data type LANG) and maintained text language flag have a special handling:

    Automatic language code page assignment is done during RFC for each row independent of logon language.

    This enables sending and and receiving tables from MDMP systems (different code pages for each row):

    Logon = DE / Lang key = DE Logon = DE / Lang key = JA

    Maintain language codepage assignment with SM59 Maintain text language flag with SE11

  • 2003 SAP AG, Unicode Outside Communication, Christian Hansen 17

    Maintain RFC destination SM59: MDMP settings

  • 2003 SAP AG, Unicode Outside Communication, Christian Hansen 18

    SE11: Maintain text language

  • 2003 SAP AG, Unicode Outside Communication, Christian Hansen 19

    Contents

    Introduction About Code Pages The Ideal Picture Reality

    Part I RFC Unicode Unicode Unicode single code page system Unicode MDMP system

    Part II File transfer Writing and reading files on the application server Writing and reading files on the front end

    Part III Common mistakes

    Exercises

  • 2003 SAP AG, Unicode Outside Communication, Christian Hansen 20

    File transfer: Application server

    Pattern for writing/reading files on the application server:

    OPEN DATASET IN MODETRANSFER/READCLOSE DATASET

    :

    BINARY MODEUninterpreted sequence of bytes.

    TEXT MODE ENCODING UTF-8 / NON-UNICODE / DEFAULTPure unstructured text data. DEFAULT equals UTF-8 in Unicodesystems and NON-UNICODE in non-Unicode systems.

    LEGACY TEXT/BINARY MODEProduces an format compatible to non-Unicode systems. Text data is always written in NON-UNICODE format. Not character-like structures are allowed. The only difference between TEXT and BINARY is, that in case of TEXT an EOF (END OF FILE) marker is added.

  • 2003 SAP AG, Unicode Outside Communication, Christian Hansen 21

    File transfer: Application server

    Code page selection NON-UNICODE:

    If during data transfer a Unicode non-Unicode conversion is neccessary, the non-Unicode code page is derived from the currentsystem language SY-LANGU, which may be changed by using SET LOCALE LANGUAGE .

    Advantages and disadvantages for data exchange: BINARY. Not a good exchange format in itself. Use this for

    writing/reading prepared data of well known format (e.g. XML /UTF-8 as XSTRING) or use for write/read on the same application server.

    TEXT MODE: UTF-8 is a good exchange format. Structures may not be transfered as a whole. Only single fields.

    LEGACY MODES: Only for reading or writing non-Unicode data. Structure and code page information is considered.

  • 2003 SAP AG, Unicode Outside Communication, Christian Hansen 22

    File transfer: Application server

    Example 1: BINARY MODE

    R/3 Enterprise R/3ISO8859-1

    SJIS11008000

    BINARY MODE

    BINARY MODELEGACY BINARY MODE

    SY-LANGU

  • 2003 SAP AG, Unicode Outside Communication, Christian Hansen 23

    File transfer: Application server

    Example 2: TEXT MODE UTF-8

    R/3 Enterprise R/3ISO8859-1

    SJISTEXT MODE UTF-8 TEXT MODE UTF-8

    SY-LANGU

    TEXT MODE UTF-8 TEXT MODE UTF-8

    SY-LANGU

    Full charset supported (no data loss in the file) Structured data as a whole write field by field =

  • 2003 SAP AG, Unicode Outside Communication, Christian Hansen 24

    File transfer: Application server

    Example 3: TEXT MODE NON-UNICODE

    R/3 Enterprise R/3ISO8859-1

    SJIS

    SY-LANGU

    TEXT MODE NON-UNICODE

    SY-LANGU

    TEXT MODE NON-UNICODE

    TEXT MODE NON-UNICODE

    TEXT MODE NON-UNICODE1100

    8000

    1100

    8000

    Full charset supported (no data loss in the file) Structured data as a whole write field by field =

  • 2003 SAP AG, Unicode Outside Communication, Christian Hansen 25

    File transfer: Application server

    Example 4: TEXT MODE DEFAULT

    R/3 Enterprise R/3ISO8859-1

    SJIS

    SY-LANGU

    TEXT MODE DEFAULT

    SY-LANGU

    1100

    8000

    TEXT MODE NON-UNICODE

    TEXT MODE DEFAULT

    TEXT MODE UTF-8

  • 2003 SAP AG, Unicode Outside Communication, Christian Hansen 26

    File transfer: Application server

    Example 5: LEGACY TEXT/BINARY MODE

    R/3 Enterprise R/3ISO8859-1

    SJIS

    SY-LANGU

    LEGACY TEXT/BINARY MODE

    SY-LANGU

    1100

    8000

    1100

    8000

    LEGACY TEXT/BINARY MODE

    LEGACY TEXT/BINARY MODE

    LEGACY TEXT/BINARY MODE

    Full charset supported (no data loss in the file) Structured data

  • 2003 SAP AG, Unicode Outside Communication, Christian Hansen 27

    File transfer: Using XML

    Using XML as transport format

    Use CALL TRANSFORMATION with target data type XSTRING to create an UTF-8 based XML representation of your data.

    Structure information(no layout / alignment problems)

    UTF-8 based (no data loss)

    Transport in binaryform

  • 2003 SAP AG, Unicode Outside Communication, Christian Hansen 28

    File transfer: Application server

    Example 6: UTF-8 based XML + BINARY MODE

    R/3 Enterprise R/3ISO8859-1

    SJIS

    SY-LANGU

    CALL TRANSFORMATION+ BINARY MODE

    BINARY MODE +CALL TRANSFORMATION

    CALL TRANSFORMATION+ BINARY MODE

    SY-LANGU

    BINARY MODE +CALL TRANSFORMATION

    Full charset supported (no data loss in the file) Structured data

  • 2003 SAP AG, Unicode Outside Communication, Christian Hansen 29

    File transfer: Frontend

    File transfer at the frontend with GUI_UP/DOWNLOAD

    The function modules GUI_/UPDOWNLOAD convert data into textual representation. Structures are allowed.

    Determination of the outside code page:

    Front end code page matching to the current system code page (SY-LANGU, SET LOCALE LANGUAGE)

    Declared explicitly with optional parameter CODEPAGE (Starting with release 6.20 SP 21).

    It is planned to provide in cl_gui_frontend_services=>file_open/save_dialogthe possibility to select from different frontend code pages (e.g. in the Unicode system you may select old standard code pages rather than using the standard frontend cp UTF-8 or later UTF-16).

  • 2003 SAP AG, Unicode Outside Communication, Christian Hansen 30

    Overview: RFC and File transfer

    RFC and file transfer from a Unicode systems perspective

  • 2003 SAP AG, Unicode Outside Communication, Christian Hansen 31

    Contents

    Introduction About Code Pages The Ideal Picture Reality

    Part I RFC Unicode Unicode Unicode single code page system Unicode MDMP system

    Part II File transfer Writing and reading files on the application server Writing and reading files on the front end

    Part III Common mistakes

    Exercises

  • 2003 SAP AG, Unicode Outside Communication, Christian Hansen 32

    Common mistakes: overview

    Things you should never do!

    Type hiding Missing language key Wrong length assumptions Sending data that is not in the receivers codepage ...

  • 2003 SAP AG, Unicode Outside Communication, Christian Hansen 33

    Common mistakes: Type hiding: binary data

    Don't hide types 1If you conceal the true types from the system the system cannot anything for you. As a consequence, data may, for example, be subject to unwanted codepage conversions.

    Example: Transporting binary data in character containers

  • 2003 SAP AG, Unicode Outside Communication, Christian Hansen 34

    Common mistakes: Type hiding: characterlike data

    Don't hide types 2

    Even sending a pure characterlike structure in a character container conceals important information the field boundaries from the system.

    Example: Transporting characterlike data in character containers

  • 2003 SAP AG, Unicode Outside Communication, Christian Hansen 35

    0 0 F F 0 0

    Common mistakes: Type hiding: characterlike data

    Workaround if container approach cannot be changed

    Use CL_NLS_STRUC_CONTAINER to correct the implicit layout:

    NAME RGB Value

    0 0 F F 0 0

    0 0 F F 0 0

    0 0 F F 0 0Unicode system

    Non-Unicode system

    RFC

    struc_to_cont

    cont_to_strucstruc_to_cont

    cont_to_struc

    Data container

  • 2003 SAP AG, Unicode Outside Communication, Christian Hansen 36

    Common mistakes: Missing language key

    Always use language keysIn principle you must not send any data without language key if the data contains non 7 bit ASCII characters. Otherwise corruption of the data is the result.

    Example: Sending non Latin 1 data without language key by RFC with German logon

  • 2003 SAP AG, Unicode Outside Communication, Christian Hansen 37

    Common mistakes: Wrong length assumptions

    Problems with length assumptionsString lengths are not invariant under code page conversions. This may leadto different problems:

    In a Unicode system a character field of certain length can hold more characters than the same character field in a non-Unicode system. Sending such data will result in data loss ().

  • 2003 SAP AG, Unicode Outside Communication, Christian Hansen 38

    Common mistakes: Wrong length assumptions

    Problems with length assumptions (continued)

    Breaking a string into a table of fixed line size and sending the table from a non-Unicode to a Unicode-system does not work, since the information about the occupied length is lost and subsequent reassembling into a string will insert unwanted spaces ().

  • 2003 SAP AG, Unicode Outside Communication, Christian Hansen 39

    Common mistakes: data not in receivers codepage

    Data not in the receivers code page

    In general you must not send data from a source system into a targetsystem, if the characters send are not in the target systems code page. Especially dont send one of the characters that are only in the Unicode code page to an old-fashioned non-Unicode system:

    Try to send a white smiling face () or a black smiling face () or some beamed eigth notes () ! ( # )

  • 2003 SAP AG, Unicode Outside Communication, Christian Hansen 40

    Contents

    Introduction About Code Pages The Ideal Picture Reality

    Part I RFC Unicode Unicode Unicode single code page system Unicode MDMP system

    Part II File transfer Writing and reading files on the application server Writing and reading files on the front end

    Part III Common mistakes

    Exercises

  • 2003 SAP AG, Unicode Outside Communication, Christian Hansen 41

    Exercises

    Send single code page and MDMP data via RFC Type hiding and missing language keys:

    TECHED_UNICODE_EXERCISE_11/12/13/14 and15 Wrong length assumptions:

    TECHED_UNICODE_EXERCISE_16/18 Data not in the receivers code page:

    TECHED_UNICODE_EXERCISE_17

    Transfer data via file on the application server Writing files:

    TECHED_UNICODE_EXERCISE_19 Reading files:

    TECHED_UNICODE_EXERCISE_20

    Transfer data via file on the frontend Writing files:

    TECHED_UNICODE_EXERCISE_21 Reading files:

    TECHED_UNICODE_EXERCISE_22

  • 2003 SAP AG, Unicode Outside Communication, Christian Hansen 42

    Service Marketplace:Technical information: http://service.sap.com/Unicode@SAPCustomer contact: mail [email protected]

    Further Information

    Further Presentationshttp://service.sap.com/Unicode@SAP Unicode Technology Media Library:z Unicode Enabling ABAP Programs or

    ABAP Conversion SAP Tutorz Unicode Support in SAP Web Application Server

  • 2003 SAP AG, Unicode Outside Communication, Christian Hansen 43

    Q&A

    Questions?

  • 2003 SAP AG, Unicode Outside Communication, Christian Hansen 44

    No part of this publication may be reproduced or transmitted in any form or for any purpose without the express permission of SAP AG. The information contained herein may be changed without prior notice.

    Some software products marketed by SAP AG and its distributors contain proprietary software components of other software vendors.

    Microsoft, WINDOWS, NT, EXCEL, Word, PowerPoint and SQL Server are registered trademarks of Microsoft Corporation.

    IBM, DB2, DB2 Universal Database, OS/2, Parallel Sysplex, MVS/ESA, AIX, S/390, AS/400, OS/390, OS/400, iSeries, pSeries, xSeries, zSeries, z/OS, AFP, Intelligent Miner, WebSphere, Netfinity, Tivoli, Informix and Informix Dynamic ServerTM are trademarks of IBM Corporation in USA and/or other countries.

    ORACLE is a registered trademark of ORACLE Corporation. UNIX, X/Open, OSF/1, and Motif are registered trademarks of the Open Group. Citrix, the Citrix logo, ICA, Program Neighborhood, MetaFrame, WinFrame, VideoFrame, MultiWin and

    other Citrix product names referenced herein are trademarks of Citrix Systems, Inc.

    HTML, DHTML, XML, XHTML are trademarks or registered trademarks of W3C, World Wide Web Consortium, Massachusetts Institute of Technology.

    JAVA is a registered trademark of Sun Microsystems, Inc. JAVASCRIPT is a registered trademark of Sun Microsystems, Inc., used under license for technology invented

    and implemented by Netscape.

    MarketSet and Enterprise Buyer are jointly owned trademarks of SAP AG and Commerce One. SAP, SAP Logo, R/2, R/3, mySAP, mySAP.com and other SAP products and services mentioned herein as well as

    their respective logos are trademarks or registered trademarks of SAP AG in Germany and in several other countries all over the world. All other product and service names mentioned are trademarks of their respective companies.

    Copyright 2003 SAP AG. All Rights Reserved