objectstudio for unicode alexander augustin getting ready for global markets

Post on 01-Apr-2015

227 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

ObjectStudio for Unicode

Alexander Augustin

Getting ready for global markets

OverviewOverview

Problem description

History of character sets and Encoding

Goals and approach

Features and technologies

Limitations

Conclusions

ObjectStudio 6.9.1ObjectStudio 6.9.1

ObjectStudio is an integrated Smalltalk environment for the Windows platform

Access to most common Windows services and database systems, like DLL functions, COM, ODBC, Oracle …

It’s Smalltalk – so almost anything is possible – except easy localization and processing multilingual data.

ObjectStudio 6.9.1 in a Unicode WorldObjectStudio 6.9.1 in a Unicode World

ObjectStudio(ANSI/OEM)

Operating System(Unicode)

Other programs(Unicode)

Data sources(Unicode)??

Go Multilingual!Go Multilingual!

Applications in a global market must represent texts and names of Eastern Europe and Asia.

User interfaces must be localizable

Offer capabilities of handling multilingual Data

Must be supported by the runtime environment and the development system

Screenshot: Japanese Version of Microsoft Word

ObjectStudio 6.9.1ObjectStudio 6.9.1

Supports:

ANSI (CP1252) and OEM (CP850)

8 Bit characters

Adequate for:

Writing source code

Creating English UIs

Processing English text files

Accessing databases withEnglish texts Screenshot: ObjectStudio 6.9.1 Environment

OverviewOverview

Problem description

History of character sets and Encoding

Goals and approach

Features and technologies

Limitations

Conclusions

The history of character setsThe history of character sets

Punch card – late 18th century

Enhanced by Holerith (patented 1890)

5 channel punch tape – 19th century

25 = 32, not enough for 26 letters + 10 digits

Solution: shift key as prefix state shift

8 channel punch tape – mid 20th century

7 bit US-ASCII + parity

No support for umlauts

VT220 terminal invents ISO8859-L1 - 1975

Similar to Microsoft codepage 1252

Many character encodings for many languages

EBCDIC, KOI8, ShiftJIS, …

UnicodeUnicode

Unicode - a standard defined by the Unicode consortium.

Unicode assigns a unique number (code point) to each glyph

Version 4.0.0 reserves more than 1.000.000 code points

Several transformation formats for binary representation of Unicode code points

UCS-2 (2Bytes/char), UTF-8 (1-4 bytes/char), UTF-16 (2/4 bytes/char)

UnicodeUnicode

World-wide unification effort for all characters of the world

Supported by all major vendors!

The solution for ObjectStudio!

EncodingEncoding

Character CodeBinary

representation

Transforming characters into their binary representation in another encoding

One main problem when accessing external data sources

Distinguish between specialized encodings and Unicode

Byte EncodingsByte Encodings

Differ in the value that represents a character in the encoding

Do not differ in the binary format of the code ( always 1 Byte)

Decimal value/Binary hexadecimal representation

Encoding\character Ö €

CP1252 214/D6 128/80

CP852 153/99 --

ISO8859-L15 214/D6 164/A4

Character Code Binary representation

Unicode EncodingsUnicode Encodings

Do not differ in the value (Code Point) that is assigned to a character

Differ in the binary format of the value

Character Code Point Binary representation

Hexadecimal binary representation

UTF\character Ö (Code Point 214) € (Code Point 8364)

UCS-2 (little-endian) D6 00 AC 20

UTF-8 C3 96 E2 82 AC

GoalsGoals

1. Enable Unicode!Extend encoding capabilities

Provide native multilingual IO support

2. Extend external access featuresAdd Unicode file access

Add Unicode database access

ChangesChanges

Create a Unicode VMMake ObjectStudio a native Windows Unicode application

Adapted class libraryMake Smalltalk String/Symbol Objects 16bit Unicode strings (UCS-2)

Add encodings

External interfaces and resourcesC Calls

Unicode File access

Database access (ODBC, OCI)

Stream EncodingStream Encoding

Ported from VisualWorks

Use StreamEncoders and CharacterEncoders that „know“ the encoding

Can be applied to any kind of stream with a byte-like buffer to encode or decode data

EncodedStreamEncodedStream

StreamStream

StreamEncoderStreamEncoder

BufferBuffer

CharacterCharacterEncoderEncoder

CharacterEncoderCharacterEncoder

StreamEncoderStreamEncoder

Stream EncodingStream Encoding

EncodedStreamEncodedStream

StreamStream

BufferBuffer

Character

Code

Binary representation

StreamEncoding use casesStreamEncoding use cases

Accessing external services and storages without UCS-2 support (e.g. ANSI C calls)

Examples

Access to databases without UCS-2 support

Calling ANSI DLL functions without UCS-2 support

String transfer via TCP/IP

Access to text files with foreign encodings

Text file accessText file access

Read/write access to any kind of text fileUTF8, UTF16, UCS-2 little-endian, … CP1252 (Windows ANSI) CP850 (Windows OEM)And Many more

Using EncodedStreams and NewFileStreams

Example: read UTF-8 encoded file

| fileStream encoder encodedStream result |fileStream := NewFileStream file: ‘example.txt’ mode: #binary onError:

[ self error: ‘could not open file’ ].encoder := StreamEncoder new: #utf8.encodedStream := EncodedStream on: fileStream encodedBy: encoder.result := encodedStream upToEnd.encodedStream close

External Database AccessExternal Database Access

Supported Unicode database interfaces

ODBC

OCI (ORACLE Call Interface)

Features

Native access to Unicode data sources

No application modifications needed

Requirements

ODBC: Version 3.5

OCI: OCI Client Version 9.0.1 (9i) or higher

LimitationsLimitations

Source files continue to be OEM encoded

Store Unicode text data in text files or external databases

UIs sources can‘t contain Unicode strings

Use external files/databases to store Unicodedata for localizing UIs

Planned to implement some localization support

Implicit conversions between Strings and ByteArrays cannot be supported

Use encoded streams or #asByteArrayEncoding:

LimitationsLimitations

Image files are not compatible

Compile class files and create new images

ConclusionConclusion

ObjectStudio Unicode

Operating System(Unicode)

Other programs(Unicode)

Data sources(Unicode)

AvailabilityAvailability

ObjectStudio 7.0 for Unicode is available to the new CINCOM Smalltalk CD together with VisualWorks 7.3

Contact Information

Email: Alexander.Augustin@heeg.de

We provide project support to internationalize your ObjectStudio application

Georg Heeg eKBaroper Str. 337D-44227 DortmundTel: +49-231-97599-0Fax: +49-231-97599-20

Georg Heeg AGSeestr. 131CH-8027 ZürichTel: +41-848-433424

Georg Heeg eKMühlenstr. 19D-06366 KöthenTel: +49-3496-214 328Fax: +49-3496-214 712

Email: info@heeg.dehttp://www.heeg.de

2004 Cincom Systems, Inc. All Rights Reserved

Developed in the U.S.A.CINCOM, , and The World’s Most Experienced Software Company are trademarks or registered trademarks

of Cincom Systems, Inc

All other trademarks belong to their respective companies.

top related