ObjectStudio for Unicode
Alexander Augustin
Getting ready for global markets
OverviewOverview
Problem description
History of character sets and Encoding
Goals and approach
Features and technologies
Limitations
Conclusions
ObjectStudio 6.9.1ObjectStudio 6.9.1
ObjectStudio is an integrated Smalltalk environment for the Windows platform
Access to most common Windows services and database systems, like DLL functions, COM, ODBC, Oracle …
It’s Smalltalk – so almost anything is possible – except easy localization and processing multilingual data.
ObjectStudio 6.9.1 in a Unicode WorldObjectStudio 6.9.1 in a Unicode World
ObjectStudio(ANSI/OEM)
Operating System(Unicode)
Other programs(Unicode)
Data sources(Unicode)??
Go Multilingual!Go Multilingual!
Applications in a global market must represent texts and names of Eastern Europe and Asia.
User interfaces must be localizable
Offer capabilities of handling multilingual Data
Must be supported by the runtime environment and the development system
Screenshot: Japanese Version of Microsoft Word
ObjectStudio 6.9.1ObjectStudio 6.9.1
Supports:
ANSI (CP1252) and OEM (CP850)
8 Bit characters
Adequate for:
Writing source code
Creating English UIs
Processing English text files
Accessing databases withEnglish texts Screenshot: ObjectStudio 6.9.1 Environment
OverviewOverview
Problem description
History of character sets and Encoding
Goals and approach
Features and technologies
Limitations
Conclusions
The history of character setsThe history of character sets
Punch card – late 18th century
Enhanced by Holerith (patented 1890)
5 channel punch tape – 19th century
25 = 32, not enough for 26 letters + 10 digits
Solution: shift key as prefix state shift
8 channel punch tape – mid 20th century
7 bit US-ASCII + parity
No support for umlauts
VT220 terminal invents ISO8859-L1 - 1975
Similar to Microsoft codepage 1252
Many character encodings for many languages
EBCDIC, KOI8, ShiftJIS, …
UnicodeUnicode
Unicode - a standard defined by the Unicode consortium.
Unicode assigns a unique number (code point) to each glyph
Version 4.0.0 reserves more than 1.000.000 code points
Several transformation formats for binary representation of Unicode code points
UCS-2 (2Bytes/char), UTF-8 (1-4 bytes/char), UTF-16 (2/4 bytes/char)
UnicodeUnicode
World-wide unification effort for all characters of the world
Supported by all major vendors!
The solution for ObjectStudio!
EncodingEncoding
Character CodeBinary
representation
Transforming characters into their binary representation in another encoding
One main problem when accessing external data sources
Distinguish between specialized encodings and Unicode
Byte EncodingsByte Encodings
Differ in the value that represents a character in the encoding
Do not differ in the binary format of the code ( always 1 Byte)
Decimal value/Binary hexadecimal representation
Encoding\character Ö €
CP1252 214/D6 128/80
CP852 153/99 --
ISO8859-L15 214/D6 164/A4
Character Code Binary representation
Unicode EncodingsUnicode Encodings
Do not differ in the value (Code Point) that is assigned to a character
Differ in the binary format of the value
Character Code Point Binary representation
Hexadecimal binary representation
UTF\character Ö (Code Point 214) € (Code Point 8364)
UCS-2 (little-endian) D6 00 AC 20
UTF-8 C3 96 E2 82 AC
GoalsGoals
1. Enable Unicode!Extend encoding capabilities
Provide native multilingual IO support
2. Extend external access featuresAdd Unicode file access
Add Unicode database access
ChangesChanges
Create a Unicode VMMake ObjectStudio a native Windows Unicode application
Adapted class libraryMake Smalltalk String/Symbol Objects 16bit Unicode strings (UCS-2)
Add encodings
External interfaces and resourcesC Calls
Unicode File access
Database access (ODBC, OCI)
Stream EncodingStream Encoding
Ported from VisualWorks
Use StreamEncoders and CharacterEncoders that „know“ the encoding
Can be applied to any kind of stream with a byte-like buffer to encode or decode data
EncodedStreamEncodedStream
StreamStream
StreamEncoderStreamEncoder
BufferBuffer
CharacterCharacterEncoderEncoder
CharacterEncoderCharacterEncoder
StreamEncoderStreamEncoder
Stream EncodingStream Encoding
EncodedStreamEncodedStream
StreamStream
BufferBuffer
Character
Code
Binary representation
StreamEncoding use casesStreamEncoding use cases
Accessing external services and storages without UCS-2 support (e.g. ANSI C calls)
Examples
Access to databases without UCS-2 support
Calling ANSI DLL functions without UCS-2 support
String transfer via TCP/IP
Access to text files with foreign encodings
Text file accessText file access
Read/write access to any kind of text fileUTF8, UTF16, UCS-2 little-endian, … CP1252 (Windows ANSI) CP850 (Windows OEM)And Many more
Using EncodedStreams and NewFileStreams
Example: read UTF-8 encoded file
| fileStream encoder encodedStream result |fileStream := NewFileStream file: ‘example.txt’ mode: #binary onError:
[ self error: ‘could not open file’ ].encoder := StreamEncoder new: #utf8.encodedStream := EncodedStream on: fileStream encodedBy: encoder.result := encodedStream upToEnd.encodedStream close
External Database AccessExternal Database Access
Supported Unicode database interfaces
ODBC
OCI (ORACLE Call Interface)
Features
Native access to Unicode data sources
No application modifications needed
Requirements
ODBC: Version 3.5
OCI: OCI Client Version 9.0.1 (9i) or higher
LimitationsLimitations
Source files continue to be OEM encoded
Store Unicode text data in text files or external databases
UIs sources can‘t contain Unicode strings
Use external files/databases to store Unicodedata for localizing UIs
Planned to implement some localization support
Implicit conversions between Strings and ByteArrays cannot be supported
Use encoded streams or #asByteArrayEncoding:
LimitationsLimitations
Image files are not compatible
Compile class files and create new images
ConclusionConclusion
ObjectStudio Unicode
Operating System(Unicode)
Other programs(Unicode)
Data sources(Unicode)
AvailabilityAvailability
ObjectStudio 7.0 for Unicode is available to the new CINCOM Smalltalk CD together with VisualWorks 7.3
Contact Information
Email: [email protected]
We provide project support to internationalize your ObjectStudio application
Georg Heeg eKBaroper Str. 337D-44227 DortmundTel: +49-231-97599-0Fax: +49-231-97599-20
Georg Heeg AGSeestr. 131CH-8027 ZürichTel: +41-848-433424
Georg Heeg eKMühlenstr. 19D-06366 KöthenTel: +49-3496-214 328Fax: +49-3496-214 712
Email: [email protected]://www.heeg.de
2004 Cincom Systems, Inc. All Rights Reserved
Developed in the U.S.A.CINCOM, , and The World’s Most Experienced Software Company are trademarks or registered trademarks
of Cincom Systems, Inc
All other trademarks belong to their respective companies.