98mx17

40
1 MINPURAA: A PANDITHAM BASED MULTILINGUAL MAIL SERVER AND CLIENT J. Ramesh 98MX17 Guided by: Dr. P. Navaneethan

Upload: maheshwaran

Post on 25-Dec-2015

6 views

Category:

Documents


4 download

DESCRIPTION

viva

TRANSCRIPT

Page 1: 98MX17

1

MINPURAA: A PANDITHAM BASED

MULTILINGUAL MAIL SERVER AND CLIENT

J. Ramesh

98MX17

Guided by: Dr. P. Navaneethan

Page 2: 98MX17

2

System Configuration

Pentium II Processor (333 MHz)

128 MB RAM

2.1 GB HDD

Hardware:

Software:

Platform: Microsoft Windows 2000 Professional

JAVA 2

Kawa 3.22 (IDE for JAVA)

Macromedia Fontographer 4.1

High Logic Font Creator Programmer 2.1

Page 3: 98MX17

3

Motivation Existing Multilingual E-mail facilities support sending

E-mails in non-character oriented formats.

Most Multilingual products rely on fonts and not the

language.

User name and password are still provided only in

English.

The usage of E-mail is limited within the people who

know English.

‘To take IT to rural areas’ is the primary motivation.

Page 4: 98MX17

4

Requisites A character encoding scheme for processing

Multilingual strings; PANDITHAM is used.

A protocol for sending mails; SMTP (RFC 821) is used.

A protocol for Message formats; Internet Message

Formats (RFC 822) is used.

Region based 16-bit character oriented Fonts .

Page 5: 98MX17

5

Acronym for the Protocol for ApplicatioNs Development

In THAmizh and Multilingual computing.

It was developed and being improved by

Dr. P. Navaneethan

It gives life to individual characters in each language by

assigning unique values (in that language) to them.

Introduction:

PANDITHAM 4.0 - An overview

Page 6: 98MX17

6

PANDITHAM Principle …

Page 7: 98MX17

7

I Want ARUN!In English

Drop letters in1,18,21,14

1 2 .. 1814 2621.. .. .. 26.. ..1 2 .. 1814 21 ..

A1

R18

U21A

1A1A1A1A1A1A1

R18R18R18R18R18R18R18R18R18R18

U21U21U21U21U21U21U21U21U21U21U21

N14N14N14N14N14N14N14N14N14N14N14N14

1 2 .. 10050 247200.. .. ..

ENG

TAM

A1A1A1A1A1

R18

R18R18

R18

R18

R18

R18

R18

R18

U21

U21

U21

U21

U21

U21

U21

U21

U21

N14

ENGENGENGENG

Page 8: 98MX17

8

Now I Want `R]f

In Tamil

Dropletters in1,161,91

1 2 .. 11891 247161.. .. ..

1 2 .. 1814 2621.. .. ..

1 2 .. 11891 247161.. .. ..

` R]f`````

`

`

`

`

ENG

TAM

RRRR

R

RRRR

R

]f]f]f]f]f

]f]f

]f]f

]f

TAMTAMTAMTAMTAM

RRR]f]f]f

RRRRRRRRR

Page 9: 98MX17

9

Why Protocol ? PANDITHAM packs the 247 Thamizh characters into an 8 bit

space as follows:

` ~ ; : ... Oq # k ka ... ekq kf g ... W ...

e[q [f

9 10 11 12 20 21 22 23 33 34 35 65 254 255

The above lexical order has been referred from

tiRkfKbqf etqiv<Ar by Dr. M. v r t ra c [a af.

The value 65 stands for ‘A’ in ASCII and ‘W’ in Thamizh. To

resolve this ambiguity the machine needs a set of RULES.

Hence the Protocol.

The remaining nine characters (0-8) are used as PANDITHAM

Control and Punctuation characters.

Page 10: 98MX17

10

The escape character DLC (Default Language Code) is the first byte of a PANDITHAM

string and is followed by the code that is assigned to that language.

This is followed by a Monolingual String and if need be, it can switch to a different

language.

The Language Switching can be in 2 different ways:

Way 1: There are at least 2 characters in the new language, in which case once again the escape

character DLC is used.

Way 2: There is exactly only one character in the new language, in which case the escape

character MLC (Momentary Language Code) is used. This escape character conveys

that language switching is momentary in nature.

Language Codes:ASCII - 05h

THAMIZH1 - 08h

THAMIZH2 - 09h

TELUGU - 0Ah

KANNADA - 0Bh

MALAYALAM - 0Ch

The Rule

Page 11: 98MX17

11

Example:

String = maEtsfvr[f R

DLC TM1 ma Et MLC TM2 sf v r [f SP MLC ASC R

DLC TM1 8C 6B MLC TM2 E4 BF A5 FF SP MLC ASC 52 NULL

Length of the string : 15 bytes

Page 12: 98MX17

12

Modification of the protocol rules slightly, so as to accommodate Telugu, Kannada etc. in a better way; i.e., by making use of a scheme similar to the one followed by Japanese, namely, DBCS (Double Byte Coding Scheme).

DLC TEL U L U L … DLC TM1 B B B … NULL

Value of a Telugu letter will be U * 256 + L

The Rule (contd.)

Page 13: 98MX17

13

Design of Region and Language Databases

Region Fonts

Language

has

has

1

1

N

N

Page 14: 98MX17

14

Structures of Database

Type Region regionCode As Byte /* Unique Region code */ regionName As String /* Name of the Region */ defaultFont As int /* Default font code of the region */End Type

Type Region_Fonts fontCode As int /* Unique Font Code */ fontName As String /* Name of the Font */ regionCode As Byte /* Font Association to a Region */End Type

Page 15: 98MX17

15

Type Language /* Language Record Structure */ langCode As Byte /* Unique Language Code */ langName As String /* Language Name */ regionCode As Byte /* Unique Region Code*/ DBCS As Boolean /* Double Byte Coding Scheme */ fontLocation As Char /* Location in 16 Bit Font

Table, sizeof (Char) is 2 Bytes */ fontUnits As Byte /* No. Of Font Units, 1 Unit = 256

Slots */ weight As Byte /* Language Weight for sorting */End Type

Structures of Database (Contd.)

Page 16: 98MX17

16

Universal

THAMIZH1 (Pure Thamizh)

Telugu

Other Languages in this region (Kannada, Malayalam, etc.,)

8200H

THAMIZH2 (Grantha)

80FFH

8100H

.

.

.

.

.

.

.

.

.

81FFH

0000H

7FFFH

8000H

.

.

.

8AFFH

8BFFH

.

.

.

FFFFH

Slots allotted in 16-bit Font table

Page 17: 98MX17

17

Language based ordering is

feasible

Network

Congestion is low

Lexical Order Sorting is easy

No Mis-scripting

eg. eci

No Kerning Problemeg. ]f

Merits

Ease of Speech Synthesis

Page 18: 98MX17

18

PANDITHAM as applied to other languages

Name : Krishna Reddy PANDITHAM rep.: DLC ASC K r i s h n a R e d d y NULL Length : 16 bytes

Name : kiRxf]a erdfF PANDITHAM rep.: DLC TM1 ki R MLC TM2 xf ]a SP er df F NULL Length : 13 bytes

Name : h.v{^w¨ UvxD« PANDITHAM rep.: DLC KAN .v{ ^w¨ Uvx D« NULL

Length : 12 bytes

On the Average, to represent Kannada or Telugu strings in PANDITHAM, it may require about 2 bytes per character

Page 19: 98MX17

19

Storage RequirementsMonolingualeg. tiRkfKbqf, Any English Text

1 Byte / Character

Multilingual

(in the worst case it is multilingual to the core ; i.e., alternate letters switch between two different languages)

eg. ½ ai h Uv n (22 bytes)

DLC HIN ½ MLC TM1 ai MLC TM2 h MLC KAN Uv MLC ENG n null

1.1 Bytes / Character Bilingual (Thamizh & Grantha)

Best case: Most of the characters belong to the same languageeg. tiRkfKbqf

Worst case: Alternate letters switch between two different languages eg. haihErxf

1 Byte / Character

2 Bytes / Character

The Average will depend on Languages present

Page 20: 98MX17

20

Performance of various schemes - A Comparison

7 bit ASCII

8 bit ASCII(GlyphBased)

Issues----------Schemes

PANDITHAM

1 Byte

1 - 3 Bytes

3.5 Bytesfor Thamizh

StoragePer char

Best Case1 ByteWorst Case2 BytesLikely Case1.1 Bytes

2 Bytes

Very low

Very High

ExtremelyHigh

NetworkCongestion

Low

Very High

Simple

Complex andParsing

Required

Complex

LexicalOrder Sorting

Simple

Simple

N.A.

No

No

Flexibilityin Language Ordering?

Yes

No

Difficult

ComplexParsing

Req.and

Discontinuous

Complex

SpeechSynthesis

Simple

Simple

Random Processing of

letters ?

Yes

No

No

Simple for Monolingual

(Pure Thamizh)

Yes

Lingual

Mono(English)

Bi

ISCII Based

Multi

Char Based

ISCII Based

unicodeMulti

Multi

Page 21: 98MX17

21

Features of various schemesFeatures----------Schemes

7 bit ASCII

8 bit ASCII

(Glyph Based)

UnicodeISCII Based

UnicodeChar. Based

PANDITHAM

Characterrendering

Simple

Simple, but not always.But time consuming

Parsing required,time consuming

Simple

Simple

KerningProblem

Yes

Noeg. ]ffff

May be

Yes

Yes

Mis-ScriptProblem

Yes

Noeg. eci

Yes

Yes

Yes

Lingual

Mono(English)

Bilingual

Indian

Multi

Multi

ELIMINATES

Page 22: 98MX17

22

An Overview of E-mail

Mail Transfer Agents (MTAs)

Mail User Agents (MUAs)

Permanent Programs run on the hosts

Listens for E-mail

Saves the E-mail for the local users

Host Computers run MTAs, also known as Mail Servers

Run by user to send (or) receive E-mails

An interface to view the E-mail

Facilitates communication with the MTA

Page 23: 98MX17

23

Study of Existing Multilingual E-mail Providers

It is a Mail User Agent (Client Program)

Supports 12 Indian languages including Thamizh

Supports Various Keyboard Layouts including Tamil99 Keyboard

Mail Message is despatched in Rich Text Format which is very Costly

Fonts are attached with the Mail

Glyph based Editing (glyphs ‘N’ and ‘ÿ’ make ‘Nÿ’)

Mis-Scripting and Kerning problems are encountered (eg. eg. EciEci ) (eg. XI, ]f)

IndoMail (By Lastech Systems):

Page 24: 98MX17

24

It is web based mail service

Multilingual E-mail

No Standard Keyboard Layout has been used

Mail Message is converted to image format (.gif) and sent to the destination

Mis-Scripting and Kerning problems are identified

End-user should know English to send E-mail in Thamizh service since they use Transliteration technology, i.e., ‘ka’ becomes ‘k’ and ‘amma’ becomes ‘`mfma’

www.bharatmail.com:

Page 25: 98MX17

25

Protocol used for communication between MTAs and

between MUAs and MTAs

Objective is to transfer Mail reliably and efficiently

Clients use 4 letter Command for communication with

Server

3 Digit numeric code is used as the response by Server

SMTP Servers usually listen on port 25

An Overview of SMTP

Page 26: 98MX17

26

Sender-SMTP Receiver-SMTP

User

File System

File System

SMTP Commands/

Replies

and Mail

The SMTP Model

Page 27: 98MX17

27

Opening and Closing Connection

Sender-SMTP Receiver-SMTP

MUA MTA

HELO <panditham>

250 PMS-Ok

QUIT

221 PMS-service

closing transmission

channel

220 Ready

Page 28: 98MX17

28

Sending Mail

Sender-SMTP Receiver-SMTP

MUA MTA

MAIL FROM:<panditham>

250 PMS-Ok

RCPT TO:<ØÊå@Nt>

250 PMS-Ok

DATA

354 PMS-Start mail input

<Happy Thamizh new year é×Ìʵ Og×ÏZNt>

<CRLF>.<CRLF>

250 PMS-Ok

Page 29: 98MX17

29

E-R Diagram

User

Message

Has

N

M

Page 30: 98MX17

30

Name Type Size (in bytes)

Record_size Short 2

User_id_no Integer 4

Password Byte Var

User_id Byte Var

User_name Byte Var

Sex Byte 1

Alternate_email Byte Var

Contact_phone Byte Var

MailDayBox Byte 1

Table Design

Table: USER

Page 31: 98MX17

31

Name Type Size (in bytes)

Record_size Short 2

User_id_no Integer 4

Message_id Long 8

IsRead Boolean 1

IsUrgent Boolean 1

Date Byte Var

Sender Byte Var

Subject Byte Var

Name Type Size (in bytes)

Record_size Long 8

Message_id Long 8

Message_data Byte Var

Table: USER_MESSAGE_<0-6>

Table: MESSAGE

Page 32: 98MX17

32

USER_MESSAGE_0

00

USER_MESSAGE_6

100011

1 0234567bit

Usage of MailDayBox Field

Page 33: 98MX17

33

Format of E-mail message

E-mail consists of header part and message part

Headers are terminated by null line

Message is terminated by <CRLF>.<CRLF>

Headers Used

To: Date:Cc: Urgent:From: ImmDel:Subject: Delivery-Date:Day: Received:

Page 34: 98MX17

34

To: 5ì×Cc: From: "Administrator" <admin>Subject: åʳËþ/PandithamDay: 3 /* 0-Sunday, 1-Monday and so on */Date: 06/12/2000Urgent: 0 /* 0-Not Urgent, and 1-Urgent */Immdel: 0 /* 0-Do Nothing, and 1-Delete on Read */Delivery-Date: Wed Dec 06 11:31:15 GMT+05:30 2000Received: from panditham1/164.16.18.181

A½é¹*@/Hello,

åʳËþ ô½±°hM oVZON åÍR n3ËúZR ôZN ؽw. ËgNt I'gNlZR "åʳËþ" G½u öNN6ZR ô½±°hM AµñåRþ.

Thank you for registering with Panditham Mail Service. Mail to "admin" for any clarification and/or help.ؽw/Regards,åʳËþ R_/The Panditham Team.

A sample mail message

Page 35: 98MX17

35

Supporting tools

Thamizh Keyboard driver in the form of Component

Tool for Language Database Maintenance

Multilingual Text components

Multilingual password component

Multilingual Message Box

Tool for creating interface data

16-bit fonts; Muhil, Aruvi, and Thamizhan have been

developed

Page 36: 98MX17

36

Features of PANDITHAM Mail Server

Understands Multilingual strings

Uses Character oriented protocol PANDITHAM for processing

multilingual strings

Registration of a new User in multilingual form,

i.e., 4ù×_J is a valid user-id

Handles multiple users at the same time

Tools provided to monitor the clients

Supports unique features such as Urgent mail and

Delete on read mail.

Page 37: 98MX17

37

Features of PANDITHAM Mail Client

Provides Multilingual User Interface

Character Oriented data entry

No Kerning Problem (eg. XI, ]f)

No Mis-Scripting (eg. eg. Eci Eci )

Optimized data transfer for Multilingual strings.

Uses Tamil99 Keyboard Layout (Phonetic) for Thamizh

Provision is provided to send English mails to other mail servers

Also features POP3 client facility, such that mails can be read from

other servers, which support POP3 mails

Page 38: 98MX17

38

Conclusion

The reliability of Multilinguality is of high standard

as the core engine of the system is based on PANDITHAM.

The system can easily accommodate new languages when

the appropriate keyboard drivers are provided in the form

of components.

The system can be further improved by

incorporating POP3 Server feature in the PANDITHAM

Mail Server.

Page 39: 98MX17

39

Page 40: 98MX17

40

ؽwVisit: www.psgtech.ac.in/panditham/Mail: [email protected]