on mathematical expression analysis in arabic handwriting elena smirnova and stephen watt orcca,...

17
On Mathematical Expression Analysis in Arabic Handwriting Elena Smirnova and Stephen Watt ORCCA, UWO, Feb 2007

Upload: silas-watts

Post on 23-Dec-2015

214 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: On Mathematical Expression Analysis in Arabic Handwriting Elena Smirnova and Stephen Watt ORCCA, UWO, Feb 2007

On Mathematical

Expression Analysis in Arabic

Handwriting

On Mathematical

Expression Analysis in Arabic

Handwriting

Elena Smirnova and Stephen Watt

ORCCA, UWO,

Feb 2007

Page 2: On Mathematical Expression Analysis in Arabic Handwriting Elena Smirnova and Stephen Watt ORCCA, UWO, Feb 2007

Categories of Math Notations• Writing direction

– Math flows against text – Math is written in the same direction as text (right to left)

• Use of alphanumerics and math symbols– Variables

• Use of Latin and Greek alphabet• Use of Arabic alphabet

– Numerals • Use of Western Arabic notation for numbers• Use of Arabic - Indic or Eastern Arabic-Indic numbers

– Math operators and function names• Western notation • Mirrored glyphs• Special Arabic glyphs

Page 3: On Mathematical Expression Analysis in Arabic Handwriting Elena Smirnova and Stephen Watt ORCCA, UWO, Feb 2007

Directions in Arabic Math• Dual direction (Persian and Moroccan Styles)

<text 2> math <text 1>

• Single direction (Maghreb and Machrek Styles)

<text 2> math <text 1>

٠ < ۱+ ا ، ب( ا – ب)٢

Page 4: On Mathematical Expression Analysis in Arabic Handwriting Elena Smirnova and Stephen Watt ORCCA, UWO, Feb 2007

Numerals in Arabic Notations

• Western Arabic (Europe)0 1 2 3 4 5 6 7 8 9

• Arabic – Indic (Most of Arabic counties)

٩ ٨ ٧ ٦ ٥ ٤ ٣ ٢ ١ ٠

• Eastern Arabic-Indic (Iran, Urdu)

٣ ٢ ١ ٠ ۴ ۵ ۶ ٩ ٨ ٧

Page 5: On Mathematical Expression Analysis in Arabic Handwriting Elena Smirnova and Stephen Watt ORCCA, UWO, Feb 2007

Math Variables

• Latin and Greek alphabets

• Arabic alphabet

Page 6: On Mathematical Expression Analysis in Arabic Handwriting Elena Smirnova and Stephen Watt ORCCA, UWO, Feb 2007

Math Operators and Functions• European Notation – Persian Style

• Mirrored glyphs – Maghreb “Western” style

• Arabic glyphs – Machrek “Eastern” style

Page 7: On Mathematical Expression Analysis in Arabic Handwriting Elena Smirnova and Stephen Watt ORCCA, UWO, Feb 2007

Typeset Arabic Math

• Related projects in rendering typeset math:

– DADTeX, a TeX environment supporting Arabic

– Dadzilla, a MathML browser supporting Arabic

– Arabic Unicode, with respect to directionality

Page 8: On Mathematical Expression Analysis in Arabic Handwriting Elena Smirnova and Stephen Watt ORCCA, UWO, Feb 2007

New Challenges in HWR

• Stroke segmentation in text fragments

• If Moroccan or Persian notations are used, structure recognizers has to handle bidirectional input.

• In Maghreb notation recognizer has to handle mirrored glyphs.

• In Machrek notation a special recognition technique is needed for handling ligatures.

Page 9: On Mathematical Expression Analysis in Arabic Handwriting Elena Smirnova and Stephen Watt ORCCA, UWO, Feb 2007

Influence on Expression Analysis

• Arabic notation affects methods not only for analyzing the structure, but also for interpreting the results of recognition

• Special attention to be paid to

– Implicit directionality– Mirrored expressions– Special container glyphs– Stretched ligatures

Page 10: On Mathematical Expression Analysis in Arabic Handwriting Elena Smirnova and Stephen Watt ORCCA, UWO, Feb 2007

Implicit Directionality

Statement “A2>0 if A>0” written in Farsi

• Recognizer determined the glyphs as { A,>, اگر ,٠ ,  A, ٠, <,٢ }.

• Persian notation mathematical content flows from left to right

• Naïve structure analyzer may translate this to

A > 0  if A2 > 0(A > اگر ٠   A ٠ < ٢ )

WRONG!!!

Page 11: On Mathematical Expression Analysis in Arabic Handwriting Elena Smirnova and Stephen Watt ORCCA, UWO, Feb 2007

Careful Mirroring • Every asymmetric operator is assigned its

mirrored glyph: “(“ “)”, “>” “>“, etc.

٠< ۱+ ا ، ب( ا – ب)٢  a, b )a – b(2 + 1> 0

• Some mirrored glyphs have not only opposite, but very different mathematical meaning– For ex. pair “\”, “/” is direction sensitive:

• “A / B” means division in Left to Right notation• “B / A” means set subtraction in Right to Left notation

Page 12: On Mathematical Expression Analysis in Arabic Handwriting Elena Smirnova and Stephen Watt ORCCA, UWO, Feb 2007

New Container Glyphs

notation for “5!”

• The notation for factorial introduces one more case of a container symbol, in addition to the symbols for radical and long division.

• New set of rules to the structural analyzer must be added, i.e. the layout of the expression “n!” will be detected as nested rather than linear.

Page 13: On Mathematical Expression Analysis in Arabic Handwriting Elena Smirnova and Stephen Watt ORCCA, UWO, Feb 2007

Advantages• Stretched large operators allow to avoid

ambiguities in structure recognition

Examples

N-ary Summation vs.

N-ary Product vs.

Limit vs.

Maghreb

Machrek

Farsi

Page 14: On Mathematical Expression Analysis in Arabic Handwriting Elena Smirnova and Stephen Watt ORCCA, UWO, Feb 2007

Context Assistance• Extra challenge: lots of ambiguous math

characters– ;("1") ١ and ("ALEF") ا – ٠("0") and a dot; – ٥("5") and ه("HEH") or the symbol for degree "".

• Ex:

• Suggested strategy for character disambiguation: use of Math Context Database for Arabic notations

Page 15: On Mathematical Expression Analysis in Arabic Handwriting Elena Smirnova and Stephen Watt ORCCA, UWO, Feb 2007

Conclusions• Recognition of Arabic handwritten math introduces

new classes of problems, mainly dealing with – stroke segmentation – structure analysis in bidirectional notations.

• However, many methods developed for European style of math handwriting analysis are applicable to Arabic notations.

• Moreover, certain things that are easier with Arabic notations:– clearer structure organization in case of large delimiters– more explicit distinction between mathematical and text

fragments (in bidirectional notations).

Page 16: On Mathematical Expression Analysis in Arabic Handwriting Elena Smirnova and Stephen Watt ORCCA, UWO, Feb 2007

Future Work• Identifying suitable source of Arabic training

material for building Context DB and for training the structure analyzer

• Merging our Mathink framework with existing recognizers for Arabic script (for text fragments)

• Enhancing char recognizers to handle very stretched glyphs

• Adding direction awareness to the structure analyzer

• Developing tools for automated notational profile detection

Page 17: On Mathematical Expression Analysis in Arabic Handwriting Elena Smirnova and Stephen Watt ORCCA, UWO, Feb 2007

References[1]  Azzeddine Lazrek, Mustapha Eddahibi, Khalid Sami, Cadi Ayyad, Bruce R. Miller. Arabic mathematical notation. W3C Interest Group Note, January 2006. http://www.w3.org/TR/arabic-math/

[2]  T. Sari and M. Sellami,Cursive Arabic Script Segmentation and Recognition System. International Journal of Computers and Applications, Vol. 27, 2005.

[3] Al-Emami, S. and Usher, M., On-Line Recognition of Handwritten Arabic Characters. Pattern Analysis and Machine Intelligence, IEEE Transactions (PAMI) Vol. 12, No. 7, 1990, pp. 704-710.