latin-nastalique script classification system ghani.pdf · script is recognized by the tesseract...

12
ATIN-NASTALIQUE CLASSIFICATION SY CLASSIFICATION SY uhammad Usman Ghani esearch Officer-III enter for Language Engineering E SCRIPT YSTEM YSTEM

Upload: others

Post on 26-Jul-2020

19 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: LATIN-NASTALIQUE SCRIPT CLASSIFICATION SYSTEM Ghani.pdf · script is recognized by the Tesseract OCR. Font size independent approach is used. INTRODUCTION Latin script. Urdu OCR and

LATIN-NASTALIQUE SCRIPT

CLASSIFICATION SYSTEMCLASSIFICATION SYSTEMMuhammad Usman Ghani

Research Officer-III

Center for Language Engineering

NASTALIQUE SCRIPT

CLASSIFICATION SYSTEMCLASSIFICATION SYSTEM

Page 2: LATIN-NASTALIQUE SCRIPT CLASSIFICATION SYSTEM Ghani.pdf · script is recognized by the Tesseract OCR. Font size independent approach is used. INTRODUCTION Latin script. Urdu OCR and

Latin script is also used for terminology illustration or other

purposes in Urdu books and Magazines.

The script detection system isolates Nastalique and

The Nastalique script is recognized through

script is recognized by the Tesseract OCR.

Font size independent approach is used.

INTRODUCTION

Font size independent approach is used.

Latin script is also used for terminology illustration or other

The script detection system isolates Nastalique and Latin script.

is recognized through Urdu OCR and Latin

Tesseract OCR.

Page 3: LATIN-NASTALIQUE SCRIPT CLASSIFICATION SYSTEM Ghani.pdf · script is recognized by the Tesseract OCR. Font size independent approach is used. INTRODUCTION Latin script. Urdu OCR and

SYSTEM OVERVIEW

Page 4: LATIN-NASTALIQUE SCRIPT CLASSIFICATION SYSTEM Ghani.pdf · script is recognized by the Tesseract OCR. Font size independent approach is used. INTRODUCTION Latin script. Urdu OCR and

Features Extraction

� Dimensional Features

� Morphological Features

Classification: C4.5 Decision Tree algorithm

SCRIPT CLASSIFICATION

Classification: C4.5 Decision Tree algorithm

Page 5: LATIN-NASTALIQUE SCRIPT CLASSIFICATION SYSTEM Ghani.pdf · script is recognized by the Tesseract OCR. Font size independent approach is used. INTRODUCTION Latin script. Urdu OCR and

Dimensional Features

� Height

� Width

� Area

� Height-to-Width Ratio

Centroid Composite Value

FEATURES EXTRACTION (1)

� Centroid Composite Value

FEATURES EXTRACTION (1)

Page 6: LATIN-NASTALIQUE SCRIPT CLASSIFICATION SYSTEM Ghani.pdf · script is recognized by the Tesseract OCR. Font size independent approach is used. INTRODUCTION Latin script. Urdu OCR and

Morphological Features

FEATURES EXTRACTION (2)(2)

Page 7: LATIN-NASTALIQUE SCRIPT CLASSIFICATION SYSTEM Ghani.pdf · script is recognized by the Tesseract OCR. Font size independent approach is used. INTRODUCTION Latin script. Urdu OCR and

Script type of first ligature in a line is changed to script type of next two CCs, if these two CCs have same script type.

Script type of last ligature in a line is changed to script type of previous two CCs, if these two CCs have same script type.

If a ligature having script type Latin have Nastalique script CCs on its right and left, its script type would be changed to Nastalique.

If a ligature having script type Nastalique have Latin script CCs on its right and left, its script type would be changed to Latin.

If a Latin script ligature has a diacritic associated with it and it is

NEIGHBORING RULES

If a Latin script ligature has a diacritic associated with it and it is placed below the MB or inside the MB, script type of such ligature would be converted to Latin.

Script type of first ligature in a line is changed to script type of next two CCs, if these two CCs have same script type.

Script type of last ligature in a line is changed to script type of previous two CCs, if these two CCs have same script type.

If a ligature having script type Latin have Nastalique script CCs on its right and left, its script type would be changed to Nastalique.

If a ligature having script type Nastalique have Latin script CCs on its right and left, its script type would be changed to Latin.

If a Latin script ligature has a diacritic associated with it and it is If a Latin script ligature has a diacritic associated with it and it is placed below the MB or inside the MB, script type of such ligature

Page 8: LATIN-NASTALIQUE SCRIPT CLASSIFICATION SYSTEM Ghani.pdf · script is recognized by the Tesseract OCR. Font size independent approach is used. INTRODUCTION Latin script. Urdu OCR and

RUN MARKING

Page 9: LATIN-NASTALIQUE SCRIPT CLASSIFICATION SYSTEM Ghani.pdf · script is recognized by the Tesseract OCR. Font size independent approach is used. INTRODUCTION Latin script. Urdu OCR and

99Identity Crisis

(Collective WillNationality)

55(Gallstones(blle saltscholesterolcalcium

RECOGNITION

saltscholesterolcalcium£

Page 10: LATIN-NASTALIQUE SCRIPT CLASSIFICATION SYSTEM Ghani.pdf · script is recognized by the Tesseract OCR. Font size independent approach is used. INTRODUCTION Latin script. Urdu OCR and

Identity Crisis

(Collective Will

Nationality)

(Gallstones)

blle salts

Cholesterol

POST-PROCESSING

Cholesterol

Calcium

Page 11: LATIN-NASTALIQUE SCRIPT CLASSIFICATION SYSTEM Ghani.pdf · script is recognized by the Tesseract OCR. Font size independent approach is used. INTRODUCTION Latin script. Urdu OCR and

QUESTIONS ?

Page 12: LATIN-NASTALIQUE SCRIPT CLASSIFICATION SYSTEM Ghani.pdf · script is recognized by the Tesseract OCR. Font size independent approach is used. INTRODUCTION Latin script. Urdu OCR and

THANK YOU ☺