character recognition - a review

Patwm Rtcogmrio,, Vol. 23. No. 7, pp. 671-683. 1990 Prmted in Great Brttain

0031-3293 90 $300 + .GO P ~ o . Prem ptc

1990 Pattern Recogmtiou Soocty

C H A R A C T E R R E C O G N I T I O N - - A R E V I E W

V. K. GOVINDAN Department of Electrical Engineering Calicut Regional Engineering College, Calicut-673 601, India

and

A. P. SHIVAPRASAD* Department of Electrical Communication Engineering, Indian Institute of Science.

Bangalore-560012, India

(Received 9 February 1989; received for publication I June 1989)

AImtract--The machine replication of human reading has been the subject of intensive research for more than three decades. A large number of research papers and reports have already been published on this topic. Many commercial establishments have manufactured recognizers of varying capabilities. Hand- held, desk-top, medium-size and large systems costing as high as half a million dollars are available, and are in use for various applications. However, the ultimate goal of developing a reading machine having the same reading capabilities of humans still remains unachieved. So, there still is a great gap between human reading and machine reading capabilities, and a great amount of further effort is required to narrow-down this gap, if not bridge it. This review is organized into six major sections covering a general overview (an introduction), applications of character recognition techniques, methodologies in character recognition, research work in character recognition, some practical OCRs and the conclusions.

Character recognition Character recognition applications Statistical approach Syntactic approach Descriptive approach Off-line and On-line character recognition Template matching Correlation Feature analysis and matching Chinese character recognition Indian character recognition Automatic design Practical OCRs

I. INTRODUCTION

Character recognition techniques associate a symbolic identity with the image of a character. This problem of replication of human functions by machines (computers) involves the recognition of both machine printed and handprinted/cursive-written characters.

Character recognition is better known as optical character recognition (OCR) since it deals with recognition of optically processed characters rather than magnetically processed tt) ones. Though the origin of character recognition can be found as early as 1870, it first appeared as an aid to the visually handicapped, and the first successful attempt was made by the Russian scientist Tyurin in 1900. (') The modern ver- sion of OCR appeared in the middle of the 1940s with the development of the digital computers. Thenceforth it was realized as a data processing approach with application to the business world. The principal motivation for the development of OCR systems is the need to cope with the enormous flood of paper such as bank cheques, commercial forms, government records, credit card imprints and mail sorting generated by the expanding technological society.

* To whom correspondence should be addressed.

OCR machines have been commercially available since the middle of the 1950s. Since then extensive research has been carried out and a large number of technical papers and reports have been pubLished by various researchers in the area of character recognition. Several books have been published on optical character recognition. (3-tt) Also special issues and reports on the topic have repeatedly appeared in the proceedings of the International Joint Conferences on Pattern Recognition and of the International System, Man and Cybernetics Conferences- Research works also appear in various other Conferences such as British Conferences on Pattern Recognition, and The Scandinavian Conferences on Image Analysis. State of the art reports on character recognition

(research have been presented by Na~' , "2) Har- mon, (~3) Stallings, ¢t4) Suen et al., (is) Mori et al., c~6~ Mantas, (2) Davis and Yall "~) and Chatterji. "s)

Presently, the methodologies in character recognition have advanced from the earlier use of primitive techniques for the recognition of machine printed

• numerals and a limited number of English (Latin) letters to the application of sophisticated techniques for the recognition of a wide variety of complex handprinted characters, symbols and word/script including Chinese and Japanese characters. The corn-

671

672 V.K. GOVINDAN and A. P. SHIVAPRASAD

purer recognition of Chinese characters was considered to be a very hard problem and regarded as one of the ultimate goals of character recognition research. Today, a number of research organizations and commercial establishments in Japan and China are actively engaged in introducing new innovations and sophistications for achieving better performances for Chinese character/text readers. "9-~3~ A number of character readers for Chinese and Japanese are now available. For example, the CLL-200(Y ~4~ character reader can read Chinese as well as Japanese Hirakana and Katakana characters. The Japanese use many kinds of characters, say about 2000 characters in their daily life, H6~ which are composed of Kanji, Hirakana, Katakana, Roman alphabets and Arabic numerals. Kanji is almost the same as Chinese characters, and about 3000 at most are commonly used in Japan. About 5000 characters are commonly used in China and they have more than 50,000 characters. Chinese characters are ideographs which are roughly equival- ent to an entire Western word, and mainly made of strokes, with the horizontal and vertical strokes dominating over diagonal ones. An adequate representation of an ideographic character requires a matrix of pixels about 10 times that needed for a Roman letter. This demonstrates the complexity and sophistication needed in the development of a satisfactory character reading machine for these characters.

Most recognizers reported in the literature and those commercially available are solely dedicated to a specific alphabet set. However, as Japanese use Kanji, Hirakana, Katakana, etc. in their daily life some of their OCRs can read more than one alphabet set as in the case of CLL-2000. The research work that has been reported so far includes the development of recognizers for English (Latin), Japanese, Chinese, Indian, Arabic and Korean. Also a few works are reported on recognizers for Cyrillic (Russian), t2~ Hebrew (a semitic language), ~'6~ Thai: '~-~9~ Greek ~°~ and Berber. °t~

How great is the importance of any work in the field of character recognition can be seen from the variety of practical applications, given in the next section, for which the character recognition techniques have been employed.

2. APPLICATIONS OF CHARACTER RECOGNITION TECHNOLOGY

Optical character recognition technology has many practical applications. Some of the literatures covering these are in languages other than English, namely, German, Japanese, etc. However, for the purpose of completeness they are also indexed while citing the applications. The following are some of the applications for which OCRs have been used or suggested by researchers. --Use by blind people--as reading aid using photo- sensor and tactile simulators, and as a sensory aid with sound output. °z-s'~ Also used for reading and

reproduction of braille originals. '3s~ - -Use as a telecommunication aid for deaf: 36~ - -Use in postal department--for postal address reading and as a reader for handwritten and printed pos t a l codes . {37-39}

- - F o r character print quality analysis/measurement/ 4°'4'~ document reading and sorting, ~'*z~ in air- line reservationJ 43~ and in motor vehicle bureau--as automatic number plate reader and recorder for road traffic control: 44' - -Use in the publishing industry/4s~ and as a reader for data communication terminal: 46~ - - F o r giro services--for giro document reading sorting and ledgering and for reading giro orders: "Tj - - F o r direct processing of documents--as a multipurpose document reader for large-scale data processing, as a micro-film reader data input system, for high speed data entry, for changing text/graphics into a computer readable form, as electronic page reader to handle large volumes of mail: 4s-st~ - - F o r use in customer billing as in telephone exchange billing system/s2~ order data logging, ~s3~ automated finger print identification, ~s4~ as an automatic inspection system--for I.C. mask inspection and defect detection in microcircuits, ~Ss~ and as a credit card scanner in credit personal identification systems: s6~ - - F o r business applications--financial business applications like cheque sorting strategy optimiz- ation:ST.ss~

--For digital bar code reading/sg~ and as a handwriting analyser--for automatic writer recognition and signature verification: 6°'6'~ --Use in health insurance data acquisition: 62~ --For mechanized document reading in textile and clothing manufacture enterprises: ~s~ automatic punching of industrial telegraphs/64~ retail data processing applications in food enterprises, and for retail product code name and price reading techniques. ~6~ --In law enforcement applications, ~6~ in educational administrations --examination assessment and attendance record evaluation/67~ and as mark sheet reader for payroll accounting and book-keeping, t6s~ - - F o r optical census: 69) and for control of outside

distributions. workers in sales and • _tTo~ - - I n automated cartography, t~'~ metallurgical industries, tTz~ computer assisted forensic linguistic system, c.3~ electronic mail: ~4~ information units and libraries, and for facsimile: 7sj - - F o r shorthand transcription/~6:7~ and in electronic package industries ~s) and reading characters stamped on metallic parts: 79's°~

3. METHODOLOGIES IN CHARACTER RECOGNITION

The character recognition methodologies can be looked upon in various ways. The three main ways to look at are based on

(I) the approaches used, (2) the nature of applications, and (3) the features used.

Character recognition--a review 673

3.1. Character recognition approaches

We have two main approaches to pattern recognition. They are statistical/decision-theoretic and syntactic/linguistic/grammatical/structural approaches. Each of them have their merits and demerits. ~st'a2~ The structural information about the interconnections in complex patterns cannot be handled very well by statistical pattern recognition techniques. On the other hand, the use of formal language-theoretic models to represent patterns is the main drawback of the syntactic approach. Patterns are natural entities which cannot strictly obey the mathematical constraints set by the formal language theory. Imposing a strict rule on the pattern structure is not particularly applicable to character recognition, where the intraclass variations are infinite. Further, the linguistic approach gives little concern on the limitations of the feature extractor.

So, a hybrid model is the only solution to practical character recognition problems. ~+'83~ To quote Fu, ~'*) "... the dichotomy of syntactic and decision-theoretic approaches appears to be convenient only from the viewpoint of theoretical study. In other words, the division of the two approaches is sometimes not clear- cut particularly in terms of practical applications."

For character recognition we need techniques to describe a large number of similar structures of the same category while allowing distinct descriptions among categorically different patterns. The ultimate goal of character recognition research is to develop machines which can read any text (unconstraint handwritten) with the same recognition capability of human (of course, at a faster rate). The expectation is that, if the features people use to recognize characters are properly described and used in a character recognition algorithm, the algorithm should perform as well as a human. This is the motivation for the so- called descriptive approach (e.g. reference 84), which is now popular in character recognition research. A descriptive approach can be provided easily with the flexibilities needed to take care of the infinite variation of the character shapes in a category-description. A description represents a higher level of intelligence. The description of a character involves features (structural details) and the rules under which they compose a character. To achieve an efficient description, the features used should be independent. That is, the presence of new feature(s) or absence of old feature, s) should not affect the description of the remaining features. This will provide some immunity to the limitations on the part of feature extractors.

3.2. Schemes based on the nature of applications

On the basis of the nature of applications we can group the works in character recognition into two main schemes, namely, off-line character recognition and on-line character recognition. In off-line systems, the recognition is not done at the time of preparing the documents, whereas in on-line character recogni-

tion, the recognition is done as and when the characters are hand-drawn, and hence the timing information of each strokes are also available along with the character images.

On the basis of the capabilities and complexity we can further classify the off-line schemes as:<,>

(1) Fixed-font character recognition which deals with the recognition of a specific type writing font like OCR-A; OCR-B, Pica, Elite, etc.

(2) Multifont character recognition which recognizes more than one font.

(3) Omni-font character recognition for the recognition of any font.

(4) Handwritten character recognition which deals with the recognition of unconnected normal handwritten characters.

(5) Script recognition which deals with the recognition of unconstrained handwritten characters which may be connected or cursive.

3.3. Classification based on features used

In terms of the features used, the character recognition techniques can be broadly classified as

(1) template matching and correlation techniques, and

(2) feature analysis and matching techniques.

3.3.1. Template matching and correlation techniques. This directly compares an input character to a standard set of prototypes stored. The prototype that matches most closely provides recognition. The comparison methods can be as simple as one-to-one comparison, or as complex as decision tree analysis in which only selected pixels are tested. This type of technique suffers from sensitivity to noise and is not adaptive to differences in writing style. Moreover, from an Artificial Intelligence perspective, template matching has been ruled out as an explanation for human performance. ~8s~

3.3.2. Feature analysis and matching. These techniques are based on matching on feature planes or spaces which are distributed on a two-dimensional plane. These are the most frequently used techniques for character recognition. In these methods, significant features are extracted from a character and compared to the feature descriptions of the ideal characters, and the description that matches most closely provides recognition. The capabilities of human reasoning are better captured by feature analysis techniques than by template matching. ~.5)

Many feature analysis techniques have been developed and applied to character recognition. Most of them are examples of traditional pattern recognition methods, and are usually suitable for application to constrained domains. Suen et al. ~i s~ have given a very useful survey of various feature matching techniques. The details given below are mainly based on their work.


Based on the type of feature extraction techniques used the feature analysis techniques are grouped as:

(1)Global transformation and series expansion. (2) Features derived from the statistical distribution

of points. (3) Geometrical and topological features. Global transformation and series expansion tech-

nique helps to reduce dimensionality of the feature vector and provides features invariant to some global deformation like translation and rotation. In this, researchers have used Fourier, ~$6-s9) Walsh, ~9°'91~ Haar, ~92~ Hadamard ~93~ series expansions, Karhunen- Loeve expansion, ~2~ Hough transform, ~26'3t.9"~ projection transform, csg~ chain-code transform ~9~ and principal axis transform. ~95~ The extraction and mask making processes are easy for these features. However, such feature extraction techniques demand high computational requirements.

Features derived from the statistical distribution of points includes Zoning, ~6~ Moments, ~9~ n-tuples, ~gs~ Characteristic Loci, ~9~ and Crossing and Dis- tances, o°°-~°a~ These features are tolerant to distor- tion and take care of style variations to some extent. They provide high speed and low complexity for implementation. However, in general the mask making is difficult for these type of features.

Geometrical and topological feature analysis method is the most popular technique investigated by the researchers. The features may represent global and local properties of the characters. These include strokes, and bays in various directions, end points, intersections of line segments, loops (e.g. references 84 and 104), and stroke relations, angular properties, sharp protrusions (e.g. references 105 and 106). These features have high tolerances to distortions and style variations, and also tolerate a certain degree of translation and rotation. They help to process characters at high speeds. However, the extraction processes are in general very complex and it is difficult to generate masks for these type of features.

4. RESEARCH WORK IN CHARACTER RECOGNITION

This Section presents brief descriptions of some of the important research work including automatic designs in the area of character recognition. The presentation is split into six sub-sections dealing with early research, recognition of Chinese characters, recognition of Indian characters, research work of the early eighties, current research work and research in automatic designs.

4.1. Early work in character recognition

A notable early attempt in the area of character recognition research is by Grimsdale et al. ~ ° ~ in 1958. In their method, the input character pattern obtained by a flying spot scanner is described in terms of length and slope of straight line segments and length and curvature of curved segments. The description is compared with that of the prototype stored in the

computer in order to reach the proper decision about the identity of the unknown character.

Another important work is the analysis-by-synthesis method suggested by Eden ~1°s'1°9~ at M.I.T. He put forward the idea that all Latin script characters can be formed by 18 strokes, which in turn can be generated from a subset of 4 strokes, namely, hump, bar, hook, and loop. Some of the examples of the works in this directions are those by Blesser et al., cl 1o~ Cox et al., ~11~ Shiliman et al., ~ ' ~ Yoshida and Eden," t3~ and BerthodJ ~ ~*~ Blesser et al. proposed a theoretical approach based on phenomenological attributes. Cox et al. presented two main groups of grammar-like rules to deal with variability in type fonts. Three experimental techniques for studying ambiguous characters and for investigating relationship between physical and functional attributes were suggested by Shillman et al. Yoshida and Eden proposed a Chinese character recognition system which employs a generative process to extract a stroke sequence from the input pattern, and a look up dictionary of strokes to effect recognition. Berthod utilized Eden's primitives for cursive script analysis.

In the sixties, Narasimhan suggested a labeling schemata for syntactic description of pictures, t t s and a syntax directed interpretation of classes of pictures. ~1 ~ 6~ In another work, "17~ he proposed a recognition technique based on description and generation. Using primitives and relations, he described a specifi- cation language for handprinted Fortran character recognition. Later, Narasimhan and Reddy ~tts~ put forward a syntax-aided recognition scheme, wherein they incorporated in the decision rule some flexibility required for the satisfactory performance of a recognition system. The authors expressed the views that the rule currently in use must be refined, modified, and augmented continuously on the basis of the experience and other relevant knowledge acquired.

Pavlidis and Ali "~9~ and Ali and Pavlidis ~t°s~ utilized split-and-merge algorithm ":°~ for the polygonal approximation of characters for numeral recognition. A feature generation technique for syntactic pattern recognition by approximating character boundary by polygons and then decomposing on the basis of concavity is suggested by Feng and Pavlidis " ' t ) in 1974.

4.2. Chinese character recognition

Major research activities in character recognition are now centred about the recognition of handprinted Chinese characters, which was once considered to be a very hard problem and regarded as one of the ultimate goals of character recognition research. In 1966, Casey and Nagy ~xz2) at IBM presented one of the first attempts at Chinese character recognition. As the number of characters considered was about 1000 in their system, they employed a two stage process, namely a pre-classification or a rough classification stage for a group of similar characters, and


a fine classification stage for resolving individual characters' identity. The preclassification technique is the general strategy of research in Chinese character recognition to effectively deal with their large character set. The various techniques employed for the recognition of Chinese characters can be found in the review work of Stallings ~t'*~ and Moil et al. ct6~

In late 1970, Agui and Nagahashi "231 suggested a description method for handprinted Chinese character recognition. In their technique, a Chinese character is represented by partial patterns using three relations, namely concatenate, cross and near. The relations of relative location among partial patterns are used for categorization of the partial patterns. Later, in 1981, when Fujii et al. °1~ demonstrated a model of handprinted Kanji character recognizer, the psychological barrier that the machine recognition of Chinese character was very difficult was broken. "6~ This triggered a lot of interest among researchers in Japan, and as a result various existing as well as new methods have been tried to bring Chinese OCRs into practical use. Now, the main technique used is the feature matching method, in which a feature vector at each point is matched pixel-wise against a feature vector at a corresponding point on a template, after size and skew normalization. The technique demands only one template each for most of the characters, which is very important in the recognition of a large Chinese character set.

In 1980, Arakawa "2'*~ suggested an on-line handwritten character recognition system for Japanese characters. Fourier coefficients of pen-point move- ment loci relating to strokes are utilized as feature vectors. A method based on the Bayesian decision rule is used for recognition. Sekita et al. ~22~ presented a method of extracting features by using spline approximation. The method represents a character by contours expressed by well-approximating functions and stable breakpoints which characterize the connec- tion of the strokes so that it provides proper features for recognition with relaxation matching. A new relaxation method based on features reflecting structural information for Chinese character recognition was introduced by Xie and Suk. ~23~ They defined a new distance measure based on matching probabilities computed by relaxation technique for distinguishing similarly shaped characters within a cluster produced by pre-classification. A modified relaxation technique, incorporating the knowledge about the Chinese characters into the training system to reduce computational load is suggested by Leung et al. " ' s~ Finally Yong "26~ suggested recognition via neural networks for achieving fast recognition of handprinted Chinese characters.

4.3. Indian character recognition

Not many attempts have been carried out on the recognition of Indian character sets. However, some major works are reported on Devanagari (an Indian

script used for writing Sanskrit, Hindi and some other languages) ~ t 27-t 3 t l and Tamil c t 32- t 36~ character recognition. Some attempts are also reported on Brahmi (a script widely used all over India during third century BCL "33j TelugC t3~ and Bengali ct3s) characters. These are briefly reviewed in the following.

Sethi and Chatterjee It2~ have presented a Devan- agari numeral recognition in which the presence/absence of 4 basic primitives, namely, horizontal line segment, vertical line segment, right slant and left slant, and their interconnections are used for effecting recognition with the help of a decision tree. Late# t2s~ the authors attempted constraint handprinted Devan- agari character recognition using a similar method.

Sinha "29-t3tj has carried out a few notable works in Devanagari script recognition. The first attempt was by Sinha and Mahabala. "29~ They presented a syntactic pattern analysis system with an embedded picture language for Devanagari script recognition. The system stores structural descriptions for each symbol of the script in terms of primitives and their relationships. The recognition involves a search for the unknown character primitives based on the stored description and context. Sinha late# ~ 3o. t 3 t~ suggested knowledge based contextual post-processing systems for Devanagari text recognition.

Siromoney et al. "3'~ attempted machine recognition of Tamil characters using an encoded character string dictionary. Late# t33~ they proposed a recognition technique for printed Brahmi. The scheme employs features in the form of strings which are extracted by row-wise and column-wise scanning of character matrix. The features in each row and column are encoded suitably depending upon the complexity of the script to be recognized. Approaches similar to the above were later used by Chandrasekaran et al.~t 3,t~ for constraint handprinted Tamii recognition, and Chandrasekaran et al. "3s j for multifont Tamil, and special sets of printed Malayalam, and Devan- agari recognition.

In 1980, Chinnuswamy and Krishnamoorthy tt36~ presented an approach for handprinted Tamil character recognition employing labelled graphs to describe structural composition of characters in terms of line-like primitives. Recognition is carried out by correlation matching of the labelled graph of the unknown character with that of the prototypes.

A two stage recognition system for Telugu alphabets has been described by Rajasekaran and Deek- shatulu. "3~ In the first stage a directed curve tracing method is employed with a knowledge based search to recognize primitives (minor structural details) and to extract basic character from the actual character pattern. In the second stage, the basic character is coded, and on the basis of the knowledge of the primitives and basic character present in the input pattern, the classification is achieved by means of a decision tree.

An attempt for Bengali character recognition is that by Ray and Chatterjee. ct3a~ They presented

676 V.K. GOV1NDAN and A. P. SHIVAPRASAD

a nearest neighbour classifier employing features extracted by using a string connectivity criterion. Exploiting the similarity among the major Indian scripts, Dutta ~13~ presented a generalized formal approach for generation and analysis of all Bengali and Hindi characters. Marudarajan et al. ~t~°~

employed adaptive threshold logic for printed Hindi numeral recognition.

4.4. Research o f the early eighties

Some of the important character recognition research of the early eighties are those by Tanaka et al., ~ ' ~ Sarvarayudu and Sethi, ~gt) Shridhar and Badreidin, ~ ' ~ Sato et al. ~ ' ~ and Evangelisti. ~4~ A brief description of them is given in the following.

Tanaka et al.~'~presented a new recognition system of distorted patterns using the Viterbi algorithm and a modified trellis incorporating a pertinent statistics of distorted patterns. The trellis eliminates all the irrelevant pattern classes at the outset and leave only the most probable for its final decision. The method is used for the recognition of handwritten English and Japanese Katakana characters.

The works of Sarvarayudu and Sethi ~9~ and Shrid- har and Badreldin ~s~'s~ are on numeral recognition. Sarvarayudu and Sethi used Waish descriptors on the pattern boundary as features. They also presented a technique for the reconstruction of the pattern boundary from the Walsh descriptors. Shridhar and Badreidin first presented ~s~ a two stage character recognition algorithm using Fourier and topological descriptors to realize high accuracy for numeral recognition. To improve the speed of recognition impaired by the high computational requirements of Fourier descriptors, they later ~sT~ used a new set of topological features derived from a global description of the character. The recognition system consists of a syntactic classifier analysing the topological structure of the pattern.

To work as an economical input device for distributed data processing system, Sato et al. ~ae~ suggested a low cost, hand scanning type OCR which can read printed or typed characters. It uses a one-dimensional image sensor and scanning is done manually in horizontal direction.

A method of evaluating a character recognition scanner prior to designing recognition logic has been suggested by Evangelisti. ~t'~ The scanner is evaluated by comparing the pattern it produces with standard patterns selected by the computer.

4.5. Current research work in character recogni t ion

A large amount of research work has been carried out in the mid eighties and after. A few of them are reviewed here. They include contextual post processing by Nagy et al. ~ ' ~ and Sinha, ~t~t~ word/ script recognition by Almuallim and Yamaguchi, ~t'~ El-sheikh and Guindi, ¢t46~ Hull, ~'.7~ Aoki and Yamaya, ~t's~ Wong and Fallside, t''*9~ and Shrihari and Bozinovic, ~ ~0~ separation of connected characters

by Tampi and Chetlur t'~l~ and Ting and Ward, ~ls'~ numeral recognition by Lain and Suen, ~s3~ and Baptista and Kulkarni, ~54~ multifont learning by Cannat et al., ~5~ '~s~ learning by experience by Malyan and Sunthankar/~s~ Pitman's shorthand recognition by Leedham and Downton, (~6.~7~ pattern description and generation technique by Nagahashi and Nakatsuyama, ~ss~ description aided recognition by Harjinder, cs4~ chain-code transform technique by Cheng and Leung ~i9~ and pre-classification and recognition using Walsh transform by Huang and Lung. ~9°~

Nagy et al. ~ have demonstrated a heuristic algorithm for assigning alphabetic identity for symbols in a textual context on the basis of a small vocabulary of frequent English words requiring relatively modest storage and computing requirements. A rule based contextual post processor for Devanagari recognition is suggested by Sinha. ~x3x~ This consists of a composition syntax checker in the form of a finite state machine. The substitution rules are in the form of condition action pairs giving flexibility to the system for each alteration. Each substitution rule has a penalty associated with it and the accumulated penalty value for a word gives a measure of its confidence level.

A cursive Arabic word recognition system has been proposed ~'*s~, where words are first segmented into strokes and these strokes are then classified using their geometrical and topological properties. The relative position of the classified strokes are then examined, and strokes are combined into a string of characters that represents the recognized word. Another work in cursive Arabic script is by El-Sheikh and Guindi, ~46~ who segmented the cursive word into characters and recognized them with the aid of the context. Hulrs work ~4~ is on a knowledge based word shape analysis system with capability to read text printed in a wide variety of fonts and scripts. The algorithm characterizes the shape of a word by the left-to-right sequence of occurrence of a small number of features. This characterization is input to a classification algorithm that uses a letter tree representation of a dictionary to locate a group or neighbour- hood of words that share these features. Aoki and Yamaya ~'~8~ considered a syntactic recognizer for handwritten script words that uses a learning mechanism. A new dynamic programming method based on techniques used in the recognition of continuous speech has been described by Wong and Fallside. c149~ Also, a multilevel perception approach to reading cursive script has been proposed by Shrihari and Bozinovic.~t s o~

In reference (151), segmentation of connected handprinted characters is approached using an image description vocabulary which consists of words with built-in characteristics that gives features essential for segmentation. Another work which deals with connected character separation is by Ting and Ward.~ ~

A system for classification by relaxation matching of


totally unconstrained handwritten zip-code numbers has been described by Lain and Suen. I~ ~3~ It comprises a feature extractor which decomposes the skeleton of the character into geometrical primitives, and two classification algorithms, one is a fast structural clas- sifter that identifies the majority of the samples, and the other is a robust relaxation algorithm which classifies the rest of the data. Baptist and Kulkarni ~t 5,~ employed the multilevel approach to processing of visual information by the human brain to yield high accuracy handwritten character recognition.

Cannat eta/. ~tss't561 have employed the symbolic learning technique for multifont character recognition. They suggested a learning model in which the knowledge has to be found rather than modified in order to discover a discriminating generalization to achieve multifont character recognition. In reference (157), to aid reading by the blind, Malyan and Sunthankar have presented some preliminary results on the development of a low cost handprinted text reading system that learns by experience.

Leedham and Downton ~76~ described a number of evaluation experiments designed to establish the potential of Pitman's handwritten shorthand as an input for computer transcription to text. Later, ~77~ they suggested a recognition strategy for Pitman's shorthands. The technique involves splitting the shorthand outlines into two classes of characters, namely, shortforms and vocalized outlines. The short forms represents as much as 50% of normal shorthand are recognized by dynamic programming template matching technique, and the vocalized outlines are recognized using a syntactic method which interact with a knowledge source derived from analysis of a large number of shorthand outlines.

A pattern description and generation method for structural characters is reported by Nagahashi and Nakatsuyama/tSs~ In this method, any character is regarded as a composite pattern constructed by several simpler subpatterns, and is described in terms of them by introducing three kinds of positional relationships among them.

An important and informative work of the mid eighties is that of Harjinder/8"j He has given detailed implementation of a description aided recognition scheme for handprinted English (Latin) characters. Also given is a literature survey of the important methodologies employed in character recognition. The flexibility of the approach is th~ open endedness of the inventories of features and character descriptions. The scheme uses a hexagonal-cellular regular hexagon for curve following. Characters are described in terms of some grammar-like rules. A decision tree is used for the coupling of curve following, feature extraction and recognition.

Cheng and Leung ~tg~ have suggested a new parameter transformation method, called chain-code transform, suitable for the recognition of patterns containing straight lines. The chain-code transform method essentially maps the strokes of a character

into a 2-dimensional parameter space similar to that of Hough transform. The technique is employed in a preliminary recognition experiment with Chinese characters and obtained best recognition rate when compared with the projection profile method, Fourier transforms, and Hough transforms.

Huang and Lung ~9°~ pre-classified the commonly used Chinese characters into about 4096 classes most containing 1-6 characters using 4C code (obtained by encoding four corner zones of a character) and 4P code (obtained by encoding four peripheral rectangu- lar zones). Walsh transform is used for fine classification.

Other important works include the work by Wol- berg It59~ who suggested a syntactic omni-font system that recognizes a wide range of fonts including handprinted characters; on performance testing of mixed font variable size character recognizers by Lam and Baird; ~16°~ about the vectorizer and feature extractor for the document reader suggested by Pavlidis; ctr:l and on the guide lines for designing feature vectors for use with large character sets given by Hagita and Masuda.~t62~

4.6. Research in automatic designs

No attempts are known to the author in the topic of automated designs dealing with the design of recognizers suitable for structurally different character sets. However, some limited attempts have been made by a few authors. The most important among them are the works by Naylor, ct°4~ Ishii et al., ~16~ and Kami.~l 6,~

Naylor has described an interactive design for type written English (Latin) characters. A graphic console was used to aid the learning of decision logic (in the form of a decision tree) for discriminating between two patterns displayed on the console by the designer. The designer selects a pattern location which most strongly discriminate between the patterns, and the computer records the selections and builds up the decision logic. The scheme employed features such as 'Square corners' and 'horizontal line end'. These are extracted from measurements at 12 extreme points determined by maximizing some quadratic functions, and at the centre of gravity of the character.

The works of Ishii et al. and Kami are on automatic dictionary design/generation for numeral recognition. lshii et al. employed the feature concentration method to represent the topology of the characters by binary features. The features are selected on the basis of their usefulness in separating a certain class from all the other classes employing a criterion based on feature probabilities estimated from the training set. A recognition logic of a class is expressed as a sum of the products of the binary features (boolean variables) in which each term corresponds to a subclass. The ambiguities are checked by employing this classification rule to classify characters belonging to all the other classes of the training set. If any classes are


classified into one of the subclasses, a new feature is added to the classification rule to separate them from those subclasses. This process is repeated until the ambiguity becomes zero.

Kami's work employed features like 'relation between one convex (or concave) line and another' and 'information in convex (or concave) line'. Each character is expressed as a feature vector in terms of a best feature set selected by a sequential procedure applied to learning data feature values. A feature subspace for each category is then obtained by com- bining the feature vectors of the various samples of the category in such a way that each feature subspace has some distance from each of the others.

The scopes of all the above attempts were limited because they use simple features which do not exactly or directly reflect the structural details of the characters. They cannot represent the varying structural complexities of different alphabet sets. Moreover, with such simple features the automatic design problem will be easier to handle. Recently, the authors ct651 have suggested an automated approach to the design of recognizers suitable for structurally different character sets. The approach is somewhat similar to that of Kami's. "641 However, a flexible and unified/ general feature representation is employed to take care of the controlled incorporation of structural details (to describe various character classes) depending upon the complexity of an alphabet set.

5. SOME PRACTICAL OCRs

Small hand held OCRs cost about $1000 and desk top OCRs cost about $10,000. The medium size OCRs and large OCRs are very expensive. The Kurzweil Corporation manufactures medium size character readers that cost about $35,000 each, but can recognize a wide range of fonts. °~6~ The United States and other countries have installed large postal address reading machines that cost about halfa million dollars each to meet more stringent performances than most other readers.

The details of technology/operations of various practical OCRs and scanners are given in the literature. °'kst's~'s°'tl'2'*'t66-tTs~ Given below is a very brief mention of some of the practical OCR systems in the marketplace.

An example of a small hand held OCR is the Saba Handscan ~t 67~ that reads a line at a time and transmits it for incorporation into application programs such as word processors, databases, and spreadsheets. It is available for IBM, PC, XT, AT or compatibles. Another example is the RH.530 model ~t t~ developed by Toshiba for reading machine printed Katakana characters using template matching technique.

The CLL-200 ~24~ is a desk top OCR which can accurately recognize about 2400 handwritten characters including Chinese characters, Hirakana and Katakana characters. This portable OCR consists of 21 16 bit microprocessors assembled on 4 A4-sized

circuit boards. This OCR can interface with Japanese wordprocessor systems.

Another portable text reader is the DELTA t34~ intended for a sightless or severely visually impaired person reading English, French, Spanish or any printed text without outside help by means of a character recognition system associated with a Braille tactile display. The reading principle is that when the sightless person moves a microcamera along a line of text the characters are recognized in real time and converted into Braille on the tactile display.

McCormick t t ~ has presented a review of five character recognition machine for IBM PC and compatibles. They are the CompuScan PCS 230, the Dest PC Scan, the Canon IX-12, the IOC Reader and the EIT Personal Scanner 2000.

DBS 3000 "68~ is a character recognizer developed by AEG, Wedel, Germany that can read writing, printing, etc. even when their impression is poor. The equipment has a CCD camera, the image of which is processed in real time and stored in a 512 x 512 x 8 bit memory.

Vossen ~51~ has described the Formscan TXL4 Workless Station that reads 1000 pages per day into a text system. This OCR is developed by Dest of the U.S.A. to handle large volumes of mail.

An example of a general purpose OCR is the N3670G tt~,~ developed by NEC utilizing new ideas and technologies to realize high processing through- put.

CSL 2610 "~'*~ is an OCR developed for mail order business applications. This can process order forms with pre-printed customer address and customer handwritten or typed other information. An autonom- ous system for reading typed or handwritten documents is the Siemens optical character reader SLS9691.tt ~5~

An example of a high performance OCR is the TO- 3000 ~t ~ developed by ETL (Electrotechnical Labora- tory). This can recognize printed as well as handwritten characters. This employs the technique of outermost point method of Yamamoto eta/. tt°6~

OCR-3500C, ttt~ commercialized by OKI Inc., Japan uses a feature concentration method tt63~ based on the characteristic loci t99~ and the field effect technique.( t~6~

6. CONCLUSIONS

This review, in general attempted to bring out the present status of character recognition research. The various industrial, commercial, banking, and other activities to which optical character recognition technology is applied are listed. Major character recognition methodologies are discussed, and the need for a flexible approach to take care of the infinite intraclass variations is stressed. Much of the important research work reported is briefly described. Some of the commercially available OCRs are briefly mentioned.

Though researchers have suggested various sophi-


sticated ideas and techniques to deal with the recognition of unconstrained and connected characters, practical OCR systems suffer from a lack of such characteristics. This may be because of (I)the claims made by the researchers are not adequately substanti- ated by exposure of the systems into real working environments/conditions, and (2) the lack of practical feasibility of such advanced techniques with the available hardware from an economical viewpoint.

Now if we look at the performances of various commercially available systems, we can see that the performances of all these machines are controlled by many constraints. Deviations from these constraints can cause a large deterioration in the specified performance figures. (t~) Some of the commonly imposed constraints in some or all of the machines are: ('s) --Individual characters must not touch each other, and text must be clearly printed in dark ink on a highly coloured background. - - T h e location of the individual characters must fall within specified limits. --Multifont capability is achievable only if the oper- ator trains the machine on new fonts.

From these constraints and the lack of performances it can be concluded that the ability to read text by machines with the same fluency as the human remains an unachieved goal, though a great amount of effort has already been expended on the subject. That is, there is still a gap between human and machine reading capabilities, and further great efforts are needed to bridge this gap.

REFERENCES

I. Magnetic ink character recognition system with angled read head, IBM Technical Disclosure Bull. 28, 2555- 2556 (1985).

2. J. Mantas, An overview of character recognition methodologies, Pattern Recognition 19, 425-430 (1986).

3. Character Recognition. British Computer Society, Lon- don, England (1971).

4. K. S. Fu, Syntactic Pattern Recognition and Applica- tions. Prentice Hall, Engiewood Cliffs, New Jersey (1982).

5. G. Nagy, Optical character recognition: theory and practice, Handbooks of Statistics, P. R. Kilshnaiah and L. N. Kanal, Eds, Vol. 2, pp. 621-649 (1982).

6. S. N. Srihari, Computer Text Recognition and Error Correction. IEEE Computer Society Press, Silver Spring, MD (1984).

7. J. R. Ulimann, Advances in character recognition, Application of Pattern Recognition, K. S. Fu, Ed., pp. 197-236. CRC Press, Boca Raton, FL (1982).

8. V. A. Kovalevsky, Ed., Character Readers and Pattern Recognition. Spartan Books, New York (1968).

9. Y. A. Kovalevsky, Image Pattern Recognition. Springer, Berlin (1977).

10. Optical Character Recognition and the Years Ahead. The Business Press, Illinois, U.S.A. (1969).

I 1. C. Y. Suen and R. D. Moil, Eds, Computer Analysis and Perception, Vol. 1: Visual Signals. CRC Press, Boca Raton, FL (1982).

12. G. Nagy, State of the art in pattern recognition, Proc. IEEE 56, 836-860 (1968).

13. L. D. Harmon, Automatic recognition of print and script, Proc. IEEE 60, 1165-1176 (1972).

14. W. Stallings, Approaches to Chinese character recognition, Pattern Recognition 8. 87-98 (1976).

15. C. Y. Suen, M. Berthod and S. Mori, Automatic recognition of handprinted characters--the state of the art, Proc. IEEE 68, 469-485 (1980).

16. S. Moil, K. Yamamoto and M. Yasuda. Research on machine recognition of handpilnted characters, IEEE Trans. Pattern. Anal. Mach. Intell. 6, 386-405 (1984).

17. R. H. Davis and J. L. Yall, Recognition of handwritten characters--a review, image Vision Comput. 4. 208- 218 (1986).

18. B. N. Chatterji, Feature extraction methods for character recognition, IETE Tech. Rev. 3, 9-22 (1986).

19. Y. S. Cheng and C. H. Leung, Chain-code transform for Chinese character recognition, IEEE 1985. Proc. Int. Conf. Cyb. Soc., Tucson, AZ, U.S.A. pp. 42-45 (1985)

20. K. Yakosava, M. Umeda and E. Yadogawa, A model of human Kanji character recognition, Proc. 1986 IEEE Int. Conf. Syst. Man Cyb., Atlanta, GA, U.S.A., Vol. 2, pp. 1282-1286, (14-17 October 1986).

21. N. Fujii, H. Sugawara, E. Yamamoto, C. Ito and Fujita, Some results on handprinted Kanji character recognition using the feature extracted from multiple standpoint, Trans. IECE Japan, PRL81-32 (1981).

22. L Sekita, K. Toraichi, R. Mori, K. Yamamoto and H. Yamada, Feature extraction of handpilnted Japanese characters by spline function for relaxation matching, Pattern Recognition 21, 9-17 (1988).

23. X. L. Xie and M. Suk, On Machine recognition of handprinted Chinese characters, by feature relaxation, Pattern Recognition 21, 1-7 (1988).

24. H. Matsumura, K. Aoki, T. Iwahara, H. Oohama and K. Kogura, Desktop optical handwritten character reader, Sanyo Tech. Re,'. 18, 3-12 (1986). (In Japanese.)

25. E. P. Chornoshtan and I. B. Sirodzha, Machine model of a normalized written character classifier. Probl. Bioniki 19, 120-124 1977). (In Russian.)

26. M. Kushnir, K. Abe and K. Matsumoto, An application of the Hough transform to the recognition of printed Hebrew character, Pattern Recognition 16, 183-191 (1983).

27. C. Kimpan, A. Atoh and K. Kawanishi, Fine classification of printed Thai character recognition using the Karhunen-Loeve expansion, IEEE Proc. E 134, 257-264 (1987).

28. P. Hiranvanichakorn, T. Agui and M. Nakajima, A recognition of handprinted Thai character by local features, Trans. Inst. Elec. Com. Engng. Japan, Section E E68, 83-90 (1985).

29. C. Kimpan, Printed Thai character recognition using topological properties method, Int. J. Elec. {U.K.) 60, 303-329 (1986).

30. J. Mantas, Recognition of Greek handwritten characters employing fuzzy set reasoning, Proc. MELECAN "85 Mediterranean Electrotechnical Conf., Madrid, Spain 263-266 (8-10 October 1985).

31. A. Oulamara and J. Duvernoy, An application of the Hough transform to automatic recognition of Berber characters, Signal Process. (Netherlands) 14, 79-90 (1988).

32. J. C. Bliss, A relatively high-resolution reading aid for the blind, IEEE T. Man, Mack System 10, 1-9 (1969).

33. G.C. Smirch, The stereotoner reading aid for the blind, a progress report, 1973 Carnahan Conf. on Electronic Prosthetics, Lexington, U.S.A. pp. 74-76 (19-21 Sep- tember 1973).

34. R. D. Badoux, DELTA [text reader for the blind'], Computeilsed braille production, Proc. 5th lnt. Work- shop, Winterthur, Switzerland, pp. 21-25 (30 October- 1 November 1985).

35. C. J. V. Spronsen and F. Bruggeman, Raised type reading, Mini and Microcomputers and their applica-

680 V.K. GOVlNDAN and A. P. SHIVAPRASAD

tions, Proc. ISMM Int. Syrup., Sant Feliu de Guixols, Spain, pp. 274-277 (25-28 June 1985).

36. G. V. Kondraske and A. Shennib, Character pattern ~'cognition for a telecommunication aid for the deaf, IEEE T. Biomed. Engng. 33, 366-370 {1986).

37. C. W. Swonger, An evaluation of character normalization, feature extraction and classification techniques for postal mail reading, Proc. Automatic Pattern Recog- nition, Washington, D.C., U.S.A., pp. 67-87 (6 May 1969).

38. H. Genchi, S. Watanabe, S. Matsunaga and M. Tamada, Automatic reader-sorter for mail with handwritten or printed postal code numbers, Toshiba Rev. (Int. edn), Japan, 49, 7-11 (1970).

39. K. Notbohm and W. Hanisch, Automatic digit recognition in a mail sorting machine, Nachrichtentech, Elec. tron. (Germany) 36, 472-476 ••986). (In German.)

40. J. L. Crawford, Pictorial information disector and analyser system (PIDAS), IBM Technical Disclosure Bull. 15, 61-62 (June 1972).

41. W. R. Throsseli and P. R. Fryer, The measurement of print quality for optical character recognition system, Pattern Recognition 6, 141-147 (1974).

42. Burroughs reports [B9137 reader sorter], Burroughs Cleaning House (U.S.A.) 59, 32-33 (1975).

43. J. C. McAbee, OCR application at United Air Lines, Data Processing XII, Proc. 1967 Int. Data Process. Conf. and Business Exposition, Boston, MA, U.S.A, pp. 362-366 (20-23 June 1967).

44. A. Gyarfas, Experiments concerning the inspection and control of car and truck in France, Koe'.lekedes Tud. Sz. 24, 85-91 (1974). (In Hungarian.)

45. G. L. Skalski, OCR in the publishing Industry, Data Processing XII, Proc. 1967 Int. Data Process. Conj'. and Business Exposition, Boston, MA, U.S.A~ pp. 255-260 (20-23 June 1967).

46. H. Genchi, Data communication terminal apparatus, optical character and mark readers, Denshi Tsushin Gakkai Zasshi 52, 418-428 (1969). (In Japanese.)

47. J. D. Haaley, National giro document reading and sorting optical character recognition, Datafair 1969, Manchester, England (25-29 August 1969).

48. J. Uler, Direct data processing with the IBM 1287 multipurpose document reader for standard a r t i c l e - fresh service to Joh. Jacob and Co., Breman, IBM Nachr. (Germany) 20, 35-40 (February 1970). (In Ger- man.)

49. Speed readers [text readers], Which Computer (U.K.), pp. 119, 121, 123 (Nov. 1986).

50. S. Kroger, Scanner in practice, Chip (Germany), 5, 94- 96 (1987). (In German.)

51. M. Vossen, Electronic page reader in use, O~ce Man. agement (Germany) 34, 1148 (1986). (In German.)

52. K. Yoshida, Optical character reader for telephone exchange charge billing system, Japan Telecom. Rev. (Japan) 16, 105-110 (1974).

53. G. Hilgert, Method of dealing with orders on the IBM 1287 multifunction document reader at the decen- tralised sales organisation of the continental Gummi- Werke Aktiengesellschaft, IBM Nachr. 20, 122-125 (1970). (In German.)

54. Automatic identification of latent finger-prints, Report (unnumbered), (PB-192976), Metropolitan Atlanta Council of local Government, GA, U.S.A., p. 31 (April 1970).

55. W. Bojman, Detection and/or measurement on complex patterns, IBM Technical Disclosure Bull. (U.S.A.) 13, 1429-1430 (1970).

56. N. M. Herbst and C. N. Liu, Card-based personal identification system, IBM Technical Disclosure Bull. (U.S.A.) 22, 4291-4293 (February 1980).

57. Field-oriented scanning system, IBM Technical Disclos- ure Bull. (U.S.A.) 29, 2130-2133 (October 1986).

58. F. H. Murphy and E. A. Stohr, Optimal check sorting strategies, Bull. Oper. Res. Am. 23 (supplement 1), B/145 (Spring 1975).

59. E. G. Nassimbene, Digital compare circuitry, 1BM Technical Disclosure Bull. 14, 3421-3422 (1972).

60. S. Kupriyanov, Electronic handwriting analyzer, Tekh. Misul (Bulgaria) 9, 7-13 (1972). (In Bulgarian.)

61. J. Sternberg, Automated signature verification using handwriting pressure, 1975 WOSCON Technical Papers-- Western Electronic Show and Convention 19, San Francisco, California, U.S.A., 31-4 (16-19 Sep- tember 1975).

62. H. Timm" Registering of health insurance data using the IBM 1288 page reader, IBM Nachr. 23, 789-792 (1973). (In German.)

63. H. Schafer, Mechanised document reading in a textile and clothing manufacture enterprise, IBM Nachr. 23, 776-782 (1973). (In German.)

64. S. Inoue, A. Kurematsu, T. Wada and S. Nakabo, Studies on optical character recognition of international telegraph, KDD Tech. J. 77, 51-61 (1973). (In Japanese.)

65. W. Eggimann, Electronics in U.S.A. - - the computer in the supermarket, Electroniker (Switzerland) 13, 28-29 (1974). (In German.)

66. C. P. Joshi, Role of electronics in law enforcement, I. Inst. Elec. and Telecom. Engng (India) 20, 500-503 (1974).

67. B. R. Hemphill, Optical character recognition--the future is here, AEDS Monit. (U.S.A.) 13, 8-9 (1975).

68. E. Christ and G. Schrag, New tasks for the mark sheet reader, Data Rep. I I, 27-31 (1976). (In German.)

69. Z. Ress, Some experience with optically readable handwriting in solving the MIKROCENSUS 73, Mech. Aurora. Adm. (Czechoslovakia), 15, 290-292 (1975). (In Czechoslovakian.)

70. D. Schacht' Control of outside workers in sales and distributions using the optical document reader IBM 3886, IBM Nachr. (Germany) 28, 131-138 (1978). (In German.)

71. M. L. Gronmeyer, Recognition of handprinted characters for automated cartography: a progress report, Proc. Soc. Photo-Opt. Instr. Engng (U.S.A.) 205, 165- 174 {1979).

72. M. Pokluda, Optical pattern recognition in NHKG Ostrava, Mech. Aurora. Adm. (Czechoslovakia) 19, 218- 220 (1977). (In Czechoslovakian.)

73. U. Perret, Computer assisted forensic linguistic system 'TEXTOR', Proc. 3rd Int. Conf.: Security through Sci- ence Engng, Lexington, KY, U.S.A., 139-149 (23-26 September 1980).

74. P. F. Polizzano, OCR and electronic mail, Computer World 17, 49-52 (12October 1983).

75. J. W. T. Smith and 7., Merali, Optical character recognition: the technology and its applications in information units and libraries, Report 33, British Library, Boston Spa, Wetherby, West Yorks, England (1985).

76. C.G. Leedham and A. C. Downton, On-line recognition of Pitman's handwritten sho r thand - an evaluation of potential, Int. J. Man Mach. Stud. (U.K.) 24, 375-393 (1986).

77. C.G. Leedham and A. C. Downton, Automatic recognition and transcription of Pitman's handwriting shorthand--an approach to short forms, Pattern Recogni- tion 20, 341-348 (1987).

78. A. ikrger, P. Dunbar and C. Robert, Machine vision recognition in the electronic packaging industry, three case studies, VISION 85 Conf. Proc~ Detroit, MI, U.S.A. (25-28 March 1985).

79. Y. Nakamura, M. Suda, K. Sakal, Y. Takeda and M. Udaka, Development of a high performance stamped character reader, IEEE 7". Ind. Electron. (U.S.A.) 33, 144-147 (1986).


80. Y. Nakamura, M. Suda, T. Hayashi, A. Tanaka and S. Watanabe, An Optical character recognition system for industrial application: TOSEYE-1000, Proc. ira. Workshop on Industrial Application of Machine Vision and Machine Intelligence, Seiken Syrup., Tokyo, Japan. pp. 364-368 (2-5 February 1987).

81. M. A. Aiserman, Remarks on two problems connected with pattern recognition, Methodologies of Pattern Recognition, S. Watanabe, Ed., Academic Press, New York (1969).

82. S. Watanabe, Ungrammatical grammars in pattern recognition, Pattern Recognition 3, 385-408 (1971).

83. L. N. Kanal and B. Chandrasekaran, On linguistic, statistical and mixed models of pattern recognition, Frontiers of Pattern Recognition, S. Watanabe, Ed., pp. 163-192. Academic Press, New York (1972).

84. Harjinder Singh, Description aided recognition of handprinted characters, Ph.D. Thesis, Indian Institute of Science, Bangalore, India (1985).

85. S. C. Shapiro eta/., Eds, Encyclopaedia of Artificial Intelligence, Vol. 1. Wiley lnterscience, New York (1987).

86. M. Shridhar and A. Badreldin, High accuracy character recognition algorithm using Fourier and topological descriptors, Pattern Recognition 17, 515-523 (1984).

87. M. Shildhar and A. Badreldin, A high accuracy syntactic recognition algorithm for handwritten numerals, IEEE T. Syst. Man Cyb. 15 (1985).

88. E. Persoon and K. S. Fu, Shape determination using Fourier descriptors, IEEE 7". Synt. Man Cyb. 7, 170- 179 (1977).

89. H. F. Li and S. C. Cheng, Projection profile and Fourier transform for Chinese character recognition, Int. J. Elec. (U.K.) 54, 299-300 (1983).

90. J. S. Huang and M. Lung, Separating similar complex Chinese characters by Walsh transform, Pattern Recog- nition 20, 425-428 (1987).

91. G. P. R. Sarvarayudu and I. K. Sethi, Walsh descriptors for polygonal curves, Pattern Recognition 16, 327-336 (1983).

92. S. Wendling, C. Gagneux and G. Stamon, Use of Haar transform and some of its properties in character recognition, 3rd Int. J. Conf. Pattern Recognition, Coronado, CA, U.S.A. pp. 844-848 (8-11 Nov 1976).

93. S. Wendling, G. Stamon, Hadamard and Haar transforms and their power spectrum in character recognition, 1976 Joint Workshop on Pattern Recognition and Artificial Intelligence, Hyannis, Mass, U.S.A., 1-3 June 1976, 103-112 (1976).

94. M. Kushnir and K. Matsumoto, Recognition of handprinted Hebrew characters using features selected in the Hough transform space, Pattern Recognition 18, 103-114 (1985).

95. R. Ott, On feature selection by means of principal axis transform and nonlinear classification, Proc. 2nd Int. J. Conf. Pattern Recognition, pp. 220--222 (1974).

96. A. B. S. Hussain, G. T. Toussaint and R. W. Donaldson, Results obtained using a simple character recognition procedure on Munson's handpilnted data, IEEE T. Comput. 21, 201,205 (1972).

97. N. D. Tucker and F. C. Evans, A two-step strategy for character recognition using geometrical moments, Proc. 2rid Int. J. Conf. Pattern Recognition, pp. 223-225 (1974).

98. J. R. Ullmann, Experiments with the n-tuple method of pattern recognition, IEEE T. Comput. 18, 1135- I 137 (1969).

99. A. L. Knoll Experiments with "characteristic loci" for recognition of handprinted characters, IEEE T. Comput. i& 366-372 (1969).

100. A. W. Holt, Algorithm for a low-cost handprint reader, Comput. Design, 85-89 (February 1974).

101. P. M. Lewis, The characteristic selection problem in

recognition system, IEEE T. Inform. Theory g, 171-178 (1962).

102. C. H. Chen, Computer searching criterion for best feature set in character recognition, Proc. IEEE 53, 2128-2129 (1965).

103. K. S. Fu and G. P. Cardillo, A note on optimum feature selection, IEEE T. Automat. Contr. 12, 588-591 (1967).

104. W. C. Naylor, Some studies in the interactive design of character recognition system, IEEE 7". Compur. 20, 1075-1086 (1971).

105. F. All and T. Pavlidis. Syntactic recognition of handwritten numerals. IEEE T. Syst. Man Cyb. 7, 537-541 (1977).

106. K. Yamamoto and S. Moil, Recognition of handprinted characters by outermost point method, Proc. 4th Int. J. Conf. Pattern Recognition, Kyoto, Japan pp. 794- 796 (7-10 November 1978) (1978).

107. R. L. Grimsdale, F. H. Sumner, C. J. Tunis and T. Kilburn, A system for the automatic recognition of patterns, Proc. lEE 106, 210-221 (1959).

108. M. Eden, On the formalization of handwriting Struc- ture of Language and its Mathematical Aspects, pp. 83- 88, American Academic Society (1961).

109. M. Eden, Handwriting generalization and recognition, Recognizing Patterns, Kolers and M. Eden. Eds, pp. 138-154. M.I.T. Press, Cambridge, MA (1968).

! 10. B. Blesser, R. Shillman, T. Kuklinski, C. Cox, M. Eden and J. Ventura, A theoretical approach for character recognition based on phenomenological attributes, Proc. 1st Int. J. Conf. Pattern Recognition, Washington, pp. 33-40 (1973).

II1. C. Cox, B. Blesser and M. Eden, The application of type font analysis to automatic character recognition, Proc. 2nd Int. J. Conf. Pattern Recognition, Copen- hagen, pp. 226-232 (1974).

112. R. J. Shillman, T. T. Kukliuski and B. A. Blesser, Experimental methodologies for character recognition based on phenomenological attributes, Proc. 2nd Int. J. Conf. Pattern Recognition, Copenhagen, pp. 195-201 (1974).

113. M. Yoshida and M. Eden, Handwritten Chinese character recognition by an analysis by synthesis method, Proc. 1st Int. J. Conf. Pattern Recognition, Washington, pp. 197-204 (1973).

114. M. Bcrthod, Online analysis of cursive writing, Com- puter Analysis and Perception, Vol. I, Visual Signals, C. Y. Suen and R. D. Mori, Eds. CRC Press, Cleveland, Ohio, U.S.A. (1982).

115. R. Narasimhan, Labelling schemata and syntactic description of pictures, Inform. Contr. 7, 151 - 179 (1964).

116. R. Narasimhan, Syntax directed interpretation of class of pictures, Commun. ACM 9, 166-173 (1966).

117. R. Narasimhan, On the description, generation and recognition of classes of pictures, Automatic Interpret- ation and Classification of Images, A. Gasselli, Ed., pp. 1-42. Academic Press, New York (1969).

118. R. Narasimhan and V. S. N. Reddy, A syntax-aided recognition scheme for handprinted English letters, Pattern Recognition 3, 345-361 (1971).

119. T. Pavlidis and F. Ali, Computer recognition of handprinted numerals by polygonal approximation, IEEE T. Syst. Man. Cyb. 5, 610-614 (1975).

120. T. Pavlidis and S. L. Horowit~ Segmentation of plae curves, IEEE T. Comput. 23, 860-870 (1974).

121. H. F. Feng and T. Pavlidis, Decomposition of polygons into simpler components: Feature generation for syntactic pattern recognition, IEEE T. Comput. 24, 636- 650 (1975).

122. R. Casey and G. Nagy, Recognition of printed Chinese characters, IEEE T. Elec. Comput. IS, 91-101 (1966).

123. T. Agui and N. Nagahashi, A description method of handprinted Chinese characters, IEEE T. Pat. Anal. Mach. lntell. 1 20-24 (1979).


124. A. Arakawa, Online recognition of handwritten characters - - Alphanumerics and Hirakana, Katakana, Kanji, Pattern Recognition 16, 9-16 (1983).

125. C. H. Leung" Y. S. Cheung and Y. L. Wong" A knowledge based stroke-matching method for Chinese characters, IEEE T. Syst. Man Cyb. 17, 993-1003 (1987).

126. Y. Yong" Handprinted Chinese character recognition via neural networks, Pattern Recognition Lett. 7, 19- 25 (1988).

127. I. K. Sethi and B. Chatterjee, machine recognition of handprinted Devanagari numerals, J. Inst. Elec. Telecom. Engng (India) 22, 532-535 (1976).

128. I. K. Sethi, Machine recognition of constrained handprinted Devanagari, Pattern Recognition 9, 69-75 (1977).

129. R. M. K. Sinha and H. Mahabala, Machine recognition of Devanagari script, IEEE T. Syst. Man Cyb. 9, 435- 449 (1979).

130. R. M. K. Sinha, Role of context in Devanagari script recognition, J. Inst. Elec. Telecom. Engng (Indian) 33, 86-91 (1987).

131. R. M. Sinha, Role of contextual postprocessing for Devanagari text recognition, Pattern Recognition 20, 475-485 (1987).

132. G. Siromoney, R. Chandrasekaran and M. Chandrase- karan, Machine recognition of printed Tamil characters, Pattern Recognition 10, 243-247 (1978).

133. G. Siromoney, R. Chandrasekaran and M. Chandrase- karan, Machine recognition of Brahmi script, IEEE T. Syst. Man Cyb. 13, (1983).

IM. M. Chandrasekaran, R. Chandrasekaran and G. Siro- money, Context dependent recognition of handprinted Tamil characters, Proc. Int. Conf. Syst. Man Cyb., (India) 2, 786-790 (1984).

135. R. Chandrasekaran, M. Chandrasekaran and G. Siro- money, Computer recognition of Tamil, Malayalam and Devanagari characters, J. Inst. Elec. Telecom. Engng (India) 30, 150-154 (1984).

136. P. Chinnuswamy and S. G. Krishnamoorthy, Recogni- tion of handprinted Tamil characters, Pattern Recogni- tion 12, 141-152 (1980).

137. S. N. S. Rajasekaran and B. L. Deekshatulu, Recogni- tion of printed Telugu characters, Comput. Graph. Image Process. 6, 335-360 (1977).

138. A. K. Ray and B. Chatterj~, Design of a nearest neighbour classifier system for Bengali character recognition, J. Inst. Elec. Telecom. Engng (India) 30, 226- 229 (1984).

139. A. K. Dutta, A generalized formal approach for description and analysis for major Indian scripts, J. Inst. Elec. Telecom. Engng 30, 155-161 0984).

140. A. R. Marudarajan, K. Jayanthi and M. Rajeswari, Extension of adaptive threshold logic to printed Hindi numeral recognition, J. Inst. Elec. Telecom. Engng (India) 24, 223-225 (1978).

141. H. Tanaka, Y. Hirakawa and S. Kaneku, Recognition of distorted patterns using Viterbi algorithm, IEEE T. Pattern Anal. Mach. lntell. 4, 18-25 (1982).

142. K. Sato, I. Isshiki, A. Ohoka and K. Yoshida, Hand-scan OCR with a one-dimensional image sensor, Pattern Recognition 16, 459-467 (1983).

143. C. J. Evangelisti, Some experiments in the evaluation of character recognition scanners, Pattern Recognition 16, 273-287 (1983).

144. G. Nagy, S. Sethi and Einspahr, Decoding substitution ciphers by means of word matching with application to OCR, IEEE T. Pattern Anal. Mach. lntell. 9, 710- 715 (1987).

145. H. Almuallim and S. Yamaguchi, A method of recognition of Arabic cursive handwriting, IEEE T. Pattern Anal. Mach. lntell. 9, 715-722 (1987).

146. T. S. El-Sheikh and R. M. Guindi, Computer recogni-

tion of Arabic cursive scripts, Pattern Recognition 21, 293-302 (1988).

147. J. J. Hull, Word shape analysis in a knowledge based system for reading text, 2nd Conf. on Artificial lhtelli- gence Applications: The Engineering of Knowledge Based System, Miami Beach, FL, U.S.A., pp. 114-119 (11-13 December 1985).

148. K. Aoki and Y. Yamaya, Recognizer with learning mechanism for handwritten English script words, Proc. 8th Int. J. Conf. Pattern Recognition, Paris, France, pp. 690-692 (27-31 October 1986).

149. K. H. Wong and F. Fallside, Dynamic programming in the recognition of connected handwritten scripts, 2nd Conf. Artificial Intelligence Applications: The Engin- eering of Knowledge Based Systems, Miami Beach, FL, U.S.A., pp. 666-670 (I 1-13 December 1985).

150. S. N. Srihari and R. M. Bozinovic, A multilevel perception approach to reading cursive script, Artificial lntell. (Netherlands) 33, 217-255 (October 1987).

151. K. R. Tampi and S. S. Chetlur, Segmentation of handwritten characters, Proc. 8th Int. J. Conf. Pattern Recognition, Paris, France, pp. 684-686 (27-31 October 1986).

152. V. R. Ting and R. K. Ward, Separation and recognition of connected handprinted English characters, IEEE Pacific Rim Conf. Communications, Computers and Sig- nal Process. Proc., Victoria, BC, Canada, 4-5 June 1987, pp. 512-516 (1987).

153. L. Lain and C. Y. Such, Structural classification and relaxation matching of totally unconstrained handwritten ZIP-code numbers, Pattern Recognition 21, 19- 31 (1988).

154. G. Baptista and K. M. Kulkarni, A high accuracy algorithm for recognition of handwritten numerals, Pattern Recognition 21, 287-291 (1988).

155. J. J. Cannat and Y. Kodratoff, Learning technique applied to multifont character recognition, Proc. SPIE, Int. Soc. Opt. Engng 635, 469-479 (1986).

156. J. J. Cannat, Y. Kodratoff and S. Moscatelli, Learning techniques applied to multifont character recognition, Proc. 8th Int. J. Conf. Pattern Recognition, pp. 123- 125 (1986).

157. R. Malyan and R. Sunthankar, Handprinted text reader that learn by experience, Microprocessor Microsystem (U.K.) 10, 377-385 (1986).

158. H. Nagahashi and M. Nakatsuyama, A pattern description and generation method for structural characters, IEEE T. Pattern Anal. Mach. Intell. 8, 112-I 18 (1986).

159. G. Wolberg" A syntactic Omni-font character recognition system, Proc. CVPR '86: IEEE Comput. Society Conf. on Computer Vision and Pattern Recognition, Miami Beach, FL, U.S.A, pp. 168-173 (22-26 June 1986).

160. S. W. Lain and H. S. Baird, Performance testing of mixed font variable size character recognizers, Proc. 5th Scandinavian Conf. Image Analysis, Stockholm, Sweden, Vol. 2, pp. 563-570 (2-5 June 1987).

161. T. Pavlidis, A vectorizer and feature extractor for document recognition, Comput. Wtsion Graph. Image Process. 35, l 11-127 (1986).

162. H. Hagita and I. Masuda, design principles of feature vectors for recognition of large character sets, Proc. 1987 Int. Conf. Syst. Man Cyb., Alexandria, VA, U.S.A., Vol. 2, pp. 826-830 (20-23 October 1987).

163. K. Ishii, N. Kanemaki and K. Komori, Automatic design of a character recognition dictionary based on feature concentration method, Proc. 4th Int. J. Conf. Pattern Recognition, Kyoto, Japan, pp. 804-806 (7-10 Nov 1978).

164. H. Kami, Evaluation of automatic dictionary generation for character recognition, NEC Res. De,el. pp. 42-47 (January 1984).

165. V. K. Govindan, Computer Recognition of Hand-


printed Characters: An Automated Approach to the Design of Recognizers, Ph.D. Thesis, Dept of Electrical Communication Engineering Indian Institute of Sci- ence, Bangalore, India (1988).

166. R.C. Kurzweii, Artificial intelligence program at CORE of scanning system, Graphic Arts Mort. 56, 564-566 (1984).

167. J. McCormick, Saba handscan, BYTE 12, 165-167 (December 1987).

168. Automatic recognition of indistinct characters, Electro- technik (W. Germany) 69. 44, 47 (26 October 1987). (In German.)

169. A. E. Cawkell, Scanner tutorial and survey, inf. Media TechnoL 21, 19-25 (1988).

170. T. Coleman, Do you read me PC?. PC User (U.K.) No. 58, 125, 128 (March 1987).

171. B. Crider, Scanners: gaining recognition, PC World (U.S.A.) 4, 230-235 (August 1986).

172. H. lshiguro, M. Miyamoto, K. Shigeta, K. Hiromori, A. Fukusawa, Y. Mural, F. Kawamata and K. Kondoh, N3670G hand printed OCR system, NEC Tech. J. (Japan) 39, 132-138 (1986). (In Japanese.)

173. M. McCormick, Text scanners for IBM PC, BYTE 12, 233-238 (April 1987).

174. A. Kauch and H. Lincke, OCR in mail order business, COM (Germany) 20, 24-27 (1985). (In German.)

175. D. Beckmann, Creating readable documents [Siemens SLS 9691 OCR], COM 20, 44-45 (1985). (In German.)

176. T. Mori, S. Mori and K. Yamamoto, Field effect method for feature extraction from patterns - - extraction of concavities and enclosures, Syst. Comput. Control (U.S.A.) 5, 44-50 (1974).

177. J. Schurmann, Reading machines, Proc. 6th int. J. Conf. Pattern recognition, Munich, F.R.G., pp. I031-1044 (October 1982.)

About the Author--V. K. GOVlNDAN received the Bachelors degree in Electrical Engineering and Masters degree in Instrumentation and Control systems from Calicut University, Calicut, India in 1975 and 1978 respectively. Recently, he submitted his Ph.D. Thesis in 'Character Recognition' from the Indian Institute of,Science, Bangalore, India. Presently, he is working as an Assistant Professor in the Electrical Engineering Department of Calicut Regional Engineering College, Calicut, India. His research interests include artificial intelligence, character recognition, automatic learning and microprocessor-based systems.

About the Author--A. P. SHIVAPRASAD recfivb-"d the B.E., M.E., and Ph.D. Degrees in Electrical Communication Engineering from the Indian Institute of Science, Bangalore, in 1965, 1967 and 1972 respectively. Since 1967 he has been a member of the staff of Indian Institute of Science where he presently holds the post of Associate Professor in the Department of Electrical Communication Engineering. He has published a number of papers and his fields of interest include microprocessor-based instrumentation, electronic circuits and communication systems.

PII 23:7-I I

character recognition - a review

Documents