literature survey - inflibnetshodhganga.inflibnet.ac.in/bitstream/10603/35020/10/11_chapter2.pdf ·...

7

Literature Survey

For the past several years, there has been an increasing interest among researchers in

the problem related to extracting text from video. Intensive research has been carried

out in this area, which is evident from large number of technical papers. One such

application is locating the number plate in a video.

Zoe Jeffrey, Xiaojun Zhai et al., have proposed a method of Automatic number plate

recognition system based on ARM- DSP The arithmetic capability of digital signal

processors (DSPs), the multiple peripheral interfaces and the high frequency

execution of the ARM processors make them an attractive choice for real time

embedded systems. DSPs are already widely used for applications such as audio and

speech processing, image and video processing, and wireless signal processing.

Practical applications include surveillance, video encoding and decoding, and object

tracking and detection in images and video. On the other hand, rapid development of

Field Programmable Gate Arrays (FPGAs) offers alternative way to provide a low

cost acceleration for computationally intensive tasks such as digital signal processing.

Most of these applications use ARM, DSPs and FPGAs due to the processing power

offered, in order to provide portability and real-time capability, and create custom

embedded architectures for different application requirements. The main goal of this

work is to design and implement efficient and novel architectures for automatic

number plate recognition (ANPR) system using ARM-DSP System-on-Chip platform,

which operates in high definition (HD) and in real time. In addition, a separate ANPR

algorithm is developed and optimized, by taking advantage of technical features of

FPGAs which accelerate digital image processing algorithms. The investigation of the

8

algorithm and its optimization focused on real time image and video processing for

license plate (LP) or number plate localization (NPL), LP character segmentation

(NPS) and optical character recognition (OCR) in particular, which are the three key

stages of the ANPR process. ANPR often forms part of an intelligent transportation

systems. Its applications include identifying vehicles by their number plates for

policing, control access and toll collection.

The distance at which a vehicle plate could be identified using a specified lens at

maximum zoom is provided in the work by Mike Constant [10]. The distance can

vary from 100 meters to 300 meters in some cases. The common guidelines suggest

that, to read a number plate, the car should be 50% of the screen height. The height of

the vehicle is assumed as 1.5 meters and the size of the lens as 7.5-75 mm.

Michael Lidenbaum et al. has devised an algorithm for moving car license plate

recognition. He has developed a prototype system, which is capable of recognizing a

license plate number. The recognition will be performed in almost real time, watching

cars passing at low speed in front of video recording device. In the beginning, a video

is taken on a sunny day with ordinary camera settings. During the development, he

concluded, that the current picture quality is excellent for first task, cutting license

plate containing frames, but the second part, number recognition was almost

impossible because of the following reasons:

The pictures were taken with normal exposure time that caused smoothing of

picture in general and a number in particular.

The number was too small and too few pixels available to analyze.

9

The picture with small exposure time also was not used because the dynamic range of

the picture was too small for reliable detection of yellow color. The acceptable quality

of the picture was achieved when the tradeoff between exposure time and dynamic

rage was taken into account.

The objects are in fact, portions from the original image having a higher average

contrast measure. In particular (different lighting conditions, different colors, different

distance from the camera, etc), the area containing the car license plate number proves

to be one of these objects. Once the interest areas are thus segmented, they are smartly

binarized (with the aid of some statistical methods and using several test points), and

passed for further segmentation to the recognition/ training subsystem.

The recognition / training subsystem is based on a proprietary approach in the field of

hyper sphere classifiers which led us several years ago to a neural like technology,

easy enough to be implemented even on relatively slow PC systems. This technology

features a high degree of noise tolerance and generalization power. It practically

allows learning and further recognition of any graphical symbol.

The first training experiments were focused on car license plates recognition. The

number of test images taken was about 200 images in various conditions. This amount

of training ensured a recognition ratio of well over 99%. It is emphasized again that

the recognition/training system can, in fact, learn any kind of machine printed text, if

properly segmented within the image. Initial components in a character recognition

algorithm are the features extracted for the classifier. Feature analysis determines the

descriptors or feature set, used to describe all characters. Given a character image, the

10

feature extractor derives the features that the character possesses. The derived features

are then used as input to the character classifier.

The process was similar to a skeletonization, with only the significant internal borders

of the objects being highlighted. Although this style of representation could greatly

simplify classification, there is a potential major drawback. Most of the descriptions

on the top half of the labels are incredibly dense. So, rather than emphasizing on the

shape of the characters, median thresholding often distorts them to the point that they

are unrecognizable.

Several issues prevent this from being an ideal segmentation:

The label occupies less than 80% of the image, which causes a large portion of

the label border to be considered a region. When this is compressed for

identification, however, however, the point will be moot.

Due to particularly harsh lighting arrangements, characters such as major zero

appear to be internally separated. Even the preprocessing techniques cannot

completely eliminate this issue, but this is admittedly an extreme case.

Many of the characters on the top half of the label are facing at an upward

angle, resulting from the depths of the threads. This may have important

ramifications for letter identification. Recognizing these types, unfortunately,

would require three dimension vision techniques and we plan to investigate

this in future.

11

Finally, the iteration is performed on the image, and translates each of these objects

into individual portable bitmaps to maintain uniqueness of each of these files which

are named according to their labels in the initial image.

Modern OCR technology is proposed by M. Sheppard’s, GISMO A Robot Reader

Writer. invented in 1951. In 1654, J. Rainbow has developed a prototype machine

that was able to read uppercase typewritten output at the fantastic speed of one

character per minute. During the late 1960’s, the technology underwent many

dramatic developments, but OCR systems were considered exotic and futuristic, being

used only by government agencies or large corporations.

Today, OCR systems are less expensive, faster, and more reliable, it is not uncommon

to find PC-based OCR systems, which are capable of recognizing several hundred

characters per minute. Less expensive electronic components and extensive research

have paved the way for these new systems. Commercial OCR systems can largely be

grouped into two categories: task-specific readers and general purpose page readers:

The first technique that was considered by mean and median thresholding, works very

similar to mean and median smoothing. That is, a neighborhood, typically 3x3 or 5x5,

is analyzed and, for mean filtering, the bisection is determined by the average of the

surrounding pixel values. Using this procedure, though, the same issues are predicted

as mean smoothing that is, an average would weaken the edges of the objects we are

attempting to detect.

Though the classification in the bottom half of the image has been obtained, there is

serious degradation in the description. Optimally, it is better to retain as much of the

text as possible, the prediction that this was not the expected result.

12

According to Sorin Draghici et al., an artificial neural network based artificial vision

system is able to analyze the image of a car given by a camera, locate the registration

plate and recognize the registration number of the car. This paper describes in detail

various practical problems encountered in implementing this particular application

and the methods used to solve them. The main features of the system presented are

controlled stability, plasticity, behavior, controlled reliability threshold, both offline

and online learning, self assessment of the output reliability and high reliability based

on high level multiple feedbacks.

The system proposed by Sorin Draghici et al., has designed using a modular approach

which allows easy upgrading and/or substitutions of various sub modules thus making

it potentially suitable in large range of vision applications. The OCR engine was

designed as an interchangeable plug-in module. This allows the user to choose an

OCR engine which is suited to the particular application and to upgrade it easily in

future. At present, there are several versions of OCR engine. One of them is based

on fully connected feed forward artificial neural network with sigmoidal activation

functions. This network can be trained with various training algorithms such as error

back propagation. An alternative OCR engine is based on the constraint based

decomposition (CBD) training architecture.

The system has showed the following performance (on average) on real-world data

successful plate location and segmentation is about 99%, successful character

recognition is about 98% and successful recognition of complete registration number

plates of about 80%.

13

Leonard G.C.Hamy et al., has described the task of recognition of Australian vehicle

number plates (also called license plates or registration plates in other countries). A

system for Australian number plate recognition must cope with wide variations in the

appearance of the plates. Each state uses its own range of designs with font variations

between the designs. There are special designs issued for significant events such as

the Sydney 2000 Olympic Games. Also, vehicle owners may place the plates inside

glass covered frames or use plates made of non-standard materials. These issues

compound the complexity of automatic number plate recognition, making existing

approaches inadequate. They have developed a system that incorporates a novel

combination of image processing and artificial neural network technologies to

successfully locate and read Australian vehicle number plates in digital images.

Commercial application of the system is envisaged.

According to Serkan Ozbay et al., and Ergun Ercelebi et al., Automatic Vehicle

Identification (AVI) has many applications in traffic systems (highway electronic toll

collection, red light violation enforcement, border and customs checkpoints, etc.).

License Plate Recognition is an effective form of AVI systems. In their study, a smart

and simple algorithm is presented for vehicle’s license plate recognition system. The

proposed algorithm consists of three major parts:

1. Extraction of plate region,

2. Segmentation of characters

3. Recognition of plate characters.

For extracting the plate region, edge detection algorithms and smearing algorithms

are used. In segmentation part, smearing algorithms, filtering and some morphological

14

algorithms are used. Also finally statistical based template matching is used for

recognition of plate characters. The performance of the proposed algorithm has been

tested on real images.

Halina Kwasnicka et al., and Bartosz Wawrzyniak et al., have described an approach to

license plate localization and recognition. They proposed a method which is designed

to perform recognition of any kind of license plates under any environmental

conditions. The main assumption of their method is the ability of recognition of all

license plates which can be found in an individual picture. To solve the problem of

localization of a license plate, two independent methods are used. The first one was

based on the connected components analysis and the second one search for the

“signature” of the license plate in the image. Segmentation of characters is performed

by using vertical projection of license plates image. However, a simple neural

network is used to recognize them. Finally, to separate correct license plates from

other captions in the picture, during the license plate recognition process, a syntax

analysis is used. The proposed approach is discussed together with results obtained on

a benchmark data set of license plate pictures. In this paper, examples of correct and

incorrect results are also presented, as well as possible practical applications of

proposed method.

According to Hyo Jong Lee et al., although the recognition of a license plate number

or vehicle type has been researched, the recognition of vehicles using all features has

not been studied due to its complexity. In this paper, a novel method is proposed to

identify vehicles with specific information that is color, license plate, and vehicle’s

model. Low level image processing and texture descriptors are computed from the

15

front image of vehicles. Then, two three layer neural networks were built and trained

for license plate and vehicle’s model identification.

Zhong et al. (1995) has located text in images of compact disc, book cover, or traffic

scenes in two steps. In the first step, approximate locations of text lines were obtained

and then text components in those lines were extracted using color segmentation. Wu

et al. (1999) has proposed a texture segmentation method to generate candidate text

regions. A set of feature components is computed for each pixel and these are

clustered using K-means algorithm.

Shivakumara et al. (2010) has proposed an algorithm to detect video text for low and

high contrast images, which are classified by analysing the edge difference between

Sobel and Canny edge detectors. After computing edge and texture features, low-

contrast and high-contrast thresholds are used to extract text objects from low and

high contrast images separately.

Shyang-Lih Chang et al., Li-Shien Chen et al., Yun-Chung Chung et al., and Sei-Wan

Chen et al., Automatic license plate recognition (LPR) plays an important role in

numerous applications and a number of techniques have been proposed. However,

most of them worked under restricted conditions, such as fixed illumination, limited

vehicle speed, designated routes, and stationary backgrounds. In this study, as few

constraints as possible on the working environment are considered. The proposed

LPR technique consists of two main modules: a license plate locating module and a

license number identification module. The former characterized by fuzzy disciplines

attempts to extract license plates from an input image, while the latter conceptualized

in terms of neural subjects aims to identify the number present in a license plate.

16

Experiments have been conducted for the respective modules. In the experiment on

locating license plates, 1088 images taken from various scenes and under different

conditions were employed. Of which, 23 images have been failed to locate the license

plates present in the images; the license plate location rate of success is 97.9%. In the

experiment of identifying license number plate, 1065 test images from which license

plates have been successfully located. In which, 47 images have been failed to

identify the numbers of the license plates located in the image. The identification rate

of success is 95.6%. Combing the above success rates, the overall rate of success of

our license plate recognition algorithm is 93.7%.

Prathamesh Kulkarni et al., Ashish Khatri et al., Prateek Banga et al., Kushal Shah et al.,

Automatic Number Plate Recognition (ANPR) is a real time embedded system which

automatically recognizes the license number of vehicles. In this paper, the task of

recognizing number plate for Indian conditions is considered, where number plate

standards are rarely followed. The system consists of integration of algorithms like:

‘Feature-based number plate Localization’ for locating the number plate, Image

Scissoring for character segmentation and statistical feature extraction for character

recognition; which is specifically designed for Indian number plates. The system can

recognize single and double line number plates under widely varying illumination

conditions with a success rate of about 82%.

Papavassiliou et al. (2007) has proposed a parametric spectral-based method for text

verification in videos. By assuming that the horizontal projections of text regions are

periodic, the author has computed the spectrum of the projection and apply linear

prediction coefficients analysis to estimate the poles of the candidate block. The

17

amplitude and angle of the pole and the spectral centroid value of the projection are

used as features to classify candidate text blocks. However, if a text block is mixed

with background edges, the periodicity of the text area is spoiled and the approach

may fail.

Jing Zhang et al. (2008) has proposed a ‘new edge-based text verification approach

for video’. In this paper, they propose a new edge-based text verification approach for

video. Based on the investigation of the relation between candidate blocks and their

neighbor areas, the proposed approach first detects background edges in candidate

blocks, and then erases them by an edge tracking technique, and finally the candidate

blocks containing too few remaining edges are eliminated as false alarms. Three

measures for text detection evaluation in video were used to assess the performance of

the proposed text verification approach.

Vassilis Papavassiliou et al. (2007) has proposed a new method for verifying text

areas detected in video streams. This algorithm explores the spectral properties of the

horizontal projection of candidate text regions in order to reduce the high amount of

false alarms that most text detection algorithms suffer from. The full algorithm (text

localization followed by verification and temporal redundancy module) has been

tested on newscast video sequences (MPEG-1-720x576 resolution-184 minutes). The

detection module produced 94.82% recall rate but only 51.84% precision rate. The

addition of the verification module increased the precision rate to 78.93%, keeping the

recall rate almost unaffected.

The closest related work is that of Li et al. (2000) for video text tracking. The system

includes a component for text frame classification to find the first text frame in a

18

video stream in order to start text tracking. The method of text frame classification is

based on a supervised learning method using a neural network classifier. The method

is thus dependent on the training set and requires considerable training time for the

use of the neural network classifier. It serves also a different objective from our

present work as our aim is to classify a set of unknown video images into classes of

text and non-text frames. Li’s system, on the other hand, is to locate a starting text

frame from a video stream known to contain text using a training set of video text

frames. Furthermore, our proposed method is unsupervised.

Palaiahnakote Shivakumara et al. (2012) have proposed multi-oriented video scene

text detection through Bayesian classification and boundary growing method. They

presented a new enhancement method that includes product of Laplacian and Sobel

operation to enhance text pixels in video. To classify true text pixels, they propose a

Bayesian classifier without assuming a priori probability about the input frame but

estimating it based on three probable matrices. Three different ways of clustering are

performed on the output of enhancement method to obtain the three probable

matrices. Text candidates are obtained by intersecting the output of Bayesian

classifier with the canny edge map of the input frame. A boundary growing method is

introduced to traverse the multi-oriented scene text lines using text candidates. The

Boundary growing method works based on the concept of nearest neighbor. The

robustness of the method has been tested on a variety of datasets that include their

own created data (non-horizontal and horizontal text data) and two publicly available

data namely video frames of Hua and complex scene text data of ICDAR 2003

competition (camera images).

19

Shivakumara et al (2011) have proposed a laplacian approach to multi oriented text

detection in video. Unlike many other approaches which assume that text is

horizontally oriented, this method is able to handle text of arbitrary orientation. The

input image is first filtered with Fourier-Laplacian. K-means clustering is then used to

identify candidate text regions based on the maximum difference. The skeleton of

each connected component helps to separate the different text strings from each other.

Finally, text string straightness and edge density are used for false positive

elimination.

Lukas Neumann and Jiri Matas (2012) have proposed a method real-time scene text

localization and recognition. The real-time performance is achieved by posing the

character detection problem as an efficient sequential selection from the set of

External Regions (ERs). The ER detector is robust to blur, illumination, color and

texture variation and handles low contrast text. In the first classification stage, the

probability of each ER being a character is estimated using novel features calculated

with O(1) complexity per region. Only ERs with locally maximal probability are

selected for the second stage, where the classification is improved using more

computationally expensive features. A highly efficient exhaustive search with

feedback loops is then applied to group ERs into words and to select the most

probable character segmentation. Finally, text is recognized in an OCR stage trained

using synthetic fonts.

Shivakumara et al (2008, 2009, 2010, and 2012) have proposed method for text

detection in video image and camera image as well based on edge features and texture

features. The main focuses of these methods is text detection in video but not text

20

detection in natural scene images. Therefore, the methods give good accuracy for

video text detection.

Yang Zhang et al. (2012) have proposed a ‘new method for text verification based on

random forests’. In this paper, they would exploit the performance of random forests

for text verification. And to combine different features with random forests trained

with different kinds of features, they can improve the accuracy of classification.

Experimental results demonstrate that random forests are suitable for text verification,

superior or comparable with SVM and it can improve the accuracy of classification by

merging different kinds of feature.

In the present state of art, the concerned authorities have to stop the vehicle, and ask

the drivers to produce the documents related to all these information. Some of the

information like tax paid receipt and insurance documents can be verified. However,

other information as we have mentioned above can be verified on case to case basis,

i.e., if there is a request by the higher authorities to check, only then it is verified.

Tracking down these types of vehicles manually is difficult task because the

authorities have to monitor the vehicles day and night. Another difficulty is to note

down the number that is present on the vehicle, whether the tax payment is up to date,

and it is also difficult to find the duplication of the numbers. Automating this process

by placing a camera at a constant position could get over these problems. The camera

will take the pictures and using these pictures, further processing can be done.

The main goal is to build a prototype system, which should be capable of recognizing

a license plate number of standard format. The recognition should be performed in

real time, watching cars passing at low speed in front of video recording device.

21

Locating and detecting text in video is an interesting and real time research problem,

which finds lot of applications in multimedia related area. This problem is nearer to

the human perception as some of the strategies can be taken from human perception.

In this work, a method is proposed to locate the vehicle number written in the front

or back panel of the vehicle. The input is taken from a stationary camera, which

continuously takes the video of the passing vehicles through it. The problem of

location involves lot of preprocessing activities like, normalization, skew detection

and correction and segmentation. Quality of the video produced by camera is not

always consistent; hence, it is required to carry out preprocessing activities such as

noise removal, edge detection, is done on the recorded video. Any standard OCR can

be used at later stage to identify the text. Since the domain of the characters is very

limited in the text of vehicle number, high recognition rate can be expected in the

OCRs. Segmented characters are to be recognized. It was decided to use an

algorithm, which must be as simple as possible, since the types of characters that

appear on the number plates are limited. Some of the papers which inspired us in

developing the 14-segment algorithm are mentioned here.

literature survey - inflibnetshodhganga.inflibnet.ac.in/bitstream/10603/35020/10/11_chapter2.pdf ·...

Documents