introducing a large dataset of persian license plate characters introducing a large dataset of...

8
Introducing a large dataset of Persian license plate characters Amir Ebrahimi Ghahnavieh Mahmoud Enayati Abolghasem A. Raie Introducing a large dataset of Persian license plate characters Amir Ebrahimi Ghahnavieh Mahmoud Enayati Abolghasem A. Raie

Upload: independent

Post on 20-Nov-2023

0 views

Category:

Documents


0 download

TRANSCRIPT

Introducing a large dataset ofPersian license plate characters

Amir Ebrahimi GhahnaviehMahmoud EnayatiAbolghasem A. Raie

Introducing a large dataset ofPersian license plate characters

Amir Ebrahimi GhahnaviehMahmoud EnayatiAbolghasem A. Raie

Introducing a large dataset of Persian license platecharacters

Amir Ebrahimi Ghahnavieh,a,* Mahmoud Enayati,b and Abolghasem A. Raiea

aAmirkabir University of Technology (Tehran Polytechnic), Mobile Robots Research Laboratory, Faculty of Electrical Engineering,424 Hafez Avenue, Tehran, Iran 15875-4413bBani Nick Pardazesh Company, Unit 13, No. 2, Jahangiri Alley, North Sohrevardi Street, Tehran 1558843747, Iran

Abstract. A large dataset of Persian license plate characters is introduced. These extracted characters areprovided by Bani Nick Pardazesh Company, which is a pioneer in the intelligent transportation systemsfield, and its license plate recognition system has been used for many applications in Iran. Natural scene vehicleimages delivered from this company were in various conditions. Most of them were taken with visible light duringday and a few of them by infrared light with 850 nm wavelength during night. The vehicle images were achievedby color, black and white, or infrared cameras from front view and back view of automobiles in ∼20 differentindoor and outdoor locations, such as streets, roads, and parking lots, for different purposes, such as trafficcontrol, issuing fines, etc. The images are different in size and angle and were taken in light and dark back-grounds, where the direction and intensity of the light varied. Also, some of the license plates were muddyand had parts that were shadowed. Out of all the available images, ∼20;145 Persian characters were extractedby an intelligent system and verified by human observers. The extracted images are different in size, and some ofthem suffer from elimination, distortion, rotation, and noise. © 2014 SPIE and IS&T [DOI: 10.1117/1.JEI.23.2.023015]

Keywords: intelligent transportation systems; license plate recognition; Persian license plate characters; Persian license platedataset.

Paper 13589 received Oct. 18, 2013; revised manuscript received Jan. 27, 2014; accepted for publication Feb. 3, 2014; publishedonline Apr. 3, 2014.

1 IntroductionToday, most of the developed and developing countries useintelligent transportation system for traffic supervision.License plate recognition (LPR) is a key element of thissystem with a great variety of applications, such as trafficcontrol, fee payment, issuing fines, etc. Therefore, LPRsystems have attracted a great deal of attention and manyresearchers have been trying to develop and enhance them.Each LPR system is composed of three subsystems: licenseplate detection, character segmentation, and character recog-nition. Two complete reviews on the available methods inLPR field are studied in Refs. 1 and 2. Character recognitionsubsystem is the last but not the least main part, which hasa direct effect on the overall system accuracy.

For license plate character recognition step, lots of meth-ods have been proposed so far. Template matching is a sim-ple way for this purpose. Also, probabilistic neural networksshowed better results compared with template matching.3

Moreover, multi-layer perceptron neural networks arenowadays very popular,4,5 but lots of training parametersare involved in these networks, and the training processwill be finished after obtaining the first satisfying separatorhyperplane.

Support vectors machines (SVMs) provide higher gener-alization capability compared with neural networks due totheir optimized separator and are introduced as a superiorclassifier.6 Indeed, these kinds of classifiers are binary andhave to be generalized for license plate character recognitionpurpose, which needs multiclass classifier. In Ref. 7, SVM is

used for license plate character recognition, and a traditionalmethod called one-against-all is used for generalization.Also, a combination of SVM and other classifiers, such asneural networks, are studied in Ref. 8.

Some of the mentioned methods were used on Latinlicense plates and some on Persian ones. It should benoted that LPR issue is an ongoing research in Iran.5,8–11

Unfortunately, there is not any complete Persian datasetfor researchers and developers. The necessity of having aPersian dataset promotes us to create one. The major aimof carrying out this paper is to introduce a large dataseton Persian license plate characters. The remainder of thisstudy is organized as follows. First, a review on availabledatasets for handwritten and license plate characters is dis-cussed. Second, characteristics of standard format of Persianlicense plates are mentioned. Third, necessary equipmentand acquisition condition of images is stated and parametersof the camera, such as camera resolution, shutter speed, ori-entation, and light, are listed. Finally, the extracted charactersare evaluated.

2 Available DatasetsThere are several datasets of handwritten digits and letters inEnglish. The Center for Pattern Recognition and MachineIntelligence (CENPARMI) digit dataset12 is available fromCENPARMI, Concordia University. It contains 6000 digitscollected from the envelop images of United States PostalService (USPS), scanned at 166 dpi. In this dataset, 4000images, 400 samples per class, are specified for trainingand the remaining 2000 images are for testing.

*Address all correspondence to: Amir Ebrahimi Ghahnavieh, E-mail: [email protected] 0091-3286/2014/$25.00 © 2014 SPIE and IS&T

Journal of Electronic Imaging 023015-1 Mar–Apr 2014 • Vol. 23(2)

Journal of Electronic Imaging 23(2), 023015 (Mar–Apr 2014)

The Center of Excellence for Document Analysis andRecognition (CEDAR) digit dataset is available fromCEDAR, State University of New York at Buffalo. Theimages were scanned at 300 dpi. The training and testsets contain 18,468 and 2711 digits, respectively. The num-ber of samples in both training and test sets differ foreach class. Since some images in the test set are poorly seg-mented, a subset of 2213 well-segmented images is alsoprovided for testing.13

The modified National Institute of Standards andTechnology (MNIST) dataset14 was extracted from theNIST. Samples are normalized into 20 � 20 grayscale imageswith aspect ratio reserved, and the normalized images arelocated in a 28 � 28 frame. Number of training and testsamples is 60,000 and 10,000, respectively. Also anotherone USPS digit dataset has 7291 training and 2007 testsamples.15

The described datasets are all on English characters. Hodadataset16 is available online on Persian handwritten digits.The dataset consists of binary images of 102,352 digits,which were extracted from ∼12;000 registration forms, filledby BSc and senior high school students. These forms werescanned at 200 dpi with a high speed scanner. Besides Hoda,IFHCDB dataset from Amirkabir University of Technologyis available for researchers to verify their methods on a sameset of handwritten letters and digits.17

All the above-mentioned datasets were on handwrittencharacters. Handwritten characters are difficult to recognizebecause of their varieties, such as size, rotation, etc. Hence,these datasets are not adequate to use in LPR systems andthere is a necessity to provide one. There is a dataset onGreek license plates in Ref. 18, but it is limited in numberand it is just for Latin characters. In this paper, a new Persianlicense plate character dataset is introduced to help research-ers evaluate their methods.

3 Persian License Plate CharacteristicsLicense plates in Iran have been changed frequently duringthe years. In Fig. 1, some of the old ones and in Fig. 2, somenewer ones are shown. Variety of formats was a challenge inautomatic LPR systems. Fortunately, nowadays all of themare changed to a new format as shown in Fig. 3.

New license plates include a blue strip on the left side withthe flag of Iran and the typing “I. R. IRAN.” This colorfulstrip can be used as a clue for license plate detection. Themiddle part of a license plate includes six characters—five

digits and one letter. On the right side, it has a two-digit num-ber in a smaller size compared with the middle-part charac-ters and the name of Iran in Persian can be seen on the upperpart. Standard license plates are available in four backgroundcolors: red for governmental cars, yellow for public automo-biles, green for police cars, and white for the rest. Some fea-tures of standard format are summarized in Table 1.

It is worth mentioning that the two-digit number on theright side of the license plate is based on the place where thelicense plate is registered. Twenty-four characters, which areextracted from standard format of Iranian license plates, arelisted in Fig. 4. These characters include 9 digits (1 to 9) and15 letters, which are a part of the Persian alphabet. The rea-son not to use other letters in Persian license plates is thesimilarity between characters. For example, Fig. 5 showssome of the similar characters. Note that some of the alpha-bets are for special purposes. These special purposes arelisted in Fig. 6. Also, characters lose their dots dur-ing processing. Therefore, they will be recognized by theirbackground color as stated in Fig. 6. The background colorof the character is white.

Fig. 1 Older license plates used in Iran.

Fig. 2 Newer license plates used in Iran.

Fig. 3 Standard format of license plates in Iran.

Table 1 Features of standard license plates format in Iran.

Size Length: 52 cm/Height: 11 cm

Background color White/yellow/red/green

Background color type Reflective

Foreground color Black, white

Foreground color type Nonreflective

Journal of Electronic Imaging 023015-2 Mar–Apr 2014 • Vol. 23(2)

Ghahnavieh, Enayati, and Raie: Introducing a large dataset of Persian license plate characters

4 Necessary Equipment and Image AcquisitionCondition

4.1 Necessary EquipmentImages of this dataset were taken by IP cameras from AxisCompany.19 Images are sent to a server for analysis using TCP/IP protocol by a switch. License plates are recognized and savedin a computer by software. Then, all of the information is sent toa control center for further processing, such as issuing fines.

The diagram of Fig. 7 shows the LPR software systemsteps of our LPR system. After taking an image by a camera,license plate location is extracted. Then characters aresegmented and finally recognized. Based on the describeddiagram, computer software is designed. This software usesVisual C++ for LPR kernel dll, C#.Net for graphic user inter-face, MATLAB® for data sample clustering and templatematching, and Neuro Solutions for character recognitionusing neural networks.20 A detailed description of all thesteps is listed as follows:

• Automobile detection: In this section, automobiles aredetected using motion detection.

• Edge detection and morphology: Here, a 3 × 7 convo-lution mask is used to detect edges of the image.Afterward, the edges join together using morphologi-cal operators and multiple interlaces scanning.21,22 Theconvolution mask is an extended mode of the Sobelmask, which is shown below.

24−3 −2 −1 0 1 2 3

−6 −4 −2 0 2 4 6

−3 −2 −1 0 1 2 3

35

• License plate extraction: Based on the abundance ofthe edges (especially vertical edges) in each region,the license plate is extracted.

• Image enhancement and thresholding: Mean andmedian filters are used in this step to remove noise

Fig. 5 Twenty four characters used in Iranian license plates.

Fig. 6 Twenty four characters used in Iranian license plates.

Fig. 4 Twenty four characters used in Iranian license plates.

Journal of Electronic Imaging 023015-3 Mar–Apr 2014 • Vol. 23(2)

Ghahnavieh, Enayati, and Raie: Introducing a large dataset of Persian license plate characters

and illumination variations in the license plate region.Also, averaging neighborhood pixels is used for binar-ization (by using hundreds of pixels around each pixel).

• Feature extraction and pattern recognition: In this step,by using region growing method and some morpho-logical operators, such as dilation and erosion, candi-dates of characters are extracted. Then, pattern of eachcandidate is extracted to verify the possibility of beinga character.

• Postprocessing and verification: Extracted charactersfrom the previous step are compared with templatesof known characters using clustering methods.

• Control and save in the dataset: Finally, the result iscontrolled by human observers.

An approach similar to the diagram (Fig. 7) is presented inRef. 22 with more details. The method used is adequatelyrobust to disturbances, such as nonuniform illuminationon the various positions of the license plate image and theplate color. Noteworthy to mention, even though the plannedprocedure is concerned with license plates from Iran, many

Fig. 7 Block diagram of a license plate recognition system.

Fig. 8 A sample image, which is taken during night.

Fig. 9 Position of cameras in the Quds square, Tehran, Iran.

Table 2 Parameters of images taken and cameras.

Parameter Minimum Maximum

Height difference between camera andautomobile

1.5 m 6.5 m

Distance between camera and automobile 5 m 30 m

Horizontal angle difference betweencamera and automobile (pan)

−30 deg þ30 deg

Vertical angle difference betweencamera and automobile (tilt)

−25 deg þ25 deg

Role angle difference between cameraand automobile

−3 deg þ3 deg

Automobile speed 0 km∕h 250 km∕h

Shutter speed of cameras 1∕4000 s 1∕100 s

Journal of Electronic Imaging 023015-4 Mar–Apr 2014 • Vol. 23(2)

Ghahnavieh, Enayati, and Raie: Introducing a large dataset of Persian license plate characters

parts of the above algorithm can be easily used with licenseplates from other countries.

4.2 Image Acquisition ConditionThe aforesaid LPR system designed by Bani Nick PardazeshCompany was used for many applications in Iran. Images ofthis system were taken during day and night. Most of themwere taken with visible light during the day and the others byinfrared light with 850 nm wavelength during night. A sam-ple image that was taken during night is shown in Fig. 8.

The images were taken by color, black and white, or infra-red cameras from the front view and back view of automo-biles in ∼20 different indoor and outdoor locations, such asstreets, roads, parking lots, etc. Figure 9 shows the positionand angle of cameras in a square. This figure shows a situa-tion when a car crosses the pedestrian lane. Issuing fines willbe accomplished by control center, and human observers willverify the automatic procedures. The images of the Tele ZoneLPR cameras (blue ones) will be used for LPR system, andthe images of the Wide Zone overview cameras (yellowones) will just be used by human observers.

The size of the images taken for license plate detectionwas different from 640 × 480 to 2000 × 1500 because ofusing several cameras. Some of the other parametersabout the images are listed in Table 2. The images weretaken in light and dark backgrounds, where the directionand intensity of the light varied. Also some of the licenseplates were muddy and had parts that were shadowed.Some of the sample images are shown in Fig. 10.

5 Extracted CharactersAbout 20,145 characters from 24 Persian characters consist-ing of 9 digits and 15 letters were extracted. The provideddataset is available for research purposes by sending arequest to Ref. 23 or to the authors of this paper.Figure 11 shows the abundance of each character in the data-set. As it is obvious, the extracted digits are more than lettersbecause in each license plate, there are seven digits and justone letter.

Figure 12 shows some sample extracted images. As it isshown, the extracted characters are different in size due to theimage resolution and different distances between camera andautomobiles. In addition, some of the extracted characterssuffer from elimination, distortion, a bit of rotation, and

Fig. 10 Some sample images.

Fig. 11 Twenty four characters used in Iranian license plates.

Fig. 12 Some sample segmented characters.

Journal of Electronic Imaging 023015-5 Mar–Apr 2014 • Vol. 23(2)

Ghahnavieh, Enayati, and Raie: Introducing a large dataset of Persian license plate characters

noise. Elimination happens because of using mean andmedian filters for image enhancement and noise removal,and also imprecise threshold for binarization. Moreover,another reason of elimination is the erosion operator inregion growing. Elimination in the letters that include dotleads to omission of their dots. Samples of this kind of elimi-nation are shown in Fig. 12 for

The most important reason of distortion in the extractedcharacters is the motion of automobiles. Also, morphologicaloperations and severe shadows on the characters lead todistortion in the binarization stage. The angle between thecamera and license plates at image acquisition time createsrotation. In addition to the existing noise during imageacquisition, joining the edges in the first steps of the usedalgorithm and using morphological operators amplify thenoise because the source of a small percentage of theedges is noise. However, using the proposed method inRef. 21 for image scanning and identifying the non-noiseedges in addition to using mean and median filters and binar-ization by employing many neighbor pixels around eachpixel decreases the severity of the noise.

Noncharacter objects are observable around some charac-ters in Fig. 12. Some of these objects are produced in thebinarization step due to an improper threshold on shadowycharacters and the others are because of nails on the licenseplates.

The aforementioned problems make recognition stagemore difficult. Among the characters obtained from the data-base, there are some characters that are even difficult to berecognized by the human eye. Some of these charactershave been shown in Fig. 13. In this figure, all the charactersare related to , which may be mistaken for . Similar cou-ple characters in shape that cause confusion are listed inFig. 14. Resemblance of some of the couples is obvious. Itis noteworthy to mention that in Ref. 8, which is our previousresearch on the presented dataset, it is reported that characters

cause 87% of the misclassifications usingdifferent combinations of classifiers with SVM.

6 ConclusionIn this paper, we introduced a large dataset of Persian licenseplate characters. The images of license plates were providedby Bani Nick Pardazesh Company in various conditions.Most of them were taken with visible light during dayand a few of them by infrared light during night and in

∼20 different indoor and outdoor locations, such as streets,roads, and parking lots. They were different in size and angleand were taken in light and dark backgrounds, where direc-tion and intensity of the light caused nonuniform illumina-tion. Some of the license plates were muddy or suffered fromshadow. These problems in addition to complex backgroundand vehicle speed are the challenges of LPR systems.

In this paper, the methods used in the necessary equip-ment section were mentioned completely, step by step. Fromthe whole available images, ∼20;000 Persian characters wereextracted by an intelligent system and verified by humanobservers. The extracted images are different in size, andsome of them suffer from elimination, distortion, rotation,and noise.

References

1. C.-N. E. Anagnostopoulos et al., “License plate recognition from stillimages and video sequences: a survey,” IEEE Trans. Intell. Transp.Syst. 9(3), 371–391 (2008).

2. S. Du et al., “Automatic license plate recognition (ALPR): a state-of-the-art review,” IEEE Trans. Circuits Syst. Video Technol. 23(2), 311–325 (2013).

3. Y. Hu, F. Zhu, and X. Zhang, “A novel approach for license plate rec-ognition using subspace projection and probabilistic neural network,”Lec. Notes Comput. Sci. 3497(1), 216–221 (2005).

4. C. Anagnostopoulos et al., “A license plate recognition system for intel-ligent transportation system applications,” IEEE Trans. Intell. Transp.Syst. 7(3), 377–392 (2006).

5. M. Akhtari and K. Faez, “The application of a CICA neural network onFarsi license plates recognition,” in 10th Int. Conf. on Hybrid IntelligentSystems, Atlanta, GA, pp. 205–208 (2010).

6. Y. Wen et al., “An algorithm for license plate recognition applied tointelligent transportation system,” IEEE Trans. Intell. Transp. Syst.12(3), 830–845 (2011).

7. K. K. Kim et al., “Learning-based approach for license plate recogni-tion,” in Proc. of IEEE Signal Processing Society Workshop, Sydney,NSW, Vol. 2, pp. 614–623 (2000).

8. A. Ebrahimi and A. Raie, “License plate character recognition usingmulticlass SVM,” J. Am. Sci. 8(1s), 38–42 (2012).

9. A. Sedighi and M. Vafadust, “A new and robust method for charactersegmentation and recognition in license plate images,” Expert Syst.Appl. 38(11), 13497–13504 (2011).

10. S. H. M. Kasaei and S. M. M. Kasaei, “Extraction and recognition ofthe vehicle license plate for passing under outside environment,” inEuropean Intelligence and Security Informatics Conf., Athens,pp. 234–237 (2011).

11. M. Zahedi and S. M. Salehi, “License plate recognition system based onSIFT features,” Comput. Sci. 3(1), 998–1002 (2011).

12. C. Y. Suen et al., “Computer recognition of unconstrained handwrittennumerals,” Proc. IEEE 80(7), 1162–1180 (1992).

13. C.-L. Liu et al., “Handwritten digit recognition: benchmarking ofstate-of-the-art techniques,” Pattern Recognit. 36(10), 2271–2285(2003).

14. Y. LeCun, C. Cortes, and C. J. C. Burges, “The MNIST Database ofHandwritten Digits,” http://yann.lecun.com/exdb/mnist/index.html.

15. J. J. Hull, “A database for handwritten text recognition,” IEEE Trans.Pattern Anal. Mach. Intell. 16(5), 550–554 (1994).

16. H. Khosravi and E. Kabir, “Introducing a very large dataset of hand-written Farsi digits and a study on their varieties,” Pattern Recognit.Lett. 28(10), 1133–1141 (2007).

17. S. Mozaffari et al., “A comprehensive isolated Farsi/Arabic characterdatabase for handwritten OCR research,” in Proc. of 10th Int.Workshop on Frontiers in Handwriting Recognition, La Baule, France,pp. 385–389 (2006).

18. “Medialab LPR Database,” http://www.medialab.ntua.gr/research/LPRdatabase.html.

Fig. 13 Characters in the database that are even difficult to be recognized by the human eye.

Fig. 14 Twenty four characters used in Iranian license plates.

Journal of Electronic Imaging 023015-6 Mar–Apr 2014 • Vol. 23(2)

Ghahnavieh, Enayati, and Raie: Introducing a large dataset of Persian license plate characters

19. “AXIS221 Network Camera,” http://www.axis.com/products/cam_221/.20. “The Premier Neural Network Development Environment,” http://www

.neurosolutions.com/.21. D. Zheng, Y. Zhao, and J. Wang, “An efficient method of license plate

location,” Pattern Recognit. Lett. 26(15), 2431–2438 (2005).22. A. A. Shahraki, A. E. Ghahnavieh, and S. A. Mirmahdavi, “License

plate extraction from still images,” in Proc. of IEEE 4th Int. Conf. onIntelligent Systems, Modelling and Simulation, Bangkok, Thailand,pp. 45–48 (2013).

23. Bani Nick Pardazesh Co., http://www.baninick.com/.

Amir Ebrahimi Ghahnavieh received his BSc degree in electronicsengineering from Shahrekord University, Shahrekord, Iran, in 2010and his MSc degree in digital electronics from Amirkabir Universityof Technology, Tehran, Iran, in 2012. Currently, he is an imagingexpert with the Space Research Center of Iran University ofScience and Technology, Tehran, Iran. His research interests areimage processing, computer vision, neural networks, pattern recog-nition, and artificial intelligence.

Mahmoud Enayati graduated in computer engineering from the IranUniversity of Science and Technology in 1997 and also receivedhis BSc degree in software engineering from Sharif University ofTechnology in 1994. Since 1998, he has been working as a developerof license plate recognition applications. His main interests are inthe field of average speed measurement, multilanguage license platerecognition, hybrid access control system, OCR, and real-time trafficmanagement. Since 2011, he is the CEO of Bani Nick Pardazesh Ltd.

Abolghasem A. Raie received his BSc degree in electrical engineer-ing from Sharif University of Technology, Iran, in 1973 and hisMSc and PhD degrees in electrical engineering from University ofMinnesota in 1979 and 1982, respectively. Currently, he is an asso-ciate professor with the electrical engineering department of AmirkabirUniversity of Technology, Iran. His research interests are algorithmdesign and performance analysis, machine vision, sensor fusion, andmobile robots navigation.

Journal of Electronic Imaging 023015-7 Mar–Apr 2014 • Vol. 23(2)

Ghahnavieh, Enayati, and Raie: Introducing a large dataset of Persian license plate characters