machine learning techniques for acquiring new knowledge in image tracking

This article was downloaded by: [The University of British Columbia]On: 29 October 2014, At: 13:21Publisher: Taylor & FrancisInforma Ltd Registered in England and Wales Registered Number: 1072954 Registeredoffice: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK

Applied Artificial Intelligence: AnInternational JournalPublication details, including instructions for authors andsubscription information:http://www.tandfonline.com/loi/uaai20

MACHINE LEARNING TECHNIQUES FORACQUIRING NEW KNOWLEDGE IN IMAGETRACKINGBlanca Rodríguez a , Óscar Pérez a , Jesús García a & José M. Molina*a

a University Carlos III of Madrid, Department of Informatics , Madrid,SpainPublished online: 25 Apr 2008.

To cite this article: Blanca Rodríguez , Óscar Pérez , Jesús García & José M. Molina* (2008) MACHINELEARNING TECHNIQUES FOR ACQUIRING NEW KNOWLEDGE IN IMAGE TRACKING, Applied ArtificialIntelligence: An International Journal, 22:3, 266-282, DOI: 10.1080/08839510701821652

To link to this article: http://dx.doi.org/10.1080/08839510701821652

PLEASE SCROLL DOWN FOR ARTICLE

Taylor & Francis makes every effort to ensure the accuracy of all the information (the“Content”) contained in the publications on our platform. However, Taylor & Francis,our agents, and our licensors make no representations or warranties whatsoever as tothe accuracy, completeness, or suitability for any purpose of the Content. Any opinionsand views expressed in this publication are the opinions and views of the authors,and are not the views of or endorsed by Taylor & Francis. The accuracy of the Contentshould not be relied upon and should be independently verified with primary sourcesof information. Taylor and Francis shall not be liable for any losses, actions, claims,proceedings, demands, costs, expenses, damages, and other liabilities whatsoever orhowsoever caused arising directly or indirectly in connection with, in relation to or arisingout of the use of the Content.

This article may be used for research, teaching, and private study purposes. Anysubstantial or systematic reproduction, redistribution, reselling, loan, sub-licensing,systematic supply, or distribution in any form to anyone is expressly forbidden. Terms &Conditions of access and use can be found at http://www.tandfonline.com/page/terms-and-conditions

http://www.tandfonline.com/loi/uaai20

http://www.tandfonline.com/action/showCitFormats?doi=10.1080/08839510701821652

http://dx.doi.org/10.1080/08839510701821652

http://www.tandfonline.com/page/terms-and-conditions

http://www.tandfonline.com/page/terms-and-conditions

MACHINE LEARNING TECHNIQUES FOR ACQUIRINGNEW KNOWLEDGE IN IMAGE TRACKING

Blanca Rodrıguez, Oscar Perez, Jesus Garcıa, and Jose M. Molina�

University Carlos III of Madrid, Department of Informatics,Madrid, Spain

& The purpose of this research is to apply data mining (DM) to an optimized surveillance videosystem with the objective of improving tracking robustness and stability. Specifically, the machinelearning has been applied to blob extraction and detection, in order to decide whether a detected blobcorresponds to a real target or not. Performance is assessed with an Evaluation function, which hasbeen developed for optimizing the video surveillance system. This Evaluation function measures thequality level reached by the tracking system.

INTRODUCTION

Machine learning techniques could be applied to discover new rela-tions among attributes in different domains. This application is named datamining (DM) and it is a part of the knowledge data discovering process(KDD) (Witten and Frank 2000). The application of DM techniques to aspecific problem takes several perspectives: classification, prediction, opti-mization, etc. In this work, DM techniques will be used to learn a classifierable to determine if a detected surface on an image could be considered asa tentative target or not. This classifier allows avoiding many computationsin the association process of the surveillance system.

In a previous work (Perez et al. 2005), a surveillance system based onvideo cameras was optimized to obtain the best behavior from the datarecorded that is used as a set of examples. In that work, the optimizationprocedure used Evolutionary Computation (EC) to find the best set ofparameters of the surveillance system.

�Funded by projects CICYT TSI2005-07344, CICYT TEC2005-07186, and CAM MADRINETS-0505=TIC=0255.

Address correspondence to Jesus Garcıa, Departamento de Inform�aatica, Avda de la UniversidadCarlos III, 22, Colmenarejo 28270, Madrid, Spain. E-mail: [email protected]

Applied Artificial Intelligence, 22:266–282Copyright # 2008 Taylor & Francis Group, LLCISSN: 0883-9514 print/1087-6545 onlineDOI: 10.1080/08839510701821652

Dow

nloa

ded

by [

The

Uni

vers

ity o

f B

ritis

h C

olum

bia]

at 1

3:21

29

Oct

ober

201

4

The automatic video surveillance system considered is able of trackingmultiple objects or groups of objects in real conditions (Rosin and Ioannidis2003). The whole system is composed of several processes:

. A predictive process of the image background.

. A detector process of moving targets. This process works over theprevious and current acquired frames. The detection is carried out bysubtracting these two images and analyzing the intensity levels of theobtained difference. Detection is directly related to the previous processin order to determine the threshold that defines whether a pixel couldbe considered as a moving target or a variation in the background.

. A grouping pixel process. This process groups adjacent detected pixelsto conform detected regions. These regions are usually named blobs.

. An association process, which evaluates which detected blobs should beconsidered as belonging to each existing target.

. A tracking system that maintains a track for each existing target.

The application of the EC to the previous whole surveillance systemimplied the evaluation of each set of parameters. The evaluation was per-formed considering real positions of targets in the image compared withthe obtained positions in the surveillance system.

In this work, we propose the application of DM techniques to add newknowledge into the surveillance system without the necessity of a new ECprocess. In this case, the surveillance system could generate a set of filescontaining parameters of the detected blobs and, making use of the realpositions, we can generate an identifier to determine if the blob is partof a target or if the blob is only noise. Using this strategy, to add new knowl-edge, for example the optical flow, the surveillance system is executed withthe optimized parameters and the information about the optical flow isrecorded for each blob. Then, the DM techniques could use this new infor-mation to classify the blobs as a valid blob (if it is a part of a target) or asinvalid blob (if it is just noise).

The new knowledge could be added to the system without perturbingthe optimized parameters of the previous learning process based on theEC. The knowledge acquired is used in the detector process of moving tar-gets (the previous step to association) to delete the invalid blob and, then,to simplify the association process.

SURVEILLANCE VIDEO SYSTEM

This section describes the structure of an image-based tracking system.It is based on a previously developed prototype, intended to analyze theintegration of video technology in A-SMGCS Surveillance function for

Machine Learning Techniques in Image Tracking 267

Dow

nloa

ded

by [

The

Uni

vers

ity o

f B

ritis

h C

olum

bia]

at 1

3:21

29

Oct

ober

201

4

Madrid=Barajas Airport. This work has been jointly developed by Data Pro-cessing and Simulation (GPDS) and Applied Artificial Intelligence (GIAA)Research Groups, respectively, at universities Politecnica de Madrid andCarlos III de Madrid. Specifications and details of this video system haveappeared in several publications (Besada et al. 2004; Besada et al. 2005;Besada et al. 2001).

The system architecture is completely described in those publications;in Figure 1 is depicted a simplified architecture:

It is a coupled tracking system where the detected objects are processedto initiate and maintain tracks representing the real targets in the scenarioand estimate their location and cinematic state. The system captures theframes in the video sequence and uses them to compute background esti-mation. Background statistics are used to detect contrasting pixels corre-sponding to moving objects. These detected pixels are connected later toform image regions referred to as blobs. Blobs are defined with their spatialborders, generally a rectangular box, centroid location and area. Then, thetracker reconnects these blobs to segment all targets from background andtrack their motion, applying association, and filtering processes. The associ-ation process assigns one or several blobs to each track, while not associatedblobs are used to initiate tracks. Map information and masks are used totune specific aspects such as detection, track initiation, update parameters,etc. To illustrate the process, Figure 2 depicts the different levels of infor-mation interchanged, from the raw images to the tracks.

Since the objective of this work focuses on the lowest processing level,on Blobs Detection, this is the only block being described.

The blob extraction is based on the intensity of the image, that is, onthe detection of targets contrasting with local background (Cohen andMedioni 1998). This method results effective, has low computational load,and allows gradual illumination changes.

The statistics of background in the video images are estimated andupdated in an auxiliary image, to be named Background. Then, the pixellevel detector is able to extract moving pixels from the static background,simply comparing the difference with a threshold:

Detectionðx; yÞ:¼ ½Imageðx; yÞ � Backgroundðx; yÞ� > Threshold�r;

FIGURE 1 Architecture of the video surveillance system.

268 B. Rodriguez et al.

Dow

nloa

ded

by [

The

Uni

vers

ity o

f B

ritis

h C

olum

bia]

at 1

3:21

29

Oct

ober

201

4

where r represents the standard deviation of pixel intensity. A low thresh-old means a higher sensitivity, but it may also lead to many false detections.

With a simple iterative process, taking the sequence of previous images,and weighting to give higher significance to most recent frames, thebackground statistics (mean and variance) are updated in the following way:

Backgroundðx; y; kÞ ¼ aImageðx; y; kÞ þ ð1� aÞ Backgroundðx; y; k � 1Þ;

r2ðx; y; kÞ ¼ a½Imageðx; y; kÞ �Backgroundðx; y; k� 1Þ�2þ ð1� aÞr2 ðx; y; k� 1Þ:

FIGURE 2 Information levels in the processing chain.


Dow

nloa

ded

by [

The

Uni

vers

ity o

f B

ritis

h C

olum

bia]

at 1

3:21

29

Oct

ober

201

4

Finally, the algorithm marks with a unique label all detected pixels connec-ted, by means of a clustering and growing regions algorithm (Sanka et al.1999). Resulted blobs are then filled. In order to reduce the number of falsedetections due to noise, a minimum area is required to form blobs.

DM FOR CLASSIFYING BLOBS

The starting point for this work is the video surveillance systemdescribed above. The objective now is to provide new knowledge in theinitial phase (at the lowest level), with the purpose of removing false targetsto simplify the association process, and, in this way, improve the wholesystem. The architecture of the system is shown in Figure 3.

The novel block is the DM-based Filter, situated after the Blobs Detec-tion block with the objective of reducing false alarms. With this filter, lowvalues for the threshold and minimum area may be chosen in order toachieve more sensitivity.

As it has already been said, the detection of targets is based on theintensity gradient in the background image. However, not all the blobsdetected correspond to real targets. These false blobs may appear becauseof noise, variation in illumination, etc. It is at this point where DM may beapplied to remove false alarms without affecting real targets. The objectiveof DM is finding patterns in data in order to make nontrivial predictions onnew data (Witten and Frank 2000). So, having various attributes of thedetected blobs, the goal is to find patterns that allow us to decide whethera detected blob corresponds to a real target or not (Figure 4).

The input data take form of a set of examples of blobs. Each instance orexample is characterized by the values of attributes that measure differentaspects of the instance. The learning scheme needed in this case is a classi-fication scheme that takes a set of classified examples from which it isexpected to learn a way of classifying unseen examples. That is, we startfrom a set of characteristics of blobs together with the decision for eachas to whether it is a real target or not, and the problem is to learn howto classify new blobs as ‘‘real target’’ or ‘‘false target.’’ The output mustinclude a description of a structure that can be used to classify unknownexamples in order that the decision can be explained. We have used WEKA

FIGURE 3 Architecture of the DM-based video surveillance system.


Dow

nloa

ded

by [

The

Uni

vers

ity o

f B

ritis

h C

olum

bia]

at 1

3:21

29

Oct

ober

201

4

for experimenting with different learners. Four representative classificationtechniques have been chosen: bayes, distance (nearest neighbour), decisiontrees (C4.5), and rules (PART). The Correctly Classified Instances obtainedwith each technique is specified in Table 1.

Since bayes offers a relatively low percentage of correctly classifiedinstances, this technique is not being proved in the system. On the otherhand, nearest neighbor gives us no model, so it is not implemented. So,C4.5 and PART are the techniques being tested in our system. Since bothtechniques are based on entropy, similar results are expected.

Next, the input data used to represent this problem and the obtainedoutput data are described.

Input Attributes

Three scenarios have been used and we describe them in the nextsection. In all cases, the training examples have to be extracted to obtainthe classifier. These examples correspond to detected blobs, characterizedby several attributes and classified as ‘‘true target’’ or ‘‘false target.’’

We try to use many attributes that may characterize a blob, so a betterclassification might be done. Later, the classification method will decidewhich attributes are relevant for blob classification and which are not.The first attribute we may think of is the one used for extracting blobs:intensity. Regarding intensity, media, standard deviation, minimum andmaximum values of the intensity inside the blobs may be considered.

Another approach for extracting moving objects is the optical flow.Optical flow is the apparent motion of brightness patterns in the image,and generally, it corresponds to the motion field. The computation of

FIGURE 4 DM for classifying detected blobs as real or false targets.

TABLE 1 Success Rate of Four Classification Techniques

Correctly classified instances (%)

Bayes 72.8691C4.5 85.9327PART 86.8592IB1 83.3539


Dow

nloa

ded

by [

The

Uni

vers

ity o

f B

ritis

h C

olum

bia]

at 1

3:21

29

Oct

ober

201

4

the optical flow is based on spatial and temporal partial derivatives (Horn1981), so the computational load is considerable, but we propose calculat-ing it only on the extracted blobs, and not in the whole image. Since opticalflow calculates the motion of an object, represented as a vector, moduleand phase may be considered, and, therefore, media, standard deviation,minimum and maximum values of module, and phase of optical flow insidethe blobs may be considered. Finally, the following consideration is made:moving objects have edges, so, some pixels in the blobs should correspondto edges. Two methods based on gradient have been used to detect edges:the Canny algorithm (Canny 1986) and corner detection. And, in addition,a high-pass filter has also been used. In these three cases, the number ofpixels in the blob and its surrounding area that correspond to detectededges is stored.

These methods are applied once the blobs have been extracted apply-ing intensity, and are only calculated on the extracted blobs. Summarizing,the attributes used for characterizing these blobs are the following ones:

1. Intensity Gradient. Media, standard deviation, minimum and maximumvalues of intensity gradient inside the blob have been stored.

2. Optical Flow. Media, standard deviation, minimum and maximum valuesof the module, and the phase of the optical flow inside the blob havebeen stored.

3. Edge Detection. The number of pixels of the blob and its surroundingarea that correspond to detected edges has been stored, using threedifferent algorithms.

We can illustrate optical flow and edge detection in next images(Figure 5), corresponding to the first raw image.

To classify the extracted blobs as ‘‘true target’’ or ‘‘false target,’’ theground truth must be used. Ground truth may be considered as the truesegment assigned by a human operator processing the data frame by frame.It is assumed to be the output if the system would be perfect. Specifically,ground truth is defined here by a rectangle surrounding each target. Whenthe overlap of a detected blob with a real target is superior to a specificvalue (40%), the blob is classified as a ‘‘true target;’’ otherwise, it is classi-fied as a ‘‘false target.’’ This training has been done for the three scenarioswe are working with. Table 2 shows a few training examples.

Output: Trained System and Classifier’s Performance

As it has previously been said, two algorithms are being used in thesystem: C4.5 and PART. Both of them have been used adjusting the ‘‘confi-dence factor’’ parameter to 0.0001 (Quinlan 1986; 1993) in order to obtain


Dow

nloa

ded

by [

The

Uni

vers

ity o

f B

ritis

h C

olum

bia]

at 1

3:21

29

Oct

ober

201

4

models small enough, but also with a small error rate. The trained systemobtained, in the form of decision tree, is shown in Figure 6; and in the formof decision rules, it is shown in Figure 7 (only the first ten rules of the totalof 28 are shown).

As it can be observed in both models, nearly the same attributes aresignificant. Actually, they start with the same assessment: if ljOF

�!j > 1:38719.We examine the attributes that cover the three following types of parameters.

Maximum and standard deviation values of the Intensity Gradient(max DI and rDI, this one just considered by C4.5 algorithm). In general,

TABLE 2 Training Examples for Blob Classification. Input-output Attributes

Optical flow

Intensity gradient Module Phase Edge detection

min max l r min max l r min max l r Canny Corner HPF Target?

16 36 27.75 7.84 2.49 23.79 13.23 6.20 �1.819 �1.01 �1.50 0.21 14 3 16 YES

16 27 21.3 3.95 5.94 13.10 8.90 2.06 �1.68 1.28 �1.48 0.14 5 0 4 YES

2 68 43.99 18.63 0.19 3.20 1.34 0.72 �3.09 3.12 0.32 1.65 0 0 0 NO

FIGURE 5 Optical flow and edge detection.


Dow

nloa

ded

by [

The

Uni

vers

ity o

f B

ritis

h C

olum

bia]

at 1

3:21

29

Oct

ober

201

4

a high max DI means that the detected blob is a true target and a high rDI(probably produced by noise) means that the detected blob is not a true target.

Mean value of the module of the Optical Flow (ljOF�!j). In general, a

blob with a high ljOF�!j corresponds to a true target.

The values corresponding to Edge Detection obtained by the Cannyalgorithm, corners algorithm (only PART technique), and by the high-passfilter (HPF). In general, blobs with high Canny or HPF correspond to truetargets.

The algorithms provide as well the classifier’s performance in terms ofthe error rate. They have been executed with cross-validation, with theobjective of getting a reliable error estimate. Cross-validation means thatpart of the instances is used for training and the rest for classificationand the process is repeated several times with random samples. The con-fusion matrix is used by both algorithms to show how many instances ofeach class have been assigned to each class. Both confusion matrixes areshown in Tables 3 and 4.

For example, in algorithm C4.5, 2958 blobs have been correctly classi-fied as targets (True Positives), 2607 blobs have been correctly classifiedas false targets (True Negatives), 451 blobs have been incorrectly classifiedas false targets (False Negatives), and 460 blobs have been incorrectly

FIGURE 6 Decision tree obtained by C4.5 algorithm for classifying detected blobs as real or falsetargets.


Dow

nloa

ded

by [

The

Uni

vers

ity o

f B

ritis

h C

olum

bia]

at 1

3:21

29

Oct

ober

201

4

classified as true targets (False Positives). The false negatives produce thedeletion of true targets, which may have a cost with respect to no applyingmachine learning.

The percentage correct classification, which is the most widespreadmethod for assessing a classifier’s performance, gives the correctly classifiedinstances:

Percentage correct classification: PCC ¼Total True Positives þ Total True Negatives

Total True Positives þ Total False Positives þ Total True Negatives

þTotal False Negatives

FIGURE 7 Rules obtained by PART algorithm for classifying detected blobs as real or false targets.

TABLE 3 10-fold Cross-Validation Confusion Matrix for C4.5

Target False target Classified as

2958 451 Target460 2607 False Target


Dow

nloa

ded

by [

The

Uni

vers

ity o

f B

ritis

h C

olum

bia]

at 1

3:21

29

Oct

ober

201

4

The global PCC is 85.9327 in C4.5 algorithm and 86.8592 in PARTalgorithm. Considering that the maximum value is 100, the value a perfectsystem would obtain, the result obtained is quite good, meaning that85.9327% in C4.5 algorithm and 86.8592% in PART algorithm of the blobsare correctly classified.

SURVEILLANCE SYSTEM EVALUATION

Before evaluating the DM-based Surveillance Video System, we describethe Evaluation System in the original Surveillance Video System.

The typical approach to evaluate the detection and tracking systemperformance consists on using ground truth to provide independent andobjective data that can be related to the observations extracted anddetected data from the video sequence.

In each scenario, the ground truth has been extracted frame by frame,selecting the targets and storing the next data for each target:

. Number of analyzed frames.

. Track identifier.

. Value of the minimum x coordinates of the rectangle that surrounds thetarget.

. Value of the maximum x coordinates of the rectangle that surrounds thetarget.

. Value of the minimum y coordinates of the rectangle that surrounds thetarget.

. Value of the maximum y coordinates of the rectangle that surrounds thetarget.

These ground truth data are compared to the real detections by theevaluation system. This evaluation system computes four parameters pertarget, classified into ‘‘accuracy metrics’’ and ‘‘continuity metrics.’’

Accuracy metrics:

. Overlap-area (OAP): Overlap Area Percentage between the real and thedetected blobs.

TABLE 4 10-fold Cross-validation Confusion Matrix for PART

Target False target Classified as

3034 375 Target476 2591 False Target


Dow

nloa

ded

by [

The

Uni

vers

ity o

f B

ritis

h C

olum

bia]

at 1

3:21

29

Oct

ober

201

4

. X-error (Ex) and Y-error (Ey): Difference in x and y coordinates betweenthe centers of the ideal blob and the detected blobs.

Continuity metrics:

. Number of Tracks per target (NT): It is checked if more than onedetected track is matched with the same ideal track. If this happens,the program keeps the detected track which has a bigger overlapped areavalue, removes the other one, and marks the frame with a flag thatindicates the number of detected tracks associated to this ideal one.

. Commutation (C): A commutation occurs when the identifier of a trackmatched to an ideal track changes. It typically takes place when the trackis lost and recovered later.

In addition to these parameters, an evaluation function has beendefined, with the objective of extracting just a number that measures thequality level of the tracking system. This number is based on the evaluationmetrics specified before. Thus, the resultant number is obtained by meansof a weighted sum of different terms which are computed target by target:

. Mismatch (M): A counter which stores how many times the ground truthand the tracked object data do not match up, that is, NT is not 1. Further-more, this counter is normalized by the difference between the last andfirst frame in which the ideal track disappears and appears (Time oflife (T)).

. The next three terms are the total sum of the overlapped areas (ROAP)and the central errors of x (REx) and y axes (REy). They are normalizedby a number which indicates how many times these values are available(there is no continuity problem) in the whole video sequence (DZ).

. The next two elements are two counters:

. Overmatch-counter (OC): How many times the ground truth track ismatched with more than one tracked object data.

. Undermatch-counter (UC): How many times the ground truth track isnot matched with any track at all.

. Finally, the last term is the number of commutations in the track understudy (RC).

The last three elements are normalized by the same value of normaliza-tion as the first one (Time of life, T). It is clear that the lower the Evalu-ation function, the better the quality of the tracking system. With theobjective of minimizing the Evaluation function, the Video SurveillanceSystem has been optimized, as it is explained in Perez et al. (2005).


Dow

nloa

ded

by [

The

Uni

vers

ity o

f B

ritis

h C

olum

bia]

at 1

3:21

29

Oct

ober

201

4

Evaluation function

eðÞ ¼ W1 �M

Tþ

W2 �Pð1�OAPÞ þW3 �

PEx þW4 �

PEy

DZ

þW5 �Oc þW6 �Uc þW7 �P

C

T

where W1,2,3,4,5,6,7 are the weights for the parameters.Figure 8 depicts the different stages of the Evaluation System in order

to have a clear idea of it.

PERFORMANCE EVALUATION OF THE DM-BASEDSURVEILLANCE VIDEO SYSTEM

The same evaluation metric described above has been applied to theDM-Based Surveillance Video System.

But first, the three scenarios that have been used throughout the wholeresearch are described. They are localized in an airport where severalcameras are deployed for surveillance purposes.

The first scenario is a simple video with just three targets, but only theplane going from left to right is tracked (Figure 8). There are no other tar-gets and there are only two critical points when the aircraft crosses with twoother aircrafts.

In the second scenario (Figure 9), complexity increases due to manyocclusions of the tracked aircraft with parked aircraft. Besides, there aremany targets that correspond to buses, cars, etc., that may interact withtracked planes.

As in the second scenario, in the third one (Figure 10) there are manyocclusions and many targets interacting with the tracked aircrafts. In thiscase, three aircrafts have been tracked.

Described the scenarios, the Evaluation Parameter is calculated foreach track, and the results are shown in Table 5.

FIGURE 8 Stages of the evaluation system.


Dow

nloa

ded

by [

The

Uni

vers

ity o

f B

ritis

h C

olum

bia]

at 1

3:21

29

Oct

ober

201

4

As it was previously explained, the lower the evaluation function, thebetter the tracking system; so, in four out of the five cases, the tracking sys-tem is improved with both DM techniques. The only case in which it getsworse is in the simple video, in which the original tracking system hadno problems. It gets worse when using the DM Filter because the aircraftis detected some frames later. It is the cost of possibly removing true targets

FIGURE 9 Track 0 in scenario 1.

FIGURE 10 Track 0 in scenario 2.


Dow

nloa

ded

by [

The

Uni

vers

ity o

f B

ritis

h C

olum

bia]

at 1

3:21

29

Oct

ober

201

4

(false negatives), due to the DM filtering (section 4.1). However, in morecomplex situations, the results are better.

An example of the most significant part of the evaluation metrics in theEvaluation function is next given: the number of detected tracks associatedto an ideal track. In Figure 11 it is shown the number of associated tracks totrack 0 in scenario 3.

From Figure 12, it can be easily seen how the number of tracks, ideallyone, improves with both DM-based filters. Without filtering, the number oftracks associated to track 0 is ‘1’ in 38 instances, ‘2’ (the track is duplicated)in 22 instances, and ‘0’ (the track is lost) in 3 instances. With C4.5-DM-based filter, the instances in which the number of tracks associated is ‘1’increases to 53, whilst there are duplicated tracks just on 2 instances. Onthe other hand, the track is lost in 8 instances (this is the cost of filtering).Finally, with the PART-DM-based filter, the number of tracks is ‘1’ in 48instances, ‘2’ in 15 instances, and there not lost tracks.

CONCLUSIONS AND FUTURE LINES

The optimized surveillance video system has been improved in scenar-ios with some complexity by applying DM-based filtering. The DM-based

TABLE 5

Scenario 1 Scenario 2

Scenario 3

Track 0 Track 0 Track 0 Track 1 Track 2

Without DM Filter 33.71 7245.95 8085.81 12226.33 11610.18With C4.5-DM Filter 67.34 (�) 6642.04

(8.33%)6840.48

(15.40%)9066.24

(25.85%)9140.90

(21.27%)With PART-DM Filter 9621.03 (�) 6498.15

(10.32%)6536

(19.16%)6994.10(43%)

7797.86(33%)

FIGURE 11 Tracks 0, 1 and 2 y scenario 3.


Dow

nloa

ded

by [

The

Uni

vers

ity o

f B

ritis

h C

olum

bia]

at 1

3:21

29

Oct

ober

201

4

Filter decides whether the blobs extracted by the system correspond to realtargets or not. Four representative classification techniques were applied:bayes, distance (nearest neighbor), decision trees (C4.5), and rules(PART). WEKA has been used to compare the Percentage of Correct Clas-sifications (PCC) given by each technique. Bayes provides a relative lowPCC, 72.8691%, contrasting to 85.9327% given by C4.5 and 86.8592%given by PART. On the other hand, the nearest neighbor gives no model,although it provides a good PCC (83.3539%). So just C4.5 and PART arebeing implemented in the wide surveillance system.

The training examples used in WEKA with the algorithms consist ofdetected blobs, characterized by several attributes, based on the intensitygradient, optical flow, and edge detection, and classified as ‘‘true target’’or ‘‘false target.’’ The ground truth, which has been extracted by a humanoperator, has been used for the blobs’ classification.

The result surveillance video system has been evaluated by analyzing anEvaluation function that measures the quality level. This quality level has ingeneral been improved with both C4.5 and PART algorithms. PART hasresulted lightly better, but very similar to C4.5, as was expected, since bothalgorithms are based on entropy. Actually, the performance has beenincreased in all scenarios tested, except for one, in which the cost of filter-ing has become manifest. Because of filtering, blobs that correspond to realtargets may be removed and this fact may cause the loss of the track or alater detection, what has occurred in the mentioned scenario. In any case,this scenario was the simplest one and the initial tracking system had noproblems; so, we can conclude that in scenarios with more complexityDM-based filtering improves the tracking system.

FIGURE 12 Number of Tracks associated to Track 0 in Scenario 3 with and without DM-based filter.


Dow

nloa

ded

by [

The

Uni

vers

ity o

f B

ritis

h C

olum

bia]

at 1

3:21

29

Oct

ober

201

4

In future works, some actions will be undertaken to continue thisapproach, such as, applying machine learning to higher levels of videoprocessing: data association, parameter estimation, etc.

REFERENCES

Besada, J. A., J. Portillo, J. Garcıa, J. M. Molina, A. Varona, and G. Gonzalez. 2001. Image-BasedAutomatic Surveillance for Airport Surface. FUSION 2001 Conference. Montreal, Canada.

Besada, J. A., J. Garcıa, J. Portillo, J. M. Molina, A. Varona, and G. Gonz�aalez. 2005. Airport surfacesurveillance based on video images. IEEE Transactions on Aerospace and Elec. Systems 41:1075–1082.

Besada, J. A., J. M. Molina, J. Garcıa, A. Berlanga, and J. Portillo. 2004. Aircraft identification integratedin an airport surface surveillance video system. Machine Vision & Applications 15(3):164–171.

Canny, J. 1986. A computational approach to edge detection. IEEE Transactions Pattern Analysis andMachine Intelligence 8:679–714.

Cohen, I. and G. Medioni. 1998. Detecting and tracking moving objects in video from an airborneobserver. DARPA Image Understanding Workshop, IUW98. Monterey, Nov. 1998, 217–222.

Faceli, K., A. de Carvalho, and S. O. Rezende. 2004. Combining intelligent techniques for sensor fusion.Applied Intelligence 20(3):199–213.

Horn, B. K. P. and B. G. Schunck. 1981. Determining optical flow. Artificial Intelligence 17:185–203.Perez, O., J. Garcıa, A. Berlanga, and J. M. Molina. 2005. Evolving parameters of surveillance video

systems for non-overfitted learning. 7th European Workshop on Evolutionary Computation in ImageAnalysis and Signal Processing (EvoIASP), Lausanne, Switzerland.

Quinlan, J. R. 1986. Induction of decision trees. Machine Learning Journal 1:81–106.Quinlan, J. R. 1993. C4.5: Programs for Machine Learning. San Francisco: Morgan Kaufmann.Rosin, P. L. and E. Ioannidis. 2003. Evaluation of global image thresholding for change detection.

Pattern Reconginion Letters 24(14):2345–2356.Sanka, M., V. Hlavac, and R. Bolye. 1999. Image Processing, Analysis and Machine Vision. Pacific Grove, CA:

Brooks=Cole Publishing Company.Witten, I. H. and E. Frank. 2000. Data Mining: Practical Machine Learning Tools and Techniques with Java

Implementations. San Francisco: Morgan Kaufmann.


Dow

nloa

ded

by [

The

Uni

vers

ity o

f B

ritis

h C

olum

bia]

at 1

3:21

29

Oct

ober

201

4

machine learning techniques for acquiring new knowledge in image tracking

Documents