kernel-basedobject tracking - dorin comaniciu · kernel-basedobject tracking dorin comaniciu...
TRANSCRIPT
Kernel-BasedObject Tracking
Dorin Comaniciu�
VisvanathanRamesh�
PeterMeer�
�Real-TimeVision andModelingDepartment
SiemensCorporateResearch755CollegeRoadEast,Princeton,NJ08540
�ElectricalandComputerEngineeringDepartment
RutgersUniversity94 Brett Road,Piscataway, NJ08854-8058
Abstract
A new approachtowardtargetrepresentationandlocalization,thecentralcomponentin visualtrack-ing of non-rigidobjects,is proposed.Thefeaturehistogrambasedtargetrepresentationsareregularizedby spatialmaskingwith anisotropickernel.Themaskinginducesspatially-smoothsimilarity functionssuitablefor gradient-basedoptimization,hence,the target localizationproblemcanbe formulatedus-ing the basinof attractionof the local maxima. We employ a metric derived from the Bhattacharyyacoefficient assimilarity measure,andusethemeanshift procedureto performtheoptimization.In thepresentedtrackingexamplesthenew methodsuccessfullycopedwith cameramotion,partialocclusions,clutter, andtargetscalevariations.Integrationwith motionfiltersanddataassociationtechniquesis alsodiscussed.We describeonly few of thepotentialapplications:exploitationof backgroundinformation,Kalmantrackingusingmotionmodels,andfacetracking.
Keywords: non-rigidobjecttracking;targetlocalizationandrepresentation;spatially-smoothsim-
ilarity function;Bhattacharyyacoefficient; facetracking.
1 Intr oduction
Real-timeobjecttrackingis thecritical taskin many computervisionapplicationssuchassurveil-
lance[44,16,32], perceptualuserinterfaces[10], augmentedreality[26], smartrooms[39, 75,47],
object-basedvideocompression[11], anddriverassistance[34, 4].
Two majorcomponentscanbedistinguishedin a typical visual tracker. Target Representa-
tion and Localizationis mostly a bottom-upprocesswhich hasalsoto copewith the changesin
theappearanceof thetarget.Filtering andData Associationis mostlya top-down processdealing
with thedynamicsof the trackedobject,learningof scenepriors,andevaluationof differenthy-
potheses.Theway thetwo componentsarecombinedandweightedis applicationdependentand
playsa decisive role in therobustnessandefficiency of thetracker. For example,facetrackingin
1
a crowdedscenereliesmoreon targetrepresentationthanon targetdynamics[21], while in aerial
video surveillance,e.g., [74], the target motion andthe ego-motionof the cameraare the more
importantcomponents.In real-timeapplicationsonly a smallpercentageof thesystemresources
canbeallocatedfor tracking,therestbeingrequiredfor thepreprocessingstagesor to high-level
taskssuchasrecognition,trajectoryinterpretation,andreasoning.Therefore,it is desirableto keep
thecomputationalcomplexity of a tracker aslow aspossible.
Themostabstractformulationof thefilteringanddataassociationprocessis throughthestate
spaceapproach for modelingdiscrete-timedynamicsystems[5]. The informationcharacterizing
the target is definedby the statesequence��������� �� ���������
, whoseevolution in time is specifiedby
the dynamicequation���������������������! "��#
. The availablemeasurements��$��%��������������
arerelatedto
the correspondingstatesthroughthe measurementequation$��&�(')�*���+���!,)��#
. In general,both�!�and
'-�arevector-valued,nonlinearandtime-varyingfunctions. Eachof thenoisesequences,�� "����������������and
��,)���������������is assumedto beindependentandidenticallydistributed(i.i.d.).
The objective of tracking is to estimatethe state���
given all the measurements$��/. �
up
that moment,or equivalently to constructthe probability densityfunction (pdf) 0 �����*1 $2�/. �%# . The
theoreticallyoptimalsolutionis providedby therecursiveBayesianfilter whichsolvestheproblem
in two steps.Thepredictionstepusesthedynamicequationandthealreadycomputedpdf of the
stateat time 3 �547698, 0 ����������1 $2�/. ������# , to derive theprior pdf of the currentstate,0 �����*1 $2�/. �����:# .
Then, the updatestepemploys the likelihood function 0 �;$%��1 ����# of the currentmeasurementto
computetheposteriorpdf 0 ���+��1 $��/. � ).When the noisesequencesare Gaussianand
���and
'<�are linear functions, the optimal
solutionis providedby theKalmanfilter [5, p.56],whichyieldstheposteriorbeingalsoGaussian.
(We will return to this topic in Section6.2.) When the functions�=�
and')�
are nonlinear, by
linearizationtheExtendedKalmanFilter (EKF) [5, p.106]is obtained,theposteriordensitybeing
still modeledasGaussian.A recentalternative to theEKF is theUnscentedKalmanFilter (UKF)
[42] which usesa setof discretelysampledpoints to parameterizethe meanandcovarianceof
the posteriordensity. Whenthe statespaceis discreteandconsistsof a finite numberof states,
HiddenMarkov Models(HMM) filters [60] canbe appliedfor tracking. The mostgeneralclass
of filters is representedby particlefilters [45], alsocalledbootstrapfilters [31], which arebased
on Monte Carlo integrationmethods.The currentdensityof the stateis representedby a setof
2
randomsampleswith associatedweightsandthenew densityis computedbasedon thesesamples
andweights(see[23,3] for reviews). TheUKF canbeemployedto generateproposaldistributions
for particlefilters, in whichcasethefilter is calledUnscentedParticleFilter (UPF)[54].
When the tracking is performedin a clutteredenvironmentwheremultiple targetscanbe
present[52], problemsrelatedto the validation and associationof the measurementsarise [5,
p.150]. Gating techniquesareusedto validateonly measurementswhosepredictedprobability
of appearanceis high. After validation,a strategy is neededto associatethe measurementswith
thecurrenttargets. In additionto theNearestNeighborFilter, which selectstheclosestmeasure-
ment,techniquessuchasProbabilisticDataAssociationFilter (PDAF) areavailablefor thesingle
target case.The underlyingassumptionof the PDAF is that for any given target only onemea-
surementis valid, andtheothermeasurementsaremodeledasrandominterference,that is, i.i.d.
uniformly distributedrandomvariables. The Joint DataAssociationFilter (JPDAF) [5, p.222],
on theotherhand,calculatesthemeasurement-to-targetassociationprobabilitiesjointly acrossall
the targets.A differentstrategy is representedby theMultiple HypothesisFilter (MHF) [63, 20],
[5, p.106]which evaluatesthe probability that a given target gave rise to a certainmeasurement
sequence.TheMHF formulationcanbeadaptedto trackthemodesof thestatedensity[13]. The
dataassociationproblemfor multiple targetparticlefiltering is presentedin [62, 38].
The filtering andassociationtechniquesdiscussedabove wereappliedin computervision
for varioustrackingscenarios.Boykov andHuttenlocher[9] employedtheKalmanfilter to track
vehiclesin anadaptive framework. RosalesandSclaroff [65] usedtheExtendedKalmanFilter to
estimatea 3D object trajectoryfrom 2D imagemotion. Particle filtering wasfirst introducedin
visionastheCondensationalgorithmby IsardandBlake [40]. Probabilisticexclusionfor tracking
multipleobjectswasdiscussedin [51]. Wu andHuangdevelopedanalgorithmto integratemultiple
targetclues[76]. Li andChellappa[48] proposedsimultaneoustrackingandverificationbasedon
particlefilters appliedto vehiclesand faces. Chenet al. [15] usedthe HiddenMarkov Model
formulationfor trackingcombinedwith JPDAF dataassociation.Rui andChenproposedto track
thefacecontourbasedon theunscentedparticlefilter [66]. ChamandRehg[13] applieda variant
of MHF for figuretracking.
Theemphasisin this paperis on theothercomponentof tracking: targetrepresentationand
localization.While thefiltering anddataassociationhave their rootsin controltheory, algorithms
3
for targetrepresentationandlocalizationarespecificto imagesandrelatedto registrationmethods
[72, 64, 56]. Both target localizationandregistrationmaximizesa likelihoodtype function. The
differenceis that in tracking,asopposedto registration,only small changesareassumedin the
locationandappearanceof the target in two consecutive frames. This propertycanbe exploited
to developefficient,gradientbasedlocalizationschemesusingthenormalizedcorrelationcriterion
[6]. Sincethecorrelationis sensitive to illumination, HagerandBelhumeur[33] explicitly mod-
eledthe geometryandillumination changes.The methodwasimprovedby Sclaroff andIsidoro
[67] usingrobustM-estimators.Learningof appearancemodelsby employing a mixtureof stable
imagestructure,motion informationandan outlier process,wasdiscussedin [41]. In a differ-
ent approach,Ferrariet al. [26] presentedan affine tracker basedon planarregionsandanchor
points. Trackingpeople,which risesmany challengesdueto thepresenceof large3D, non-rigid
motion,wasextensively analyzedin [36, 1, 30, 73]. Explicit trackingapproachesof people[69]
aretime-consumingandoften the simplerblob model [75] or adaptive mixture models[53] are
alsoemployed.
Themaincontribution of thepaperis to introduceanew framework for efficient trackingof
non-rigidobjects.Weshow thatbyspatiallymaskingthetargetwith anisotropickernel,aspatially-
smoothsimilarity function canbe definedandthe target localizationproblemis thenreducedto
a searchin the basinof attractionof this function. The smoothnessof the similarity function
allows applicationof a gradientoptimizationmethodwhich yieldsmuchfastertarget localization
comparedwith the(optimized)exhaustivesearch.Thesimilarity betweenthetargetmodelandthe
target candidatesin thenext frameis measuredusingthemetricderived from the Bhattacharyya
coefficient. In our casetheBhattacharyyacoefficient hasthemeaningof a correlationscore.The
new targetrepresentationandlocalizationmethodcanbeintegratedwith variousmotionfiltersand
dataassociationtechniques.We presenttrackingexperimentsin which our methodsuccessfully
copedwith complex cameramotion,partialocclusionof thetarget,presenceof significantclutter
andlargevariationsin targetscaleandappearance.We alsodiscusstheintegrationof background
informationandKalmanfilter basedtracking.
Thepaperis organizedasfollows. Section2 discussesissuesof targetrepresentationandthe
importanceof aspatially-smoothsimilarity function.Section3 introducesthemetricderivedfrom
theBhattacharyyacoefficient. Theoptimizationalgorithmis describedin Section4. Experimental
resultsareshown in Section5. Section6 presentsextensionsof thebasicalgorithmandthenew
4
approachis put in thecontext of computervision literaturein Section7.
2 TargetRepresentation
To characterizethetarget,first a featurespaceis chosen.Thereferencetargetmodelis represented
by its pdf > in thefeaturespace.For example,thereferencemodelcanbechosento bethecolor
pdf of the target. Without lossof generalitythe target model canbe consideredascenteredat
the spatiallocation ? . In the subsequentframea target candidateis definedat location @ , and
is characterizedby the pdf 0 � @ # . Both pdf-s are to be estimatedfrom the data. To satisfy the
low computationalcostimposedby real-timeprocessingdiscretedensities,i.e., A -bin histograms
shouldbeused.Thuswehave
targetmodel: BC �D� B>�E E ��������� FF
E ��� B>�E�G8
targetcandidate: BH � @ #I�J� B0KE � @ #L E ��������� FF
E ��� B0KE�M8ON
Thehistogramis not thebestnonparametricdensityestimate[68], but it sufficesfor our purposes.
Otherdiscretedensityestimatescanbealsoemployed.
Wewill denoteby
BP � @ #OQ PSR BH � @ #L� BC<T (1)
asimilarity functionbetweenBH and BC . Thefunction BP � @ # playstheroleof alikelihoodandits local
maximain the imageindicatethepresenceof objectsin thesecondframehaving representations
similar to BC definedin thefirst frame.If only spectralinformationis usedto characterizethetarget,
thesimilarity functioncanhave largevariationsfor adjacentlocationson theimagelatticeandthe
spatialinformationis lost. To find themaximaof suchfunctions,gradient-basedoptimizationpro-
ceduresaredifficult to applyandonly anexpensiveexhaustivesearchcanbeused.We regularize
thesimilarity functionby maskingtheobjectswith anisotropickernelin thespatialdomain.When
thekernelweights,carryingcontinuousspatialinformation,areusedin definingthefeaturespace
representations,BP � @ # becomesasmoothfunctionin @ .
5
2.1 TargetModel
A targetis representedby anellipsoidalregion in theimage.To eliminatetheinfluenceof different
targetdimensions,all targetsarefirst normalizedto aunit circle. This is achievedby independently
rescalingtherow andcolumndimensionswith UWV and UYX .Let
����Z[\ [ ��������� ]be the normalizedpixel locationsin the region definedasthe target model.
Theregion is centeredat ? . An isotropickernel,with a convex andmonotonicdecreasingkernel
profile4)�_^<#
1, assignssmallerweightsto pixels fartherfrom the center. Using theseweightsin-
creasesthe robustnessof the densityestimationsincethe peripheralpixelsarethe leastreliable,
beingoftenaffectedby occlusions(clutter)or interferencefrom thebackground.
Thefunction `ba"cedgf �+8<N\N�N A associatesto thepixel at location��Z[
theindex ` ����Z[�# of its
bin in thequantizedfeaturespace.Theprobabilityof thefeatureh ��8<N\N�N A in thetargetmodel
is thencomputedas
B>iE �Jj][ ��� 4)�Lk=� Z[Wk d #il R ` ��� Z[\#m6 h T � (2)
wherel
is theKronecker deltafunction.Thenormalizationconstantj
is derivedby imposingthe
conditionFE ��� B>�E �M8
, from where
jJ� 8][ ��� 4-�ik=� Z[Wk d #�
(3)
sincethesummationof deltafunctionsfor h �M8-N�N�N A is equalto one.
2.2 TargetCandidates
Let��� [ [ ��������� ]on
bethenormalizedpixel locationsof thetargetcandidate,centeredat @ in thecurrent
frame.Thenormalizationis inheritedfrom theframecontainingthetargetmodel.Usingthesame
kernelprofile4-�p^<#
, but with bandwidthU , theprobabilityof the featureh ��8-N�N\N A in thetarget
candidateis givenby
B0YE � @ #I�Jjrq]on[ ��� 4 @ 6s� [
Ud l R ` ��� [ #<6 h T � (4)
1Theprofile of akernel t is definedasa function ubvxw y%z�{}|*~D� suchthat t��;��|���u��i���)� � | .6
where jrq�� 8]n[ ��� 4-�ikx� �%�K�q k d # (5)
is thenormalizationconstant.Notethatjrq
doesnot dependon @ , sincethepixel locations� [
are
organizedin a regularlatticeand @ is oneof thelatticenodes.Therefore,jrq
canbeprecalculated
for a given kernel and different valuesof U . The bandwidth U definesthe scaleof the target
candidate,i.e., thenumberof pixelsconsideredin thelocalizationprocess.
2.3 Similarity Function Smoothness
Thesimilarity function(1) inheritsthepropertiesof thekernelprofile4-�p^<#
whenthetargetmodel
andcandidateare representedaccordingto (2) and (4). A differentiablekernelprofile yields a
differentiablesimilarity functionandefficientgradient-basedoptimizationsprocedurescanbeused
for finding its maxima.Thepresenceof thecontinuouskernelintroducesaninterpolationprocess
betweenthelocationson theimagelattice.Theemployedtargetrepresentationsdo not restrictthe
way similarity is measuredandvariousfunctionscanbeusedfor P . See[59] for anexperimental
evaluationof differenthistogramsimilarity measures.
3 Metric basedon Bhattacharyya Coefficient
The similarity function definesa distanceamongtarget modelandcandidates.To accommodate
comparisonsamongvarioustargets,this distanceshouldhave a metric structure.We definethe
distancebetweentwo discretedistributionsas
� � @ #r� 8r6 P�R BH � @ #L� BC<T � (6)
wherewechose
BP � @ #IQ P�R BH � @ #L� BC<T �F
E ��� B0YE � @ # B>iE � (7)
thesampleestimateof theBhattacharyyacoefficientbetweenH and C [43].
TheBhattacharyyacoefficient is adivergence-typemeasure[49] whichhasastraightforward
geometricinterpretation.It is thecosineof theanglebetweenthe A -dimensionalunit vectors� B0 ����N�N�N�� � B0 F �and
� B> �L�\N�N�N!� � B> F �. Thefactthat H and C aredistributionsis thusexplicitly
7
takeninto accountby representingthemontheunit hypersphere.At thesametimewecaninterpret
(7) asthe(normalized)correlationbetweenthevectors� B0 �i��N�N\N\� � B0 F �
and� B> �L��N\N�N\� � B> F �
.
Propertiesof theBhattacharyyacoefficientsuchasits relationto theFishermeasureof information,
quality of thesampleestimate,andexplicit formsfor variousdistributionsaregivenin [22, 43].
Thestatisticalmeasure(6) hasseveraldesirableproperties:
1. It imposesa metric structure(seeAppendix). The Bhattacharyyadistance[28, p.99] or
Kullbackdivergence[19, p.18]arenotmetricssincethey violateat leastoneof thedistance
axioms.
2. It hasa cleargeometricinterpretation.Note that the �<� histogrammetrics(including his-
togramintersection[71]) donot enforcetheconditionsFE ��� B>�E �98
andFE ��� B0KE �M8
.
3. It usesdiscretedensities,andthereforeit is invariantto thescaleof thetarget(up to quanti-
zationeffects).
4. It is valid for arbitrarydistributions,thusbeingsuperiorto the Fisherlinear discriminant,
which yieldsusefulresultsonly for distributionsthatareseparatedby themean-difference
[28, p.132].
5. It approximatesthechi-squaredstatistic,while avoiding thesingularityproblemof thechi-
squaretestwhencomparingemptyhistogrambins[2].
Divergencebasedmeasureswerealreadyusedin computervision. TheChernoff andBhat-
tacharyyaboundshavebeenemployedin [46] to determinetheeffectivenessof edgedetectors.The
Kullbackdivergencebetweenjoint distributionandproductof marginals(e.g.,themutualinforma-
tion) hasbeenusedin [72] for registration.Informationtheoreticmeasuresfor targetdistinctness
werediscussedin [29].
4 TargetLocalization
To find the locationcorrespondingto the target in the currentframe,the distance(6) shouldbe
minimizedasa functionof @ . The localizationprocedurestartsfrom thepositionof the target in
8
theprevious frame(themodel)andsearchesin theneighborhood.Sinceour distancefunction is
smooth,theprocedureusesgradientinformationwhich is providedby themeanshift vector[17].
More involvedoptimizationsbasedon theHessianof (6) canbeapplied[58].
Color informationwaschosenas the target feature,however, the sameframework canbe
usedfor texture and edges,or any combinationof them. In the sequelit is assumedthat the
following informationis available:(a) detectionandlocalizationin theinitial frameof theobjects
to track(targetmodels)[50, 8]; (b) periodicanalysisof eachobjectto accountfor possibleupdates
of thetargetmodelsdueto significantchangesin color [53].
4.1 DistanceMinimization
Minimizing thedistance(6) is equivalentto maximizingtheBhattacharyyacoefficient BP � @ # . The
searchfor thenew target locationin thecurrentframestartsat thelocation B@ of thetarget in the
previous frame. Thus,the probabilities� B0YE � B@ #L E ��������� F of the target candidateat location B@ in
thecurrentframehave to becomputedfirst. UsingTaylorexpansionaroundthevalues B0YE � B@ # , the
linearapproximationof theBhattacharyyacoefficient (7) is obtainedaftersomemanipulationsas
P�R BH � @ #L� BC-T+�8�F
E ��� B0YE � B@ # B>iE��8�F
E ��� B0YE� @ # B>�E
B0YE � B@ #N
(8)
Theapproximationis satisfactorywhenthetargetcandidate� B0YE � @ #L E ��������� F doesnotchangedrasti-
cally from theinitial� B0YE � B@ #L E ��������� F , which is mostoftenavalid assumptionbetweenconsecutive
frames.The condition B0YE � B@ #����(or somesmall threshold)for all h ��8-N�N\N A , canalwaysbe
enforcedby notusingthefeaturevaluesin violation. Recalling(4) resultsin
P�R BH � @ #L� BC-T+�8�F
E ��� B0YE � B@ # B>iE��jrq�] n[ ���"� [ 4 @ 6�� [
Ud
(9)
where
� [ �F
E ���B>iE
B0YE � B@ #l R ` ��� [ #<6 h T N (10)
Thus,to minimizethedistance(6), thesecondtermin (9) hasto bemaximized,thefirst termbeing
independentof @ . Observe that the secondterm representsthe densityestimatecomputedwith
kernelprofile4-�p^<#
at @ in thecurrentframe,with thedatabeingweightedby � [ (10). Themode
of this densityin the local neighborhoodis thesoughtmaximumwhich canbe foundemploying
9
themeanshift procedure[17]. In this procedurethekernelis recursively movedfrom thecurrent
location B@ to thenew location B@ � accordingto therelation
B@ � �]n[ ��� � [ � [�� �@S� � � �q d]on[ ��� � [�� �@m� � � �q d (11)
where� �_^<#��96�4 ��p^<#
, assumingthat thederivativeof4-�_^<#
exists for all^&¡ R �o��¢£# , exceptfor a
finite setof points.Thecompletetargetlocalizationalgorithmis presentedbelow.
BhattacharyyaCoefficient P¤R BH � @ #i� BC<T Maximization
Given:
thetargetmodel� B>iE E ��������� F andits location B@ in thepreviousframe.
1. Initialize the locationof the target in the currentframewith B@ , compute� B0YE � B@ #L E ��������� F ,
andevaluate
P¤R BH � B@ #i� BC<T �FE ��� B0YE � B@ # B>�E N
2. Derive theweights� � [ [ ��������� ]n accordingto (10).
3. Find thenext locationof thetargetcandidateaccordingto (11).
4. Compute� B0YE � B@ � #L E ��������� F , andevaluate
P¤R BH � B@ � #i� BC<T �FE ��� B0YE � B@ � # B>�E N
5. While P�R BH � B@ � #L� BC-T+¥ P�R BH � B@ #L� BC<TDo B@ �I¦
�d� B@ � B@ � #
Evaluate P¤R BH � B@ � #L� BC<T6. If
k B@ � 6 B@ k ¥J§ Stop.
Otherwise Set B@ ¦ B@ � andgo to Step2.
10
4.2 Implementation of the Algorithm
Thestoppingcriterionthreshold§ usedin Step6 is derivedby constrainingthevectors B@ and B@ �to bewithin thesamepixel in original imagecoordinates.A lower thresholdwill inducesubpixel
accuracy. From real-timeconstraints(i.e., uniform CPU load in time), we alsolimit the number
of meanshift iterationsto ¨ FS© V , typically taken equalto 20. In practicethe averagenumberof
iterationsis muchsmaller, about4.
Implementationof thetrackingalgorithmcanbemuchsimplerthanaspresentedabove. The
roleof Step5 is only to avoid potentialnumericalproblemsin themeanshift basedmaximization.
Theseproblemscanappeardueto thelinearapproximationof theBhattacharyyacoefficient. How-
ever, a largesetof experimentstrackingdifferentobjectsfor long periodsof time,hasshown that
theBhattacharyyacoefficient computedat thenew location B@ � failed to increasein only�KNª8!«
of
thecases.Therefore,theStep5 is not usedin practice,andasa result,thereis noneedto evaluate
theBhattacharyyacoefficient in Steps1 and4.
In the practicalalgorithm, we only iterateby computingthe weights in Step2, deriving
the new locationin Step3, andtestingthe sizeof the kernelshift in Step6. The Bhattacharyya
coefficient is computedonly afterthealgorithmcompletionto evaluatethesimilarity betweenthe
targetmodelandthechosencandidate.
Kernelswith Epanechnikov profile [17]
4-�_^<#I� �dY¬��� � � � � #��i8r6®^<# if
^°¯M8�otherwise
(12)
arerecommendedto beused.In this casethederivative of theprofile,� �_^<#
, is constantand(11)
reducesto
B@ � �]n[ ��� � [ � []n[ ��� � [
�(13)
i.e.,asimpleweightedaverage.
The maximizationof the Bhattacharyyacoefficient can be also interpretedas a matched
filtering procedure. Indeed,(7) is the correlationcoefficient betweenthe unit vectors� BC and
BH � @ # , representingthetargetmodelandcandidate.Themeanshift procedurethusfindsthelocal
maximumof thescalarfield of correlationcoefficients.
11
Will call the operational basinof attraction the region in the currentframe in which the
new location of the target can be found by the proposedalgorithm. Due to the useof kernels
this basinis at leastequalto thesizeof the targetmodel. In otherwords,if in thecurrentframe
the centerof the target remainsin the imageareacoveredby the target model in the previous
frame,thelocalmaximumof theBhattacharyyacoefficient is areliableindicatorfor thenew target
location.Weassumethatthetargetrepresentationprovidessufficientdiscrimination,suchthatthe
Bhattacharyyacoefficientpresentsauniquemaximumin thelocal neighborhood.
The meanshift procedurefinds a root of the gradientas function of location,which can,
however, alsocorrespondto asaddlepointof thesimilarity surface.Thesaddlepointsareunstable
solutions,andsincetheimagenoiseactsasanindependentperturbationfactoracrossconsecutive
frames,they cannotinfluencethetackingperformancein animagesequence.
4.3 AdaptiveScale
Accordingto thealgorithmdescribedin Section4.1, for a giventargetmodel,the locationof the
targetin thecurrentframeminimizesthedistance(6) in theneighborhoodof thepreviouslocation
estimate.However, the scaleof the target often changesin time, andthus in (4) the bandwidth
U of thekernelprofile hasto beadaptedaccordingly. This is possibledueto thescaleinvariance
propertyof (6).
Denoteby UY�:±ª²ª³ thebandwidthin thepreviousframe.WemeasurethebandwidthUY´µ��¶ in the
currentframeby runningthetargetlocalizationalgorithmthreetimes,with bandwidthsU � UY�:±ª²ª³ ,U � UY�:±ª²ª³��¸·¹U , and U � UY�:±ª²ª³ 6 ·¹U . A typical valueis ·¹U �º�KNª8 UW��±µ²�³ . Thebestresult, UY´µ��¶ ,yieldingthelargestBhattacharyyacoefficient,is retained.To avoid over-sensitivescaleadaptation,
thebandwidthassociatedwith thecurrentframeis obtainedthroughfiltering
U ] ²�» ��¼ UY´µ��¶+� �i8r6®¼m# UW��±µ²�³ (14)
wherethedefaultvaluefor¼
is 0.1.Notethatthesequenceof U ] ²�» containsimportantinformation
aboutthedynamicsof thetargetscalewhich canbefurtherexploited.
12
4.4 Computational Complexity
Let N betheaveragenumberof iterationsper frame. In Step2 thealgorithmrequiresto compute
therepresentation� B0YE � @ #i E ��������� F . Weightedhistogramcomputationhasroughly thesamecostas
theunweightedhistogramsincethekernelvaluesareprecomputed.In Step3 thecentroid(13) is
computedwhichinvolvesaweightedsumof itemsrepresentingthesquare-rootof adivisionof two
terms. We concludethat themeancostof theproposedalgorithmfor onescaleis approximately
givenby jr½¾� ¨ � ¬:¿ �®Àq¬�Á# � ¨7À q ¬�Á (15)
where ¬:¿ is thecostof thehistogramand ¬�Á thecostof anaddition,a square-root,anda division.
We assumethat thenumberof histogramentriesA andthenumberof targetpixels À q arein the
samerange.
It is of interestto comparethecomplexity of thenew algorithmwith thatof targetlocalization
without gradientoptimization,asdiscussedin [25]. Thesearchareais assumedto beequalto the
operationalbasinof attraction,i.e., a region covering the targetmodelpixels. Thefirst stepis to
computeÀ q histograms.Assumethateachhistogramis derivedin asquaredwindow of À q pixels.
To implementthecomputationefficiently weobtaina targethistogramandupdateit by sliding the
window À q times(� À q horizontalstepstimes
� À q verticalsteps).For eachmove� � À q additions
areneededto updatethehistogram,hence,theeffort is ¬:¿ �� À q � À q ¬: , where¬: is thecostof an
addition. Thesecondstepis to computeÀ q Bhattacharyyacoefficients. This canalsobedoneby
computingonecoefficient andthenupdatingit sequentially. Theeffort is A ¬�Á �� À q � À q ¬�Á . The
total effort for targetlocalizationwithoutgradientoptimizationis then
jrÃĽÅ�¬:¿ �
� À q � À q ¬� �� A�� � À q � À qo# ¬�Á �
� À q � À q ¬:ÁN
(16)
Theratiobetween(16)and(15) is�<Æ � À qKÇ ¨ . In atypicalsetting(asit will beshown in Section5)
thetargethasaboutÈ �IÉ È � pixels(i.e.,� À q�� È � ) andthemeannumberof iterationsis ¨ �ËÊKNª8!Ì
.
Thustheproposedoptimizationtechniquereducesthecomputationaltime�rÆ È �KÇÊoNª8�Ì � � Ê
times.
An optimizedimplementationof thenew trackingalgorithmhasbeentestedon a1GHzPC.
Thebasicframework with scaleadaptation(which involvesthreeoptimizationsat eachstep)runs
at a rateof 150 fps allowing simultaneoustrackingof up to five targetsin real time. Note that
withoutscaleadaptationsthesenumbersshouldbemultiplied by three.
13
Figure1: Football sequence,trackingplayerno. 75 . The frames30, 75, 105,140,and150areshown.
0 50 100 150
2
4
6
8
10
12
14
16
18
Frame Index
Mea
n S
hift
Itera
tions
Figure2: The numberof meanshift iterationsfunction of the frameindex for the Football se-quence.Themeannumberof iterationsis
ÊoNª8�Ìperframe.
5 Experimental Results
The kernel-basedvisual tracker wasappliedto many sequencesand it is integratedinto several
applications.Herewe just presentsomerepresentative results.In all but the lastexperimentsthe
RGB color spacewastaken asfeaturespaceandit wasquantizedinto8�Í}ÉD8�ÍÎÉJ8!Í
bins. The
algorithmwasimplementedasdiscussedin Section4.2. TheEpanechnikov profile wasusedfor
histogramcomputationsandthemeanshift iterationswerebasedon weightedaverages.
TheFootballsequence(Figure1) has8 È Ê framesof ÏoÈ � É � Ê� pixels,andthemovementof
playerno. 75 wastracked. The targetwasinitialized with a hand-drawn elliptical region (frame
14
−40−20
020
40
−40−2002040
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
XY
Bha
ttach
aryy
a C
oeffi
cien
t
Initial location Convergence point
Figure3: The similarity surface(valuesof the Bhattacharyyacoefficient) correspondingto therectanglemarkedin frame
8�� È of Figure1. Theinitial andfinal locationsof themeanshift itera-tionsarealsoshown.
Ï � ) of size Ð 8gÉ ÈoÏ (yielding normalizationconstantsequalto� UYV � UYX #Ñ��� ÏÈ � � Ío#�# . Thekernel-
basedtracker provesto berobust to partialocclusion,clutter, distractors(frame8�Êo�
) andcamera
motion. Sinceno motionmodelhasbeenassumed,the tracker adaptedwell to thenonstationary
characterof the player’s movementswhich alternatesabruptlybetweenslow andfastaction. In
addition,theintenseblurringpresentin someframesdueto thecameramotion,doesnot influence
the tracker performance(frame8 È � ). Thesameconditions,however, canlargely perturbcontour
basedtrackers.
Thenumberof meanshift iterationsnecessaryfor eachframe(onescale)is shown in Fig-
ure2. Thetwo centralpeakscorrespondto themovementof theplayerto thecenterof theimage
andbackto theleft side.Thelastandlargestpeakis dueto a fastmove from theleft to theright.
In all thesecasestherelative largemovementbetweentwo consecutive framesputsmoreburden
on themeanshift procedure.
To demonstratethe efficiency of our approach,Figure3 presentsthe surfaceobtainedby
computingthe Bhattacharyyacoefficient for the Ò 8¹É Ò 8 pixels rectanglemarked in Figure 1,
frame8�� È . The targetmodel(theelliptical region selectedin frame Ï � ) hasbeencomparedwith
thetargetcandidatesobtainedby sweepingin frame8�� È theelliptical region insidetherectangle.
The surfaceis asymmetricdueto neighboringcolors that aresimilar to the target. While most
of thetrackingapproachesbasedon regions[7, 27, 50] mustperformanexhaustive searchin the
rectangleto find themaximum,our algorithmconvergedin four iterationsasshown in Figure3.
Notethattheoperationalbasinof attractionof themodecoverstheentirerectangularwindow.
15
Figure4: Subway-1sequence.Theframes3140,3516,3697,5440,6081,and6681areshown.
3000 3500 4000 4500 5000 5500 6000 6500 70000
0.1
0.2
0.3
0.4
0.5
0.6
0.7
Frame Index
d
Figure5: The minimum valueof distance�
function of the frame index for the Subway-1se-quence.
16
Figure6: Subway-2sequence.Theframes30,240,330,450,510,and600areshown.
In controlledenvironmentswith fixedcamera,additionalgeometricconstraints(suchasthe
expectedscale)andbackgroundsubtraction[24] canbeexploitedto improvethetrackingprocess.
TheSubway-1sequence( Figure4) is suitablefor suchanapproach,however, theresultspresented
herehasbeenprocessedwith the algorithmunchanged.This is a�
minutesequencein which a
personis trackedfrom themomentsheentersthesubwayplatformtill shegetson thetrain (about
3600frames).The trackingis mademorechallengingby the low quality of thesequencedueto
imagecompressionartifacts.Notethechangesin thesizeof thetrackedtarget.
Theminimumvalueof thedistance(6) for eachframe,i.e., thedistancebetweenthetarget
modelandthechosencandidate,is shown in Figure5. Thecompressionnoiseelevatestheresidual
distancevaluefrom�
(perfectmatch)to a about�oN Ï . Significantdeviationsfrom this valuecorre-
spondto occlusionsgeneratedby otherpersonsor rotationsin depthof thetarget(largechangesin
therepresentation).For example,the� � �KN_Í
peakcorrespondsto thepartialocclusionin frame
Ï ÍoÌ Ð . At the endof the sequence,the personbeingtracked getson the train, which producesa
completeocclusion.
TheshorterSubway-2sequence(Figure6) of about600framesis evenmoredifficult since
17
Figure7: Mug sequence.Theframes60,150,240,270,360,and960areshown.
thecameraquality is worseandtheamountof compressionis higher, introducingclearlyvisible
artifacts.Nevertheless,thealgorithmwasstill ableto trackapersonthroughtheentiresequence.
In theMugsequence(Figure7) of about1000framestheimageof acup(frame60)wasused
astargetmodel.Thenormalizationconstantswere� UYV �ËÊoÊK� UWX �JÍoÊo#
. Thetracker wastestedfor
fastmotion (frame150),dramaticchangesin appearance(frame270), rotations(frame270)and
scalechanges(frames360-960).
6 Extensionsof the Tracking Algorithm
We presentthreeextensionsof the basicalgorithm: integrationof the backgroundinformation,
Kalmantracking,andanapplicationto facetracking.It shouldbeemphasized,however, thatthere
aremany otherpossibilitiesthroughwhich thevisualtrackercanbefurtherenhanced.
6.1 Background-WeightedHistogram
The backgroundinformation is importantfor at leasttwo reasons.First, if someof the target
featuresare also presentin the background,their relevancefor the localizationof the target is
diminished. Second,in many applicationsit is difficult to exactly delineatethe target, and its
model might containbackgroundfeaturesas well. At the sametime, the improperuseof the
18
backgroundinformationmayaffect thescaleselectionalgorithm,makingimpossibleto measure
similarity acrossscales,hence,to determinetheappropriatetargetscale.Ourapproachis to derive
asimplerepresentationof thebackgroundfeatures,andto useit for selectingonly thesalientparts
from therepresentationsof thetargetmodelandtargetcandidates.
Let� BÓ E E ��������� F (with
FE ��� BÓ E �Ô8
) bethediscreterepresentation(histogram)of theback-
groundin thefeaturespaceand BÓÕ beits smallestnonzeroentry. This representationis computed
in a region aroundthe target. The extent of the region is applicationdependentandwe usedan
areaequalto threetimesthetargetarea.Theweights
Ö E �Ë×7Ø�Ù BÓ�ÕBÓ E�o8
E ��������� F (17)
aresimilar in conceptto the ratio histogramcomputedfor backprojection[71]. However, in our
casetheseweightsareonly employedto defineatransformationfor therepresentationsof thetarget
modelandcandidates.Thetransformationdiminishestheimportanceof thosefeatureswhichhave
low Ö E , i.e.,areprominentin thebackground.Thenew targetmodelrepresentationis thendefined
by
B>�E �Ëj Ö E][ ��� 4-�ik=� Z[Yk d #il R ` �;� Z[�#<6 h T (18)
with thenormalizationconstantj
expressedas
j¸� 8][ ��� 4-�Lk=� Z[ k d #FE ��� Ö E l R ` ��� Z[ #<6 h T
N(19)
Comparewith (2) and(3). Similarly, thenew targetcandidaterepresentationis
B0YE � @ #O�Jjrq Ö E]on[ ��� 4 @ 6�� [
Ud l R ` ��� [ #<6 h T (20)
wherenowjrq
is givenby
jrqg� 8] n[ ��� 4-�Lkx� �%�o�q k d #FE ��� Ö E l R ` ��� [ #<6 h T
N(21)
An importantbenefitof usingbackground-weightedhistogramsis shown for the Ball se-
quence(Figure8). Themovementof theping-pongball from frameto frameis largerthanits size.
Applying the techniquedescribedabove, the target modelcanbe initialized with a� 8gÉ Ï 8 size
region (frame2), largerthanoneobtainedby accuratetargetdelineation.Thelargerregion yields
19
Figure8: Ball sequence.Theframes2, 12,16,26,32,40,48,and51 areshown.
a satisfactoryoperationalbasinof attraction,while theprobabilitiesof thecolorsthatarepartof
thebackgroundareconsiderablyreduced.Theball is reliably trackedover theentiresequenceofÍo�frames.
Thelastexampleis alsotakenfrom theFootball sequence.This time theheadandshoulder
of playerno. È Ì is tracked (Figure9). Note the changesin target appearancealong the entire
sequenceandtherapidmovementsof thetarget.
6.2 Kalman Prediction
It was alreadymentionedin Section1 that the Kalman filter assumesthat the noisesequences "�and
,-�areGaussianandthe functions
�!�and
'-�arelinear. The dynamicequationbecomes���Ú�¸Û��Ü����� � "����� while themeasurementequationis$��Ú�¸ÝÞ��� � ,)� . Thematrix
Ûis calledthe
systemmatrix andÝ
is themeasurementmatrix. As in thegeneralcase,theKalmanfilter solves
thestateestimationproblemin two steps:predictionandupdate.For moredetailssee[5, p.56].
Thekernel-basedtargetlocalizationmethodwasintegratedwith theKalmanfiltering frame-
work. For a fasterimplementation,two independenttrackersweredefinedfor horizontalandver-
tical movement. A constant-velocity dynamicmodelwith accelerationaffectedby white noise
20
Figure9: Football sequence,trackingplayerno. 59. The frames70, 96, 108,127,140,147areshown.
[5, p.82] hasbeenassumed.The uncertaintyof the measurementshasbeenestimatedaccording
to [55]. The ideais to normalizethe similarity surfaceandrepresentit asa probability density
function.Sincethesimilarity surfaceis smooth,for eachfilter only 3 measurementsaretakeninto
account,oneat theconvergencepoint (peakof thesurface)andtheothertwo atadistanceequalto
half of thetargetdimension,measuredfrom thepeak.We fit a scaledGaussianto thethreepoints
andcomputethemeasurementuncertaintyasthestandarddeviationof thefitted Gaussian.
A first setof trackingresultsincorporatingtheKalmanfilter is presentedin Figure 10 for
the 120 framesHand sequencewherethe dynamicmodel is assumedto be affectedby a noise
with standarddeviation equalto È . Thesizeof thegreencrossmarkedon thetarget indicatesthe
stateuncertaintyfor thetwo trackers.Observe thattheoverall algorithmis ableto trackthetarget
(hand)in thepresenceof completeocclusionby asimilarobject(theotherhand).Thepresenceof a
similarobjectin theneighborhoodincreasesthemeasurementuncertaintyin frame46,determining
an increasein stateuncertainty. In Figure 11a we presentthe measurements(dotted)and the
estimatedlocationof the target (continuous). Note that the graphis symmetricwith respectto
the numberof framessincethe sequencehasbeenplayedforward andbackward. The velocity
associatedwith thetwo trackersis shown in Figure11b.
A secondsetof resultsshowing trackingwith Kalmanfilter is displayedin Figure12. The
sequencehas8 ÒoÐ framesof Ï � �}É � Êo� pixels eachandthe initial normalizationconstantswere
21
Figure10: Handsequence.Theframes40,46,50,and57areshown.
0 20 40 60 80 100 120190
200
210
220
230
240
250
260
270
Frame Index
Mea
sure
men
t and
Est
imat
ed L
ocat
ion
0 20 40 60 80 100 120−6
−4
−2
0
2
4
6
Frame Index
Est
imat
ed V
eloc
ity
(a) (b)
Figure11: Measurementsandestimatedstatefor Hand sequence.(a) The measurementvalue(dottedcurve) andtheestimatedlocation(continuouscurve) functionof the frameindex. Uppercurvescorrespondto the y filter, while the lower curvescorrespondto thex filter. (b) Estimatedvelocity. Dottedcurve is for they filter andcontinuouscurve is for thex filter
22
Figure12: Subway-3sequence:Theframes470,529,580,624,and686areshown.
� UYV � UWX #e���i88\�K8 Ò # . Observe theadaptationof thealgorithmto scalechanges.Theuncertaintyof
thestateis indicatedby thegreencross.
6.3 FaceTracking
Weappliedtheproposedframework for real-timefacetracking.Thefacemodelis anelliptical re-
gionwhosehistogramis representedin theintensitynormalizedß � spacewith8 � Ò É�8 � Ò bins.To
adaptto fastscalechangeswe alsoexploit thegradientinformationin thedirectionperpendicular
to the borderof the hypothesizedfaceregion. The scaleadaptationis thusdeterminedby maxi-
mizing the sumof two normalizedscores,basedon color andgradientfeatures,respectively. In
Figure13wepresentthecapabilityof thekernel-basedtracker to handlescalechanges,thesubject
turningaway (frame150),in-planerotationsof thehead(frame498),andforeground/background
saturationdue to back-lighting(frame 576). The tracked faceis shown in the small upper-left
window.
23
Figure13: Facesequence:Theframes39,150,163,498,576,and619areshown.
7 Discussion
The kernel-basedtracking techniqueintroducedin this paperusesthe basinof attractionof the
similarity function. This functionis smoothsincethetargetrepresentationsarederivedfrom con-
tinuousdensities.Severalexamplesvalidatetheapproachandshow its efficiency. Extensionsof
the basicframework were presentedregardingthe useof backgroundinformation, Kalman fil-
tering, and facetracking. The new techniquecanbe further combinedwith moresophisticated
filtering andassociationapproachessuchasmultiplehypothesistracking[13].
Centroidcomputationhasbeenalsoemployed in [53]. Themeanshift is usedfor tracking
humanfacesby projectingthehistogramof a facemodelontotheincomingframe[10]. However,
thedirectprojectionof themodelhistogramonto thenew framecanintroducea largebiasin the
estimatedlocationof the target, andthe resultingmeasureis scalevariant(see[37, p.262] for a
discussion).
We mentionthatsinceits original publicationin [18] the ideaof kernel-basedtrackinghas
beenexploited anddevelopedforward by variousresearchers.ChenandLiu [14] experimented
with thesamekernel-weightedhistograms,but employedtheKullback-Leiblerdistanceasdissim-
ilarity while performingtheoptimizationbasedon trust-region methods.HaritaogluandFlickner
24
[35] usedanappearancemodelbasedoncolorandedgedensityin conjunctionwith akerneltracker
for monitoringshoppinggroupsin stores.Yilmazetal. [78] combinedkerneltrackingwith global
motion compensationfor forward-lookinginfrared(FLIR) imagery. Xu andFujimura[77] used
night vision for pedestriandetectionandtracking,wherethedetectionis performedby a support
vectormachineandthetrackingis kernel-based.Raoet al.[61] employedkerneltrackingin their
systemfor actionrecognition,while Caenenet al. [12] followed the sameprinciple for texture
analysis.Thebenefitsof guidingrandomparticlesby gradientoptimizationarediscussedin [70]
andaparticlefilter for colorhistogramtrackingbasedon themetric(6) is implementedin [57].
Finally we would like to add a word of caution. The tracking solution presentedin this
paperhasseveraldesirableproperties:it is efficient,modular, hasstraightforwardimplementation,
and provides superiorperformanceon most imagesequences.Nevertheless,we note that this
techniqueshouldnot be appliedblindly. Instead,the versatility of the basicalgorithm should
be exploited to integratea priori information which is almostalways availablewhena specific
applicationis considered.For example,if themotionof the target from frameto frameis known
to be larger thanthe operationalbasinof attraction,oneshouldinitialize the tracker in multiple
locationsin theneighborhoodof basinof attraction,accordingto themotionmodel. If occlusions
arepresent,oneshouldemploy a moresophisticatedmotion filter. Similarly, oneshouldverify
that the chosentarget representationis sufficiently discriminantfor the applicationdomain. The
kernel-basedtrackingtechnique,whencombinedwith prior task-specificinformation,canachieve
reliableperformance.2
APPENDIX
Proof that the distance� � BH � BC #r� 8r6 P � BH � BC # is a metric
Theproof is basedon thepropertiesof theBhattacharyyacoefficient (7). Accordingto the
Jensen’s inequality[19, p.25]wehave
P � BH � BC #<�F
E ��� B0KE B>iE �F
E ��� B0KEB>�EB0YE¯
F
E ��� B>�E�à8\�
(A.1)
2A patentapplicationhasbeenfiled covering the tracking algorithm togetherwith the extensionsand variousapplications[79].
25
with equalityif f BH � BC . Therefore,� � BH � BC #e� 8I6 P � BH � BC # existsfor all discretedistributions BH
and BC , is positive,symmetric,andis equalto zeroif f BH � BC .
ToprovethetriangleinequalityconsiderthreediscretedistributionsdefiningtheA -dimensional
vectors BH , BC , and Bá , associatedwith thepoints â � � � B0 �i��N�N�N�� � B0 F �, â*ã � � B> �L��N\N�N\� � B> F �
,
and â ± � � Bß �L��N\N�N�� � Bß F �on the unit hypersphere.From the geometricinterpretationof the
Bhattacharyyacoefficient, thetriangleinequality
� � BH � Bá # � � � BC � Bá #Iä � � BH � BC # (A.2)
is equivalentto
8r6¬ Ó�å
� â � � â ± # � 8r6¬ Ó�å
� â ã � â ± #Iä 8r6¬ Ó�å
� â � � â ã #æN (A.3)
If we fix the points â � and â ã , andthe anglebetweenâ � and â ± , the left sideof inequality
(A.3) is minimizedwhenthevectorsâ � , â ã , andâ ± lie in thesameplane.Thus,theinequality(A.3)
canbereducedto a�-dimensionalproblemthatcanbeeasilyprovenby employing thehalf-angle
sinusformulaanda few trigonometricmanipulations.
Acknowledgment
A preliminaryversionof thepaperwaspresentedatthe2000IEEEConferenceonComputerVision
and Pattern Recognition, Hilton Head,SC. PeterMeer wassupportedby the National Science
FoundationundertheawardIRI 99-87695.
References
[1] J.Aggarwal andQ.Cai,“Humanmotionanalysis:A review,” ComputerVisionandImageUnderstand-ing, vol. 73,pp.428–440,1999.
[2] F. Aherne,N. Thacker, andP. Rockett, “The Bhattacharyyametricasan absolutesimilarity measurefor frequency codeddata,” Kybernetika, vol. 34,no.4, pp.363–368,1998.
[3] S. Arulampalam,S. Maskell, N. Gordon,andT. Clapp,“A tutorial on particlefilters for on-linenon-linear/non-GaussianBayesiantracking,” IEEE Trans.Signal Process., vol. 50, no. 2, pp. 174–189,2002.
[4] S. Avidan, “Supportvectortracking,” in Proc. IEEE Conf. on ComputerVision andPatternRecogni-tion, Kauai,Hawaii, volumeI, 2001,pp.184–191.
[5] Y. Bar-ShalomandT. Fortmann,TrackingandDataAssociation. AcademicPress,1988.
26
[6] B. Bascleand R. Deriche,“Region tracking throughimagesequences,” in Proc. 5th Intl. Conf. onComputerVision,Cambridge,MA, 1995,pp.302–307.
[7] S. Birchfield, “Elliptical headtrackingusingintensitygradientsandcolor histograms,” in Proc. IEEEConf. on ComputerVisionandPatternRecognition,SantaBarbara,CA, 1998,pp.232–237.
[8] M. BlackandD. Fleet,“Probabilisticdetectionandtrackingof motionboundaries,” Intl. J. of ComputerVision, vol. 38,no.3, pp.231–245,2000.
[9] Y. Boykov andD. Huttenlocher, “Adaptive Bayesianrecognitionin trackingrigid objects,” in Proc.IEEE Conf. onComputerVisionandPatternRecognition,Hilton Head,SC,2000,pp.697–704.
[10] G. R. Bradski,“Computervision facetrackingasacomponentof aperceptualuserinterface,” in Proc.IEEE WorkshoponApplicationsof ComputerVision,Princeton,NJ,October1998,pp.214–219.
[11] A. D. Bue,D. Comaniciu,V. Ramesh,andC. Regazzoni,“Smartcameraswith real-timevideoobjectgeneration,” in Proc. IEEE Intl. Conf. on Image Processing, Rochester, NY, volume III, 2002, pp.429–432.
[12] G. Caenen,V. Ferrari,A. Zalesny, andL. VanGool,“Analyzing the layoutof compositetextures,” inProceedingsTexture 2002Workshop,Copenhagen,Denmark,2002,pp.15–19.
[13] T. ChamandJ. Rehg,“A multiple hypothesisapproachto figure tracking,” in Proc. IEEE Conf. onComputerVision andPatternRecognition,Fort Collins,CO,volumeII, 1999,pp.239–219.
[14] H. ChenandT. Liu, “Trust-regionmethodsfor real-timetracking,” in Proc.8thIntl. Conf. onComputerVision,Vancouver, Canada,volumeII, 2001,pp.717–722.
[15] Y. Chen,Y. Rui, andT. Huang,“JPDAF-basedHMM for real-timecontourtracking,” in Proc. IEEEConf. on ComputerVisionandPatternRecognition,Kauai,Hawaii, volumeI, 2001,pp.543–550.
[16] R. Collins, A. Lipton, H. Fujiyoshi,andT. Kanade,“Algorithmsfor cooperative multisensorsurveil-lance,” Proceedingsof theIEEE, vol. 89,no.10,pp.1456–1477,2001.
[17] D. Comaniciuand P. Meer, “Mean shift: A robust approachtoward featurespaceanalysis,” IEEETrans.PatternAnal.MachineIntell., vol. 24,no.5, pp.603–619,2002.
[18] D. Comaniciu,V. Ramesh,andP. Meer, “Real-timetrackingof non-rigid objectsusingmeanshift,”in Proc. IEEE Conf. on ComputerVision andPatternRecognition, Hilton Head,SC,volumeII, June2000,pp.142–149.
[19] T. CoverandJ.Thomas,Elementsof InformationTheory. JohnWiley & Sons,New York, 1991.
[20] I. CoxandS.Hingorani,“An efficient implementationof Reid’smultiplehypothesistrackingalgorithmanditsevaluationfor thepurposeof visualtracking,” IEEETrans.PatternAnal.MachineIntell., vol. 18,no.2, pp.138–150,1996.
[21] D. DeCarloandD. Metaxas,“Optical flow constraintsondeformablemodelswith applicationsto facetracking,” Intl. J. of ComputerVision, vol. 38,no.2, pp.99–127,2000.
[22] A. Djouadi, O. Snorrason,and F. Garber, “The quality of training-sampleestimatesof the Bhat-tacharyyacoefficient,” IEEE Trans.PatternAnal.MachineIntell., vol. 12,pp.92–97,1990.
[23] A. Doucet,S. Godsill, andC. Andrieu, “On sequentialMonteCarlosamplingmethodsfor Bayesianfiltering,” StatisticsandComputing, vol. 10,no.3, pp.197–208,2000.
[24] A. Elgammal,D. Harwood, and L. Davis, “Non-parametricmodel for backgroundsubtraction,” inProc.EuropeanConf. on ComputerVision,Dublin, Ireland,volumeII, June2000,pp.751–767.
[25] F. EnnesserandG. Medioni, “Finding Waldo, or focusof attentionusing local color information,”IEEE Trans.PatternAnal.MachineIntell., vol. 17,no.8, pp.805–809,1995.
27
[26] V. Ferrari,T. Tuytelaars,andL. V. Gool, “Real-timeaffine region trackingandcoplanargrouping,” inProc. IEEE Conf. on ComputerVision andPatternRecognition, Kauai,Hawaii, volumeII, 2001,pp.226–233.
[27] P. FieguthandD. Teropoulos,“Color-basedtrackingof headsandothermobileobjectsat videoframerates,” in Proc.IEEEConf. onComputerVisionandPatternRecognition,SanJuan,PuertoRico,1997,pp.21–27.
[28] K. Fukunaga,Introductionto StatisticalPatternRecognition. AcademicPress,secondedition,1990.
[29] J. Garcia,J. Valdivia, and X. Vidal, “Information theoreticmeasurefor visual target distinctness,”IEEE Trans.PatternAnal.MachineIntell., vol. 23,no.4, pp.362–383,2001.
[30] D. Gavrila, “The visualanalysisof humanmovement:A survey,” ComputerVisionandImage Under-standing, vol. 73,pp.82–98,1999.
[31] N. Gordon,D. Salmond,andA. Smith, “A novel approachto non-linearandnon-GaussianBayesianstateestimation,” IEE Proceedings-F, vol. 140,pp.107–113,1993.
[32] M. Greiffenhagen,D. Comaniciu,H. Niemann,andV. Ramesh,“Design,analysisandengineeringofvideomonitoringsystems:An approachanda casestudy,” Proceedingsof the IEEE, vol. 89, no. 10,pp.1498–1517,2001.
[33] G. Hagerand P. Belhumeur, “Real-time tracking of imageregions with changesin geometryandillumination,” in Proc. IEEE Conf. on ComputerVision andPatternRecognition, SanFrancisco,CA,1996,pp.403–410.
[34] U. Handmann,T. Kalinke, C. Tzomakas,M. Werner, andW. von Seelen,“Computervision for driverassistancesystems,” in ProceedingsSPIE, volume3364,1998,pp.136–147.
[35] I. HaritaogluandM. Flickner, “Detectionandtrackingof shoppinggroupsin stores,” in Proc. IEEEConf. on ComputerVisionandPatternRecognition,Kauai,Hawaii, 2001.
[36] I. Haritaoglu,D. Harwood,andL. Davis, “W4: Who? When?Where?What? - A real time systemfor detectingandtrackingpeople,” in Proc.of Intl. Conf. on AutomaticFaceandGesture Recognition,Nara,Japan,1998,pp.222–227.
[37] J. Huang,S. Kumar, M. Mitra, W. Zhu, andR. Zabih,“Spatialcolor indexing andapplications,” Intl.J. of ComputerVision, vol. 35,no.3, pp.245–268,1999.
[38] C. Hue,J.Cadre,andP. Perez,“SequentialMonteCarlofiltering for multiple target trackinganddatafusion,” IEEETrans.SignalProcess., vol. 50,no.2, pp.309–325,2002.
[39] S. Intille, J. Davis, andA. Bobick, “Real-timeclosed-world tracking,” in Proc. IEEE Conf. on Com-puterVisionandPatternRecognition,SanJuan,PuertoRico,1997,pp.697–703.
[40] M. IsardandA. Blake, “Condensation- Conditionaldensitypropagationfor visual tracking,” Intl. J.of ComputerVision, vol. 29,no.1, 1998.
[41] A. Jepson,D. Fleet,andT. El-Maraghi, “Robust online appearancemodelsfor visual tracking,” inProc. IEEEConf. onComputerVisionandPatternRecognition,Hawaii, volumeI, 2001,pp.415–422.
[42] S.JulierandJ.Uhlmann,“A new extensionof theKalmanfilter to nonlinearsystems,” in Proc.SPIE,volume3068,1997,pp.182–193.
[43] T. Kailath, “The divergenceandBhattacharyyadistancemeasuresin signalselection,” IEEE Trans.Commun.Tech., vol. 15,pp.52–60,1967.
[44] V. Kettnaker andR. Zabih,“Bayesianmulti-camerasurveillance,” in Proc. IEEE Conf. on ComputerVisionandPatternRecognition,Fort Collins,CO,1999,pp.253–259.
28
[45] G. Kitagawa, “Non-Gaussianstate-spacemodelingof nonstationarytime series,” J. of Amer. Stat.Assoc., vol. 82,pp.1032–1063,1987.
[46] S.Konishi,A. Yuille, J.Coughlan,andS.Zhu, “Fundamentalboundson edgedetection:An informa-tion theoreticevaluationof differentedgecues,” in Proc. IEEE Conf. on ComputerVision andPatternRecognition,Fort Collins,1999,pp.573–579.
[47] J. Krumm, S. Harris, B. Meyers, B. Brumitt, M. Hale, and S. Shafer, “Multi-cameramulti-persontrackingfor EasyLiving,” in Proc. IEEE Intl. Workshopon Visual Surveillance, Dublin, Ireland,2000,pp.3–10.
[48] B. Li andR. Chellappa,“Simultaneoustrackingandverificationvia sequentialposteriorestimation,”in Proc. IEEE Conf. on ComputerVision andPatternRecognition, Hilton Head,SC,volumeII, 2000,pp.110–117.
[49] J.Lin, “Di vergencemeasuresbasedontheShannonentropy,” IEEETrans.InformationTheory, vol. 37,pp.145–151,1991.
[50] A. Lipton, H. Fujiyoshi,andR. Patil, “Moving targetclassificationandtrackingfrom real-timevideo,”in IEEEWorkshopon Applicationsof ComputerVision,Princeton,NJ,1998,pp.8–14.
[51] J.MacCormickandA. Blake, “A probabilisticexclusionprinciplefor trackingmultiple objects,” Intl.J. of ComputerVision, vol. 39,no.1, pp.57–71,2000.
[52] R. Mahler, “Engineeringstatisticsfor multi-object tracking,” in Proc. IEEE WorkshopMulti-ObjectTracking, 2001.
[53] S. McKenna,Y. Raja,andS. Gong,“Trackingcolourobjectsusingadaptive mixturemodels,” ImageandVision ComputingJournal, vol. 17,pp.223–229,1999.
[54] R. Merwe, A. Doucet, N. Freitas,and E. Wan, “The unscentedparticle filter,” TechnicalReportCUED/F-INFENG/TR380,CambridgeUniversityEngineeringDepartment,2000.
[55] K. Nickels and S. Hutchinson,“Estimatinguncertaintyin SSD-basedfeaturetracking,” Image andVisionComputing, vol. 20,pp.47–58,2002.
[56] C. Olson, “Image registrationby aligning entropies,” in Proc. IEEE Conf. on ComputerVision andPatternRecognition,Kauai,Hawaii, volumeII, 2001,pp.331–336.
[57] P. Perez,C.Hue,J.Vermaak,andM. Gangnet,“Color-basedprobabilistictracking,” in Proc.EuropeanConf. on ComputerVision,Copenhagen,Denmark,volumeI, 2002,pp.661–675.
[58] W. Press,S.Teukolsky, W. Vetterling,andB. Flannery, NumericalRecipesin C. CambridgeUniversityPress,secondedition,1992.
[59] J.Puzicha,Y. Rubner, C.Tomasi,andJ.Buhmann,“Empirical evaluationof dissimilaritymeasuresforcolorandtexture,” in Proc.7th Intl. Conf. onComputerVision,Kerkyra,Greece,1999,pp.1165–1173.
[60] L. Rabiner, “A tutorial on hiddenMarkov modelsandselectedapplicationsin speechrecognition,”Proceedingsof theIEEE, vol. 77,no.2, pp.257–285,1989.
[61] C. Rao,A. Yilmaz,andM. Shah,“View-invariantrepresentationandrecognitionof actions,” Intl. J. ofComputerVision, vol. 50,no.2, p. To appear, 2003.
[62] C. RasmussenandG. Hager, “Probabilisticdataassociationmethodsfor trackingcomplex visualob-jects,” IEEE Trans.PatternAnal.MachineIntell., vol. 23,no.6, pp.560–576,2001.
[63] D. Reid,“An algorithmfor trackingmultiple targets,” IEEETrans.AutomaticControl, vol. AC-24,pp.843–854,1979.
29
[64] A. Roche,G.Malandain,andN. Ayache,“Unifying maximumlikelihoodapproachesin medicalimageregistration,” TechnicalReport3741,INRIA, 1999.
[65] R. RosalesandS.Sclaroff, “3D trajectoryrecovery for trackingmultipleobjectsandtrajectoryguidedrecognitionof actions,” in Proc.IEEEConf. onComputerVisionandPatternRecognition,Fort Collins,CO,1999,pp.117–123.
[66] Y. Rui andY. Chen,“Better proposaldistributions: Objecttrackingusingunscentedparticlefilter,” inProc. IEEE Conf. on ComputerVision andPatternRecognition, Kauai,Hawaii, volumeII, 2001,pp.786–793.
[67] S.Sclaroff andJ. Isidoro,“Active blobs,” in Proc.6th Intl. Conf. on ComputerVision,Bombay, India,1998,pp.1146–1153.
[68] D. W. Scott,MultivariateDensityEstimation. Wiley, 1992.
[69] C. SminchisescuandB. Triggs, “Covariancescaledsamplingfor monocular3D body tracking,” inProc. IEEE Conf. on ComputerVision and Pattern Recognition, Kauai,Hawaii, volumeI, 2001,pp.447–454.
[70] J.SullivanandJ.Rittscher, “Guiding randomparticlesby deterministicsearch,” in Proc.8thIntl. Conf.on ComputerVision,Vancouver, Canada,volumeI, 2001,pp.323–330.
[71] M. SwainandD. Ballard,“Color indexing,” Intl. J. of ComputerVision, vol. 7, no.1, pp.11–32,1991.
[72] P. Viola and W. Wells, “Alignment by maximizationof mutual information,” Intl. J. of ComputerVision, vol. 24,no.2, pp.137–154,1997.
[73] S. WachterandH. Nagel,“Trackingpersonsin monocularimagesequences,” ComputerVision andImage Understanding, vol. 74,no.3, pp.174–192,1999.
[74] R. Wildes, R. Kumar, H. Sawhney, S. Samasekera, S. Hsu, H. Tao, Y. Guo, K. Hanna,A. Pope,D. Hirvonen,M. Hansen,andP. Burt, “Aerial videosurveillanceandexploitation,” Proceedingsof theIEEE, vol. 89,no.10,pp.1518–1539,2001.
[75] C. Wren, A. Azarbayejani,T. Darrell, andA. Pentland,“Pfinder: Real-timetrackingof the humanbody,” IEEE Trans.PatternAnal.MachineIntell., vol. 19,pp.780–785,1997.
[76] Y. Wu andT. Huang,“A co-inferenceapproachto robusttracking,” in Proc.8thIntl. Conf. onComputerVision,Vancouver, Canada,volumeII, 2001,pp.26–33.
[77] F. Xu andK. Fujimura,“Pedestriandetectionandtrackingwith nightvision,” in Proc.IEEEIntelligentVehicleSymposium,Versailles,France,2002.
[78] A. Yilmaz, K. Shafique,N. Lobo, X. Li, T. Olson,andM. Shah,“Target trackingin FLIR imageryusingmeanshift andglobal motion compensation,” in IEEE Workshopon ComputerVision BeyondVisible Spectrum,Kauai,Hawaii, 2001.
[79] “Real-timetrackingof non-rigidobjectsusingmeanshift.” US patentpending,2000.
30