research article automatic person identification in camera

Research ArticleAutomatic Person Identification in Camera Video byMotion Correlation

Dingbo Duan1 Guangyu Gao2 Chi Harold Liu2 and Jian Ma1

1 Beijing University of Posts and Telecommunications Beijing 100876 China2 Beijing Institute of Technology Beijing 100081 China

Correspondence should be addressed to Chi Harold Liu liuchi02gmailcom

Received 23 February 2014 Revised 13 May 2014 Accepted 13 May 2014 Published 3 June 2014

Academic Editor Eugenio Martinelli

Copyright copy 2014 Dingbo Duan et al This is an open access article distributed under the Creative Commons Attribution Licensewhich permits unrestricted use distribution and reproduction in any medium provided the original work is properly cited

Person identification plays an important role in semantic analysis of video content This paper presents a novel method toautomatically label persons in video sequence captured from fixed camera Instead of leveraging traditional face recognitionapproaches we deal with the task of person identification by fusing information frommotion sensor platforms like smart phonescarried on human bodies and extracted from camera video More specifically a sequence of motion features extracted from cameravideo are compared with each of those collected from accelerometers of smart phonesWhen strong correlation is detected identityinformation transmitted from the corresponding smart phone is used to identify the phone wearer To test the feasibility andefficiency of the proposed method extensive experiments are conducted which achieved impressive performance

1 Introduction

With the rapid growth in storage devices networks andcompression techniques large-scale video data have becomeavailable to more and more ordinary users Thus it alsobecomes a challenging task to search and browse desirabledata according to content in large video datasets Generallyperson information is one of the most important semanticclueswhen people are recalling video contents Consequentlyperson identification is crucial for content based videosummary and retrieval

The main purpose of person identification is to associateeach subject that appears in video clips with a real personHowever to manually label all subjects that appear in alarge-scale video archive is labor intensive time consumingand prohibitively expensive To deal with this automaticface detection [1ndash3] and face recognition (FR) [4ndash7] wereintroduced However traditional FR methods are still farfrom supporting practical and reliable automatic personidentification even when just a limited number of peopleappear in the video This is mainly due to the fact that onlyappearance information (eg color shape and texture) ofa single face image is used to determine the identity of asubject Specifically variation in illumination pose and face

expression as well as partial or total face occlusion could allmake recognition an extremely difficult task

The main contributions of the proposed method areas follows First this method provides an alternative waytowards automatic person identification by the integrationof a new sensing model This integration broadens thedomain of semantic analysis of video content and will becatalyzed by the growing popularity of wearable devicesand concurrent advance in personal sensing technology andubiquitous computing Second the method is fully automaticwithout any need for establishing a predefinedmodel or needfor user interaction in the process of person identificationMoreover the independence of any recognition techniquemakes the proposed method more robust with respect toissues mentioned above which degrade the efficiency andaccuracy of FR techniques Last but not least the simplicityand computational efficiency of the method make it possibleto plug into real-time systems

2 Related Work

To improve the performance of person identification con-textual information was utilized in recent research Authors

Hindawi Publishing CorporationJournal of SensorsVolume 2014 Article ID 838751 8 pageshttpdxdoiorg1011552014838751

2 Journal of Sensors

in [8] proposed a framework exploiting heterogeneouscontextual information including clothing activity humanattributes gait and people cooccurrence together with facialfeatures to recognize a person in low quality video dataNevertheless it suffers the difficulty in discerning multiplepersons resembling each other in clothing color or actionView angle and subject-to-camera distance were integratedto identify person in video by fusion of gait and face in[9] only in situations when people walk along a straightpath with five quantized angles Temporal spatial and socialcontext information was also employed in conjunction withlow level feature analysis to annotate person in personal andfamily photo collections [10ndash14] in which only static imagesare dealt with Moreover in all these methods a predefinedmodel has to be trained to start the identification process andthe performance is limited by the quality and scale of trainingsets

In contrast to the above efforts we propose a novelmethod to automatically identify person in video usinghuman motion pattern We argue that in the field of view(FOV) of a fixed camera motion pattern of human body isunique Under this assumption except for visual analysis wealso analyze the motion pattern of human body measuredby sensor modules in smart phone In this paper we usesmart phones equippedwith 3-axis accelerometers carried onhuman bodies to collect and transmit acceleration informa-tion and identity information By analyzing the correlationbetween motion features extracted from two different typesof sensing the problem of person identification is properlyhandled simply and accurately

The remainder of the paper is organized as followsSection 3 details the proposed method In Section 4 exper-iments are conducted and results are discussed Concludingremarks are placed in Section 5

3 General Framework

A flowchart of the proposed method is depicted in Figure 1As can be seen visual features of human body are firstextracted to track people across different video framesThen optical flows of potential human body are estimatedand segmented using the previously obtained body featuresMeanwhile accelerometer measurements from smart phoneson human bodies are transmitted and collected together withidentity information Motion features are calculated fromboth optical flow and acceleration measurements in a slidingwindow stylewhichwas depicted in Section 33Whenpeopledisappear from video sequences correlation analysis startsthe annotation process Details of the method are illustratedin the following subsections

31 Camera Data Acquisition First of all background sub-traction (BGS) which is widely adopted in moving objectsdetection in video is utilized in our method The mainidea of BGS is to detect moving objects from the differencebetween current frame and a reference frame often calledldquobackground imagerdquo or ldquobackground modelrdquo [15] In thissubsection we need to detect image patches corresponding to

Raw frame

Visual feature extraction

Optical flow estimation

Motion feature calculation

No

Yes

Acceleration

Out

Person identification

Figure 1 Flowchart of the proposed method

potential human bodies moving around in the camera FOVTo this end an algorithmof adaptiveGaussianmixturemodel[16 17] is employed to segment foreground patches Thisalgorithm represents each pixel by a mixture of Gaussians tobuild a robust background model in run time

When people enter into the camera FOV image patchescorresponding to potential human bodies are extracted andtracked by descriptors composed of patch ID color his-tograms and patch mass center in Algorithm 1 Moreoverwe also include the frame index of first and last appearanceof each patch in the descriptor in order to facilitate personannotation

For patch 119901 obtained from BGS we try to associate 119901to previous patch descriptors Histogram similarity betweenpatches from consecutive frames is first analyzed Normallyimage patches corresponding to the same subject are moresimilar to each other than those of different subjects Thecomparison of color histogram of paths used in Algorithm 1is defined in (1) The range of 119904(119867

119886 119867119887) is [minus1 1] The larger

119904(119867119886 119867119887) the more similar patches 119886 and 119887 Then from the

set of similar descriptors of 119901 the nearest one is selected totrack 119901 in terms of horizontal movement of patch center

119904 (119867119886 119867119887) =

sum119873

119894=1(119867119886(119894) minus 119867

119886) (119867119887(119894) minus 119867

119887)

radicsum119873

119894=1(119867119886(119894) minus 119867

119886)2

sum119873

119894=1(119867119887(119894) minus 119867

119887)2

(1)

119867 =1

119873

119873

sum

119894=1

119867(119894) (2)

where119873 is number of bins in histogram119867For each patch 119901 we employ optical flow method to esti-

mate motion pattern [18] and approximate patch accelerationas mean of vertical acceleration of keypoints within it asdefined in

119910 acc119901=

1

119872

119872

sum

119894=1

119910 acc119894 (3)

where 119910 acc119894is the second order derivative of 119910 coordinate

of keypoints with respect to time 119872 is the total number ofkeypoints within patch 119901

Journal of Sensors 3

Variables119901119863119890119904119888 patch descriptor 119901119863119890119904119888 = 119894119889 119891119903119886119898119890119878119905119886119903119905 119891119903119886119898119890119864119899119889 119888119890119899119905119890119903 ℎ119894119904119905119901119863119890119904119888119904 an array of patch descriptors119901119862119900119906119899119905 patch descriptor counter initialized to zero119891119903119886119898119890119868119889119909 frame counter initialized to zero119894119889 the ID of a patch119891119903119886119898119890119878119905119886119903119905 119891119903119886119898119890119864119899119889 frame index of first and last appearance of a patch119888119890119899119905119890119903 ℎ119894119904119905 119886119888119888119904 center and color histogram of a patch119904thr 119889thr 119886thr thresholds for histogram similarity patch distance and patch area 119886thr ge 0 and 119889thr ge 0 0 le 119904thr le 1Procedure

(1) Grab a video frame119891119903119886119898119890119868119889119909 = 119891119903119886119898119890119868119889119909 + 1

(2) Optical flow estimation(3) Background subtraction(4) for Each patch in current frame do(5) Calculate 119901119860119903119890119886 119901119862119890119899119905119890119903 119901119867119894119904119905 119901119882119894119889119905ℎ

(6) If 119901119860119903119890119886 lt 119886thr then(7) continue(8) end if(9) 119901119863119890119904119888119904

lowast

= 0

(10) for all 119901119863119890119904119888 isin 119901119863119890119904119888119904 do(11) if 119901119863119890119904119888 sdot 119891119903119886119898119890119864119899119889 + 1 == 119891119903119886119898119890119868119889119909 and 119904(119901119867119894119904119905 119901119863119890119904119888 sdot ℎ119894119904119905) ge 119904thr then(12) 119901119863119890119904119888119904

lowast

= 119901119863119890119904119888 cup 119901119863119890119904119888119904lowast

(13) end if(14) end for(15) 119889min = 119889thr lowast 119901119882119894119889119905ℎ 119901119863119890119904119888min = 119899119906119897119897

(16) for all 119901119863119890119904119888 isin 119901119863119890119904119888119904lowast do(17) 119889

119901=1003816100381610038161003816119901119862119890119899119905119890119903 sdot 119909 minus 119901119863119890119904119888 sdot 119888119890119899119905119890119903 sdot 119909

1003816100381610038161003816(18) if 119889

119901lt 119889min then

(19) 119889min = 119889119901 119901119863119890119904119888min = 119901119863119890119904119888

(20) end if(21) end for(22) if 119901119863119890119904119888min is 119899119906119897119897 then(23) 119901119863119890119904119888min = 119901119862119900119906119899119905 119891119903119886119898119890119868119889119909 119891119903119886119898119890119868119889119909 119901119862119890119899119905119890119903 119901119867119894119904119905

119901119863119890119904119888119904 = 119901119863119890119904119888119904 cup 119901119863119890119904119888119904min

119901119862119900119906119899119905 = 119901119862119900119906119899119905 + 1

(24) else(25) 119901119863119890119904119888min sdot 119891119903119886119898119890119864119899119889 = 119891119903119886119898119890119868119889119909

119901119863119890119904119888min sdot 119888119890119899119905119890119903 = 119901119862119890119899119905119890119903

119901119863119890119904119888min sdot ℎ119894119904119905 = 119901119867119894119904119905

(26) end if(27) Calculate and save vertical acceleration for 119901119863119890119904119888min(28) end for

Algorithm 1 Patch tracking and motion estimation

Pseudocode of patch tracking and motion estimation islisted in Algorithm 1

32 Accelerometer Measurements Collection In this subsec-tion we depict the procedure of acceleration measurementscollection using wearable sensors Android smart phonesequipped with 3-axis accelerometers are utilized as sensingplatforms For the three component accelerometer readingsonly the one with largest absolute mean value is analyzed inour experiments due to its best reflection of vertical motionpattern of human bodyThree different placements are testedand compared in order to assess impacts of different phoneplacements on accuracy of motion collection In each test a

participant performs a set of activities randomly includingstanding walking and jumping while carrying three smartphones on body with two phones placed in chest pocket andjacket side pocket respectively and one attached to waistbelt as shown in Figure 2 Results illustrated in Figure 3qualitatively show that all three types of placement couldcorrectly capture vertical motion feature of the participantwithminor acceptable discrepancyThis testmakes the choiceof phone attachment more flexible and unobtrusive

33 Feature Extraction and Person Identification Noisy rawmotion measurements of different sample frequency pre-viously obtained from different sensor sources cannot be


Figure 2 Attachment of smart phones to human body From left toright jacket side pocket chest pocket and belt attachment

0 5 10 15 20 25 30Time (s)

BeltChest Jacket

Jumping StandingWalking

Figure 3 Acceleration measurements from three ways of phoneattachment for the above mentioned activities

compared directly Instead standard deviation and energy[19 20] are employed as motion features for comparisonafter noise suppression and data cleansing Energy is definedas sum of squared discrete FFT component magnitudes ofdata samples and divided by sample count for normalizationThese features are computed in a sliding window of length 119905

119908

with 1199051199082 overlapping between consecutive windows Feature

extraction on sliding windows with 50 percent overlappinghas demonstrated its success in [21]

120588 (119883 119884) =cov (119883 119884)120590119883120590119884

(4)

To find out whether 119901 represents a human body correlationanalysis is conducted As a matter of fact motion featuresextracted from video frames are supposed to be positivelylinear with those from accelerometer measurements of thesame subject We adopt correlation coefficient to reliablymeasure strength of linear relationship as defined in (4)where119883 and119884 aremotion features to be compared cov(119883 119884)the covariance and 120590

119883and 120590

119884the standard deviation of 119883

and 119884 120588 ranges from minus1 to 1 inclusively where 0 indicatesno linear relationship +1 indicates a perfect positive linearrelationship and minus1 indicates a perfect negative linear rela-tionship The larger 120588(119883 119884) the more correlated 119883 and 119884In our case motion features of 119901 are compared with eachof those extracted from smart phones in the same periodof time Identity information of smart phone correspondingto the largest positive correlation coefficient is utilized toidentify 119901

4 Experiments and Discussions

In this section we conduct detailed experiments in varioussituations to optimize Algorithm 1 and evaluate the proposedperson identification algorithm We use a digital camera andtwo Android smart phones for data collection A simpleGUI application is created to start and stop data collectionon phones Acceleration measurements are recorded andsaved in text files on phone SD card and later accessedvia USB Video clips are recorded in the format of mp4files at a resolution of 640 times 480 15 frames per secondThe timestamps of video frames and accelerometer readingsare well synchronized before the experiment Algorithm 1 isimplemented based on OpenCV library and tested on anIntel 34GHz platform runningUbuntu 1304We recruit twoparticipants labeled as A and B respectively to take partin our experiments and place smart phones in jacket sidepockets We choose four different scenarios to perform ourexperiments including outdoor near field outdoor far fieldindoor near field indoor far field as illustrated in Figure 8In near field situations the subjects moved around within ascope about five meters away from the cameraThe silhouetteheight of human body is not less than half of the image heightand human face could be clearly distinguished In far fieldsituations the subjects moved around about twenty metersawaywhere detailed visual features of human body aremostlylost and body height in image is not more than thirty pixelsIn each scenario we repeated the experiment four times andeach lasts about five minutes In all we collect sixteen videoclips and thirty-two text files of acceleration measurements

41 Tracking Optimization Patch tracking is an essential stepfor motion estimation from camera video and directly affectsaccuracy and robustness of subsequent person identificationAs listed in Algorithm 1 the aim of patch tracking is toestimate motion measurements for each patch that appearedin video frames In the ideal case a subject is continuouslytracked in camera video by only one descriptor duringthe whole experiment and we could extract a sequence ofacceleration measurements closest to that collected from thesmart phone in terms of time duration while in the worstcase we have to create new descriptors for all patches ineach frame and the number of descriptors used for tackinga subject is as many as that of the frames of his appearanceWe present a metric in (5) to measure the performance ofAlgorithm 1 The metric 119871(119894) is defined as a ratio betweennumber of subjects in a video clip and number of descriptorsused for tracking the subjects The range of 119871(119894) is (0 1] Thelarger 119871(119894) the better the tracking performanceMoreover wealso provide a metric to evaluate tracking accuracy as shownin (6) Accurate descriptor means that a descriptor tracksonly one subject during its lifetimeThe larger119870(119894) the moreaccurate Algorithm 1

119871 (119894) =subjects in video 119894

descriptors in 119894 (5)

119870 (119894) =accurate descriptors in video 119894



0

008

007

006

005

004

003

002

001

0

L

01 02 03 04 05 06 07 08 09 1

02

04

06

08

1

12

14

K

dLthr = 001 d

Kthr = 001

dKthr = 005

dKthr = 01

dKthr = 05

dKthr = 1

dKthr = 5

dLthr = 01

dLthr = 05

dLthr = 1

dLthr = 5

dLthr = 005

sthr

(a) Outdoor near field

01 02 03 04 05 06 07 08 09 1

sthr

012

01

008

006

004

002

0

15

1

05

0

dLthr = 001 d

Kthr = 001

dKthr = 005

dKthr = 01

dKthr = 05

dKthr = 1

dKthr = 5

dLthr = 01

dLthr = 05

dLthr = 1

dLthr = 5

dLthr = 005

L K

(b) Indoor near field

0005

001

0

0015

002

0025

0930940950960970980991

01 02 03 04 05 06 07 08 09 1

sthr

L K

dLthr = 001 d

Kthr = 001

dKthr = 005

dKthr = 01

dKthr = 05

dKthr = 1

dKthr = 5

dLthr = 01

dLthr = 05

dLthr = 1

dLthr = 5

dLthr = 005

(c) Outdoor far field

00005001

0015002

0025003

0035004

0045005

086088090920940960981

01 02 03 04 05 06 07 08 09 1

sthr

L K

dLthr = 001 d

Kthr = 001

dKthr = 005

dKthr = 01

dKthr = 05

dKthr = 1

dKthr = 5

dLthr = 01

dLthr = 05

dLthr = 1

dLthr = 5

dLthr = 005

(d) Indoor far field

Figure 4 Tracking performance and performance in the four scenarios with different values of 119889thr and 119904thr

As depicted in Algorithm 1 three parameters 119886thr 119889thrand 119904thr affect 119871 and 119870 119886thr indicates minimum area of apatch that potentially represents a subject Patches with anarea less than 119886thr are filtered out Generally in a specificapplication scenario the value of 119886thr could be figured outempirically In our experiments we set it to 150 which worksfine 119904thr specifies a minimum histogram similarity betweencurrent patch 119901 and potential descriptors of 119901 Each activedescriptor that satisfies this requirement is tested in terms ofhorizontal distance to 119901 119889thr stipulates a distance thresholdto rule out inappropriate alternative descriptors A nearestdescriptor satisfying this threshold is selected to track 119901 if itexits Otherwise we create a new descriptor for 119901 Moreovermany interference factors in the scenario including poorlighting condition similar clothing color to the backgroundincidental shadow of human body and unpredictable motionpattern of subjects like fast turning and crossing would alsopose negative effects to patch tracking process To rule outimpacts of these factors and optimize patch tracking fromeach of the four scenarios we select a representative video clip

and runAlgorithm 1 over the videowith different 119904thr and119889thrResulted 119871 and119870 are illustrated in Figure 4 Extracted framesfrom video clips with labeled patches are listed in Figure 8

Due to different motion patterns of the subjects 119871 mayvary among video clips of different scenarios However fromFigure 4 we can conclude that 119871 drops dramatically when119904thr gt 08 in near field scenario and 119904thr gt 02 in far fieldscenario with 119889thr ge 01This ismainly caused by backgroundsubtraction noises Histogram similarity of patches of thesame subject from two consecutive frames is about 08 innear field in this situation In far field scenarios with relativelysmaller foreground patches the negative impacts becomemore severe and threshold similarity degrades to 02 Patchesof the same subject are associated with different descriptorswhen histogram similarity is beyond these thresholds When119889thr lt 01 the worst case occurs We need to create newdescriptors for patches in every frame as horizontal distancebetween patches of the same subject from two consecutiveframes is mostly beyond this limit As 119889thr increases 119871increases and converges at 119889thr = 5


1 2 3 4 5 6 7 8 9 10 11

Stan

dard

dev

iatio

n

Feature sample

p

AB

Figure 5 Standard deviation of accelerations of patch119901 and subjectsA and B

In near field scenarios Algorithm 1 achieves 100 percentaccuracy with whatever 119904thr and 119889thr while in far fieldscenarios it does not perform so perfectly when 119889thr ge 01

and 119904thr le 02 In the experiments we found that thishappened mostly in situations when subjects were close andthe patch of one subject lost in the following frame

To balance 119871 and 119870 we set 119889thr = 5 119904thr = 02run Algorithm 1 over the sixteen video clips and collectmotion measurements for person identification in the fol-lowing experiments Statistics of the obtained descriptors areillustrated in Figure 7

42 Person Identification When motion measurements col-lection from video finished we obtain a set of patch descrip-tors and each descriptor associates with a time series ofacceleration data of a potential subject Some descriptorswithin the set come with short series of motion data usuallyless than ten frames This is possibly caused by subjectscrossing each other fake foreground from flashing lights fastturning of human body moving objects at the edge of cameraFOV and so forth These insufficient and noisy data fail toreflect actual motion pattern of potential subjects and arefiltered out in the first place As shown in Figure 7 there arecomparatively more noisy descriptors in far field scenariosespecially in outdoor far field scenarios where nearly 50percent of descriptors are ruled out in each video

Then we calculate a sequence of motion features for eachdescriptor and compare the feature sequence with each ofthose obtained from smart phones in the same period oftime Sliding window in motion feature calculation is closelyrelated to subjects and application scenarios It should belarge enough to capture the distinctive pattern of subjectmovement but not too large to confuse different ones Inour experiments we set window size to 1 second empiricallyMotion features from an example patch descriptor and thosefrom the two smart phones in the same period are shownin Figures 5 and 6 where we could conclude that patch 119901

represents subject B during its lifetimeThe total number of accurately identified patch descrip-

tors in each video is listed in Figure 7 The proposedmethod achieves comparatively better performance in nearfield environment where we can capture more accurate androbust motion measurements of human bodyThe worst case

1 2 3 4 5 6 7 8 9 10 11Feature sample

Ener

gy

ABp

Figure 6 Energy of accelerations of patch 119901 and subjects A and B

2 4 6 8 10 12 14 16Video

3731

28

38

3026

40

3030

383230

242020

241816

201616

221615

80

5245

82

5046

80

5348

84

5047

40

21

12

40

20

10

41

20

12

40

18

8

Total descriptorsFiltered descriptorsAccurately tagged descriptors

Des

crip

tor c

ount

Figure 7 Obtained descriptors with optimized parameters filtereddescriptors and accurately identified descriptors where 1ndash4 repre-sent the outdoor near field 5ndash8 represent the indoor near field 9ndash12represent the outdoor far field and 13ndash16 represent the indoor farfield

happens in outdoor far field scenario In this case there areless optical flows within each patch and less frames asso-ciated with each descriptor We save the mapping betweenpatch descriptors and their estimated identity and rerunAlgorithm 1with the same parameter configuration as beforeThe obtained patch identity is labeled in the video right afterpatch ID As illustrated in Figure 8 the proposed methodcould maintain comparatively acceptable performance evenunder adverse situations

5 Conclusions

In this paper we propose a novel method for automaticperson identification The method innovatively leveragescorrelation of body motion features from two different sens-ing sources that is accelerometer and camera Experimentresults demonstrate the performance and accuracy of theproposed method However the proposed method is limitedin the following aspects First users have to register andcarry their smart phones in order to be discernable in cameraFOVs Second we assume that phones stay relatively still withhuman body during the experiments but in practice peopletend to take out and check their phones from time to timeAcceleration data collected during these occasions woulddamage the identification accuracy Besides the method


(a) (b) (c) (d)

(e) (f) (g) (h)

(i) (j) (k) (l)

(m) (n) (o) (p)

Figure 8 Screenshots of identification results where (a)ndash(d) represent the outdoor near field (e)ndash(h) represent the indoor near field (i)ndash(l)represent the outdoor far field and (m)ndash(p) represent the indoor far field

relies heavily on background subtraction in the process ofpatch trackingThus amore practical and reliable strategy formotion data collection is needed Third subjects in archivedvideo clips without available contextual motion informationcannot be identified using the proposed method Thereforethis method only works at the time of video capture In thefuture we plan to overcome the aforementioned constraintsand extend the application of the proposedmethod intomorecomplex environments

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgment

This work was supported in part by the National NaturalScience Foundation of China (Grant no 61202436 Grant no61271041 and Grant no 61300179)

References

[1] P Vadakkepat P Lim L C de Silva L Jing and L L LingldquoMultimodal approach to human-face detection and trackingrdquoIEEE Transactions on Industrial Electronics vol 55 no 3 pp1385ndash1393 2008

[2] C Zhang and Z Zhang ldquoA survey of recent advances in facedetectionrdquo Tech Rep Microsoft Research 2010

[3] C Huang H Ai Y Li and S Lao ldquoHigh-performance rotationinvariant multiview face detectionrdquo IEEE Transactions on Pat-tern Analysis and Machine Intelligence vol 29 no 4 pp 671ndash686 2007


[4] JWright A Y Yang A Ganesh S S Sastry and YMa ldquoRobustface recognition via sparse representationrdquo IEEE Transactionson Pattern Analysis and Machine Intelligence vol 31 no 2 pp210ndash227 2009

[5] T Ahonen A Hadid and M Pietikainen ldquoFace descriptionwith local binary patterns application to face recognitionrdquo IEEETransactions on Pattern Analysis and Machine Intelligence vol28 no 12 pp 2037ndash2041 2006

[6] G Shakhnarovich and B Moghaddam ldquoFace recognitionin subspacesrdquo in Handbook of Face Recognition pp 19ndash49Springer 2011

[7] I Naseem R Togneri and M Bennamoun ldquoLinear regressionfor face recognitionrdquo IEEE Transactions on Pattern Analysis andMachine Intelligence vol 32 no 11 pp 2106ndash2112 2010

[8] L Zhang D V Kalashnikov S Mehrotra and R VaisenbergldquoContext-based person identification framework for smartvideo surveillancerdquoMachine Vision and Applications 2013

[9] X Geng K Smith-Miles L Wang M Li and QWu ldquoContext-aware fusion a case study on fusion of gait and face for humanidentification in videordquo Pattern Recognition vol 43 no 10 pp3660ndash3673 2010

[10] N OrsquoHare and A F Smeaton ldquoContext-aware person identi-fication in personal photo collectionsrdquo IEEE Transactions onMultimedia vol 11 no 2 pp 220ndash228 2009

[11] Z Stone T Zickler and T Darrell ldquoAutotagging FacebookSocial network context improves photo annotationrdquo in Pro-ceedings of the IEEE Computer Society Conference on ComputerVision and Pattern Recognition Workshops (CVPR rsquo08) pp 1ndash8June 2008

[12] D Anguelov K-C Lee S B Gokturk and B SumengenldquoContextual identity recognition in personal photo albumsrdquoin Proceedings of the IEEE Computer Society Conference onComputer Vision and Pattern Recognition (CVPR rsquo07) pp 1ndash7June 2007

[13] M Naaman R B Yeh H Garcia-Molina and A PaepckeldquoLeveraging context to resolve identity in photo albumsrdquo inProceedings of the 5th ACMIEEE Joint Conference on DigitalLibrariesmdashDigital Libraries Cyberinfrastructure for Researchand Education pp 178ndash187 June 2005

[14] M Zhao Y W Teo S Liu T S Chua and R Jain ldquoAutomaticperson annotation of family photo albumrdquo in Image and VideoRetrieval pp 163ndash172 Springer 2006

[15] M Piccardi ldquoBackground subtraction techniques a reviewrdquo inProceedings of the IEEE International Conference on SystemsMan and Cybernetics (SMC rsquo04) vol 4 pp 3099ndash3104 October2004

[16] Z Zivkovic ldquoImproved adaptive Gaussian mixture model forbackground subtractionrdquo inProceedings of the 17th InternationalConference on Pattern Recognition (ICPR rsquo04) vol 2 pp 28ndash31IEEE August 2004

[17] P KaewTraKulPong and R Bowden ldquoAn improved adaptivebackground mixture model for real-time tracking with shadowdetectionrdquo in Video-Based Surveillance Systems pp 135ndash144Springer 2002

[18] G Farneback ldquoTwo-framemotion estimation based on polyno-mial expansionrdquo in Image Analysis pp 363ndash370 Springer 2003

[19] J R Kwapisz G M Weiss and S A Moore ldquoActivityrecognition using cell phone accelerometersrdquo ACM SIGKDDExplorations Newsletter vol 12 no 2 pp 74ndash82 2011

[20] S Dernbach B Das N C Krishnan B L Thomas and D JCook ldquoSimple and complex activity recognition through smart

phonesrdquo in Proceedings of the 8th International Conference onIntelligent Environments (IE rsquo12) pp 214ndash221 IEEE June 2012

[21] N Ravi N Dandekar P Mysore and M L Littman ldquoActivityrecognition from accelerometer datardquo in Proceedings of the20th National Conference on Artificial Intelligence and the17th Innovative Applications of Artificial Intelligence Conference(AAAIIAAI rsquo05) pp 1541ndash1546 July 2005

International Journal of

AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014


Active and Passive Electronic Components

Control Scienceand Engineering

Journal of



RotatingMachinery


Hindawi Publishing Corporation httpwwwhindawicom

Journal ofEngineeringVolume 2014

Submit your manuscripts athttpwwwhindawicom

VLSI Design



Shock and Vibration


Civil EngineeringAdvances in

Acoustics and VibrationAdvances in



Electrical and Computer Engineering

Journal of

Advances inOptoElectronics


Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of


Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014


Chemical EngineeringInternational Journal of Antennas and

Propagation




Navigation and Observation



DistributedSensor Networks



in [8] proposed a framework exploiting heterogeneouscontextual information including clothing activity humanattributes gait and people cooccurrence together with facialfeatures to recognize a person in low quality video dataNevertheless it suffers the difficulty in discerning multiplepersons resembling each other in clothing color or actionView angle and subject-to-camera distance were integratedto identify person in video by fusion of gait and face in[9] only in situations when people walk along a straightpath with five quantized angles Temporal spatial and socialcontext information was also employed in conjunction withlow level feature analysis to annotate person in personal andfamily photo collections [10ndash14] in which only static imagesare dealt with Moreover in all these methods a predefinedmodel has to be trained to start the identification process andthe performance is limited by the quality and scale of trainingsets

In contrast to the above efforts we propose a novelmethod to automatically identify person in video usinghuman motion pattern We argue that in the field of view(FOV) of a fixed camera motion pattern of human body isunique Under this assumption except for visual analysis wealso analyze the motion pattern of human body measuredby sensor modules in smart phone In this paper we usesmart phones equippedwith 3-axis accelerometers carried onhuman bodies to collect and transmit acceleration informa-tion and identity information By analyzing the correlationbetween motion features extracted from two different typesof sensing the problem of person identification is properlyhandled simply and accurately

The remainder of the paper is organized as followsSection 3 details the proposed method In Section 4 exper-iments are conducted and results are discussed Concludingremarks are placed in Section 5

3 General Framework

A flowchart of the proposed method is depicted in Figure 1As can be seen visual features of human body are firstextracted to track people across different video framesThen optical flows of potential human body are estimatedand segmented using the previously obtained body featuresMeanwhile accelerometer measurements from smart phoneson human bodies are transmitted and collected together withidentity information Motion features are calculated fromboth optical flow and acceleration measurements in a slidingwindow stylewhichwas depicted in Section 33Whenpeopledisappear from video sequences correlation analysis startsthe annotation process Details of the method are illustratedin the following subsections

31 Camera Data Acquisition First of all background sub-traction (BGS) which is widely adopted in moving objectsdetection in video is utilized in our method The mainidea of BGS is to detect moving objects from the differencebetween current frame and a reference frame often calledldquobackground imagerdquo or ldquobackground modelrdquo [15] In thissubsection we need to detect image patches corresponding to

Raw frame

Visual feature extraction

Optical flow estimation

Motion feature calculation

No

Yes

Acceleration

Out

Person identification

Figure 1 Flowchart of the proposed method

potential human bodies moving around in the camera FOVTo this end an algorithmof adaptiveGaussianmixturemodel[16 17] is employed to segment foreground patches Thisalgorithm represents each pixel by a mixture of Gaussians tobuild a robust background model in run time

When people enter into the camera FOV image patchescorresponding to potential human bodies are extracted andtracked by descriptors composed of patch ID color his-tograms and patch mass center in Algorithm 1 Moreoverwe also include the frame index of first and last appearanceof each patch in the descriptor in order to facilitate personannotation

For patch 119901 obtained from BGS we try to associate 119901to previous patch descriptors Histogram similarity betweenpatches from consecutive frames is first analyzed Normallyimage patches corresponding to the same subject are moresimilar to each other than those of different subjects Thecomparison of color histogram of paths used in Algorithm 1is defined in (1) The range of 119904(119867

119886 119867119887) is [minus1 1] The larger

119904(119867119886 119867119887) the more similar patches 119886 and 119887 Then from the

set of similar descriptors of 119901 the nearest one is selected totrack 119901 in terms of horizontal movement of patch center

119904 (119867119886 119867119887) =

sum119873

119894=1(119867119886(119894) minus 119867

119886) (119867119887(119894) minus 119867

119887)

radicsum119873

119894=1(119867119886(119894) minus 119867

119886)2

sum119873

119894=1(119867119887(119894) minus 119867

119887)2

(1)

119867 =1

119873

119873

sum

119894=1

119867(119894) (2)

where119873 is number of bins in histogram119867For each patch 119901 we employ optical flow method to esti-

mate motion pattern [18] and approximate patch accelerationas mean of vertical acceleration of keypoints within it asdefined in

119910 acc119901=

1

119872

119872

sum

119894=1

119910 acc119894 (3)

where 119910 acc119894is the second order derivative of 119910 coordinate

of keypoints with respect to time 119872 is the total number ofkeypoints within patch 119901






lowast

= 0


lowast

= 119901119863119890119904119888 cup 119901119863119890119904119888119904lowast




1003816100381610038161003816(18) if 119889

119901lt 119889min then

(19) 119889min = 119889119901 119901119863119890119904119888min = 119901119863119890119904119888


119901119863119890119904119888119904 = 119901119863119890119904119888119904 cup 119901119863119890119904119888119904min

119901119862119900119906119899119905 = 119901119862119900119906119899119905 + 1

(24) else(25) 119901119863119890119904119888min sdot 119891119903119886119898119890119864119899119889 = 119891119903119886119898119890119868119889119909

119901119863119890119904119888min sdot 119888119890119899119905119890119903 = 119901119862119890119899119905119890119903

119901119863119890119904119888min sdot ℎ119894119904119905 = 119901119867119894119904119905









0 5 10 15 20 25 30Time (s)

BeltChest Jacket




119908



120588 (119883 119884) =cov (119883 119884)120590119883120590119884

(4)


119883and 120590











0

008

007

006

005

004

003

002

001

0

L

01 02 03 04 05 06 07 08 09 1

02

04

06

08

1

12

14

K

dLthr = 001 d

Kthr = 001

dKthr = 005

dKthr = 01

dKthr = 05

dKthr = 1

dKthr = 5

dLthr = 01

dLthr = 05

dLthr = 1

dLthr = 5

dLthr = 005

sthr


01 02 03 04 05 06 07 08 09 1

sthr

012

01

008

006

004

002

0

15

1

05

0

dLthr = 001 d

Kthr = 001

dKthr = 005

dKthr = 01

dKthr = 05

dKthr = 1

dKthr = 5

dLthr = 01

dLthr = 05

dLthr = 1

dLthr = 5

dLthr = 005

L K


0005

001

0

0015

002

0025

0930940950960970980991

01 02 03 04 05 06 07 08 09 1

sthr

L K

dLthr = 001 d

Kthr = 001

dKthr = 005

dKthr = 01

dKthr = 05

dKthr = 1

dKthr = 5

dLthr = 01

dLthr = 05

dLthr = 1

dLthr = 5

dLthr = 005


00005001

0015002

0025003

0035004

0045005

086088090920940960981

01 02 03 04 05 06 07 08 09 1

sthr

L K

dLthr = 001 d

Kthr = 001

dKthr = 005

dKthr = 01

dKthr = 05

dKthr = 1

dKthr = 5

dLthr = 01

dLthr = 05

dLthr = 1

dLthr = 5

dLthr = 005







1 2 3 4 5 6 7 8 9 10 11

Stan

dard

dev

iatio

n

Feature sample

p

AB









1 2 3 4 5 6 7 8 9 10 11Feature sample

Ener

gy

ABp


2 4 6 8 10 12 14 16Video

3731

28

38

3026

40

3030

383230

242020

241816

201616

221615

80

5245

82

5046

80

5348

84

5047

40

21

12

40

20

10

41

20

12

40

18

8


Des

crip

tor c

ount



5 Conclusions



(a) (b) (c) (d)

(e) (f) (g) (h)

(i) (j) (k) (l)

(m) (n) (o) (p)





Acknowledgment


References


























RoboticsJournal of





Journal of



RotatingMachinery





VLSI Design



Shock and Vibration







Journal of



Volume 2014


SensorsJournal of





Propagation














lowast

= 0


lowast

= 119901119863119890119904119888 cup 119901119863119890119904119888119904lowast




1003816100381610038161003816(18) if 119889

119901lt 119889min then

(19) 119889min = 119889119901 119901119863119890119904119888min = 119901119863119890119904119888


119901119863119890119904119888119904 = 119901119863119890119904119888119904 cup 119901119863119890119904119888119904min

119901119862119900119906119899119905 = 119901119862119900119906119899119905 + 1

(24) else(25) 119901119863119890119904119888min sdot 119891119903119886119898119890119864119899119889 = 119891119903119886119898119890119868119889119909

119901119863119890119904119888min sdot 119888119890119899119905119890119903 = 119901119862119890119899119905119890119903

119901119863119890119904119888min sdot ℎ119894119904119905 = 119901119867119894119904119905









0 5 10 15 20 25 30Time (s)

BeltChest Jacket




119908



120588 (119883 119884) =cov (119883 119884)120590119883120590119884

(4)


119883and 120590











0

008

007

006

005

004

003

002

001

0

L

01 02 03 04 05 06 07 08 09 1

02

04

06

08

1

12

14

K

dLthr = 001 d

Kthr = 001

dKthr = 005

dKthr = 01

dKthr = 05

dKthr = 1

dKthr = 5

dLthr = 01

dLthr = 05

dLthr = 1

dLthr = 5

dLthr = 005

sthr


01 02 03 04 05 06 07 08 09 1

sthr

012

01

008

006

004

002

0

15

1

05

0

dLthr = 001 d

Kthr = 001

dKthr = 005

dKthr = 01

dKthr = 05

dKthr = 1

dKthr = 5

dLthr = 01

dLthr = 05

dLthr = 1

dLthr = 5

dLthr = 005

L K


0005

001

0

0015

002

0025

0930940950960970980991

01 02 03 04 05 06 07 08 09 1

sthr

L K

dLthr = 001 d

Kthr = 001

dKthr = 005

dKthr = 01

dKthr = 05

dKthr = 1

dKthr = 5

dLthr = 01

dLthr = 05

dLthr = 1

dLthr = 5

dLthr = 005


00005001

0015002

0025003

0035004

0045005

086088090920940960981

01 02 03 04 05 06 07 08 09 1

sthr

L K

dLthr = 001 d

Kthr = 001

dKthr = 005

dKthr = 01

dKthr = 05

dKthr = 1

dKthr = 5

dLthr = 01

dLthr = 05

dLthr = 1

dLthr = 5

dLthr = 005







1 2 3 4 5 6 7 8 9 10 11

Stan

dard

dev

iatio

n

Feature sample

p

AB









1 2 3 4 5 6 7 8 9 10 11Feature sample

Ener

gy

ABp


2 4 6 8 10 12 14 16Video

3731

28

38

3026

40

3030

383230

242020

241816

201616

221615

80

5245

82

5046

80

5348

84

5047

40

21

12

40

20

10

41

20

12

40

18

8


Des

crip

tor c

ount



5 Conclusions



(a) (b) (c) (d)

(e) (f) (g) (h)

(i) (j) (k) (l)

(m) (n) (o) (p)





Acknowledgment


References


























RoboticsJournal of





Journal of



RotatingMachinery





VLSI Design



Shock and Vibration







Journal of



Volume 2014


SensorsJournal of





Propagation











0 5 10 15 20 25 30Time (s)

BeltChest Jacket




119908



120588 (119883 119884) =cov (119883 119884)120590119883120590119884

(4)


119883and 120590











0

008

007

006

005

004

003

002

001

0

L

01 02 03 04 05 06 07 08 09 1

02

04

06

08

1

12

14

K

dLthr = 001 d

Kthr = 001

dKthr = 005

dKthr = 01

dKthr = 05

dKthr = 1

dKthr = 5

dLthr = 01

dLthr = 05

dLthr = 1

dLthr = 5

dLthr = 005

sthr


01 02 03 04 05 06 07 08 09 1

sthr

012

01

008

006

004

002

0

15

1

05

0

dLthr = 001 d

Kthr = 001

dKthr = 005

dKthr = 01

dKthr = 05

dKthr = 1

dKthr = 5

dLthr = 01

dLthr = 05

dLthr = 1

dLthr = 5

dLthr = 005

L K


0005

001

0

0015

002

0025

0930940950960970980991

01 02 03 04 05 06 07 08 09 1

sthr

L K

dLthr = 001 d

Kthr = 001

dKthr = 005

dKthr = 01

dKthr = 05

dKthr = 1

dKthr = 5

dLthr = 01

dLthr = 05

dLthr = 1

dLthr = 5

dLthr = 005


00005001

0015002

0025003

0035004

0045005

086088090920940960981

01 02 03 04 05 06 07 08 09 1

sthr

L K

dLthr = 001 d

Kthr = 001

dKthr = 005

dKthr = 01

dKthr = 05

dKthr = 1

dKthr = 5

dLthr = 01

dLthr = 05

dLthr = 1

dLthr = 5

dLthr = 005







1 2 3 4 5 6 7 8 9 10 11

Stan

dard

dev

iatio

n

Feature sample

p

AB









1 2 3 4 5 6 7 8 9 10 11Feature sample

Ener

gy

ABp


2 4 6 8 10 12 14 16Video

3731

28

38

3026

40

3030

383230

242020

241816

201616

221615

80

5245

82

5046

80

5348

84

5047

40

21

12

40

20

10

41

20

12

40

18

8


Des

crip

tor c

ount



5 Conclusions



(a) (b) (c) (d)

(e) (f) (g) (h)

(i) (j) (k) (l)

(m) (n) (o) (p)





Acknowledgment


References


























RoboticsJournal of





Journal of



RotatingMachinery





VLSI Design



Shock and Vibration







Journal of



Volume 2014


SensorsJournal of





Propagation










0

008

007

006

005

004

003

002

001

0

L

01 02 03 04 05 06 07 08 09 1

02

04

06

08

1

12

14

K

dLthr = 001 d

Kthr = 001

dKthr = 005

dKthr = 01

dKthr = 05

dKthr = 1

dKthr = 5

dLthr = 01

dLthr = 05

dLthr = 1

dLthr = 5

dLthr = 005

sthr


01 02 03 04 05 06 07 08 09 1

sthr

012

01

008

006

004

002

0

15

1

05

0

dLthr = 001 d

Kthr = 001

dKthr = 005

dKthr = 01

dKthr = 05

dKthr = 1

dKthr = 5

dLthr = 01

dLthr = 05

dLthr = 1

dLthr = 5

dLthr = 005

L K


0005

001

0

0015

002

0025

0930940950960970980991

01 02 03 04 05 06 07 08 09 1

sthr

L K

dLthr = 001 d

Kthr = 001

dKthr = 005

dKthr = 01

dKthr = 05

dKthr = 1

dKthr = 5

dLthr = 01

dLthr = 05

dLthr = 1

dLthr = 5

dLthr = 005


00005001

0015002

0025003

0035004

0045005

086088090920940960981

01 02 03 04 05 06 07 08 09 1

sthr

L K

dLthr = 001 d

Kthr = 001

dKthr = 005

dKthr = 01

dKthr = 05

dKthr = 1

dKthr = 5

dLthr = 01

dLthr = 05

dLthr = 1

dLthr = 5

dLthr = 005







1 2 3 4 5 6 7 8 9 10 11

Stan

dard

dev

iatio

n

Feature sample

p

AB









1 2 3 4 5 6 7 8 9 10 11Feature sample

Ener

gy

ABp


2 4 6 8 10 12 14 16Video

3731

28

38

3026

40

3030

383230

242020

241816

201616

221615

80

5245

82

5046

80

5348

84

5047

40

21

12

40

20

10

41

20

12

40

18

8


Des

crip

tor c

ount



5 Conclusions



(a) (b) (c) (d)

(e) (f) (g) (h)

(i) (j) (k) (l)

(m) (n) (o) (p)





Acknowledgment


References


























RoboticsJournal of





Journal of



RotatingMachinery





VLSI Design



Shock and Vibration







Journal of



Volume 2014


SensorsJournal of





Propagation










1 2 3 4 5 6 7 8 9 10 11

Stan

dard

dev

iatio

n

Feature sample

p

AB









1 2 3 4 5 6 7 8 9 10 11Feature sample

Ener

gy

ABp


2 4 6 8 10 12 14 16Video

3731

28

38

3026

40

3030

383230

242020

241816

201616

221615

80

5245

82

5046

80

5348

84

5047

40

21

12

40

20

10

41

20

12

40

18

8


Des

crip

tor c

ount



5 Conclusions



(a) (b) (c) (d)

(e) (f) (g) (h)

(i) (j) (k) (l)

(m) (n) (o) (p)





Acknowledgment


References


























RoboticsJournal of





Journal of



RotatingMachinery





VLSI Design



Shock and Vibration







Journal of



Volume 2014


SensorsJournal of





Propagation










(a) (b) (c) (d)

(e) (f) (g) (h)

(i) (j) (k) (l)

(m) (n) (o) (p)





Acknowledgment


References


























RoboticsJournal of





Journal of



RotatingMachinery





VLSI Design



Shock and Vibration







Journal of



Volume 2014


SensorsJournal of





Propagation































RoboticsJournal of





Journal of



RotatingMachinery





VLSI Design



Shock and Vibration







Journal of



Volume 2014


SensorsJournal of





Propagation









research article automatic person identification in camera

Documents