research article automatic person identification in camera
TRANSCRIPT
Research ArticleAutomatic Person Identification in Camera Video byMotion Correlation
Dingbo Duan1 Guangyu Gao2 Chi Harold Liu2 and Jian Ma1
1 Beijing University of Posts and Telecommunications Beijing 100876 China2 Beijing Institute of Technology Beijing 100081 China
Correspondence should be addressed to Chi Harold Liu liuchi02gmailcom
Received 23 February 2014 Revised 13 May 2014 Accepted 13 May 2014 Published 3 June 2014
Academic Editor Eugenio Martinelli
Copyright copy 2014 Dingbo Duan et al This is an open access article distributed under the Creative Commons Attribution Licensewhich permits unrestricted use distribution and reproduction in any medium provided the original work is properly cited
Person identification plays an important role in semantic analysis of video content This paper presents a novel method toautomatically label persons in video sequence captured from fixed camera Instead of leveraging traditional face recognitionapproaches we deal with the task of person identification by fusing information frommotion sensor platforms like smart phonescarried on human bodies and extracted from camera video More specifically a sequence of motion features extracted from cameravideo are compared with each of those collected from accelerometers of smart phonesWhen strong correlation is detected identityinformation transmitted from the corresponding smart phone is used to identify the phone wearer To test the feasibility andefficiency of the proposed method extensive experiments are conducted which achieved impressive performance
1 Introduction
With the rapid growth in storage devices networks andcompression techniques large-scale video data have becomeavailable to more and more ordinary users Thus it alsobecomes a challenging task to search and browse desirabledata according to content in large video datasets Generallyperson information is one of the most important semanticclueswhen people are recalling video contents Consequentlyperson identification is crucial for content based videosummary and retrieval
The main purpose of person identification is to associateeach subject that appears in video clips with a real personHowever to manually label all subjects that appear in alarge-scale video archive is labor intensive time consumingand prohibitively expensive To deal with this automaticface detection [1ndash3] and face recognition (FR) [4ndash7] wereintroduced However traditional FR methods are still farfrom supporting practical and reliable automatic personidentification even when just a limited number of peopleappear in the video This is mainly due to the fact that onlyappearance information (eg color shape and texture) ofa single face image is used to determine the identity of asubject Specifically variation in illumination pose and face
expression as well as partial or total face occlusion could allmake recognition an extremely difficult task
The main contributions of the proposed method areas follows First this method provides an alternative waytowards automatic person identification by the integrationof a new sensing model This integration broadens thedomain of semantic analysis of video content and will becatalyzed by the growing popularity of wearable devicesand concurrent advance in personal sensing technology andubiquitous computing Second the method is fully automaticwithout any need for establishing a predefinedmodel or needfor user interaction in the process of person identificationMoreover the independence of any recognition techniquemakes the proposed method more robust with respect toissues mentioned above which degrade the efficiency andaccuracy of FR techniques Last but not least the simplicityand computational efficiency of the method make it possibleto plug into real-time systems
2 Related Work
To improve the performance of person identification con-textual information was utilized in recent research Authors
Hindawi Publishing CorporationJournal of SensorsVolume 2014 Article ID 838751 8 pageshttpdxdoiorg1011552014838751
2 Journal of Sensors
in [8] proposed a framework exploiting heterogeneouscontextual information including clothing activity humanattributes gait and people cooccurrence together with facialfeatures to recognize a person in low quality video dataNevertheless it suffers the difficulty in discerning multiplepersons resembling each other in clothing color or actionView angle and subject-to-camera distance were integratedto identify person in video by fusion of gait and face in[9] only in situations when people walk along a straightpath with five quantized angles Temporal spatial and socialcontext information was also employed in conjunction withlow level feature analysis to annotate person in personal andfamily photo collections [10ndash14] in which only static imagesare dealt with Moreover in all these methods a predefinedmodel has to be trained to start the identification process andthe performance is limited by the quality and scale of trainingsets
In contrast to the above efforts we propose a novelmethod to automatically identify person in video usinghuman motion pattern We argue that in the field of view(FOV) of a fixed camera motion pattern of human body isunique Under this assumption except for visual analysis wealso analyze the motion pattern of human body measuredby sensor modules in smart phone In this paper we usesmart phones equippedwith 3-axis accelerometers carried onhuman bodies to collect and transmit acceleration informa-tion and identity information By analyzing the correlationbetween motion features extracted from two different typesof sensing the problem of person identification is properlyhandled simply and accurately
The remainder of the paper is organized as followsSection 3 details the proposed method In Section 4 exper-iments are conducted and results are discussed Concludingremarks are placed in Section 5
3 General Framework
A flowchart of the proposed method is depicted in Figure 1As can be seen visual features of human body are firstextracted to track people across different video framesThen optical flows of potential human body are estimatedand segmented using the previously obtained body featuresMeanwhile accelerometer measurements from smart phoneson human bodies are transmitted and collected together withidentity information Motion features are calculated fromboth optical flow and acceleration measurements in a slidingwindow stylewhichwas depicted in Section 33Whenpeopledisappear from video sequences correlation analysis startsthe annotation process Details of the method are illustratedin the following subsections
31 Camera Data Acquisition First of all background sub-traction (BGS) which is widely adopted in moving objectsdetection in video is utilized in our method The mainidea of BGS is to detect moving objects from the differencebetween current frame and a reference frame often calledldquobackground imagerdquo or ldquobackground modelrdquo [15] In thissubsection we need to detect image patches corresponding to
Raw frame
Visual feature extraction
Optical flow estimation
Motion feature calculation
No
Yes
Acceleration
Out
Person identification
Figure 1 Flowchart of the proposed method
potential human bodies moving around in the camera FOVTo this end an algorithmof adaptiveGaussianmixturemodel[16 17] is employed to segment foreground patches Thisalgorithm represents each pixel by a mixture of Gaussians tobuild a robust background model in run time
When people enter into the camera FOV image patchescorresponding to potential human bodies are extracted andtracked by descriptors composed of patch ID color his-tograms and patch mass center in Algorithm 1 Moreoverwe also include the frame index of first and last appearanceof each patch in the descriptor in order to facilitate personannotation
For patch 119901 obtained from BGS we try to associate 119901to previous patch descriptors Histogram similarity betweenpatches from consecutive frames is first analyzed Normallyimage patches corresponding to the same subject are moresimilar to each other than those of different subjects Thecomparison of color histogram of paths used in Algorithm 1is defined in (1) The range of 119904(119867
119886 119867119887) is [minus1 1] The larger
119904(119867119886 119867119887) the more similar patches 119886 and 119887 Then from the
set of similar descriptors of 119901 the nearest one is selected totrack 119901 in terms of horizontal movement of patch center
119904 (119867119886 119867119887) =
sum119873
119894=1(119867119886(119894) minus 119867
119886) (119867119887(119894) minus 119867
119887)
radicsum119873
119894=1(119867119886(119894) minus 119867
119886)2
sum119873
119894=1(119867119887(119894) minus 119867
119887)2
(1)
119867 =1
119873
119873
sum
119894=1
119867(119894) (2)
where119873 is number of bins in histogram119867For each patch 119901 we employ optical flow method to esti-
mate motion pattern [18] and approximate patch accelerationas mean of vertical acceleration of keypoints within it asdefined in
119910 acc119901=
1
119872
119872
sum
119894=1
119910 acc119894 (3)
where 119910 acc119894is the second order derivative of 119910 coordinate
of keypoints with respect to time 119872 is the total number ofkeypoints within patch 119901
Journal of Sensors 3
Variables119901119863119890119904119888 patch descriptor 119901119863119890119904119888 = 119894119889 119891119903119886119898119890119878119905119886119903119905 119891119903119886119898119890119864119899119889 119888119890119899119905119890119903 ℎ119894119904119905119901119863119890119904119888119904 an array of patch descriptors119901119862119900119906119899119905 patch descriptor counter initialized to zero119891119903119886119898119890119868119889119909 frame counter initialized to zero119894119889 the ID of a patch119891119903119886119898119890119878119905119886119903119905 119891119903119886119898119890119864119899119889 frame index of first and last appearance of a patch119888119890119899119905119890119903 ℎ119894119904119905 119886119888119888119904 center and color histogram of a patch119904thr 119889thr 119886thr thresholds for histogram similarity patch distance and patch area 119886thr ge 0 and 119889thr ge 0 0 le 119904thr le 1Procedure
(1) Grab a video frame119891119903119886119898119890119868119889119909 = 119891119903119886119898119890119868119889119909 + 1
(2) Optical flow estimation(3) Background subtraction(4) for Each patch in current frame do(5) Calculate 119901119860119903119890119886 119901119862119890119899119905119890119903 119901119867119894119904119905 119901119882119894119889119905ℎ
(6) If 119901119860119903119890119886 lt 119886thr then(7) continue(8) end if(9) 119901119863119890119904119888119904
lowast
= 0
(10) for all 119901119863119890119904119888 isin 119901119863119890119904119888119904 do(11) if 119901119863119890119904119888 sdot 119891119903119886119898119890119864119899119889 + 1 == 119891119903119886119898119890119868119889119909 and 119904(119901119867119894119904119905 119901119863119890119904119888 sdot ℎ119894119904119905) ge 119904thr then(12) 119901119863119890119904119888119904
lowast
= 119901119863119890119904119888 cup 119901119863119890119904119888119904lowast
(13) end if(14) end for(15) 119889min = 119889thr lowast 119901119882119894119889119905ℎ 119901119863119890119904119888min = 119899119906119897119897
(16) for all 119901119863119890119904119888 isin 119901119863119890119904119888119904lowast do(17) 119889
119901=1003816100381610038161003816119901119862119890119899119905119890119903 sdot 119909 minus 119901119863119890119904119888 sdot 119888119890119899119905119890119903 sdot 119909
1003816100381610038161003816(18) if 119889
119901lt 119889min then
(19) 119889min = 119889119901 119901119863119890119904119888min = 119901119863119890119904119888
(20) end if(21) end for(22) if 119901119863119890119904119888min is 119899119906119897119897 then(23) 119901119863119890119904119888min = 119901119862119900119906119899119905 119891119903119886119898119890119868119889119909 119891119903119886119898119890119868119889119909 119901119862119890119899119905119890119903 119901119867119894119904119905
119901119863119890119904119888119904 = 119901119863119890119904119888119904 cup 119901119863119890119904119888119904min
119901119862119900119906119899119905 = 119901119862119900119906119899119905 + 1
(24) else(25) 119901119863119890119904119888min sdot 119891119903119886119898119890119864119899119889 = 119891119903119886119898119890119868119889119909
119901119863119890119904119888min sdot 119888119890119899119905119890119903 = 119901119862119890119899119905119890119903
119901119863119890119904119888min sdot ℎ119894119904119905 = 119901119867119894119904119905
(26) end if(27) Calculate and save vertical acceleration for 119901119863119890119904119888min(28) end for
Algorithm 1 Patch tracking and motion estimation
Pseudocode of patch tracking and motion estimation islisted in Algorithm 1
32 Accelerometer Measurements Collection In this subsec-tion we depict the procedure of acceleration measurementscollection using wearable sensors Android smart phonesequipped with 3-axis accelerometers are utilized as sensingplatforms For the three component accelerometer readingsonly the one with largest absolute mean value is analyzed inour experiments due to its best reflection of vertical motionpattern of human bodyThree different placements are testedand compared in order to assess impacts of different phoneplacements on accuracy of motion collection In each test a
participant performs a set of activities randomly includingstanding walking and jumping while carrying three smartphones on body with two phones placed in chest pocket andjacket side pocket respectively and one attached to waistbelt as shown in Figure 2 Results illustrated in Figure 3qualitatively show that all three types of placement couldcorrectly capture vertical motion feature of the participantwithminor acceptable discrepancyThis testmakes the choiceof phone attachment more flexible and unobtrusive
33 Feature Extraction and Person Identification Noisy rawmotion measurements of different sample frequency pre-viously obtained from different sensor sources cannot be
4 Journal of Sensors
Figure 2 Attachment of smart phones to human body From left toright jacket side pocket chest pocket and belt attachment
0 5 10 15 20 25 30Time (s)
BeltChest Jacket
Jumping StandingWalking
Figure 3 Acceleration measurements from three ways of phoneattachment for the above mentioned activities
compared directly Instead standard deviation and energy[19 20] are employed as motion features for comparisonafter noise suppression and data cleansing Energy is definedas sum of squared discrete FFT component magnitudes ofdata samples and divided by sample count for normalizationThese features are computed in a sliding window of length 119905
119908
with 1199051199082 overlapping between consecutive windows Feature
extraction on sliding windows with 50 percent overlappinghas demonstrated its success in [21]
120588 (119883 119884) =cov (119883 119884)120590119883120590119884
(4)
To find out whether 119901 represents a human body correlationanalysis is conducted As a matter of fact motion featuresextracted from video frames are supposed to be positivelylinear with those from accelerometer measurements of thesame subject We adopt correlation coefficient to reliablymeasure strength of linear relationship as defined in (4)where119883 and119884 aremotion features to be compared cov(119883 119884)the covariance and 120590
119883and 120590
119884the standard deviation of 119883
and 119884 120588 ranges from minus1 to 1 inclusively where 0 indicatesno linear relationship +1 indicates a perfect positive linearrelationship and minus1 indicates a perfect negative linear rela-tionship The larger 120588(119883 119884) the more correlated 119883 and 119884In our case motion features of 119901 are compared with eachof those extracted from smart phones in the same periodof time Identity information of smart phone correspondingto the largest positive correlation coefficient is utilized toidentify 119901
4 Experiments and Discussions
In this section we conduct detailed experiments in varioussituations to optimize Algorithm 1 and evaluate the proposedperson identification algorithm We use a digital camera andtwo Android smart phones for data collection A simpleGUI application is created to start and stop data collectionon phones Acceleration measurements are recorded andsaved in text files on phone SD card and later accessedvia USB Video clips are recorded in the format of mp4files at a resolution of 640 times 480 15 frames per secondThe timestamps of video frames and accelerometer readingsare well synchronized before the experiment Algorithm 1 isimplemented based on OpenCV library and tested on anIntel 34GHz platform runningUbuntu 1304We recruit twoparticipants labeled as A and B respectively to take partin our experiments and place smart phones in jacket sidepockets We choose four different scenarios to perform ourexperiments including outdoor near field outdoor far fieldindoor near field indoor far field as illustrated in Figure 8In near field situations the subjects moved around within ascope about five meters away from the cameraThe silhouetteheight of human body is not less than half of the image heightand human face could be clearly distinguished In far fieldsituations the subjects moved around about twenty metersawaywhere detailed visual features of human body aremostlylost and body height in image is not more than thirty pixelsIn each scenario we repeated the experiment four times andeach lasts about five minutes In all we collect sixteen videoclips and thirty-two text files of acceleration measurements
41 Tracking Optimization Patch tracking is an essential stepfor motion estimation from camera video and directly affectsaccuracy and robustness of subsequent person identificationAs listed in Algorithm 1 the aim of patch tracking is toestimate motion measurements for each patch that appearedin video frames In the ideal case a subject is continuouslytracked in camera video by only one descriptor duringthe whole experiment and we could extract a sequence ofacceleration measurements closest to that collected from thesmart phone in terms of time duration while in the worstcase we have to create new descriptors for all patches ineach frame and the number of descriptors used for tackinga subject is as many as that of the frames of his appearanceWe present a metric in (5) to measure the performance ofAlgorithm 1 The metric 119871(119894) is defined as a ratio betweennumber of subjects in a video clip and number of descriptorsused for tracking the subjects The range of 119871(119894) is (0 1] Thelarger 119871(119894) the better the tracking performanceMoreover wealso provide a metric to evaluate tracking accuracy as shownin (6) Accurate descriptor means that a descriptor tracksonly one subject during its lifetimeThe larger119870(119894) the moreaccurate Algorithm 1
119871 (119894) =subjects in video 119894
descriptors in 119894 (5)
119870 (119894) =accurate descriptors in video 119894
descriptors in 119894 (6)
Journal of Sensors 5
0
008
007
006
005
004
003
002
001
0
L
01 02 03 04 05 06 07 08 09 1
02
04
06
08
1
12
14
K
dLthr = 001 d
Kthr = 001
dKthr = 005
dKthr = 01
dKthr = 05
dKthr = 1
dKthr = 5
dLthr = 01
dLthr = 05
dLthr = 1
dLthr = 5
dLthr = 005
sthr
(a) Outdoor near field
01 02 03 04 05 06 07 08 09 1
sthr
012
01
008
006
004
002
0
15
1
05
0
dLthr = 001 d
Kthr = 001
dKthr = 005
dKthr = 01
dKthr = 05
dKthr = 1
dKthr = 5
dLthr = 01
dLthr = 05
dLthr = 1
dLthr = 5
dLthr = 005
L K
(b) Indoor near field
0005
001
0
0015
002
0025
0930940950960970980991
01 02 03 04 05 06 07 08 09 1
sthr
L K
dLthr = 001 d
Kthr = 001
dKthr = 005
dKthr = 01
dKthr = 05
dKthr = 1
dKthr = 5
dLthr = 01
dLthr = 05
dLthr = 1
dLthr = 5
dLthr = 005
(c) Outdoor far field
00005001
0015002
0025003
0035004
0045005
086088090920940960981
01 02 03 04 05 06 07 08 09 1
sthr
L K
dLthr = 001 d
Kthr = 001
dKthr = 005
dKthr = 01
dKthr = 05
dKthr = 1
dKthr = 5
dLthr = 01
dLthr = 05
dLthr = 1
dLthr = 5
dLthr = 005
(d) Indoor far field
Figure 4 Tracking performance and performance in the four scenarios with different values of 119889thr and 119904thr
As depicted in Algorithm 1 three parameters 119886thr 119889thrand 119904thr affect 119871 and 119870 119886thr indicates minimum area of apatch that potentially represents a subject Patches with anarea less than 119886thr are filtered out Generally in a specificapplication scenario the value of 119886thr could be figured outempirically In our experiments we set it to 150 which worksfine 119904thr specifies a minimum histogram similarity betweencurrent patch 119901 and potential descriptors of 119901 Each activedescriptor that satisfies this requirement is tested in terms ofhorizontal distance to 119901 119889thr stipulates a distance thresholdto rule out inappropriate alternative descriptors A nearestdescriptor satisfying this threshold is selected to track 119901 if itexits Otherwise we create a new descriptor for 119901 Moreovermany interference factors in the scenario including poorlighting condition similar clothing color to the backgroundincidental shadow of human body and unpredictable motionpattern of subjects like fast turning and crossing would alsopose negative effects to patch tracking process To rule outimpacts of these factors and optimize patch tracking fromeach of the four scenarios we select a representative video clip
and runAlgorithm 1 over the videowith different 119904thr and119889thrResulted 119871 and119870 are illustrated in Figure 4 Extracted framesfrom video clips with labeled patches are listed in Figure 8
Due to different motion patterns of the subjects 119871 mayvary among video clips of different scenarios However fromFigure 4 we can conclude that 119871 drops dramatically when119904thr gt 08 in near field scenario and 119904thr gt 02 in far fieldscenario with 119889thr ge 01This ismainly caused by backgroundsubtraction noises Histogram similarity of patches of thesame subject from two consecutive frames is about 08 innear field in this situation In far field scenarios with relativelysmaller foreground patches the negative impacts becomemore severe and threshold similarity degrades to 02 Patchesof the same subject are associated with different descriptorswhen histogram similarity is beyond these thresholds When119889thr lt 01 the worst case occurs We need to create newdescriptors for patches in every frame as horizontal distancebetween patches of the same subject from two consecutiveframes is mostly beyond this limit As 119889thr increases 119871increases and converges at 119889thr = 5
6 Journal of Sensors
1 2 3 4 5 6 7 8 9 10 11
Stan
dard
dev
iatio
n
Feature sample
p
AB
Figure 5 Standard deviation of accelerations of patch119901 and subjectsA and B
In near field scenarios Algorithm 1 achieves 100 percentaccuracy with whatever 119904thr and 119889thr while in far fieldscenarios it does not perform so perfectly when 119889thr ge 01
and 119904thr le 02 In the experiments we found that thishappened mostly in situations when subjects were close andthe patch of one subject lost in the following frame
To balance 119871 and 119870 we set 119889thr = 5 119904thr = 02run Algorithm 1 over the sixteen video clips and collectmotion measurements for person identification in the fol-lowing experiments Statistics of the obtained descriptors areillustrated in Figure 7
42 Person Identification When motion measurements col-lection from video finished we obtain a set of patch descrip-tors and each descriptor associates with a time series ofacceleration data of a potential subject Some descriptorswithin the set come with short series of motion data usuallyless than ten frames This is possibly caused by subjectscrossing each other fake foreground from flashing lights fastturning of human body moving objects at the edge of cameraFOV and so forth These insufficient and noisy data fail toreflect actual motion pattern of potential subjects and arefiltered out in the first place As shown in Figure 7 there arecomparatively more noisy descriptors in far field scenariosespecially in outdoor far field scenarios where nearly 50percent of descriptors are ruled out in each video
Then we calculate a sequence of motion features for eachdescriptor and compare the feature sequence with each ofthose obtained from smart phones in the same period oftime Sliding window in motion feature calculation is closelyrelated to subjects and application scenarios It should belarge enough to capture the distinctive pattern of subjectmovement but not too large to confuse different ones Inour experiments we set window size to 1 second empiricallyMotion features from an example patch descriptor and thosefrom the two smart phones in the same period are shownin Figures 5 and 6 where we could conclude that patch 119901
represents subject B during its lifetimeThe total number of accurately identified patch descrip-
tors in each video is listed in Figure 7 The proposedmethod achieves comparatively better performance in nearfield environment where we can capture more accurate androbust motion measurements of human bodyThe worst case
1 2 3 4 5 6 7 8 9 10 11Feature sample
Ener
gy
ABp
Figure 6 Energy of accelerations of patch 119901 and subjects A and B
2 4 6 8 10 12 14 16Video
3731
28
38
3026
40
3030
383230
242020
241816
201616
221615
80
5245
82
5046
80
5348
84
5047
40
21
12
40
20
10
41
20
12
40
18
8
Total descriptorsFiltered descriptorsAccurately tagged descriptors
Des
crip
tor c
ount
Figure 7 Obtained descriptors with optimized parameters filtereddescriptors and accurately identified descriptors where 1ndash4 repre-sent the outdoor near field 5ndash8 represent the indoor near field 9ndash12represent the outdoor far field and 13ndash16 represent the indoor farfield
happens in outdoor far field scenario In this case there areless optical flows within each patch and less frames asso-ciated with each descriptor We save the mapping betweenpatch descriptors and their estimated identity and rerunAlgorithm 1with the same parameter configuration as beforeThe obtained patch identity is labeled in the video right afterpatch ID As illustrated in Figure 8 the proposed methodcould maintain comparatively acceptable performance evenunder adverse situations
5 Conclusions
In this paper we propose a novel method for automaticperson identification The method innovatively leveragescorrelation of body motion features from two different sens-ing sources that is accelerometer and camera Experimentresults demonstrate the performance and accuracy of theproposed method However the proposed method is limitedin the following aspects First users have to register andcarry their smart phones in order to be discernable in cameraFOVs Second we assume that phones stay relatively still withhuman body during the experiments but in practice peopletend to take out and check their phones from time to timeAcceleration data collected during these occasions woulddamage the identification accuracy Besides the method
Journal of Sensors 7
(a) (b) (c) (d)
(e) (f) (g) (h)
(i) (j) (k) (l)
(m) (n) (o) (p)
Figure 8 Screenshots of identification results where (a)ndash(d) represent the outdoor near field (e)ndash(h) represent the indoor near field (i)ndash(l)represent the outdoor far field and (m)ndash(p) represent the indoor far field
relies heavily on background subtraction in the process ofpatch trackingThus amore practical and reliable strategy formotion data collection is needed Third subjects in archivedvideo clips without available contextual motion informationcannot be identified using the proposed method Thereforethis method only works at the time of video capture In thefuture we plan to overcome the aforementioned constraintsand extend the application of the proposedmethod intomorecomplex environments
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
Acknowledgment
This work was supported in part by the National NaturalScience Foundation of China (Grant no 61202436 Grant no61271041 and Grant no 61300179)
References
[1] P Vadakkepat P Lim L C de Silva L Jing and L L LingldquoMultimodal approach to human-face detection and trackingrdquoIEEE Transactions on Industrial Electronics vol 55 no 3 pp1385ndash1393 2008
[2] C Zhang and Z Zhang ldquoA survey of recent advances in facedetectionrdquo Tech Rep Microsoft Research 2010
[3] C Huang H Ai Y Li and S Lao ldquoHigh-performance rotationinvariant multiview face detectionrdquo IEEE Transactions on Pat-tern Analysis and Machine Intelligence vol 29 no 4 pp 671ndash686 2007
8 Journal of Sensors
[4] JWright A Y Yang A Ganesh S S Sastry and YMa ldquoRobustface recognition via sparse representationrdquo IEEE Transactionson Pattern Analysis and Machine Intelligence vol 31 no 2 pp210ndash227 2009
[5] T Ahonen A Hadid and M Pietikainen ldquoFace descriptionwith local binary patterns application to face recognitionrdquo IEEETransactions on Pattern Analysis and Machine Intelligence vol28 no 12 pp 2037ndash2041 2006
[6] G Shakhnarovich and B Moghaddam ldquoFace recognitionin subspacesrdquo in Handbook of Face Recognition pp 19ndash49Springer 2011
[7] I Naseem R Togneri and M Bennamoun ldquoLinear regressionfor face recognitionrdquo IEEE Transactions on Pattern Analysis andMachine Intelligence vol 32 no 11 pp 2106ndash2112 2010
[8] L Zhang D V Kalashnikov S Mehrotra and R VaisenbergldquoContext-based person identification framework for smartvideo surveillancerdquoMachine Vision and Applications 2013
[9] X Geng K Smith-Miles L Wang M Li and QWu ldquoContext-aware fusion a case study on fusion of gait and face for humanidentification in videordquo Pattern Recognition vol 43 no 10 pp3660ndash3673 2010
[10] N OrsquoHare and A F Smeaton ldquoContext-aware person identi-fication in personal photo collectionsrdquo IEEE Transactions onMultimedia vol 11 no 2 pp 220ndash228 2009
[11] Z Stone T Zickler and T Darrell ldquoAutotagging FacebookSocial network context improves photo annotationrdquo in Pro-ceedings of the IEEE Computer Society Conference on ComputerVision and Pattern Recognition Workshops (CVPR rsquo08) pp 1ndash8June 2008
[12] D Anguelov K-C Lee S B Gokturk and B SumengenldquoContextual identity recognition in personal photo albumsrdquoin Proceedings of the IEEE Computer Society Conference onComputer Vision and Pattern Recognition (CVPR rsquo07) pp 1ndash7June 2007
[13] M Naaman R B Yeh H Garcia-Molina and A PaepckeldquoLeveraging context to resolve identity in photo albumsrdquo inProceedings of the 5th ACMIEEE Joint Conference on DigitalLibrariesmdashDigital Libraries Cyberinfrastructure for Researchand Education pp 178ndash187 June 2005
[14] M Zhao Y W Teo S Liu T S Chua and R Jain ldquoAutomaticperson annotation of family photo albumrdquo in Image and VideoRetrieval pp 163ndash172 Springer 2006
[15] M Piccardi ldquoBackground subtraction techniques a reviewrdquo inProceedings of the IEEE International Conference on SystemsMan and Cybernetics (SMC rsquo04) vol 4 pp 3099ndash3104 October2004
[16] Z Zivkovic ldquoImproved adaptive Gaussian mixture model forbackground subtractionrdquo inProceedings of the 17th InternationalConference on Pattern Recognition (ICPR rsquo04) vol 2 pp 28ndash31IEEE August 2004
[17] P KaewTraKulPong and R Bowden ldquoAn improved adaptivebackground mixture model for real-time tracking with shadowdetectionrdquo in Video-Based Surveillance Systems pp 135ndash144Springer 2002
[18] G Farneback ldquoTwo-framemotion estimation based on polyno-mial expansionrdquo in Image Analysis pp 363ndash370 Springer 2003
[19] J R Kwapisz G M Weiss and S A Moore ldquoActivityrecognition using cell phone accelerometersrdquo ACM SIGKDDExplorations Newsletter vol 12 no 2 pp 74ndash82 2011
[20] S Dernbach B Das N C Krishnan B L Thomas and D JCook ldquoSimple and complex activity recognition through smart
phonesrdquo in Proceedings of the 8th International Conference onIntelligent Environments (IE rsquo12) pp 214ndash221 IEEE June 2012
[21] N Ravi N Dandekar P Mysore and M L Littman ldquoActivityrecognition from accelerometer datardquo in Proceedings of the20th National Conference on Artificial Intelligence and the17th Innovative Applications of Artificial Intelligence Conference(AAAIIAAI rsquo05) pp 1541ndash1546 July 2005
International Journal of
AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Active and Passive Electronic Components
Control Scienceand Engineering
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
RotatingMachinery
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation httpwwwhindawicom
Journal ofEngineeringVolume 2014
Submit your manuscripts athttpwwwhindawicom
VLSI Design
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Shock and Vibration
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Civil EngineeringAdvances in
Acoustics and VibrationAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Advances inOptoElectronics
Hindawi Publishing Corporation httpwwwhindawicom
Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
SensorsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Chemical EngineeringInternational Journal of Antennas and
Propagation
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Navigation and Observation
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
DistributedSensor Networks
International Journal of
2 Journal of Sensors
in [8] proposed a framework exploiting heterogeneouscontextual information including clothing activity humanattributes gait and people cooccurrence together with facialfeatures to recognize a person in low quality video dataNevertheless it suffers the difficulty in discerning multiplepersons resembling each other in clothing color or actionView angle and subject-to-camera distance were integratedto identify person in video by fusion of gait and face in[9] only in situations when people walk along a straightpath with five quantized angles Temporal spatial and socialcontext information was also employed in conjunction withlow level feature analysis to annotate person in personal andfamily photo collections [10ndash14] in which only static imagesare dealt with Moreover in all these methods a predefinedmodel has to be trained to start the identification process andthe performance is limited by the quality and scale of trainingsets
In contrast to the above efforts we propose a novelmethod to automatically identify person in video usinghuman motion pattern We argue that in the field of view(FOV) of a fixed camera motion pattern of human body isunique Under this assumption except for visual analysis wealso analyze the motion pattern of human body measuredby sensor modules in smart phone In this paper we usesmart phones equippedwith 3-axis accelerometers carried onhuman bodies to collect and transmit acceleration informa-tion and identity information By analyzing the correlationbetween motion features extracted from two different typesof sensing the problem of person identification is properlyhandled simply and accurately
The remainder of the paper is organized as followsSection 3 details the proposed method In Section 4 exper-iments are conducted and results are discussed Concludingremarks are placed in Section 5
3 General Framework
A flowchart of the proposed method is depicted in Figure 1As can be seen visual features of human body are firstextracted to track people across different video framesThen optical flows of potential human body are estimatedand segmented using the previously obtained body featuresMeanwhile accelerometer measurements from smart phoneson human bodies are transmitted and collected together withidentity information Motion features are calculated fromboth optical flow and acceleration measurements in a slidingwindow stylewhichwas depicted in Section 33Whenpeopledisappear from video sequences correlation analysis startsthe annotation process Details of the method are illustratedin the following subsections
31 Camera Data Acquisition First of all background sub-traction (BGS) which is widely adopted in moving objectsdetection in video is utilized in our method The mainidea of BGS is to detect moving objects from the differencebetween current frame and a reference frame often calledldquobackground imagerdquo or ldquobackground modelrdquo [15] In thissubsection we need to detect image patches corresponding to
Raw frame
Visual feature extraction
Optical flow estimation
Motion feature calculation
No
Yes
Acceleration
Out
Person identification
Figure 1 Flowchart of the proposed method
potential human bodies moving around in the camera FOVTo this end an algorithmof adaptiveGaussianmixturemodel[16 17] is employed to segment foreground patches Thisalgorithm represents each pixel by a mixture of Gaussians tobuild a robust background model in run time
When people enter into the camera FOV image patchescorresponding to potential human bodies are extracted andtracked by descriptors composed of patch ID color his-tograms and patch mass center in Algorithm 1 Moreoverwe also include the frame index of first and last appearanceof each patch in the descriptor in order to facilitate personannotation
For patch 119901 obtained from BGS we try to associate 119901to previous patch descriptors Histogram similarity betweenpatches from consecutive frames is first analyzed Normallyimage patches corresponding to the same subject are moresimilar to each other than those of different subjects Thecomparison of color histogram of paths used in Algorithm 1is defined in (1) The range of 119904(119867
119886 119867119887) is [minus1 1] The larger
119904(119867119886 119867119887) the more similar patches 119886 and 119887 Then from the
set of similar descriptors of 119901 the nearest one is selected totrack 119901 in terms of horizontal movement of patch center
119904 (119867119886 119867119887) =
sum119873
119894=1(119867119886(119894) minus 119867
119886) (119867119887(119894) minus 119867
119887)
radicsum119873
119894=1(119867119886(119894) minus 119867
119886)2
sum119873
119894=1(119867119887(119894) minus 119867
119887)2
(1)
119867 =1
119873
119873
sum
119894=1
119867(119894) (2)
where119873 is number of bins in histogram119867For each patch 119901 we employ optical flow method to esti-
mate motion pattern [18] and approximate patch accelerationas mean of vertical acceleration of keypoints within it asdefined in
119910 acc119901=
1
119872
119872
sum
119894=1
119910 acc119894 (3)
where 119910 acc119894is the second order derivative of 119910 coordinate
of keypoints with respect to time 119872 is the total number ofkeypoints within patch 119901
Journal of Sensors 3
Variables119901119863119890119904119888 patch descriptor 119901119863119890119904119888 = 119894119889 119891119903119886119898119890119878119905119886119903119905 119891119903119886119898119890119864119899119889 119888119890119899119905119890119903 ℎ119894119904119905119901119863119890119904119888119904 an array of patch descriptors119901119862119900119906119899119905 patch descriptor counter initialized to zero119891119903119886119898119890119868119889119909 frame counter initialized to zero119894119889 the ID of a patch119891119903119886119898119890119878119905119886119903119905 119891119903119886119898119890119864119899119889 frame index of first and last appearance of a patch119888119890119899119905119890119903 ℎ119894119904119905 119886119888119888119904 center and color histogram of a patch119904thr 119889thr 119886thr thresholds for histogram similarity patch distance and patch area 119886thr ge 0 and 119889thr ge 0 0 le 119904thr le 1Procedure
(1) Grab a video frame119891119903119886119898119890119868119889119909 = 119891119903119886119898119890119868119889119909 + 1
(2) Optical flow estimation(3) Background subtraction(4) for Each patch in current frame do(5) Calculate 119901119860119903119890119886 119901119862119890119899119905119890119903 119901119867119894119904119905 119901119882119894119889119905ℎ
(6) If 119901119860119903119890119886 lt 119886thr then(7) continue(8) end if(9) 119901119863119890119904119888119904
lowast
= 0
(10) for all 119901119863119890119904119888 isin 119901119863119890119904119888119904 do(11) if 119901119863119890119904119888 sdot 119891119903119886119898119890119864119899119889 + 1 == 119891119903119886119898119890119868119889119909 and 119904(119901119867119894119904119905 119901119863119890119904119888 sdot ℎ119894119904119905) ge 119904thr then(12) 119901119863119890119904119888119904
lowast
= 119901119863119890119904119888 cup 119901119863119890119904119888119904lowast
(13) end if(14) end for(15) 119889min = 119889thr lowast 119901119882119894119889119905ℎ 119901119863119890119904119888min = 119899119906119897119897
(16) for all 119901119863119890119904119888 isin 119901119863119890119904119888119904lowast do(17) 119889
119901=1003816100381610038161003816119901119862119890119899119905119890119903 sdot 119909 minus 119901119863119890119904119888 sdot 119888119890119899119905119890119903 sdot 119909
1003816100381610038161003816(18) if 119889
119901lt 119889min then
(19) 119889min = 119889119901 119901119863119890119904119888min = 119901119863119890119904119888
(20) end if(21) end for(22) if 119901119863119890119904119888min is 119899119906119897119897 then(23) 119901119863119890119904119888min = 119901119862119900119906119899119905 119891119903119886119898119890119868119889119909 119891119903119886119898119890119868119889119909 119901119862119890119899119905119890119903 119901119867119894119904119905
119901119863119890119904119888119904 = 119901119863119890119904119888119904 cup 119901119863119890119904119888119904min
119901119862119900119906119899119905 = 119901119862119900119906119899119905 + 1
(24) else(25) 119901119863119890119904119888min sdot 119891119903119886119898119890119864119899119889 = 119891119903119886119898119890119868119889119909
119901119863119890119904119888min sdot 119888119890119899119905119890119903 = 119901119862119890119899119905119890119903
119901119863119890119904119888min sdot ℎ119894119904119905 = 119901119867119894119904119905
(26) end if(27) Calculate and save vertical acceleration for 119901119863119890119904119888min(28) end for
Algorithm 1 Patch tracking and motion estimation
Pseudocode of patch tracking and motion estimation islisted in Algorithm 1
32 Accelerometer Measurements Collection In this subsec-tion we depict the procedure of acceleration measurementscollection using wearable sensors Android smart phonesequipped with 3-axis accelerometers are utilized as sensingplatforms For the three component accelerometer readingsonly the one with largest absolute mean value is analyzed inour experiments due to its best reflection of vertical motionpattern of human bodyThree different placements are testedand compared in order to assess impacts of different phoneplacements on accuracy of motion collection In each test a
participant performs a set of activities randomly includingstanding walking and jumping while carrying three smartphones on body with two phones placed in chest pocket andjacket side pocket respectively and one attached to waistbelt as shown in Figure 2 Results illustrated in Figure 3qualitatively show that all three types of placement couldcorrectly capture vertical motion feature of the participantwithminor acceptable discrepancyThis testmakes the choiceof phone attachment more flexible and unobtrusive
33 Feature Extraction and Person Identification Noisy rawmotion measurements of different sample frequency pre-viously obtained from different sensor sources cannot be
4 Journal of Sensors
Figure 2 Attachment of smart phones to human body From left toright jacket side pocket chest pocket and belt attachment
0 5 10 15 20 25 30Time (s)
BeltChest Jacket
Jumping StandingWalking
Figure 3 Acceleration measurements from three ways of phoneattachment for the above mentioned activities
compared directly Instead standard deviation and energy[19 20] are employed as motion features for comparisonafter noise suppression and data cleansing Energy is definedas sum of squared discrete FFT component magnitudes ofdata samples and divided by sample count for normalizationThese features are computed in a sliding window of length 119905
119908
with 1199051199082 overlapping between consecutive windows Feature
extraction on sliding windows with 50 percent overlappinghas demonstrated its success in [21]
120588 (119883 119884) =cov (119883 119884)120590119883120590119884
(4)
To find out whether 119901 represents a human body correlationanalysis is conducted As a matter of fact motion featuresextracted from video frames are supposed to be positivelylinear with those from accelerometer measurements of thesame subject We adopt correlation coefficient to reliablymeasure strength of linear relationship as defined in (4)where119883 and119884 aremotion features to be compared cov(119883 119884)the covariance and 120590
119883and 120590
119884the standard deviation of 119883
and 119884 120588 ranges from minus1 to 1 inclusively where 0 indicatesno linear relationship +1 indicates a perfect positive linearrelationship and minus1 indicates a perfect negative linear rela-tionship The larger 120588(119883 119884) the more correlated 119883 and 119884In our case motion features of 119901 are compared with eachof those extracted from smart phones in the same periodof time Identity information of smart phone correspondingto the largest positive correlation coefficient is utilized toidentify 119901
4 Experiments and Discussions
In this section we conduct detailed experiments in varioussituations to optimize Algorithm 1 and evaluate the proposedperson identification algorithm We use a digital camera andtwo Android smart phones for data collection A simpleGUI application is created to start and stop data collectionon phones Acceleration measurements are recorded andsaved in text files on phone SD card and later accessedvia USB Video clips are recorded in the format of mp4files at a resolution of 640 times 480 15 frames per secondThe timestamps of video frames and accelerometer readingsare well synchronized before the experiment Algorithm 1 isimplemented based on OpenCV library and tested on anIntel 34GHz platform runningUbuntu 1304We recruit twoparticipants labeled as A and B respectively to take partin our experiments and place smart phones in jacket sidepockets We choose four different scenarios to perform ourexperiments including outdoor near field outdoor far fieldindoor near field indoor far field as illustrated in Figure 8In near field situations the subjects moved around within ascope about five meters away from the cameraThe silhouetteheight of human body is not less than half of the image heightand human face could be clearly distinguished In far fieldsituations the subjects moved around about twenty metersawaywhere detailed visual features of human body aremostlylost and body height in image is not more than thirty pixelsIn each scenario we repeated the experiment four times andeach lasts about five minutes In all we collect sixteen videoclips and thirty-two text files of acceleration measurements
41 Tracking Optimization Patch tracking is an essential stepfor motion estimation from camera video and directly affectsaccuracy and robustness of subsequent person identificationAs listed in Algorithm 1 the aim of patch tracking is toestimate motion measurements for each patch that appearedin video frames In the ideal case a subject is continuouslytracked in camera video by only one descriptor duringthe whole experiment and we could extract a sequence ofacceleration measurements closest to that collected from thesmart phone in terms of time duration while in the worstcase we have to create new descriptors for all patches ineach frame and the number of descriptors used for tackinga subject is as many as that of the frames of his appearanceWe present a metric in (5) to measure the performance ofAlgorithm 1 The metric 119871(119894) is defined as a ratio betweennumber of subjects in a video clip and number of descriptorsused for tracking the subjects The range of 119871(119894) is (0 1] Thelarger 119871(119894) the better the tracking performanceMoreover wealso provide a metric to evaluate tracking accuracy as shownin (6) Accurate descriptor means that a descriptor tracksonly one subject during its lifetimeThe larger119870(119894) the moreaccurate Algorithm 1
119871 (119894) =subjects in video 119894
descriptors in 119894 (5)
119870 (119894) =accurate descriptors in video 119894
descriptors in 119894 (6)
Journal of Sensors 5
0
008
007
006
005
004
003
002
001
0
L
01 02 03 04 05 06 07 08 09 1
02
04
06
08
1
12
14
K
dLthr = 001 d
Kthr = 001
dKthr = 005
dKthr = 01
dKthr = 05
dKthr = 1
dKthr = 5
dLthr = 01
dLthr = 05
dLthr = 1
dLthr = 5
dLthr = 005
sthr
(a) Outdoor near field
01 02 03 04 05 06 07 08 09 1
sthr
012
01
008
006
004
002
0
15
1
05
0
dLthr = 001 d
Kthr = 001
dKthr = 005
dKthr = 01
dKthr = 05
dKthr = 1
dKthr = 5
dLthr = 01
dLthr = 05
dLthr = 1
dLthr = 5
dLthr = 005
L K
(b) Indoor near field
0005
001
0
0015
002
0025
0930940950960970980991
01 02 03 04 05 06 07 08 09 1
sthr
L K
dLthr = 001 d
Kthr = 001
dKthr = 005
dKthr = 01
dKthr = 05
dKthr = 1
dKthr = 5
dLthr = 01
dLthr = 05
dLthr = 1
dLthr = 5
dLthr = 005
(c) Outdoor far field
00005001
0015002
0025003
0035004
0045005
086088090920940960981
01 02 03 04 05 06 07 08 09 1
sthr
L K
dLthr = 001 d
Kthr = 001
dKthr = 005
dKthr = 01
dKthr = 05
dKthr = 1
dKthr = 5
dLthr = 01
dLthr = 05
dLthr = 1
dLthr = 5
dLthr = 005
(d) Indoor far field
Figure 4 Tracking performance and performance in the four scenarios with different values of 119889thr and 119904thr
As depicted in Algorithm 1 three parameters 119886thr 119889thrand 119904thr affect 119871 and 119870 119886thr indicates minimum area of apatch that potentially represents a subject Patches with anarea less than 119886thr are filtered out Generally in a specificapplication scenario the value of 119886thr could be figured outempirically In our experiments we set it to 150 which worksfine 119904thr specifies a minimum histogram similarity betweencurrent patch 119901 and potential descriptors of 119901 Each activedescriptor that satisfies this requirement is tested in terms ofhorizontal distance to 119901 119889thr stipulates a distance thresholdto rule out inappropriate alternative descriptors A nearestdescriptor satisfying this threshold is selected to track 119901 if itexits Otherwise we create a new descriptor for 119901 Moreovermany interference factors in the scenario including poorlighting condition similar clothing color to the backgroundincidental shadow of human body and unpredictable motionpattern of subjects like fast turning and crossing would alsopose negative effects to patch tracking process To rule outimpacts of these factors and optimize patch tracking fromeach of the four scenarios we select a representative video clip
and runAlgorithm 1 over the videowith different 119904thr and119889thrResulted 119871 and119870 are illustrated in Figure 4 Extracted framesfrom video clips with labeled patches are listed in Figure 8
Due to different motion patterns of the subjects 119871 mayvary among video clips of different scenarios However fromFigure 4 we can conclude that 119871 drops dramatically when119904thr gt 08 in near field scenario and 119904thr gt 02 in far fieldscenario with 119889thr ge 01This ismainly caused by backgroundsubtraction noises Histogram similarity of patches of thesame subject from two consecutive frames is about 08 innear field in this situation In far field scenarios with relativelysmaller foreground patches the negative impacts becomemore severe and threshold similarity degrades to 02 Patchesof the same subject are associated with different descriptorswhen histogram similarity is beyond these thresholds When119889thr lt 01 the worst case occurs We need to create newdescriptors for patches in every frame as horizontal distancebetween patches of the same subject from two consecutiveframes is mostly beyond this limit As 119889thr increases 119871increases and converges at 119889thr = 5
6 Journal of Sensors
1 2 3 4 5 6 7 8 9 10 11
Stan
dard
dev
iatio
n
Feature sample
p
AB
Figure 5 Standard deviation of accelerations of patch119901 and subjectsA and B
In near field scenarios Algorithm 1 achieves 100 percentaccuracy with whatever 119904thr and 119889thr while in far fieldscenarios it does not perform so perfectly when 119889thr ge 01
and 119904thr le 02 In the experiments we found that thishappened mostly in situations when subjects were close andthe patch of one subject lost in the following frame
To balance 119871 and 119870 we set 119889thr = 5 119904thr = 02run Algorithm 1 over the sixteen video clips and collectmotion measurements for person identification in the fol-lowing experiments Statistics of the obtained descriptors areillustrated in Figure 7
42 Person Identification When motion measurements col-lection from video finished we obtain a set of patch descrip-tors and each descriptor associates with a time series ofacceleration data of a potential subject Some descriptorswithin the set come with short series of motion data usuallyless than ten frames This is possibly caused by subjectscrossing each other fake foreground from flashing lights fastturning of human body moving objects at the edge of cameraFOV and so forth These insufficient and noisy data fail toreflect actual motion pattern of potential subjects and arefiltered out in the first place As shown in Figure 7 there arecomparatively more noisy descriptors in far field scenariosespecially in outdoor far field scenarios where nearly 50percent of descriptors are ruled out in each video
Then we calculate a sequence of motion features for eachdescriptor and compare the feature sequence with each ofthose obtained from smart phones in the same period oftime Sliding window in motion feature calculation is closelyrelated to subjects and application scenarios It should belarge enough to capture the distinctive pattern of subjectmovement but not too large to confuse different ones Inour experiments we set window size to 1 second empiricallyMotion features from an example patch descriptor and thosefrom the two smart phones in the same period are shownin Figures 5 and 6 where we could conclude that patch 119901
represents subject B during its lifetimeThe total number of accurately identified patch descrip-
tors in each video is listed in Figure 7 The proposedmethod achieves comparatively better performance in nearfield environment where we can capture more accurate androbust motion measurements of human bodyThe worst case
1 2 3 4 5 6 7 8 9 10 11Feature sample
Ener
gy
ABp
Figure 6 Energy of accelerations of patch 119901 and subjects A and B
2 4 6 8 10 12 14 16Video
3731
28
38
3026
40
3030
383230
242020
241816
201616
221615
80
5245
82
5046
80
5348
84
5047
40
21
12
40
20
10
41
20
12
40
18
8
Total descriptorsFiltered descriptorsAccurately tagged descriptors
Des
crip
tor c
ount
Figure 7 Obtained descriptors with optimized parameters filtereddescriptors and accurately identified descriptors where 1ndash4 repre-sent the outdoor near field 5ndash8 represent the indoor near field 9ndash12represent the outdoor far field and 13ndash16 represent the indoor farfield
happens in outdoor far field scenario In this case there areless optical flows within each patch and less frames asso-ciated with each descriptor We save the mapping betweenpatch descriptors and their estimated identity and rerunAlgorithm 1with the same parameter configuration as beforeThe obtained patch identity is labeled in the video right afterpatch ID As illustrated in Figure 8 the proposed methodcould maintain comparatively acceptable performance evenunder adverse situations
5 Conclusions
In this paper we propose a novel method for automaticperson identification The method innovatively leveragescorrelation of body motion features from two different sens-ing sources that is accelerometer and camera Experimentresults demonstrate the performance and accuracy of theproposed method However the proposed method is limitedin the following aspects First users have to register andcarry their smart phones in order to be discernable in cameraFOVs Second we assume that phones stay relatively still withhuman body during the experiments but in practice peopletend to take out and check their phones from time to timeAcceleration data collected during these occasions woulddamage the identification accuracy Besides the method
Journal of Sensors 7
(a) (b) (c) (d)
(e) (f) (g) (h)
(i) (j) (k) (l)
(m) (n) (o) (p)
Figure 8 Screenshots of identification results where (a)ndash(d) represent the outdoor near field (e)ndash(h) represent the indoor near field (i)ndash(l)represent the outdoor far field and (m)ndash(p) represent the indoor far field
relies heavily on background subtraction in the process ofpatch trackingThus amore practical and reliable strategy formotion data collection is needed Third subjects in archivedvideo clips without available contextual motion informationcannot be identified using the proposed method Thereforethis method only works at the time of video capture In thefuture we plan to overcome the aforementioned constraintsand extend the application of the proposedmethod intomorecomplex environments
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
Acknowledgment
This work was supported in part by the National NaturalScience Foundation of China (Grant no 61202436 Grant no61271041 and Grant no 61300179)
References
[1] P Vadakkepat P Lim L C de Silva L Jing and L L LingldquoMultimodal approach to human-face detection and trackingrdquoIEEE Transactions on Industrial Electronics vol 55 no 3 pp1385ndash1393 2008
[2] C Zhang and Z Zhang ldquoA survey of recent advances in facedetectionrdquo Tech Rep Microsoft Research 2010
[3] C Huang H Ai Y Li and S Lao ldquoHigh-performance rotationinvariant multiview face detectionrdquo IEEE Transactions on Pat-tern Analysis and Machine Intelligence vol 29 no 4 pp 671ndash686 2007
8 Journal of Sensors
[4] JWright A Y Yang A Ganesh S S Sastry and YMa ldquoRobustface recognition via sparse representationrdquo IEEE Transactionson Pattern Analysis and Machine Intelligence vol 31 no 2 pp210ndash227 2009
[5] T Ahonen A Hadid and M Pietikainen ldquoFace descriptionwith local binary patterns application to face recognitionrdquo IEEETransactions on Pattern Analysis and Machine Intelligence vol28 no 12 pp 2037ndash2041 2006
[6] G Shakhnarovich and B Moghaddam ldquoFace recognitionin subspacesrdquo in Handbook of Face Recognition pp 19ndash49Springer 2011
[7] I Naseem R Togneri and M Bennamoun ldquoLinear regressionfor face recognitionrdquo IEEE Transactions on Pattern Analysis andMachine Intelligence vol 32 no 11 pp 2106ndash2112 2010
[8] L Zhang D V Kalashnikov S Mehrotra and R VaisenbergldquoContext-based person identification framework for smartvideo surveillancerdquoMachine Vision and Applications 2013
[9] X Geng K Smith-Miles L Wang M Li and QWu ldquoContext-aware fusion a case study on fusion of gait and face for humanidentification in videordquo Pattern Recognition vol 43 no 10 pp3660ndash3673 2010
[10] N OrsquoHare and A F Smeaton ldquoContext-aware person identi-fication in personal photo collectionsrdquo IEEE Transactions onMultimedia vol 11 no 2 pp 220ndash228 2009
[11] Z Stone T Zickler and T Darrell ldquoAutotagging FacebookSocial network context improves photo annotationrdquo in Pro-ceedings of the IEEE Computer Society Conference on ComputerVision and Pattern Recognition Workshops (CVPR rsquo08) pp 1ndash8June 2008
[12] D Anguelov K-C Lee S B Gokturk and B SumengenldquoContextual identity recognition in personal photo albumsrdquoin Proceedings of the IEEE Computer Society Conference onComputer Vision and Pattern Recognition (CVPR rsquo07) pp 1ndash7June 2007
[13] M Naaman R B Yeh H Garcia-Molina and A PaepckeldquoLeveraging context to resolve identity in photo albumsrdquo inProceedings of the 5th ACMIEEE Joint Conference on DigitalLibrariesmdashDigital Libraries Cyberinfrastructure for Researchand Education pp 178ndash187 June 2005
[14] M Zhao Y W Teo S Liu T S Chua and R Jain ldquoAutomaticperson annotation of family photo albumrdquo in Image and VideoRetrieval pp 163ndash172 Springer 2006
[15] M Piccardi ldquoBackground subtraction techniques a reviewrdquo inProceedings of the IEEE International Conference on SystemsMan and Cybernetics (SMC rsquo04) vol 4 pp 3099ndash3104 October2004
[16] Z Zivkovic ldquoImproved adaptive Gaussian mixture model forbackground subtractionrdquo inProceedings of the 17th InternationalConference on Pattern Recognition (ICPR rsquo04) vol 2 pp 28ndash31IEEE August 2004
[17] P KaewTraKulPong and R Bowden ldquoAn improved adaptivebackground mixture model for real-time tracking with shadowdetectionrdquo in Video-Based Surveillance Systems pp 135ndash144Springer 2002
[18] G Farneback ldquoTwo-framemotion estimation based on polyno-mial expansionrdquo in Image Analysis pp 363ndash370 Springer 2003
[19] J R Kwapisz G M Weiss and S A Moore ldquoActivityrecognition using cell phone accelerometersrdquo ACM SIGKDDExplorations Newsletter vol 12 no 2 pp 74ndash82 2011
[20] S Dernbach B Das N C Krishnan B L Thomas and D JCook ldquoSimple and complex activity recognition through smart
phonesrdquo in Proceedings of the 8th International Conference onIntelligent Environments (IE rsquo12) pp 214ndash221 IEEE June 2012
[21] N Ravi N Dandekar P Mysore and M L Littman ldquoActivityrecognition from accelerometer datardquo in Proceedings of the20th National Conference on Artificial Intelligence and the17th Innovative Applications of Artificial Intelligence Conference(AAAIIAAI rsquo05) pp 1541ndash1546 July 2005
International Journal of
AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Active and Passive Electronic Components
Control Scienceand Engineering
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
RotatingMachinery
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation httpwwwhindawicom
Journal ofEngineeringVolume 2014
Submit your manuscripts athttpwwwhindawicom
VLSI Design
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Shock and Vibration
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Civil EngineeringAdvances in
Acoustics and VibrationAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Advances inOptoElectronics
Hindawi Publishing Corporation httpwwwhindawicom
Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
SensorsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Chemical EngineeringInternational Journal of Antennas and
Propagation
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Navigation and Observation
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
DistributedSensor Networks
International Journal of
Journal of Sensors 3
Variables119901119863119890119904119888 patch descriptor 119901119863119890119904119888 = 119894119889 119891119903119886119898119890119878119905119886119903119905 119891119903119886119898119890119864119899119889 119888119890119899119905119890119903 ℎ119894119904119905119901119863119890119904119888119904 an array of patch descriptors119901119862119900119906119899119905 patch descriptor counter initialized to zero119891119903119886119898119890119868119889119909 frame counter initialized to zero119894119889 the ID of a patch119891119903119886119898119890119878119905119886119903119905 119891119903119886119898119890119864119899119889 frame index of first and last appearance of a patch119888119890119899119905119890119903 ℎ119894119904119905 119886119888119888119904 center and color histogram of a patch119904thr 119889thr 119886thr thresholds for histogram similarity patch distance and patch area 119886thr ge 0 and 119889thr ge 0 0 le 119904thr le 1Procedure
(1) Grab a video frame119891119903119886119898119890119868119889119909 = 119891119903119886119898119890119868119889119909 + 1
(2) Optical flow estimation(3) Background subtraction(4) for Each patch in current frame do(5) Calculate 119901119860119903119890119886 119901119862119890119899119905119890119903 119901119867119894119904119905 119901119882119894119889119905ℎ
(6) If 119901119860119903119890119886 lt 119886thr then(7) continue(8) end if(9) 119901119863119890119904119888119904
lowast
= 0
(10) for all 119901119863119890119904119888 isin 119901119863119890119904119888119904 do(11) if 119901119863119890119904119888 sdot 119891119903119886119898119890119864119899119889 + 1 == 119891119903119886119898119890119868119889119909 and 119904(119901119867119894119904119905 119901119863119890119904119888 sdot ℎ119894119904119905) ge 119904thr then(12) 119901119863119890119904119888119904
lowast
= 119901119863119890119904119888 cup 119901119863119890119904119888119904lowast
(13) end if(14) end for(15) 119889min = 119889thr lowast 119901119882119894119889119905ℎ 119901119863119890119904119888min = 119899119906119897119897
(16) for all 119901119863119890119904119888 isin 119901119863119890119904119888119904lowast do(17) 119889
119901=1003816100381610038161003816119901119862119890119899119905119890119903 sdot 119909 minus 119901119863119890119904119888 sdot 119888119890119899119905119890119903 sdot 119909
1003816100381610038161003816(18) if 119889
119901lt 119889min then
(19) 119889min = 119889119901 119901119863119890119904119888min = 119901119863119890119904119888
(20) end if(21) end for(22) if 119901119863119890119904119888min is 119899119906119897119897 then(23) 119901119863119890119904119888min = 119901119862119900119906119899119905 119891119903119886119898119890119868119889119909 119891119903119886119898119890119868119889119909 119901119862119890119899119905119890119903 119901119867119894119904119905
119901119863119890119904119888119904 = 119901119863119890119904119888119904 cup 119901119863119890119904119888119904min
119901119862119900119906119899119905 = 119901119862119900119906119899119905 + 1
(24) else(25) 119901119863119890119904119888min sdot 119891119903119886119898119890119864119899119889 = 119891119903119886119898119890119868119889119909
119901119863119890119904119888min sdot 119888119890119899119905119890119903 = 119901119862119890119899119905119890119903
119901119863119890119904119888min sdot ℎ119894119904119905 = 119901119867119894119904119905
(26) end if(27) Calculate and save vertical acceleration for 119901119863119890119904119888min(28) end for
Algorithm 1 Patch tracking and motion estimation
Pseudocode of patch tracking and motion estimation islisted in Algorithm 1
32 Accelerometer Measurements Collection In this subsec-tion we depict the procedure of acceleration measurementscollection using wearable sensors Android smart phonesequipped with 3-axis accelerometers are utilized as sensingplatforms For the three component accelerometer readingsonly the one with largest absolute mean value is analyzed inour experiments due to its best reflection of vertical motionpattern of human bodyThree different placements are testedand compared in order to assess impacts of different phoneplacements on accuracy of motion collection In each test a
participant performs a set of activities randomly includingstanding walking and jumping while carrying three smartphones on body with two phones placed in chest pocket andjacket side pocket respectively and one attached to waistbelt as shown in Figure 2 Results illustrated in Figure 3qualitatively show that all three types of placement couldcorrectly capture vertical motion feature of the participantwithminor acceptable discrepancyThis testmakes the choiceof phone attachment more flexible and unobtrusive
33 Feature Extraction and Person Identification Noisy rawmotion measurements of different sample frequency pre-viously obtained from different sensor sources cannot be
4 Journal of Sensors
Figure 2 Attachment of smart phones to human body From left toright jacket side pocket chest pocket and belt attachment
0 5 10 15 20 25 30Time (s)
BeltChest Jacket
Jumping StandingWalking
Figure 3 Acceleration measurements from three ways of phoneattachment for the above mentioned activities
compared directly Instead standard deviation and energy[19 20] are employed as motion features for comparisonafter noise suppression and data cleansing Energy is definedas sum of squared discrete FFT component magnitudes ofdata samples and divided by sample count for normalizationThese features are computed in a sliding window of length 119905
119908
with 1199051199082 overlapping between consecutive windows Feature
extraction on sliding windows with 50 percent overlappinghas demonstrated its success in [21]
120588 (119883 119884) =cov (119883 119884)120590119883120590119884
(4)
To find out whether 119901 represents a human body correlationanalysis is conducted As a matter of fact motion featuresextracted from video frames are supposed to be positivelylinear with those from accelerometer measurements of thesame subject We adopt correlation coefficient to reliablymeasure strength of linear relationship as defined in (4)where119883 and119884 aremotion features to be compared cov(119883 119884)the covariance and 120590
119883and 120590
119884the standard deviation of 119883
and 119884 120588 ranges from minus1 to 1 inclusively where 0 indicatesno linear relationship +1 indicates a perfect positive linearrelationship and minus1 indicates a perfect negative linear rela-tionship The larger 120588(119883 119884) the more correlated 119883 and 119884In our case motion features of 119901 are compared with eachof those extracted from smart phones in the same periodof time Identity information of smart phone correspondingto the largest positive correlation coefficient is utilized toidentify 119901
4 Experiments and Discussions
In this section we conduct detailed experiments in varioussituations to optimize Algorithm 1 and evaluate the proposedperson identification algorithm We use a digital camera andtwo Android smart phones for data collection A simpleGUI application is created to start and stop data collectionon phones Acceleration measurements are recorded andsaved in text files on phone SD card and later accessedvia USB Video clips are recorded in the format of mp4files at a resolution of 640 times 480 15 frames per secondThe timestamps of video frames and accelerometer readingsare well synchronized before the experiment Algorithm 1 isimplemented based on OpenCV library and tested on anIntel 34GHz platform runningUbuntu 1304We recruit twoparticipants labeled as A and B respectively to take partin our experiments and place smart phones in jacket sidepockets We choose four different scenarios to perform ourexperiments including outdoor near field outdoor far fieldindoor near field indoor far field as illustrated in Figure 8In near field situations the subjects moved around within ascope about five meters away from the cameraThe silhouetteheight of human body is not less than half of the image heightand human face could be clearly distinguished In far fieldsituations the subjects moved around about twenty metersawaywhere detailed visual features of human body aremostlylost and body height in image is not more than thirty pixelsIn each scenario we repeated the experiment four times andeach lasts about five minutes In all we collect sixteen videoclips and thirty-two text files of acceleration measurements
41 Tracking Optimization Patch tracking is an essential stepfor motion estimation from camera video and directly affectsaccuracy and robustness of subsequent person identificationAs listed in Algorithm 1 the aim of patch tracking is toestimate motion measurements for each patch that appearedin video frames In the ideal case a subject is continuouslytracked in camera video by only one descriptor duringthe whole experiment and we could extract a sequence ofacceleration measurements closest to that collected from thesmart phone in terms of time duration while in the worstcase we have to create new descriptors for all patches ineach frame and the number of descriptors used for tackinga subject is as many as that of the frames of his appearanceWe present a metric in (5) to measure the performance ofAlgorithm 1 The metric 119871(119894) is defined as a ratio betweennumber of subjects in a video clip and number of descriptorsused for tracking the subjects The range of 119871(119894) is (0 1] Thelarger 119871(119894) the better the tracking performanceMoreover wealso provide a metric to evaluate tracking accuracy as shownin (6) Accurate descriptor means that a descriptor tracksonly one subject during its lifetimeThe larger119870(119894) the moreaccurate Algorithm 1
119871 (119894) =subjects in video 119894
descriptors in 119894 (5)
119870 (119894) =accurate descriptors in video 119894
descriptors in 119894 (6)
Journal of Sensors 5
0
008
007
006
005
004
003
002
001
0
L
01 02 03 04 05 06 07 08 09 1
02
04
06
08
1
12
14
K
dLthr = 001 d
Kthr = 001
dKthr = 005
dKthr = 01
dKthr = 05
dKthr = 1
dKthr = 5
dLthr = 01
dLthr = 05
dLthr = 1
dLthr = 5
dLthr = 005
sthr
(a) Outdoor near field
01 02 03 04 05 06 07 08 09 1
sthr
012
01
008
006
004
002
0
15
1
05
0
dLthr = 001 d
Kthr = 001
dKthr = 005
dKthr = 01
dKthr = 05
dKthr = 1
dKthr = 5
dLthr = 01
dLthr = 05
dLthr = 1
dLthr = 5
dLthr = 005
L K
(b) Indoor near field
0005
001
0
0015
002
0025
0930940950960970980991
01 02 03 04 05 06 07 08 09 1
sthr
L K
dLthr = 001 d
Kthr = 001
dKthr = 005
dKthr = 01
dKthr = 05
dKthr = 1
dKthr = 5
dLthr = 01
dLthr = 05
dLthr = 1
dLthr = 5
dLthr = 005
(c) Outdoor far field
00005001
0015002
0025003
0035004
0045005
086088090920940960981
01 02 03 04 05 06 07 08 09 1
sthr
L K
dLthr = 001 d
Kthr = 001
dKthr = 005
dKthr = 01
dKthr = 05
dKthr = 1
dKthr = 5
dLthr = 01
dLthr = 05
dLthr = 1
dLthr = 5
dLthr = 005
(d) Indoor far field
Figure 4 Tracking performance and performance in the four scenarios with different values of 119889thr and 119904thr
As depicted in Algorithm 1 three parameters 119886thr 119889thrand 119904thr affect 119871 and 119870 119886thr indicates minimum area of apatch that potentially represents a subject Patches with anarea less than 119886thr are filtered out Generally in a specificapplication scenario the value of 119886thr could be figured outempirically In our experiments we set it to 150 which worksfine 119904thr specifies a minimum histogram similarity betweencurrent patch 119901 and potential descriptors of 119901 Each activedescriptor that satisfies this requirement is tested in terms ofhorizontal distance to 119901 119889thr stipulates a distance thresholdto rule out inappropriate alternative descriptors A nearestdescriptor satisfying this threshold is selected to track 119901 if itexits Otherwise we create a new descriptor for 119901 Moreovermany interference factors in the scenario including poorlighting condition similar clothing color to the backgroundincidental shadow of human body and unpredictable motionpattern of subjects like fast turning and crossing would alsopose negative effects to patch tracking process To rule outimpacts of these factors and optimize patch tracking fromeach of the four scenarios we select a representative video clip
and runAlgorithm 1 over the videowith different 119904thr and119889thrResulted 119871 and119870 are illustrated in Figure 4 Extracted framesfrom video clips with labeled patches are listed in Figure 8
Due to different motion patterns of the subjects 119871 mayvary among video clips of different scenarios However fromFigure 4 we can conclude that 119871 drops dramatically when119904thr gt 08 in near field scenario and 119904thr gt 02 in far fieldscenario with 119889thr ge 01This ismainly caused by backgroundsubtraction noises Histogram similarity of patches of thesame subject from two consecutive frames is about 08 innear field in this situation In far field scenarios with relativelysmaller foreground patches the negative impacts becomemore severe and threshold similarity degrades to 02 Patchesof the same subject are associated with different descriptorswhen histogram similarity is beyond these thresholds When119889thr lt 01 the worst case occurs We need to create newdescriptors for patches in every frame as horizontal distancebetween patches of the same subject from two consecutiveframes is mostly beyond this limit As 119889thr increases 119871increases and converges at 119889thr = 5
6 Journal of Sensors
1 2 3 4 5 6 7 8 9 10 11
Stan
dard
dev
iatio
n
Feature sample
p
AB
Figure 5 Standard deviation of accelerations of patch119901 and subjectsA and B
In near field scenarios Algorithm 1 achieves 100 percentaccuracy with whatever 119904thr and 119889thr while in far fieldscenarios it does not perform so perfectly when 119889thr ge 01
and 119904thr le 02 In the experiments we found that thishappened mostly in situations when subjects were close andthe patch of one subject lost in the following frame
To balance 119871 and 119870 we set 119889thr = 5 119904thr = 02run Algorithm 1 over the sixteen video clips and collectmotion measurements for person identification in the fol-lowing experiments Statistics of the obtained descriptors areillustrated in Figure 7
42 Person Identification When motion measurements col-lection from video finished we obtain a set of patch descrip-tors and each descriptor associates with a time series ofacceleration data of a potential subject Some descriptorswithin the set come with short series of motion data usuallyless than ten frames This is possibly caused by subjectscrossing each other fake foreground from flashing lights fastturning of human body moving objects at the edge of cameraFOV and so forth These insufficient and noisy data fail toreflect actual motion pattern of potential subjects and arefiltered out in the first place As shown in Figure 7 there arecomparatively more noisy descriptors in far field scenariosespecially in outdoor far field scenarios where nearly 50percent of descriptors are ruled out in each video
Then we calculate a sequence of motion features for eachdescriptor and compare the feature sequence with each ofthose obtained from smart phones in the same period oftime Sliding window in motion feature calculation is closelyrelated to subjects and application scenarios It should belarge enough to capture the distinctive pattern of subjectmovement but not too large to confuse different ones Inour experiments we set window size to 1 second empiricallyMotion features from an example patch descriptor and thosefrom the two smart phones in the same period are shownin Figures 5 and 6 where we could conclude that patch 119901
represents subject B during its lifetimeThe total number of accurately identified patch descrip-
tors in each video is listed in Figure 7 The proposedmethod achieves comparatively better performance in nearfield environment where we can capture more accurate androbust motion measurements of human bodyThe worst case
1 2 3 4 5 6 7 8 9 10 11Feature sample
Ener
gy
ABp
Figure 6 Energy of accelerations of patch 119901 and subjects A and B
2 4 6 8 10 12 14 16Video
3731
28
38
3026
40
3030
383230
242020
241816
201616
221615
80
5245
82
5046
80
5348
84
5047
40
21
12
40
20
10
41
20
12
40
18
8
Total descriptorsFiltered descriptorsAccurately tagged descriptors
Des
crip
tor c
ount
Figure 7 Obtained descriptors with optimized parameters filtereddescriptors and accurately identified descriptors where 1ndash4 repre-sent the outdoor near field 5ndash8 represent the indoor near field 9ndash12represent the outdoor far field and 13ndash16 represent the indoor farfield
happens in outdoor far field scenario In this case there areless optical flows within each patch and less frames asso-ciated with each descriptor We save the mapping betweenpatch descriptors and their estimated identity and rerunAlgorithm 1with the same parameter configuration as beforeThe obtained patch identity is labeled in the video right afterpatch ID As illustrated in Figure 8 the proposed methodcould maintain comparatively acceptable performance evenunder adverse situations
5 Conclusions
In this paper we propose a novel method for automaticperson identification The method innovatively leveragescorrelation of body motion features from two different sens-ing sources that is accelerometer and camera Experimentresults demonstrate the performance and accuracy of theproposed method However the proposed method is limitedin the following aspects First users have to register andcarry their smart phones in order to be discernable in cameraFOVs Second we assume that phones stay relatively still withhuman body during the experiments but in practice peopletend to take out and check their phones from time to timeAcceleration data collected during these occasions woulddamage the identification accuracy Besides the method
Journal of Sensors 7
(a) (b) (c) (d)
(e) (f) (g) (h)
(i) (j) (k) (l)
(m) (n) (o) (p)
Figure 8 Screenshots of identification results where (a)ndash(d) represent the outdoor near field (e)ndash(h) represent the indoor near field (i)ndash(l)represent the outdoor far field and (m)ndash(p) represent the indoor far field
relies heavily on background subtraction in the process ofpatch trackingThus amore practical and reliable strategy formotion data collection is needed Third subjects in archivedvideo clips without available contextual motion informationcannot be identified using the proposed method Thereforethis method only works at the time of video capture In thefuture we plan to overcome the aforementioned constraintsand extend the application of the proposedmethod intomorecomplex environments
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
Acknowledgment
This work was supported in part by the National NaturalScience Foundation of China (Grant no 61202436 Grant no61271041 and Grant no 61300179)
References
[1] P Vadakkepat P Lim L C de Silva L Jing and L L LingldquoMultimodal approach to human-face detection and trackingrdquoIEEE Transactions on Industrial Electronics vol 55 no 3 pp1385ndash1393 2008
[2] C Zhang and Z Zhang ldquoA survey of recent advances in facedetectionrdquo Tech Rep Microsoft Research 2010
[3] C Huang H Ai Y Li and S Lao ldquoHigh-performance rotationinvariant multiview face detectionrdquo IEEE Transactions on Pat-tern Analysis and Machine Intelligence vol 29 no 4 pp 671ndash686 2007
8 Journal of Sensors
[4] JWright A Y Yang A Ganesh S S Sastry and YMa ldquoRobustface recognition via sparse representationrdquo IEEE Transactionson Pattern Analysis and Machine Intelligence vol 31 no 2 pp210ndash227 2009
[5] T Ahonen A Hadid and M Pietikainen ldquoFace descriptionwith local binary patterns application to face recognitionrdquo IEEETransactions on Pattern Analysis and Machine Intelligence vol28 no 12 pp 2037ndash2041 2006
[6] G Shakhnarovich and B Moghaddam ldquoFace recognitionin subspacesrdquo in Handbook of Face Recognition pp 19ndash49Springer 2011
[7] I Naseem R Togneri and M Bennamoun ldquoLinear regressionfor face recognitionrdquo IEEE Transactions on Pattern Analysis andMachine Intelligence vol 32 no 11 pp 2106ndash2112 2010
[8] L Zhang D V Kalashnikov S Mehrotra and R VaisenbergldquoContext-based person identification framework for smartvideo surveillancerdquoMachine Vision and Applications 2013
[9] X Geng K Smith-Miles L Wang M Li and QWu ldquoContext-aware fusion a case study on fusion of gait and face for humanidentification in videordquo Pattern Recognition vol 43 no 10 pp3660ndash3673 2010
[10] N OrsquoHare and A F Smeaton ldquoContext-aware person identi-fication in personal photo collectionsrdquo IEEE Transactions onMultimedia vol 11 no 2 pp 220ndash228 2009
[11] Z Stone T Zickler and T Darrell ldquoAutotagging FacebookSocial network context improves photo annotationrdquo in Pro-ceedings of the IEEE Computer Society Conference on ComputerVision and Pattern Recognition Workshops (CVPR rsquo08) pp 1ndash8June 2008
[12] D Anguelov K-C Lee S B Gokturk and B SumengenldquoContextual identity recognition in personal photo albumsrdquoin Proceedings of the IEEE Computer Society Conference onComputer Vision and Pattern Recognition (CVPR rsquo07) pp 1ndash7June 2007
[13] M Naaman R B Yeh H Garcia-Molina and A PaepckeldquoLeveraging context to resolve identity in photo albumsrdquo inProceedings of the 5th ACMIEEE Joint Conference on DigitalLibrariesmdashDigital Libraries Cyberinfrastructure for Researchand Education pp 178ndash187 June 2005
[14] M Zhao Y W Teo S Liu T S Chua and R Jain ldquoAutomaticperson annotation of family photo albumrdquo in Image and VideoRetrieval pp 163ndash172 Springer 2006
[15] M Piccardi ldquoBackground subtraction techniques a reviewrdquo inProceedings of the IEEE International Conference on SystemsMan and Cybernetics (SMC rsquo04) vol 4 pp 3099ndash3104 October2004
[16] Z Zivkovic ldquoImproved adaptive Gaussian mixture model forbackground subtractionrdquo inProceedings of the 17th InternationalConference on Pattern Recognition (ICPR rsquo04) vol 2 pp 28ndash31IEEE August 2004
[17] P KaewTraKulPong and R Bowden ldquoAn improved adaptivebackground mixture model for real-time tracking with shadowdetectionrdquo in Video-Based Surveillance Systems pp 135ndash144Springer 2002
[18] G Farneback ldquoTwo-framemotion estimation based on polyno-mial expansionrdquo in Image Analysis pp 363ndash370 Springer 2003
[19] J R Kwapisz G M Weiss and S A Moore ldquoActivityrecognition using cell phone accelerometersrdquo ACM SIGKDDExplorations Newsletter vol 12 no 2 pp 74ndash82 2011
[20] S Dernbach B Das N C Krishnan B L Thomas and D JCook ldquoSimple and complex activity recognition through smart
phonesrdquo in Proceedings of the 8th International Conference onIntelligent Environments (IE rsquo12) pp 214ndash221 IEEE June 2012
[21] N Ravi N Dandekar P Mysore and M L Littman ldquoActivityrecognition from accelerometer datardquo in Proceedings of the20th National Conference on Artificial Intelligence and the17th Innovative Applications of Artificial Intelligence Conference(AAAIIAAI rsquo05) pp 1541ndash1546 July 2005
International Journal of
AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Active and Passive Electronic Components
Control Scienceand Engineering
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
RotatingMachinery
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation httpwwwhindawicom
Journal ofEngineeringVolume 2014
Submit your manuscripts athttpwwwhindawicom
VLSI Design
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Shock and Vibration
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Civil EngineeringAdvances in
Acoustics and VibrationAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Advances inOptoElectronics
Hindawi Publishing Corporation httpwwwhindawicom
Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
SensorsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Chemical EngineeringInternational Journal of Antennas and
Propagation
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Navigation and Observation
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
DistributedSensor Networks
International Journal of
4 Journal of Sensors
Figure 2 Attachment of smart phones to human body From left toright jacket side pocket chest pocket and belt attachment
0 5 10 15 20 25 30Time (s)
BeltChest Jacket
Jumping StandingWalking
Figure 3 Acceleration measurements from three ways of phoneattachment for the above mentioned activities
compared directly Instead standard deviation and energy[19 20] are employed as motion features for comparisonafter noise suppression and data cleansing Energy is definedas sum of squared discrete FFT component magnitudes ofdata samples and divided by sample count for normalizationThese features are computed in a sliding window of length 119905
119908
with 1199051199082 overlapping between consecutive windows Feature
extraction on sliding windows with 50 percent overlappinghas demonstrated its success in [21]
120588 (119883 119884) =cov (119883 119884)120590119883120590119884
(4)
To find out whether 119901 represents a human body correlationanalysis is conducted As a matter of fact motion featuresextracted from video frames are supposed to be positivelylinear with those from accelerometer measurements of thesame subject We adopt correlation coefficient to reliablymeasure strength of linear relationship as defined in (4)where119883 and119884 aremotion features to be compared cov(119883 119884)the covariance and 120590
119883and 120590
119884the standard deviation of 119883
and 119884 120588 ranges from minus1 to 1 inclusively where 0 indicatesno linear relationship +1 indicates a perfect positive linearrelationship and minus1 indicates a perfect negative linear rela-tionship The larger 120588(119883 119884) the more correlated 119883 and 119884In our case motion features of 119901 are compared with eachof those extracted from smart phones in the same periodof time Identity information of smart phone correspondingto the largest positive correlation coefficient is utilized toidentify 119901
4 Experiments and Discussions
In this section we conduct detailed experiments in varioussituations to optimize Algorithm 1 and evaluate the proposedperson identification algorithm We use a digital camera andtwo Android smart phones for data collection A simpleGUI application is created to start and stop data collectionon phones Acceleration measurements are recorded andsaved in text files on phone SD card and later accessedvia USB Video clips are recorded in the format of mp4files at a resolution of 640 times 480 15 frames per secondThe timestamps of video frames and accelerometer readingsare well synchronized before the experiment Algorithm 1 isimplemented based on OpenCV library and tested on anIntel 34GHz platform runningUbuntu 1304We recruit twoparticipants labeled as A and B respectively to take partin our experiments and place smart phones in jacket sidepockets We choose four different scenarios to perform ourexperiments including outdoor near field outdoor far fieldindoor near field indoor far field as illustrated in Figure 8In near field situations the subjects moved around within ascope about five meters away from the cameraThe silhouetteheight of human body is not less than half of the image heightand human face could be clearly distinguished In far fieldsituations the subjects moved around about twenty metersawaywhere detailed visual features of human body aremostlylost and body height in image is not more than thirty pixelsIn each scenario we repeated the experiment four times andeach lasts about five minutes In all we collect sixteen videoclips and thirty-two text files of acceleration measurements
41 Tracking Optimization Patch tracking is an essential stepfor motion estimation from camera video and directly affectsaccuracy and robustness of subsequent person identificationAs listed in Algorithm 1 the aim of patch tracking is toestimate motion measurements for each patch that appearedin video frames In the ideal case a subject is continuouslytracked in camera video by only one descriptor duringthe whole experiment and we could extract a sequence ofacceleration measurements closest to that collected from thesmart phone in terms of time duration while in the worstcase we have to create new descriptors for all patches ineach frame and the number of descriptors used for tackinga subject is as many as that of the frames of his appearanceWe present a metric in (5) to measure the performance ofAlgorithm 1 The metric 119871(119894) is defined as a ratio betweennumber of subjects in a video clip and number of descriptorsused for tracking the subjects The range of 119871(119894) is (0 1] Thelarger 119871(119894) the better the tracking performanceMoreover wealso provide a metric to evaluate tracking accuracy as shownin (6) Accurate descriptor means that a descriptor tracksonly one subject during its lifetimeThe larger119870(119894) the moreaccurate Algorithm 1
119871 (119894) =subjects in video 119894
descriptors in 119894 (5)
119870 (119894) =accurate descriptors in video 119894
descriptors in 119894 (6)
Journal of Sensors 5
0
008
007
006
005
004
003
002
001
0
L
01 02 03 04 05 06 07 08 09 1
02
04
06
08
1
12
14
K
dLthr = 001 d
Kthr = 001
dKthr = 005
dKthr = 01
dKthr = 05
dKthr = 1
dKthr = 5
dLthr = 01
dLthr = 05
dLthr = 1
dLthr = 5
dLthr = 005
sthr
(a) Outdoor near field
01 02 03 04 05 06 07 08 09 1
sthr
012
01
008
006
004
002
0
15
1
05
0
dLthr = 001 d
Kthr = 001
dKthr = 005
dKthr = 01
dKthr = 05
dKthr = 1
dKthr = 5
dLthr = 01
dLthr = 05
dLthr = 1
dLthr = 5
dLthr = 005
L K
(b) Indoor near field
0005
001
0
0015
002
0025
0930940950960970980991
01 02 03 04 05 06 07 08 09 1
sthr
L K
dLthr = 001 d
Kthr = 001
dKthr = 005
dKthr = 01
dKthr = 05
dKthr = 1
dKthr = 5
dLthr = 01
dLthr = 05
dLthr = 1
dLthr = 5
dLthr = 005
(c) Outdoor far field
00005001
0015002
0025003
0035004
0045005
086088090920940960981
01 02 03 04 05 06 07 08 09 1
sthr
L K
dLthr = 001 d
Kthr = 001
dKthr = 005
dKthr = 01
dKthr = 05
dKthr = 1
dKthr = 5
dLthr = 01
dLthr = 05
dLthr = 1
dLthr = 5
dLthr = 005
(d) Indoor far field
Figure 4 Tracking performance and performance in the four scenarios with different values of 119889thr and 119904thr
As depicted in Algorithm 1 three parameters 119886thr 119889thrand 119904thr affect 119871 and 119870 119886thr indicates minimum area of apatch that potentially represents a subject Patches with anarea less than 119886thr are filtered out Generally in a specificapplication scenario the value of 119886thr could be figured outempirically In our experiments we set it to 150 which worksfine 119904thr specifies a minimum histogram similarity betweencurrent patch 119901 and potential descriptors of 119901 Each activedescriptor that satisfies this requirement is tested in terms ofhorizontal distance to 119901 119889thr stipulates a distance thresholdto rule out inappropriate alternative descriptors A nearestdescriptor satisfying this threshold is selected to track 119901 if itexits Otherwise we create a new descriptor for 119901 Moreovermany interference factors in the scenario including poorlighting condition similar clothing color to the backgroundincidental shadow of human body and unpredictable motionpattern of subjects like fast turning and crossing would alsopose negative effects to patch tracking process To rule outimpacts of these factors and optimize patch tracking fromeach of the four scenarios we select a representative video clip
and runAlgorithm 1 over the videowith different 119904thr and119889thrResulted 119871 and119870 are illustrated in Figure 4 Extracted framesfrom video clips with labeled patches are listed in Figure 8
Due to different motion patterns of the subjects 119871 mayvary among video clips of different scenarios However fromFigure 4 we can conclude that 119871 drops dramatically when119904thr gt 08 in near field scenario and 119904thr gt 02 in far fieldscenario with 119889thr ge 01This ismainly caused by backgroundsubtraction noises Histogram similarity of patches of thesame subject from two consecutive frames is about 08 innear field in this situation In far field scenarios with relativelysmaller foreground patches the negative impacts becomemore severe and threshold similarity degrades to 02 Patchesof the same subject are associated with different descriptorswhen histogram similarity is beyond these thresholds When119889thr lt 01 the worst case occurs We need to create newdescriptors for patches in every frame as horizontal distancebetween patches of the same subject from two consecutiveframes is mostly beyond this limit As 119889thr increases 119871increases and converges at 119889thr = 5
6 Journal of Sensors
1 2 3 4 5 6 7 8 9 10 11
Stan
dard
dev
iatio
n
Feature sample
p
AB
Figure 5 Standard deviation of accelerations of patch119901 and subjectsA and B
In near field scenarios Algorithm 1 achieves 100 percentaccuracy with whatever 119904thr and 119889thr while in far fieldscenarios it does not perform so perfectly when 119889thr ge 01
and 119904thr le 02 In the experiments we found that thishappened mostly in situations when subjects were close andthe patch of one subject lost in the following frame
To balance 119871 and 119870 we set 119889thr = 5 119904thr = 02run Algorithm 1 over the sixteen video clips and collectmotion measurements for person identification in the fol-lowing experiments Statistics of the obtained descriptors areillustrated in Figure 7
42 Person Identification When motion measurements col-lection from video finished we obtain a set of patch descrip-tors and each descriptor associates with a time series ofacceleration data of a potential subject Some descriptorswithin the set come with short series of motion data usuallyless than ten frames This is possibly caused by subjectscrossing each other fake foreground from flashing lights fastturning of human body moving objects at the edge of cameraFOV and so forth These insufficient and noisy data fail toreflect actual motion pattern of potential subjects and arefiltered out in the first place As shown in Figure 7 there arecomparatively more noisy descriptors in far field scenariosespecially in outdoor far field scenarios where nearly 50percent of descriptors are ruled out in each video
Then we calculate a sequence of motion features for eachdescriptor and compare the feature sequence with each ofthose obtained from smart phones in the same period oftime Sliding window in motion feature calculation is closelyrelated to subjects and application scenarios It should belarge enough to capture the distinctive pattern of subjectmovement but not too large to confuse different ones Inour experiments we set window size to 1 second empiricallyMotion features from an example patch descriptor and thosefrom the two smart phones in the same period are shownin Figures 5 and 6 where we could conclude that patch 119901
represents subject B during its lifetimeThe total number of accurately identified patch descrip-
tors in each video is listed in Figure 7 The proposedmethod achieves comparatively better performance in nearfield environment where we can capture more accurate androbust motion measurements of human bodyThe worst case
1 2 3 4 5 6 7 8 9 10 11Feature sample
Ener
gy
ABp
Figure 6 Energy of accelerations of patch 119901 and subjects A and B
2 4 6 8 10 12 14 16Video
3731
28
38
3026
40
3030
383230
242020
241816
201616
221615
80
5245
82
5046
80
5348
84
5047
40
21
12
40
20
10
41
20
12
40
18
8
Total descriptorsFiltered descriptorsAccurately tagged descriptors
Des
crip
tor c
ount
Figure 7 Obtained descriptors with optimized parameters filtereddescriptors and accurately identified descriptors where 1ndash4 repre-sent the outdoor near field 5ndash8 represent the indoor near field 9ndash12represent the outdoor far field and 13ndash16 represent the indoor farfield
happens in outdoor far field scenario In this case there areless optical flows within each patch and less frames asso-ciated with each descriptor We save the mapping betweenpatch descriptors and their estimated identity and rerunAlgorithm 1with the same parameter configuration as beforeThe obtained patch identity is labeled in the video right afterpatch ID As illustrated in Figure 8 the proposed methodcould maintain comparatively acceptable performance evenunder adverse situations
5 Conclusions
In this paper we propose a novel method for automaticperson identification The method innovatively leveragescorrelation of body motion features from two different sens-ing sources that is accelerometer and camera Experimentresults demonstrate the performance and accuracy of theproposed method However the proposed method is limitedin the following aspects First users have to register andcarry their smart phones in order to be discernable in cameraFOVs Second we assume that phones stay relatively still withhuman body during the experiments but in practice peopletend to take out and check their phones from time to timeAcceleration data collected during these occasions woulddamage the identification accuracy Besides the method
Journal of Sensors 7
(a) (b) (c) (d)
(e) (f) (g) (h)
(i) (j) (k) (l)
(m) (n) (o) (p)
Figure 8 Screenshots of identification results where (a)ndash(d) represent the outdoor near field (e)ndash(h) represent the indoor near field (i)ndash(l)represent the outdoor far field and (m)ndash(p) represent the indoor far field
relies heavily on background subtraction in the process ofpatch trackingThus amore practical and reliable strategy formotion data collection is needed Third subjects in archivedvideo clips without available contextual motion informationcannot be identified using the proposed method Thereforethis method only works at the time of video capture In thefuture we plan to overcome the aforementioned constraintsand extend the application of the proposedmethod intomorecomplex environments
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
Acknowledgment
This work was supported in part by the National NaturalScience Foundation of China (Grant no 61202436 Grant no61271041 and Grant no 61300179)
References
[1] P Vadakkepat P Lim L C de Silva L Jing and L L LingldquoMultimodal approach to human-face detection and trackingrdquoIEEE Transactions on Industrial Electronics vol 55 no 3 pp1385ndash1393 2008
[2] C Zhang and Z Zhang ldquoA survey of recent advances in facedetectionrdquo Tech Rep Microsoft Research 2010
[3] C Huang H Ai Y Li and S Lao ldquoHigh-performance rotationinvariant multiview face detectionrdquo IEEE Transactions on Pat-tern Analysis and Machine Intelligence vol 29 no 4 pp 671ndash686 2007
8 Journal of Sensors
[4] JWright A Y Yang A Ganesh S S Sastry and YMa ldquoRobustface recognition via sparse representationrdquo IEEE Transactionson Pattern Analysis and Machine Intelligence vol 31 no 2 pp210ndash227 2009
[5] T Ahonen A Hadid and M Pietikainen ldquoFace descriptionwith local binary patterns application to face recognitionrdquo IEEETransactions on Pattern Analysis and Machine Intelligence vol28 no 12 pp 2037ndash2041 2006
[6] G Shakhnarovich and B Moghaddam ldquoFace recognitionin subspacesrdquo in Handbook of Face Recognition pp 19ndash49Springer 2011
[7] I Naseem R Togneri and M Bennamoun ldquoLinear regressionfor face recognitionrdquo IEEE Transactions on Pattern Analysis andMachine Intelligence vol 32 no 11 pp 2106ndash2112 2010
[8] L Zhang D V Kalashnikov S Mehrotra and R VaisenbergldquoContext-based person identification framework for smartvideo surveillancerdquoMachine Vision and Applications 2013
[9] X Geng K Smith-Miles L Wang M Li and QWu ldquoContext-aware fusion a case study on fusion of gait and face for humanidentification in videordquo Pattern Recognition vol 43 no 10 pp3660ndash3673 2010
[10] N OrsquoHare and A F Smeaton ldquoContext-aware person identi-fication in personal photo collectionsrdquo IEEE Transactions onMultimedia vol 11 no 2 pp 220ndash228 2009
[11] Z Stone T Zickler and T Darrell ldquoAutotagging FacebookSocial network context improves photo annotationrdquo in Pro-ceedings of the IEEE Computer Society Conference on ComputerVision and Pattern Recognition Workshops (CVPR rsquo08) pp 1ndash8June 2008
[12] D Anguelov K-C Lee S B Gokturk and B SumengenldquoContextual identity recognition in personal photo albumsrdquoin Proceedings of the IEEE Computer Society Conference onComputer Vision and Pattern Recognition (CVPR rsquo07) pp 1ndash7June 2007
[13] M Naaman R B Yeh H Garcia-Molina and A PaepckeldquoLeveraging context to resolve identity in photo albumsrdquo inProceedings of the 5th ACMIEEE Joint Conference on DigitalLibrariesmdashDigital Libraries Cyberinfrastructure for Researchand Education pp 178ndash187 June 2005
[14] M Zhao Y W Teo S Liu T S Chua and R Jain ldquoAutomaticperson annotation of family photo albumrdquo in Image and VideoRetrieval pp 163ndash172 Springer 2006
[15] M Piccardi ldquoBackground subtraction techniques a reviewrdquo inProceedings of the IEEE International Conference on SystemsMan and Cybernetics (SMC rsquo04) vol 4 pp 3099ndash3104 October2004
[16] Z Zivkovic ldquoImproved adaptive Gaussian mixture model forbackground subtractionrdquo inProceedings of the 17th InternationalConference on Pattern Recognition (ICPR rsquo04) vol 2 pp 28ndash31IEEE August 2004
[17] P KaewTraKulPong and R Bowden ldquoAn improved adaptivebackground mixture model for real-time tracking with shadowdetectionrdquo in Video-Based Surveillance Systems pp 135ndash144Springer 2002
[18] G Farneback ldquoTwo-framemotion estimation based on polyno-mial expansionrdquo in Image Analysis pp 363ndash370 Springer 2003
[19] J R Kwapisz G M Weiss and S A Moore ldquoActivityrecognition using cell phone accelerometersrdquo ACM SIGKDDExplorations Newsletter vol 12 no 2 pp 74ndash82 2011
[20] S Dernbach B Das N C Krishnan B L Thomas and D JCook ldquoSimple and complex activity recognition through smart
phonesrdquo in Proceedings of the 8th International Conference onIntelligent Environments (IE rsquo12) pp 214ndash221 IEEE June 2012
[21] N Ravi N Dandekar P Mysore and M L Littman ldquoActivityrecognition from accelerometer datardquo in Proceedings of the20th National Conference on Artificial Intelligence and the17th Innovative Applications of Artificial Intelligence Conference(AAAIIAAI rsquo05) pp 1541ndash1546 July 2005
International Journal of
AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Active and Passive Electronic Components
Control Scienceand Engineering
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
RotatingMachinery
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation httpwwwhindawicom
Journal ofEngineeringVolume 2014
Submit your manuscripts athttpwwwhindawicom
VLSI Design
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Shock and Vibration
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Civil EngineeringAdvances in
Acoustics and VibrationAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Advances inOptoElectronics
Hindawi Publishing Corporation httpwwwhindawicom
Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
SensorsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Chemical EngineeringInternational Journal of Antennas and
Propagation
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Navigation and Observation
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
DistributedSensor Networks
International Journal of
Journal of Sensors 5
0
008
007
006
005
004
003
002
001
0
L
01 02 03 04 05 06 07 08 09 1
02
04
06
08
1
12
14
K
dLthr = 001 d
Kthr = 001
dKthr = 005
dKthr = 01
dKthr = 05
dKthr = 1
dKthr = 5
dLthr = 01
dLthr = 05
dLthr = 1
dLthr = 5
dLthr = 005
sthr
(a) Outdoor near field
01 02 03 04 05 06 07 08 09 1
sthr
012
01
008
006
004
002
0
15
1
05
0
dLthr = 001 d
Kthr = 001
dKthr = 005
dKthr = 01
dKthr = 05
dKthr = 1
dKthr = 5
dLthr = 01
dLthr = 05
dLthr = 1
dLthr = 5
dLthr = 005
L K
(b) Indoor near field
0005
001
0
0015
002
0025
0930940950960970980991
01 02 03 04 05 06 07 08 09 1
sthr
L K
dLthr = 001 d
Kthr = 001
dKthr = 005
dKthr = 01
dKthr = 05
dKthr = 1
dKthr = 5
dLthr = 01
dLthr = 05
dLthr = 1
dLthr = 5
dLthr = 005
(c) Outdoor far field
00005001
0015002
0025003
0035004
0045005
086088090920940960981
01 02 03 04 05 06 07 08 09 1
sthr
L K
dLthr = 001 d
Kthr = 001
dKthr = 005
dKthr = 01
dKthr = 05
dKthr = 1
dKthr = 5
dLthr = 01
dLthr = 05
dLthr = 1
dLthr = 5
dLthr = 005
(d) Indoor far field
Figure 4 Tracking performance and performance in the four scenarios with different values of 119889thr and 119904thr
As depicted in Algorithm 1 three parameters 119886thr 119889thrand 119904thr affect 119871 and 119870 119886thr indicates minimum area of apatch that potentially represents a subject Patches with anarea less than 119886thr are filtered out Generally in a specificapplication scenario the value of 119886thr could be figured outempirically In our experiments we set it to 150 which worksfine 119904thr specifies a minimum histogram similarity betweencurrent patch 119901 and potential descriptors of 119901 Each activedescriptor that satisfies this requirement is tested in terms ofhorizontal distance to 119901 119889thr stipulates a distance thresholdto rule out inappropriate alternative descriptors A nearestdescriptor satisfying this threshold is selected to track 119901 if itexits Otherwise we create a new descriptor for 119901 Moreovermany interference factors in the scenario including poorlighting condition similar clothing color to the backgroundincidental shadow of human body and unpredictable motionpattern of subjects like fast turning and crossing would alsopose negative effects to patch tracking process To rule outimpacts of these factors and optimize patch tracking fromeach of the four scenarios we select a representative video clip
and runAlgorithm 1 over the videowith different 119904thr and119889thrResulted 119871 and119870 are illustrated in Figure 4 Extracted framesfrom video clips with labeled patches are listed in Figure 8
Due to different motion patterns of the subjects 119871 mayvary among video clips of different scenarios However fromFigure 4 we can conclude that 119871 drops dramatically when119904thr gt 08 in near field scenario and 119904thr gt 02 in far fieldscenario with 119889thr ge 01This ismainly caused by backgroundsubtraction noises Histogram similarity of patches of thesame subject from two consecutive frames is about 08 innear field in this situation In far field scenarios with relativelysmaller foreground patches the negative impacts becomemore severe and threshold similarity degrades to 02 Patchesof the same subject are associated with different descriptorswhen histogram similarity is beyond these thresholds When119889thr lt 01 the worst case occurs We need to create newdescriptors for patches in every frame as horizontal distancebetween patches of the same subject from two consecutiveframes is mostly beyond this limit As 119889thr increases 119871increases and converges at 119889thr = 5
6 Journal of Sensors
1 2 3 4 5 6 7 8 9 10 11
Stan
dard
dev
iatio
n
Feature sample
p
AB
Figure 5 Standard deviation of accelerations of patch119901 and subjectsA and B
In near field scenarios Algorithm 1 achieves 100 percentaccuracy with whatever 119904thr and 119889thr while in far fieldscenarios it does not perform so perfectly when 119889thr ge 01
and 119904thr le 02 In the experiments we found that thishappened mostly in situations when subjects were close andthe patch of one subject lost in the following frame
To balance 119871 and 119870 we set 119889thr = 5 119904thr = 02run Algorithm 1 over the sixteen video clips and collectmotion measurements for person identification in the fol-lowing experiments Statistics of the obtained descriptors areillustrated in Figure 7
42 Person Identification When motion measurements col-lection from video finished we obtain a set of patch descrip-tors and each descriptor associates with a time series ofacceleration data of a potential subject Some descriptorswithin the set come with short series of motion data usuallyless than ten frames This is possibly caused by subjectscrossing each other fake foreground from flashing lights fastturning of human body moving objects at the edge of cameraFOV and so forth These insufficient and noisy data fail toreflect actual motion pattern of potential subjects and arefiltered out in the first place As shown in Figure 7 there arecomparatively more noisy descriptors in far field scenariosespecially in outdoor far field scenarios where nearly 50percent of descriptors are ruled out in each video
Then we calculate a sequence of motion features for eachdescriptor and compare the feature sequence with each ofthose obtained from smart phones in the same period oftime Sliding window in motion feature calculation is closelyrelated to subjects and application scenarios It should belarge enough to capture the distinctive pattern of subjectmovement but not too large to confuse different ones Inour experiments we set window size to 1 second empiricallyMotion features from an example patch descriptor and thosefrom the two smart phones in the same period are shownin Figures 5 and 6 where we could conclude that patch 119901
represents subject B during its lifetimeThe total number of accurately identified patch descrip-
tors in each video is listed in Figure 7 The proposedmethod achieves comparatively better performance in nearfield environment where we can capture more accurate androbust motion measurements of human bodyThe worst case
1 2 3 4 5 6 7 8 9 10 11Feature sample
Ener
gy
ABp
Figure 6 Energy of accelerations of patch 119901 and subjects A and B
2 4 6 8 10 12 14 16Video
3731
28
38
3026
40
3030
383230
242020
241816
201616
221615
80
5245
82
5046
80
5348
84
5047
40
21
12
40
20
10
41
20
12
40
18
8
Total descriptorsFiltered descriptorsAccurately tagged descriptors
Des
crip
tor c
ount
Figure 7 Obtained descriptors with optimized parameters filtereddescriptors and accurately identified descriptors where 1ndash4 repre-sent the outdoor near field 5ndash8 represent the indoor near field 9ndash12represent the outdoor far field and 13ndash16 represent the indoor farfield
happens in outdoor far field scenario In this case there areless optical flows within each patch and less frames asso-ciated with each descriptor We save the mapping betweenpatch descriptors and their estimated identity and rerunAlgorithm 1with the same parameter configuration as beforeThe obtained patch identity is labeled in the video right afterpatch ID As illustrated in Figure 8 the proposed methodcould maintain comparatively acceptable performance evenunder adverse situations
5 Conclusions
In this paper we propose a novel method for automaticperson identification The method innovatively leveragescorrelation of body motion features from two different sens-ing sources that is accelerometer and camera Experimentresults demonstrate the performance and accuracy of theproposed method However the proposed method is limitedin the following aspects First users have to register andcarry their smart phones in order to be discernable in cameraFOVs Second we assume that phones stay relatively still withhuman body during the experiments but in practice peopletend to take out and check their phones from time to timeAcceleration data collected during these occasions woulddamage the identification accuracy Besides the method
Journal of Sensors 7
(a) (b) (c) (d)
(e) (f) (g) (h)
(i) (j) (k) (l)
(m) (n) (o) (p)
Figure 8 Screenshots of identification results where (a)ndash(d) represent the outdoor near field (e)ndash(h) represent the indoor near field (i)ndash(l)represent the outdoor far field and (m)ndash(p) represent the indoor far field
relies heavily on background subtraction in the process ofpatch trackingThus amore practical and reliable strategy formotion data collection is needed Third subjects in archivedvideo clips without available contextual motion informationcannot be identified using the proposed method Thereforethis method only works at the time of video capture In thefuture we plan to overcome the aforementioned constraintsand extend the application of the proposedmethod intomorecomplex environments
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
Acknowledgment
This work was supported in part by the National NaturalScience Foundation of China (Grant no 61202436 Grant no61271041 and Grant no 61300179)
References
[1] P Vadakkepat P Lim L C de Silva L Jing and L L LingldquoMultimodal approach to human-face detection and trackingrdquoIEEE Transactions on Industrial Electronics vol 55 no 3 pp1385ndash1393 2008
[2] C Zhang and Z Zhang ldquoA survey of recent advances in facedetectionrdquo Tech Rep Microsoft Research 2010
[3] C Huang H Ai Y Li and S Lao ldquoHigh-performance rotationinvariant multiview face detectionrdquo IEEE Transactions on Pat-tern Analysis and Machine Intelligence vol 29 no 4 pp 671ndash686 2007
8 Journal of Sensors
[4] JWright A Y Yang A Ganesh S S Sastry and YMa ldquoRobustface recognition via sparse representationrdquo IEEE Transactionson Pattern Analysis and Machine Intelligence vol 31 no 2 pp210ndash227 2009
[5] T Ahonen A Hadid and M Pietikainen ldquoFace descriptionwith local binary patterns application to face recognitionrdquo IEEETransactions on Pattern Analysis and Machine Intelligence vol28 no 12 pp 2037ndash2041 2006
[6] G Shakhnarovich and B Moghaddam ldquoFace recognitionin subspacesrdquo in Handbook of Face Recognition pp 19ndash49Springer 2011
[7] I Naseem R Togneri and M Bennamoun ldquoLinear regressionfor face recognitionrdquo IEEE Transactions on Pattern Analysis andMachine Intelligence vol 32 no 11 pp 2106ndash2112 2010
[8] L Zhang D V Kalashnikov S Mehrotra and R VaisenbergldquoContext-based person identification framework for smartvideo surveillancerdquoMachine Vision and Applications 2013
[9] X Geng K Smith-Miles L Wang M Li and QWu ldquoContext-aware fusion a case study on fusion of gait and face for humanidentification in videordquo Pattern Recognition vol 43 no 10 pp3660ndash3673 2010
[10] N OrsquoHare and A F Smeaton ldquoContext-aware person identi-fication in personal photo collectionsrdquo IEEE Transactions onMultimedia vol 11 no 2 pp 220ndash228 2009
[11] Z Stone T Zickler and T Darrell ldquoAutotagging FacebookSocial network context improves photo annotationrdquo in Pro-ceedings of the IEEE Computer Society Conference on ComputerVision and Pattern Recognition Workshops (CVPR rsquo08) pp 1ndash8June 2008
[12] D Anguelov K-C Lee S B Gokturk and B SumengenldquoContextual identity recognition in personal photo albumsrdquoin Proceedings of the IEEE Computer Society Conference onComputer Vision and Pattern Recognition (CVPR rsquo07) pp 1ndash7June 2007
[13] M Naaman R B Yeh H Garcia-Molina and A PaepckeldquoLeveraging context to resolve identity in photo albumsrdquo inProceedings of the 5th ACMIEEE Joint Conference on DigitalLibrariesmdashDigital Libraries Cyberinfrastructure for Researchand Education pp 178ndash187 June 2005
[14] M Zhao Y W Teo S Liu T S Chua and R Jain ldquoAutomaticperson annotation of family photo albumrdquo in Image and VideoRetrieval pp 163ndash172 Springer 2006
[15] M Piccardi ldquoBackground subtraction techniques a reviewrdquo inProceedings of the IEEE International Conference on SystemsMan and Cybernetics (SMC rsquo04) vol 4 pp 3099ndash3104 October2004
[16] Z Zivkovic ldquoImproved adaptive Gaussian mixture model forbackground subtractionrdquo inProceedings of the 17th InternationalConference on Pattern Recognition (ICPR rsquo04) vol 2 pp 28ndash31IEEE August 2004
[17] P KaewTraKulPong and R Bowden ldquoAn improved adaptivebackground mixture model for real-time tracking with shadowdetectionrdquo in Video-Based Surveillance Systems pp 135ndash144Springer 2002
[18] G Farneback ldquoTwo-framemotion estimation based on polyno-mial expansionrdquo in Image Analysis pp 363ndash370 Springer 2003
[19] J R Kwapisz G M Weiss and S A Moore ldquoActivityrecognition using cell phone accelerometersrdquo ACM SIGKDDExplorations Newsletter vol 12 no 2 pp 74ndash82 2011
[20] S Dernbach B Das N C Krishnan B L Thomas and D JCook ldquoSimple and complex activity recognition through smart
phonesrdquo in Proceedings of the 8th International Conference onIntelligent Environments (IE rsquo12) pp 214ndash221 IEEE June 2012
[21] N Ravi N Dandekar P Mysore and M L Littman ldquoActivityrecognition from accelerometer datardquo in Proceedings of the20th National Conference on Artificial Intelligence and the17th Innovative Applications of Artificial Intelligence Conference(AAAIIAAI rsquo05) pp 1541ndash1546 July 2005
International Journal of
AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Active and Passive Electronic Components
Control Scienceand Engineering
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
RotatingMachinery
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation httpwwwhindawicom
Journal ofEngineeringVolume 2014
Submit your manuscripts athttpwwwhindawicom
VLSI Design
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Shock and Vibration
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Civil EngineeringAdvances in
Acoustics and VibrationAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Advances inOptoElectronics
Hindawi Publishing Corporation httpwwwhindawicom
Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
SensorsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Chemical EngineeringInternational Journal of Antennas and
Propagation
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Navigation and Observation
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
DistributedSensor Networks
International Journal of
6 Journal of Sensors
1 2 3 4 5 6 7 8 9 10 11
Stan
dard
dev
iatio
n
Feature sample
p
AB
Figure 5 Standard deviation of accelerations of patch119901 and subjectsA and B
In near field scenarios Algorithm 1 achieves 100 percentaccuracy with whatever 119904thr and 119889thr while in far fieldscenarios it does not perform so perfectly when 119889thr ge 01
and 119904thr le 02 In the experiments we found that thishappened mostly in situations when subjects were close andthe patch of one subject lost in the following frame
To balance 119871 and 119870 we set 119889thr = 5 119904thr = 02run Algorithm 1 over the sixteen video clips and collectmotion measurements for person identification in the fol-lowing experiments Statistics of the obtained descriptors areillustrated in Figure 7
42 Person Identification When motion measurements col-lection from video finished we obtain a set of patch descrip-tors and each descriptor associates with a time series ofacceleration data of a potential subject Some descriptorswithin the set come with short series of motion data usuallyless than ten frames This is possibly caused by subjectscrossing each other fake foreground from flashing lights fastturning of human body moving objects at the edge of cameraFOV and so forth These insufficient and noisy data fail toreflect actual motion pattern of potential subjects and arefiltered out in the first place As shown in Figure 7 there arecomparatively more noisy descriptors in far field scenariosespecially in outdoor far field scenarios where nearly 50percent of descriptors are ruled out in each video
Then we calculate a sequence of motion features for eachdescriptor and compare the feature sequence with each ofthose obtained from smart phones in the same period oftime Sliding window in motion feature calculation is closelyrelated to subjects and application scenarios It should belarge enough to capture the distinctive pattern of subjectmovement but not too large to confuse different ones Inour experiments we set window size to 1 second empiricallyMotion features from an example patch descriptor and thosefrom the two smart phones in the same period are shownin Figures 5 and 6 where we could conclude that patch 119901
represents subject B during its lifetimeThe total number of accurately identified patch descrip-
tors in each video is listed in Figure 7 The proposedmethod achieves comparatively better performance in nearfield environment where we can capture more accurate androbust motion measurements of human bodyThe worst case
1 2 3 4 5 6 7 8 9 10 11Feature sample
Ener
gy
ABp
Figure 6 Energy of accelerations of patch 119901 and subjects A and B
2 4 6 8 10 12 14 16Video
3731
28
38
3026
40
3030
383230
242020
241816
201616
221615
80
5245
82
5046
80
5348
84
5047
40
21
12
40
20
10
41
20
12
40
18
8
Total descriptorsFiltered descriptorsAccurately tagged descriptors
Des
crip
tor c
ount
Figure 7 Obtained descriptors with optimized parameters filtereddescriptors and accurately identified descriptors where 1ndash4 repre-sent the outdoor near field 5ndash8 represent the indoor near field 9ndash12represent the outdoor far field and 13ndash16 represent the indoor farfield
happens in outdoor far field scenario In this case there areless optical flows within each patch and less frames asso-ciated with each descriptor We save the mapping betweenpatch descriptors and their estimated identity and rerunAlgorithm 1with the same parameter configuration as beforeThe obtained patch identity is labeled in the video right afterpatch ID As illustrated in Figure 8 the proposed methodcould maintain comparatively acceptable performance evenunder adverse situations
5 Conclusions
In this paper we propose a novel method for automaticperson identification The method innovatively leveragescorrelation of body motion features from two different sens-ing sources that is accelerometer and camera Experimentresults demonstrate the performance and accuracy of theproposed method However the proposed method is limitedin the following aspects First users have to register andcarry their smart phones in order to be discernable in cameraFOVs Second we assume that phones stay relatively still withhuman body during the experiments but in practice peopletend to take out and check their phones from time to timeAcceleration data collected during these occasions woulddamage the identification accuracy Besides the method
Journal of Sensors 7
(a) (b) (c) (d)
(e) (f) (g) (h)
(i) (j) (k) (l)
(m) (n) (o) (p)
Figure 8 Screenshots of identification results where (a)ndash(d) represent the outdoor near field (e)ndash(h) represent the indoor near field (i)ndash(l)represent the outdoor far field and (m)ndash(p) represent the indoor far field
relies heavily on background subtraction in the process ofpatch trackingThus amore practical and reliable strategy formotion data collection is needed Third subjects in archivedvideo clips without available contextual motion informationcannot be identified using the proposed method Thereforethis method only works at the time of video capture In thefuture we plan to overcome the aforementioned constraintsand extend the application of the proposedmethod intomorecomplex environments
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
Acknowledgment
This work was supported in part by the National NaturalScience Foundation of China (Grant no 61202436 Grant no61271041 and Grant no 61300179)
References
[1] P Vadakkepat P Lim L C de Silva L Jing and L L LingldquoMultimodal approach to human-face detection and trackingrdquoIEEE Transactions on Industrial Electronics vol 55 no 3 pp1385ndash1393 2008
[2] C Zhang and Z Zhang ldquoA survey of recent advances in facedetectionrdquo Tech Rep Microsoft Research 2010
[3] C Huang H Ai Y Li and S Lao ldquoHigh-performance rotationinvariant multiview face detectionrdquo IEEE Transactions on Pat-tern Analysis and Machine Intelligence vol 29 no 4 pp 671ndash686 2007
8 Journal of Sensors
[4] JWright A Y Yang A Ganesh S S Sastry and YMa ldquoRobustface recognition via sparse representationrdquo IEEE Transactionson Pattern Analysis and Machine Intelligence vol 31 no 2 pp210ndash227 2009
[5] T Ahonen A Hadid and M Pietikainen ldquoFace descriptionwith local binary patterns application to face recognitionrdquo IEEETransactions on Pattern Analysis and Machine Intelligence vol28 no 12 pp 2037ndash2041 2006
[6] G Shakhnarovich and B Moghaddam ldquoFace recognitionin subspacesrdquo in Handbook of Face Recognition pp 19ndash49Springer 2011
[7] I Naseem R Togneri and M Bennamoun ldquoLinear regressionfor face recognitionrdquo IEEE Transactions on Pattern Analysis andMachine Intelligence vol 32 no 11 pp 2106ndash2112 2010
[8] L Zhang D V Kalashnikov S Mehrotra and R VaisenbergldquoContext-based person identification framework for smartvideo surveillancerdquoMachine Vision and Applications 2013
[9] X Geng K Smith-Miles L Wang M Li and QWu ldquoContext-aware fusion a case study on fusion of gait and face for humanidentification in videordquo Pattern Recognition vol 43 no 10 pp3660ndash3673 2010
[10] N OrsquoHare and A F Smeaton ldquoContext-aware person identi-fication in personal photo collectionsrdquo IEEE Transactions onMultimedia vol 11 no 2 pp 220ndash228 2009
[11] Z Stone T Zickler and T Darrell ldquoAutotagging FacebookSocial network context improves photo annotationrdquo in Pro-ceedings of the IEEE Computer Society Conference on ComputerVision and Pattern Recognition Workshops (CVPR rsquo08) pp 1ndash8June 2008
[12] D Anguelov K-C Lee S B Gokturk and B SumengenldquoContextual identity recognition in personal photo albumsrdquoin Proceedings of the IEEE Computer Society Conference onComputer Vision and Pattern Recognition (CVPR rsquo07) pp 1ndash7June 2007
[13] M Naaman R B Yeh H Garcia-Molina and A PaepckeldquoLeveraging context to resolve identity in photo albumsrdquo inProceedings of the 5th ACMIEEE Joint Conference on DigitalLibrariesmdashDigital Libraries Cyberinfrastructure for Researchand Education pp 178ndash187 June 2005
[14] M Zhao Y W Teo S Liu T S Chua and R Jain ldquoAutomaticperson annotation of family photo albumrdquo in Image and VideoRetrieval pp 163ndash172 Springer 2006
[15] M Piccardi ldquoBackground subtraction techniques a reviewrdquo inProceedings of the IEEE International Conference on SystemsMan and Cybernetics (SMC rsquo04) vol 4 pp 3099ndash3104 October2004
[16] Z Zivkovic ldquoImproved adaptive Gaussian mixture model forbackground subtractionrdquo inProceedings of the 17th InternationalConference on Pattern Recognition (ICPR rsquo04) vol 2 pp 28ndash31IEEE August 2004
[17] P KaewTraKulPong and R Bowden ldquoAn improved adaptivebackground mixture model for real-time tracking with shadowdetectionrdquo in Video-Based Surveillance Systems pp 135ndash144Springer 2002
[18] G Farneback ldquoTwo-framemotion estimation based on polyno-mial expansionrdquo in Image Analysis pp 363ndash370 Springer 2003
[19] J R Kwapisz G M Weiss and S A Moore ldquoActivityrecognition using cell phone accelerometersrdquo ACM SIGKDDExplorations Newsletter vol 12 no 2 pp 74ndash82 2011
[20] S Dernbach B Das N C Krishnan B L Thomas and D JCook ldquoSimple and complex activity recognition through smart
phonesrdquo in Proceedings of the 8th International Conference onIntelligent Environments (IE rsquo12) pp 214ndash221 IEEE June 2012
[21] N Ravi N Dandekar P Mysore and M L Littman ldquoActivityrecognition from accelerometer datardquo in Proceedings of the20th National Conference on Artificial Intelligence and the17th Innovative Applications of Artificial Intelligence Conference(AAAIIAAI rsquo05) pp 1541ndash1546 July 2005
International Journal of
AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Active and Passive Electronic Components
Control Scienceand Engineering
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
RotatingMachinery
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation httpwwwhindawicom
Journal ofEngineeringVolume 2014
Submit your manuscripts athttpwwwhindawicom
VLSI Design
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Shock and Vibration
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Civil EngineeringAdvances in
Acoustics and VibrationAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Advances inOptoElectronics
Hindawi Publishing Corporation httpwwwhindawicom
Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
SensorsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Chemical EngineeringInternational Journal of Antennas and
Propagation
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Navigation and Observation
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
DistributedSensor Networks
International Journal of
Journal of Sensors 7
(a) (b) (c) (d)
(e) (f) (g) (h)
(i) (j) (k) (l)
(m) (n) (o) (p)
Figure 8 Screenshots of identification results where (a)ndash(d) represent the outdoor near field (e)ndash(h) represent the indoor near field (i)ndash(l)represent the outdoor far field and (m)ndash(p) represent the indoor far field
relies heavily on background subtraction in the process ofpatch trackingThus amore practical and reliable strategy formotion data collection is needed Third subjects in archivedvideo clips without available contextual motion informationcannot be identified using the proposed method Thereforethis method only works at the time of video capture In thefuture we plan to overcome the aforementioned constraintsand extend the application of the proposedmethod intomorecomplex environments
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
Acknowledgment
This work was supported in part by the National NaturalScience Foundation of China (Grant no 61202436 Grant no61271041 and Grant no 61300179)
References
[1] P Vadakkepat P Lim L C de Silva L Jing and L L LingldquoMultimodal approach to human-face detection and trackingrdquoIEEE Transactions on Industrial Electronics vol 55 no 3 pp1385ndash1393 2008
[2] C Zhang and Z Zhang ldquoA survey of recent advances in facedetectionrdquo Tech Rep Microsoft Research 2010
[3] C Huang H Ai Y Li and S Lao ldquoHigh-performance rotationinvariant multiview face detectionrdquo IEEE Transactions on Pat-tern Analysis and Machine Intelligence vol 29 no 4 pp 671ndash686 2007
8 Journal of Sensors
[4] JWright A Y Yang A Ganesh S S Sastry and YMa ldquoRobustface recognition via sparse representationrdquo IEEE Transactionson Pattern Analysis and Machine Intelligence vol 31 no 2 pp210ndash227 2009
[5] T Ahonen A Hadid and M Pietikainen ldquoFace descriptionwith local binary patterns application to face recognitionrdquo IEEETransactions on Pattern Analysis and Machine Intelligence vol28 no 12 pp 2037ndash2041 2006
[6] G Shakhnarovich and B Moghaddam ldquoFace recognitionin subspacesrdquo in Handbook of Face Recognition pp 19ndash49Springer 2011
[7] I Naseem R Togneri and M Bennamoun ldquoLinear regressionfor face recognitionrdquo IEEE Transactions on Pattern Analysis andMachine Intelligence vol 32 no 11 pp 2106ndash2112 2010
[8] L Zhang D V Kalashnikov S Mehrotra and R VaisenbergldquoContext-based person identification framework for smartvideo surveillancerdquoMachine Vision and Applications 2013
[9] X Geng K Smith-Miles L Wang M Li and QWu ldquoContext-aware fusion a case study on fusion of gait and face for humanidentification in videordquo Pattern Recognition vol 43 no 10 pp3660ndash3673 2010
[10] N OrsquoHare and A F Smeaton ldquoContext-aware person identi-fication in personal photo collectionsrdquo IEEE Transactions onMultimedia vol 11 no 2 pp 220ndash228 2009
[11] Z Stone T Zickler and T Darrell ldquoAutotagging FacebookSocial network context improves photo annotationrdquo in Pro-ceedings of the IEEE Computer Society Conference on ComputerVision and Pattern Recognition Workshops (CVPR rsquo08) pp 1ndash8June 2008
[12] D Anguelov K-C Lee S B Gokturk and B SumengenldquoContextual identity recognition in personal photo albumsrdquoin Proceedings of the IEEE Computer Society Conference onComputer Vision and Pattern Recognition (CVPR rsquo07) pp 1ndash7June 2007
[13] M Naaman R B Yeh H Garcia-Molina and A PaepckeldquoLeveraging context to resolve identity in photo albumsrdquo inProceedings of the 5th ACMIEEE Joint Conference on DigitalLibrariesmdashDigital Libraries Cyberinfrastructure for Researchand Education pp 178ndash187 June 2005
[14] M Zhao Y W Teo S Liu T S Chua and R Jain ldquoAutomaticperson annotation of family photo albumrdquo in Image and VideoRetrieval pp 163ndash172 Springer 2006
[15] M Piccardi ldquoBackground subtraction techniques a reviewrdquo inProceedings of the IEEE International Conference on SystemsMan and Cybernetics (SMC rsquo04) vol 4 pp 3099ndash3104 October2004
[16] Z Zivkovic ldquoImproved adaptive Gaussian mixture model forbackground subtractionrdquo inProceedings of the 17th InternationalConference on Pattern Recognition (ICPR rsquo04) vol 2 pp 28ndash31IEEE August 2004
[17] P KaewTraKulPong and R Bowden ldquoAn improved adaptivebackground mixture model for real-time tracking with shadowdetectionrdquo in Video-Based Surveillance Systems pp 135ndash144Springer 2002
[18] G Farneback ldquoTwo-framemotion estimation based on polyno-mial expansionrdquo in Image Analysis pp 363ndash370 Springer 2003
[19] J R Kwapisz G M Weiss and S A Moore ldquoActivityrecognition using cell phone accelerometersrdquo ACM SIGKDDExplorations Newsletter vol 12 no 2 pp 74ndash82 2011
[20] S Dernbach B Das N C Krishnan B L Thomas and D JCook ldquoSimple and complex activity recognition through smart
phonesrdquo in Proceedings of the 8th International Conference onIntelligent Environments (IE rsquo12) pp 214ndash221 IEEE June 2012
[21] N Ravi N Dandekar P Mysore and M L Littman ldquoActivityrecognition from accelerometer datardquo in Proceedings of the20th National Conference on Artificial Intelligence and the17th Innovative Applications of Artificial Intelligence Conference(AAAIIAAI rsquo05) pp 1541ndash1546 July 2005
International Journal of
AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Active and Passive Electronic Components
Control Scienceand Engineering
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
RotatingMachinery
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation httpwwwhindawicom
Journal ofEngineeringVolume 2014
Submit your manuscripts athttpwwwhindawicom
VLSI Design
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Shock and Vibration
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Civil EngineeringAdvances in
Acoustics and VibrationAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Advances inOptoElectronics
Hindawi Publishing Corporation httpwwwhindawicom
Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
SensorsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Chemical EngineeringInternational Journal of Antennas and
Propagation
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Navigation and Observation
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
DistributedSensor Networks
International Journal of
8 Journal of Sensors
[4] JWright A Y Yang A Ganesh S S Sastry and YMa ldquoRobustface recognition via sparse representationrdquo IEEE Transactionson Pattern Analysis and Machine Intelligence vol 31 no 2 pp210ndash227 2009
[5] T Ahonen A Hadid and M Pietikainen ldquoFace descriptionwith local binary patterns application to face recognitionrdquo IEEETransactions on Pattern Analysis and Machine Intelligence vol28 no 12 pp 2037ndash2041 2006
[6] G Shakhnarovich and B Moghaddam ldquoFace recognitionin subspacesrdquo in Handbook of Face Recognition pp 19ndash49Springer 2011
[7] I Naseem R Togneri and M Bennamoun ldquoLinear regressionfor face recognitionrdquo IEEE Transactions on Pattern Analysis andMachine Intelligence vol 32 no 11 pp 2106ndash2112 2010
[8] L Zhang D V Kalashnikov S Mehrotra and R VaisenbergldquoContext-based person identification framework for smartvideo surveillancerdquoMachine Vision and Applications 2013
[9] X Geng K Smith-Miles L Wang M Li and QWu ldquoContext-aware fusion a case study on fusion of gait and face for humanidentification in videordquo Pattern Recognition vol 43 no 10 pp3660ndash3673 2010
[10] N OrsquoHare and A F Smeaton ldquoContext-aware person identi-fication in personal photo collectionsrdquo IEEE Transactions onMultimedia vol 11 no 2 pp 220ndash228 2009
[11] Z Stone T Zickler and T Darrell ldquoAutotagging FacebookSocial network context improves photo annotationrdquo in Pro-ceedings of the IEEE Computer Society Conference on ComputerVision and Pattern Recognition Workshops (CVPR rsquo08) pp 1ndash8June 2008
[12] D Anguelov K-C Lee S B Gokturk and B SumengenldquoContextual identity recognition in personal photo albumsrdquoin Proceedings of the IEEE Computer Society Conference onComputer Vision and Pattern Recognition (CVPR rsquo07) pp 1ndash7June 2007
[13] M Naaman R B Yeh H Garcia-Molina and A PaepckeldquoLeveraging context to resolve identity in photo albumsrdquo inProceedings of the 5th ACMIEEE Joint Conference on DigitalLibrariesmdashDigital Libraries Cyberinfrastructure for Researchand Education pp 178ndash187 June 2005
[14] M Zhao Y W Teo S Liu T S Chua and R Jain ldquoAutomaticperson annotation of family photo albumrdquo in Image and VideoRetrieval pp 163ndash172 Springer 2006
[15] M Piccardi ldquoBackground subtraction techniques a reviewrdquo inProceedings of the IEEE International Conference on SystemsMan and Cybernetics (SMC rsquo04) vol 4 pp 3099ndash3104 October2004
[16] Z Zivkovic ldquoImproved adaptive Gaussian mixture model forbackground subtractionrdquo inProceedings of the 17th InternationalConference on Pattern Recognition (ICPR rsquo04) vol 2 pp 28ndash31IEEE August 2004
[17] P KaewTraKulPong and R Bowden ldquoAn improved adaptivebackground mixture model for real-time tracking with shadowdetectionrdquo in Video-Based Surveillance Systems pp 135ndash144Springer 2002
[18] G Farneback ldquoTwo-framemotion estimation based on polyno-mial expansionrdquo in Image Analysis pp 363ndash370 Springer 2003
[19] J R Kwapisz G M Weiss and S A Moore ldquoActivityrecognition using cell phone accelerometersrdquo ACM SIGKDDExplorations Newsletter vol 12 no 2 pp 74ndash82 2011
[20] S Dernbach B Das N C Krishnan B L Thomas and D JCook ldquoSimple and complex activity recognition through smart
phonesrdquo in Proceedings of the 8th International Conference onIntelligent Environments (IE rsquo12) pp 214ndash221 IEEE June 2012
[21] N Ravi N Dandekar P Mysore and M L Littman ldquoActivityrecognition from accelerometer datardquo in Proceedings of the20th National Conference on Artificial Intelligence and the17th Innovative Applications of Artificial Intelligence Conference(AAAIIAAI rsquo05) pp 1541ndash1546 July 2005
International Journal of
AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Active and Passive Electronic Components
Control Scienceand Engineering
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
RotatingMachinery
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation httpwwwhindawicom
Journal ofEngineeringVolume 2014
Submit your manuscripts athttpwwwhindawicom
VLSI Design
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Shock and Vibration
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Civil EngineeringAdvances in
Acoustics and VibrationAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Advances inOptoElectronics
Hindawi Publishing Corporation httpwwwhindawicom
Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
SensorsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Chemical EngineeringInternational Journal of Antennas and
Propagation
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Navigation and Observation
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
DistributedSensor Networks
International Journal of
International Journal of
AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Active and Passive Electronic Components
Control Scienceand Engineering
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
RotatingMachinery
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation httpwwwhindawicom
Journal ofEngineeringVolume 2014
Submit your manuscripts athttpwwwhindawicom
VLSI Design
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Shock and Vibration
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Civil EngineeringAdvances in
Acoustics and VibrationAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Advances inOptoElectronics
Hindawi Publishing Corporation httpwwwhindawicom
Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
SensorsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Chemical EngineeringInternational Journal of Antennas and
Propagation
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Navigation and Observation
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
DistributedSensor Networks
International Journal of