tri-modal human body segmentation - ubsergio/linked/cristinamaster.pdf · tri-modal human body...
TRANSCRIPT
![Page 1: Tri-modal Human Body Segmentation - UBsergio/linked/cristinamaster.pdf · Tri-modal Human Body Segmentation ... Human body articulated nature. ... M= 26 body parts. L scales per image](https://reader031.vdocuments.mx/reader031/viewer/2022022509/5ad3895c7f8b9afa798e0a98/html5/thumbnails/1.jpg)
Tri-modal Human Body SegmentationMaster of Science Thesis
Cristina Palmero Cantarino
Advisor: Sergio Escalera Guerrero
February 6, 2014
![Page 2: Tri-modal Human Body Segmentation - UBsergio/linked/cristinamaster.pdf · Tri-modal Human Body Segmentation ... Human body articulated nature. ... M= 26 body parts. L scales per image](https://reader031.vdocuments.mx/reader031/viewer/2022022509/5ad3895c7f8b9afa798e0a98/html5/thumbnails/2.jpg)
Introduction Tri-modal dataset Proposed baseline Evaluation Conclusions and future work
Outline
1 Introduction
2 Tri-modal dataset
3 Proposed baseline
4 Evaluation
5 Conclusions and future work
2/47
![Page 3: Tri-modal Human Body Segmentation - UBsergio/linked/cristinamaster.pdf · Tri-modal Human Body Segmentation ... Human body articulated nature. ... M= 26 body parts. L scales per image](https://reader031.vdocuments.mx/reader031/viewer/2022022509/5ad3895c7f8b9afa798e0a98/html5/thumbnails/3.jpg)
Introduction Tri-modal dataset Proposed baseline Evaluation Conclusions and future work
Outline
1 IntroductionHuman body segmentationMotivationProposal
2 Tri-modal dataset
3 Proposed baseline
4 Evaluation
5 Conclusions and future work
3/47
![Page 4: Tri-modal Human Body Segmentation - UBsergio/linked/cristinamaster.pdf · Tri-modal Human Body Segmentation ... Human body articulated nature. ... M= 26 body parts. L scales per image](https://reader031.vdocuments.mx/reader031/viewer/2022022509/5ad3895c7f8b9afa798e0a98/html5/thumbnails/4.jpg)
Introduction Tri-modal dataset Proposed baseline Evaluation Conclusions and future work
Human body segmentation
Segmentation: labeling problem.
Main challenges:
Different points of view.
Illumination changes.
Complex and clutteredbackgrounds.
Presence of occlusions.
Human body articulatednature.
Diversity of poses.
Variable appearance.Segmentation using Grabcut
Carsten Rother, Vladimir Kolmogorov, and Andrew Blake. “Grabcut:Interactive foreground extraction using iterated graph cuts”. In: ACMTransactions on Graphics (TOG). Vol. 23. 3. ACM. 2004, pp. 309–314.4/47
![Page 5: Tri-modal Human Body Segmentation - UBsergio/linked/cristinamaster.pdf · Tri-modal Human Body Segmentation ... Human body articulated nature. ... M= 26 body parts. L scales per image](https://reader031.vdocuments.mx/reader031/viewer/2022022509/5ad3895c7f8b9afa798e0a98/html5/thumbnails/5.jpg)
Introduction Tri-modal dataset Proposed baseline Evaluation Conclusions and future work
Motivation
Applications:
Security.
Leisure.
Health.
Imaging modalities:
Mostly RGB cues from color cameras.
Recently, RGB-Depth cues (Microsoft R©KinectTM).
Little attention to thermal.
Thermal Imaging:
Price of thermal sensors is lowering substantially every year.
Less intrinsic problems than RGB cues.
Lack of benchmarks comparing RGB-Depth-Thermalmodalities.
5/47
![Page 6: Tri-modal Human Body Segmentation - UBsergio/linked/cristinamaster.pdf · Tri-modal Human Body Segmentation ... Human body articulated nature. ... M= 26 body parts. L scales per image](https://reader031.vdocuments.mx/reader031/viewer/2022022509/5ad3895c7f8b9afa798e0a98/html5/thumbnails/6.jpg)
Introduction Tri-modal dataset Proposed baseline Evaluation Conclusions and future work
Motivation
Applications:
Security.
Leisure.
Health.
Imaging modalities:
Mostly RGB cues from color cameras.
Recently, RGB-Depth cues (Microsoft R©KinectTM).
Little attention to thermal.
Thermal Imaging:
Price of thermal sensors is lowering substantially every year.
Less intrinsic problems than RGB cues.
Lack of benchmarks comparing RGB-Depth-Thermalmodalities.
5/47
![Page 7: Tri-modal Human Body Segmentation - UBsergio/linked/cristinamaster.pdf · Tri-modal Human Body Segmentation ... Human body articulated nature. ... M= 26 body parts. L scales per image](https://reader031.vdocuments.mx/reader031/viewer/2022022509/5ad3895c7f8b9afa798e0a98/html5/thumbnails/7.jpg)
Introduction Tri-modal dataset Proposed baseline Evaluation Conclusions and future work
Motivation
Applications:
Security.
Leisure.
Health.
Imaging modalities:
Mostly RGB cues from color cameras.
Recently, RGB-Depth cues (Microsoft R©KinectTM).
Little attention to thermal.
Thermal Imaging:
Price of thermal sensors is lowering substantially every year.
Less intrinsic problems than RGB cues.
Lack of benchmarks comparing RGB-Depth-Thermalmodalities.
5/47
![Page 8: Tri-modal Human Body Segmentation - UBsergio/linked/cristinamaster.pdf · Tri-modal Human Body Segmentation ... Human body articulated nature. ... M= 26 body parts. L scales per image](https://reader031.vdocuments.mx/reader031/viewer/2022022509/5ad3895c7f8b9afa798e0a98/html5/thumbnails/8.jpg)
Introduction Tri-modal dataset Proposed baseline Evaluation Conclusions and future work
Proposal
Tri-modal dataset
Novel tri-modal dataset of continuous image sequences.
RGB-Depth-Thermal modalities.
People interacting with everyday objects.
Baseline methodology
Automatic segmentation of people in video sequences inindoor scenarios with a fixed camera.
Usage of state-of-the-art descriptors for feature extractionamong modalities.
GMM modeling of subject/object regions.
Multi-modal fusion using several approaches.
6/47
![Page 9: Tri-modal Human Body Segmentation - UBsergio/linked/cristinamaster.pdf · Tri-modal Human Body Segmentation ... Human body articulated nature. ... M= 26 body parts. L scales per image](https://reader031.vdocuments.mx/reader031/viewer/2022022509/5ad3895c7f8b9afa798e0a98/html5/thumbnails/9.jpg)
Introduction Tri-modal dataset Proposed baseline Evaluation Conclusions and future work
Proposal
Tri-modal dataset
Novel tri-modal dataset of continuous image sequences.
RGB-Depth-Thermal modalities.
People interacting with everyday objects.
Baseline methodology
Automatic segmentation of people in video sequences inindoor scenarios with a fixed camera.
Usage of state-of-the-art descriptors for feature extractionamong modalities.
GMM modeling of subject/object regions.
Multi-modal fusion using several approaches.
6/47
![Page 10: Tri-modal Human Body Segmentation - UBsergio/linked/cristinamaster.pdf · Tri-modal Human Body Segmentation ... Human body articulated nature. ... M= 26 body parts. L scales per image](https://reader031.vdocuments.mx/reader031/viewer/2022022509/5ad3895c7f8b9afa798e0a98/html5/thumbnails/10.jpg)
Introduction Tri-modal dataset Proposed baseline Evaluation Conclusions and future work
Outline
1 Introduction
2 Tri-modal datasetDatasetScenes
3 Proposed baseline
4 Evaluation
5 Conclusions and future work
7/47
![Page 11: Tri-modal Human Body Segmentation - UBsergio/linked/cristinamaster.pdf · Tri-modal Human Body Segmentation ... Human body articulated nature. ... M= 26 body parts. L scales per image](https://reader031.vdocuments.mx/reader031/viewer/2022022509/5ad3895c7f8b9afa798e0a98/html5/thumbnails/11.jpg)
Introduction Tri-modal dataset Proposed baseline Evaluation Conclusions and future work
Tri-modal dataset
Novel registered multi-modal dataset
RGB - Depth - Thermal modalities
3 different scenarios
3 continuous image sequences
More than 2,000 frames per sequence
RGB - Depth pixel-level registration
Thermal near pixel-level registration
Manually annotated ground truth
Registration algorithm provided
8/47
![Page 12: Tri-modal Human Body Segmentation - UBsergio/linked/cristinamaster.pdf · Tri-modal Human Body Segmentation ... Human body articulated nature. ... M= 26 body parts. L scales per image](https://reader031.vdocuments.mx/reader031/viewer/2022022509/5ad3895c7f8b9afa798e0a98/html5/thumbnails/12.jpg)
Introduction Tri-modal dataset Proposed baseline Evaluation Conclusions and future work
Tri-modal datasetScene 1
9/47
![Page 13: Tri-modal Human Body Segmentation - UBsergio/linked/cristinamaster.pdf · Tri-modal Human Body Segmentation ... Human body articulated nature. ... M= 26 body parts. L scales per image](https://reader031.vdocuments.mx/reader031/viewer/2022022509/5ad3895c7f8b9afa798e0a98/html5/thumbnails/13.jpg)
Introduction Tri-modal dataset Proposed baseline Evaluation Conclusions and future work
Tri-modal datasetScene 2
10/47
![Page 14: Tri-modal Human Body Segmentation - UBsergio/linked/cristinamaster.pdf · Tri-modal Human Body Segmentation ... Human body articulated nature. ... M= 26 body parts. L scales per image](https://reader031.vdocuments.mx/reader031/viewer/2022022509/5ad3895c7f8b9afa798e0a98/html5/thumbnails/14.jpg)
Introduction Tri-modal dataset Proposed baseline Evaluation Conclusions and future work
Tri-modal datasetScene 3
11/47
![Page 15: Tri-modal Human Body Segmentation - UBsergio/linked/cristinamaster.pdf · Tri-modal Human Body Segmentation ... Human body articulated nature. ... M= 26 body parts. L scales per image](https://reader031.vdocuments.mx/reader031/viewer/2022022509/5ad3895c7f8b9afa798e0a98/html5/thumbnails/15.jpg)
Introduction Tri-modal dataset Proposed baseline Evaluation Conclusions and future work
Outline
1 Introduction
2 Tri-modal dataset
3 Proposed baselineExtraction of masksExtraction of regions of interestFeature extractionClassifiers overviewCell classificationIndividual PredictionMulti-modal fusion
4 Evaluation
5 Conclusions and future work
12/47
![Page 16: Tri-modal Human Body Segmentation - UBsergio/linked/cristinamaster.pdf · Tri-modal Human Body Segmentation ... Human body articulated nature. ... M= 26 body parts. L scales per image](https://reader031.vdocuments.mx/reader031/viewer/2022022509/5ad3895c7f8b9afa798e0a98/html5/thumbnails/16.jpg)
Introduction Tri-modal dataset Proposed baseline Evaluation Conclusions and future work
Extraction of masks
13/47
![Page 17: Tri-modal Human Body Segmentation - UBsergio/linked/cristinamaster.pdf · Tri-modal Human Body Segmentation ... Human body articulated nature. ... M= 26 body parts. L scales per image](https://reader031.vdocuments.mx/reader031/viewer/2022022509/5ad3895c7f8b9afa798e0a98/html5/thumbnails/17.jpg)
Introduction Tri-modal dataset Proposed baseline Evaluation Conclusions and future work
Extraction of masksBackground Subtraction
Limit the search space.
Non-adaptive background modeling using Mixture ofGaussians.
Select modality:
14/47
![Page 18: Tri-modal Human Body Segmentation - UBsergio/linked/cristinamaster.pdf · Tri-modal Human Body Segmentation ... Human body articulated nature. ... M= 26 body parts. L scales per image](https://reader031.vdocuments.mx/reader031/viewer/2022022509/5ad3895c7f8b9afa798e0a98/html5/thumbnails/18.jpg)
Introduction Tri-modal dataset Proposed baseline Evaluation Conclusions and future work
Extraction of masksBackground Subtraction
Limit the search space.
Non-adaptive background modeling using Mixture ofGaussians.
Select modality: depth.
15/47
![Page 19: Tri-modal Human Body Segmentation - UBsergio/linked/cristinamaster.pdf · Tri-modal Human Body Segmentation ... Human body articulated nature. ... M= 26 body parts. L scales per image](https://reader031.vdocuments.mx/reader031/viewer/2022022509/5ad3895c7f8b9afa798e0a98/html5/thumbnails/19.jpg)
Introduction Tri-modal dataset Proposed baseline Evaluation Conclusions and future work
Extraction of masksMask registration
Registration
Depth/RGB Foreground Masks Thermal Foreground Masks
16/47
![Page 20: Tri-modal Human Body Segmentation - UBsergio/linked/cristinamaster.pdf · Tri-modal Human Body Segmentation ... Human body articulated nature. ... M= 26 body parts. L scales per image](https://reader031.vdocuments.mx/reader031/viewer/2022022509/5ad3895c7f8b9afa798e0a98/html5/thumbnails/20.jpg)
Introduction Tri-modal dataset Proposed baseline Evaluation Conclusions and future work
Extraction of regions of interest
17/47
![Page 21: Tri-modal Human Body Segmentation - UBsergio/linked/cristinamaster.pdf · Tri-modal Human Body Segmentation ... Human body articulated nature. ... M= 26 body parts. L scales per image](https://reader031.vdocuments.mx/reader031/viewer/2022022509/5ad3895c7f8b9afa798e0a98/html5/thumbnails/21.jpg)
Introduction Tri-modal dataset Proposed baseline Evaluation Conclusions and future work
Extraction of regions of interestBounding box generation from regions of interest
People overlap:
Bimodal disparity distribution.
Otsu’s threshold to split regions.
18/47
![Page 22: Tri-modal Human Body Segmentation - UBsergio/linked/cristinamaster.pdf · Tri-modal Human Body Segmentation ... Human body articulated nature. ... M= 26 body parts. L scales per image](https://reader031.vdocuments.mx/reader031/viewer/2022022509/5ad3895c7f8b9afa798e0a98/html5/thumbnails/22.jpg)
Introduction Tri-modal dataset Proposed baseline Evaluation Conclusions and future work
Extraction of regions of interestBounding box transformation and correspondence to other modalities
All modalities must have the same number of bounding boxes,corresponding to the same regions of interest.
Tasks:
1 Find correspondence between rgb/depth and thermal regionsof interest.
2 Compute the corresponding bounding boxes in thermalmodality generated after applying Otsu’s threshold in depthmodality.
19/47
![Page 23: Tri-modal Human Body Segmentation - UBsergio/linked/cristinamaster.pdf · Tri-modal Human Body Segmentation ... Human body articulated nature. ... M= 26 body parts. L scales per image](https://reader031.vdocuments.mx/reader031/viewer/2022022509/5ad3895c7f8b9afa798e0a98/html5/thumbnails/23.jpg)
Introduction Tri-modal dataset Proposed baseline Evaluation Conclusions and future work
Extraction of regions of interestBounding box transformation and correspondence to other modalities
1 Find correspondence between rgb/depth and thermal regions ofinterest.
Iterative search among depth and thermal modalities.Takes into account deviation among them.Best match: bounding box coordinates, amount of overlap andarea similarity.Correspondence function:
bthermaliq = β(bdepthij ) (1)
where bij is the j-th bounding box in frame i.
20/47
![Page 24: Tri-modal Human Body Segmentation - UBsergio/linked/cristinamaster.pdf · Tri-modal Human Body Segmentation ... Human body articulated nature. ... M= 26 body parts. L scales per image](https://reader031.vdocuments.mx/reader031/viewer/2022022509/5ad3895c7f8b9afa798e0a98/html5/thumbnails/24.jpg)
Introduction Tri-modal dataset Proposed baseline Evaluation Conclusions and future work
Extraction of regions of interestBounding box transformation and correspondence to other modalities
2 Compute the corresponding bounding boxes in thermal modalitygenerated after applying Otsu’s threshold in depth modality.
Assuming bounding boxes of both rgb/depth and thermalmodalities are proportional, find the equivalence ratio to createthe split bounding boxes in thermal.Ratio k:
kh =hbdepth
ij
hbthermaliq
, kw =wbdepth
ij
wbthermaliq
(2)
where h and w are the size of a given bounding box.
21/47
![Page 25: Tri-modal Human Body Segmentation - UBsergio/linked/cristinamaster.pdf · Tri-modal Human Body Segmentation ... Human body articulated nature. ... M= 26 body parts. L scales per image](https://reader031.vdocuments.mx/reader031/viewer/2022022509/5ad3895c7f8b9afa798e0a98/html5/thumbnails/25.jpg)
Introduction Tri-modal dataset Proposed baseline Evaluation Conclusions and future work
Extraction of regions of interestBounding box transformation and correspondence to other modalities
Result:
Correspondence of regions of interest among modalities.
Grid partitioning 2× 2 cells per bounding box.
RGB Depth Thermal
1 11
2 2 2
22/47
![Page 26: Tri-modal Human Body Segmentation - UBsergio/linked/cristinamaster.pdf · Tri-modal Human Body Segmentation ... Human body articulated nature. ... M= 26 body parts. L scales per image](https://reader031.vdocuments.mx/reader031/viewer/2022022509/5ad3895c7f8b9afa798e0a98/html5/thumbnails/26.jpg)
Introduction Tri-modal dataset Proposed baseline Evaluation Conclusions and future work
Extraction of regions of interestBounding box ground truth generation
Comparing overlap between:
Bounding boxes extracted from Ground Truth Masks
Bounding boxes extracted from Background SubtractionMasks
Label:
tdr =
0 (Object) if overlap ≤ 0.1−1 (Unknown) if 0.1 < overlap < 0.61 (Subject) if overlap ≥ 0.6
23/47
![Page 27: Tri-modal Human Body Segmentation - UBsergio/linked/cristinamaster.pdf · Tri-modal Human Body Segmentation ... Human body articulated nature. ... M= 26 body parts. L scales per image](https://reader031.vdocuments.mx/reader031/viewer/2022022509/5ad3895c7f8b9afa798e0a98/html5/thumbnails/27.jpg)
Introduction Tri-modal dataset Proposed baseline Evaluation Conclusions and future work
Feature extraction
24/47
![Page 28: Tri-modal Human Body Segmentation - UBsergio/linked/cristinamaster.pdf · Tri-modal Human Body Segmentation ... Human body articulated nature. ... M= 26 body parts. L scales per image](https://reader031.vdocuments.mx/reader031/viewer/2022022509/5ad3895c7f8b9afa798e0a98/html5/thumbnails/28.jpg)
Introduction Tri-modal dataset Proposed baseline Evaluation Conclusions and future work
Feature extraction: ColorHistogram of Oriented Gradients (HOG)
Unsigned gradients (0 - 180degrees).
9-bin histogram.
Contribution to the histogramgiven by the vector magnitude.
No block overlap applied.
Final vector of 288 values percell.
HOG
Navneet Dalal and Bill Triggs. “Histograms of oriented gradients forhuman detection”. In: Computer Vision and Pattern Recognition, 2005. CVPR2005. IEEE Computer Society Conference on. Vol. 1. IEEE. 2005, pp. 886–893.25/47
![Page 29: Tri-modal Human Body Segmentation - UBsergio/linked/cristinamaster.pdf · Tri-modal Human Body Segmentation ... Human body articulated nature. ... M= 26 body parts. L scales per image](https://reader031.vdocuments.mx/reader031/viewer/2022022509/5ad3895c7f8b9afa798e0a98/html5/thumbnails/29.jpg)
Introduction Tri-modal dataset Proposed baseline Evaluation Conclusions and future work
Feature extraction: ColorHistogram of Oriented Optical Flow (HOOF)
Dense optical flow computation.
8-bin histogram.
Signed gradients (0 - 360degrees).
Contribution to the histogramgiven by the vector magnitude.
Final vector of 8 values per cell.Optical flow
26/47
![Page 30: Tri-modal Human Body Segmentation - UBsergio/linked/cristinamaster.pdf · Tri-modal Human Body Segmentation ... Human body articulated nature. ... M= 26 body parts. L scales per image](https://reader031.vdocuments.mx/reader031/viewer/2022022509/5ad3895c7f8b9afa798e0a98/html5/thumbnails/30.jpg)
Introduction Tri-modal dataset Proposed baseline Evaluation Conclusions and future work
Feature extraction: ColorScore Maps (SM)
Score map based on Gaborfilters.
C = 6 component filters perbody part.
M = 26 body parts.
L scales per image.
Score maps from Ramanan et al
score(pl) =1
C
1
M
∑c∈C
∑m∈M
score(pl)mc (3)
score(p) =1
L
∑l∈L
score(pl)′ (4)
Yi Yang and Deva Ramanan. “Articulated pose estimation with flexiblemixtures-of-parts”. In: Computer Vision and Pattern Recognition (CVPR),2011 IEEE Conference on. IEEE. 2011, pp. 1385–1392.27/47
![Page 31: Tri-modal Human Body Segmentation - UBsergio/linked/cristinamaster.pdf · Tri-modal Human Body Segmentation ... Human body articulated nature. ... M= 26 body parts. L scales per image](https://reader031.vdocuments.mx/reader031/viewer/2022022509/5ad3895c7f8b9afa798e0a98/html5/thumbnails/31.jpg)
Introduction Tri-modal dataset Proposed baseline Evaluation Conclusions and future work
Feature extraction: DepthHistogram of Oriented Depth Normals (HON)
Depth dense maps to 3Dpoint cloud structures.
Surface normalscomputations.
Angle distribution quantizedin 8-bin histogram.
Depth normals
28/47
![Page 32: Tri-modal Human Body Segmentation - UBsergio/linked/cristinamaster.pdf · Tri-modal Human Body Segmentation ... Human body articulated nature. ... M= 26 body parts. L scales per image](https://reader031.vdocuments.mx/reader031/viewer/2022022509/5ad3895c7f8b9afa798e0a98/html5/thumbnails/32.jpg)
Introduction Tri-modal dataset Proposed baseline Evaluation Conclusions and future work
Feature extraction: ThermalHistogram of Thermal Intensities and Oriented Gradients (HIOG)
Concatenation of 2histograms:
1 Thermal intensitities[0, 255].
2 Orientation of thermalgradients (similar toHOG).
8 bins per histogram.Thermal intensities and
oriented gradients
29/47
![Page 33: Tri-modal Human Body Segmentation - UBsergio/linked/cristinamaster.pdf · Tri-modal Human Body Segmentation ... Human body articulated nature. ... M= 26 body parts. L scales per image](https://reader031.vdocuments.mx/reader031/viewer/2022022509/5ad3895c7f8b9afa798e0a98/html5/thumbnails/33.jpg)
Introduction Tri-modal dataset Proposed baseline Evaluation Conclusions and future work
Classifiers overview
1 Statistical Learning:
Gaussian Mixture Models (GMM)Subject and object probabilitiesIndividual prediction
2 Multi-modal fusion approaches:
Naive approachDiscriminative classifiersStacked learning fashion
30/47
![Page 34: Tri-modal Human Body Segmentation - UBsergio/linked/cristinamaster.pdf · Tri-modal Human Body Segmentation ... Human body articulated nature. ... M= 26 body parts. L scales per image](https://reader031.vdocuments.mx/reader031/viewer/2022022509/5ad3895c7f8b9afa798e0a98/html5/thumbnails/34.jpg)
Introduction Tri-modal dataset Proposed baseline Evaluation Conclusions and future work
Cell classificationGaussian Mixture Models
Unsupervised learning method for fitting multiple Gaussians toa set of multi-dimensional data points to obtain a likelihood L.
Trained using Expectation Maximization algorithm.
L =
N∏x∈X
K∏k=1
p(x|k)P (k) (5)
h1
GMM
HOGHOOFHONHIOG
Cell-based
y1
HOG HOOF
HON HIOG
31/47
![Page 35: Tri-modal Human Body Segmentation - UBsergio/linked/cristinamaster.pdf · Tri-modal Human Body Segmentation ... Human body articulated nature. ... M= 26 body parts. L scales per image](https://reader031.vdocuments.mx/reader031/viewer/2022022509/5ad3895c7f8b9afa798e0a98/html5/thumbnails/35.jpg)
Introduction Tri-modal dataset Proposed baseline Evaluation Conclusions and future work
Individual Prediction
h3
Multi-modalFusion
⎨y1' ∪ y2⎬
Normalization
Pixel to cell
description +
Normalization
y3
+
y1'
h1
GMM
h2
Individual Prediction
HOGHOOFHONHIOG
Cell-based
SM
Pixel-based
y1
y2
32/47
![Page 36: Tri-modal Human Body Segmentation - UBsergio/linked/cristinamaster.pdf · Tri-modal Human Body Segmentation ... Human body articulated nature. ... M= 26 body parts. L scales per image](https://reader031.vdocuments.mx/reader031/viewer/2022022509/5ad3895c7f8b9afa798e0a98/html5/thumbnails/36.jpg)
Introduction Tri-modal dataset Proposed baseline Evaluation Conclusions and future work
Individual PredictionCell-based
Predict if a region corresponds to subject or object, for eachcell-based descriptor individually. Grid cell voting v:
v =∑i,j
1{Ld,subij > Ld,objij } (6)
Based on a threshold vthr that defines the minimum number ofpositive votes needed to assign the subject label to the givenregion:
vthr =vgridhgrid
2(7)
Final decision tdr :
tdr = 1
{v > vthr
}∨{1{v = vthr
}·1{∑
i,j
(Ld,subij −Ld,objij
)> 0}}(8)
33/47
![Page 37: Tri-modal Human Body Segmentation - UBsergio/linked/cristinamaster.pdf · Tri-modal Human Body Segmentation ... Human body articulated nature. ... M= 26 body parts. L scales per image](https://reader031.vdocuments.mx/reader031/viewer/2022022509/5ad3895c7f8b9afa798e0a98/html5/thumbnails/37.jpg)
Introduction Tri-modal dataset Proposed baseline Evaluation Conclusions and future work
Individual PredictionPixel-based
Prediction of a given region defined by:
α: minimum score of a pixel to be considered as a person.
η: minimum percentage of pixels inside a region considered asperson that are needed to label the whole region as a person.
Final decision tdr :
tdr = 1
{ 1
Nr
Nr∑i=1
1{score(pi) > α} > η}
(9)
where Nr denotes the number of pixels of a region.
34/47
![Page 38: Tri-modal Human Body Segmentation - UBsergio/linked/cristinamaster.pdf · Tri-modal Human Body Segmentation ... Human body articulated nature. ... M= 26 body parts. L scales per image](https://reader031.vdocuments.mx/reader031/viewer/2022022509/5ad3895c7f8b9afa798e0a98/html5/thumbnails/38.jpg)
Introduction Tri-modal dataset Proposed baseline Evaluation Conclusions and future work
Multi-modal fusion
h3
Multi-modalFusion
⎨y1' ∪ y2⎬
Normalization
Pixel to cell
description +
Normalization
y3
+
y1'
h1
GMM
h2
Individual Prediction
HOGHOOFHONHIOG
Cell-based
SM
Pixel-based
y1
y2
35/47
![Page 39: Tri-modal Human Body Segmentation - UBsergio/linked/cristinamaster.pdf · Tri-modal Human Body Segmentation ... Human body articulated nature. ... M= 26 body parts. L scales per image](https://reader031.vdocuments.mx/reader031/viewer/2022022509/5ad3895c7f8b9afa798e0a98/html5/thumbnails/39.jpg)
Introduction Tri-modal dataset Proposed baseline Evaluation Conclusions and future work
Multi-modal fusionNaive approach
1 Voting among all descriptors using individual predictions tdr .
2 If there is a strong agreement between descriptions, thosedescriptions that differ are not taking into account in the thirdstep.
3 Cell level fusion:
Ld,subij =∑d∈D′
Ld,subij , Ld,objij =∑d∈D′
Ld,objij (10)
4 Predict tr following the same procedure as in individualprediction.
36/47
![Page 40: Tri-modal Human Body Segmentation - UBsergio/linked/cristinamaster.pdf · Tri-modal Human Body Segmentation ... Human body articulated nature. ... M= 26 body parts. L scales per image](https://reader031.vdocuments.mx/reader031/viewer/2022022509/5ad3895c7f8b9afa798e0a98/html5/thumbnails/40.jpg)
Introduction Tri-modal dataset Proposed baseline Evaluation Conclusions and future work
Multi-modal fusionSVM-based approach
Discriminative supervised binary classifier that learns a modelwhich represents the instances as points in space, mapped insuch a way that instances of different classes are separated bya hyperplane in a high dimensional space.
Approaches:
Simple: {Ld,subij , Ld,obj
ij }Stacked: {Ld,sub
ij , Ld,objij , tdr}
37/47
![Page 41: Tri-modal Human Body Segmentation - UBsergio/linked/cristinamaster.pdf · Tri-modal Human Body Segmentation ... Human body articulated nature. ... M= 26 body parts. L scales per image](https://reader031.vdocuments.mx/reader031/viewer/2022022509/5ad3895c7f8b9afa798e0a98/html5/thumbnails/41.jpg)
Introduction Tri-modal dataset Proposed baseline Evaluation Conclusions and future work
Outline
1 Introduction
2 Tri-modal dataset
3 Proposed baseline
4 EvaluationExperimental methodology and validation measuresExperimental results
5 Conclusions and future work
38/47
![Page 42: Tri-modal Human Body Segmentation - UBsergio/linked/cristinamaster.pdf · Tri-modal Human Body Segmentation ... Human body articulated nature. ... M= 26 body parts. L scales per image](https://reader031.vdocuments.mx/reader031/viewer/2022022509/5ad3895c7f8b9afa798e0a98/html5/thumbnails/42.jpg)
Introduction Tri-modal dataset Proposed baseline Evaluation Conclusions and future work
Experimental methodology and validation measures
10-fold cross validation.
Grid search to optimize all the parameters.
Training without unknown labels.
GMM with 3 components per Gaussian.
SVM approaches for multi-modal fusion (simple and stacked):LinearRBF
Don’t care region.
Segmentation accuracy measure: Jaccard Index
overlap(A,B) =|A ∩B||A ∪B|
(11)
39/47
![Page 43: Tri-modal Human Body Segmentation - UBsergio/linked/cristinamaster.pdf · Tri-modal Human Body Segmentation ... Human body articulated nature. ... M= 26 body parts. L scales per image](https://reader031.vdocuments.mx/reader031/viewer/2022022509/5ad3895c7f8b9afa798e0a98/html5/thumbnails/43.jpg)
Introduction Tri-modal dataset Proposed baseline Evaluation Conclusions and future work
Quantitative results
0 1 3 5 70.45
0.5
0.55
0.6
0.65
0.7
0.75
DCR (pixels)
Overlap
Individual Prediction
HOG
SM
HOOF
HIOG
HON
(a) Individual prediction
0 1 3 5 70.35
0.4
0.45
0.5
0.55
0.6
DCR (pixels)
Overlap
Naive Fusion
Thermal
Color/Depth
(b) Naive fusion
40/47
![Page 44: Tri-modal Human Body Segmentation - UBsergio/linked/cristinamaster.pdf · Tri-modal Human Body Segmentation ... Human body articulated nature. ... M= 26 body parts. L scales per image](https://reader031.vdocuments.mx/reader031/viewer/2022022509/5ad3895c7f8b9afa798e0a98/html5/thumbnails/44.jpg)
Introduction Tri-modal dataset Proposed baseline Evaluation Conclusions and future work
Quantitative results
0 1 3 5 70.45
0.5
0.55
0.6
0.65
0.7
0.75
DCR (pixels)
Ove
rla
p
Simple Linear SVM
Thermal
Color/Depth
(c) Fusion usingSimple linear SVM
0 1 3 5 70.45
0.5
0.55
0.6
0.65
0.7
0.75
DCR (pixels)
Ove
rla
p
Simple RBF SVM
Thermal
Color/Depth
(d) Fusion usingSimple RBF SVM
0 1 3 5 70.45
0.5
0.55
0.6
0.65
0.7
0.75
DCR (pixels)
Ove
rla
p
Stacked Linear SVM
Thermal
Color/Depth
(e) Fusion usingStacked Linear SVM
0 1 3 5 70.45
0.5
0.55
0.6
0.65
0.7
0.75
DCR (pixels)
Ove
rla
p
Stacked RBF SVM
Thermal
Color/Depth
(f) Fusion usingStacked RBF SVM
41/47
![Page 45: Tri-modal Human Body Segmentation - UBsergio/linked/cristinamaster.pdf · Tri-modal Human Body Segmentation ... Human body articulated nature. ... M= 26 body parts. L scales per image](https://reader031.vdocuments.mx/reader031/viewer/2022022509/5ad3895c7f8b9afa798e0a98/html5/thumbnails/45.jpg)
Introduction Tri-modal dataset Proposed baseline Evaluation Conclusions and future work
Quantitative results
Table: Overlap results of the individual predictions for each description
DCR HOG SM HOOF HIOG HON0 62.10 % 63.12 % 56.97 % 46.35 % 56.76 %
1 64.71 % 65.85 % 59.41 % 47.99 % 59.09 %
3 67.59 % 69.02 % 62.13 % 50.85 % 61.70 %
5 68.65 % 70.40 % 63.20 % 53.02 % 62.77 %
7 68.65 % 70.72 % 63.28 % 54.45 % 62.94 %
Table: Overlap results of fusion using Stacked Linear SVM for each modality
DCR Thermal Color/Depth0 49.64 % 64.65 %
1 51.33 % 67.39 %
3 54.29 % 70.43 %
5 56.56 % 71.58 %
7 58.11 % 71.63 %
42/47
![Page 46: Tri-modal Human Body Segmentation - UBsergio/linked/cristinamaster.pdf · Tri-modal Human Body Segmentation ... Human body articulated nature. ... M= 26 body parts. L scales per image](https://reader031.vdocuments.mx/reader031/viewer/2022022509/5ad3895c7f8b9afa798e0a98/html5/thumbnails/46.jpg)
Introduction Tri-modal dataset Proposed baseline Evaluation Conclusions and future work
Qualitative results
Comparison between masks generated after background subtractionand masks generated using Stacked Linear SVM.
43/47
![Page 47: Tri-modal Human Body Segmentation - UBsergio/linked/cristinamaster.pdf · Tri-modal Human Body Segmentation ... Human body articulated nature. ... M= 26 body parts. L scales per image](https://reader031.vdocuments.mx/reader031/viewer/2022022509/5ad3895c7f8b9afa798e0a98/html5/thumbnails/47.jpg)
Introduction Tri-modal dataset Proposed baseline Evaluation Conclusions and future work
Outline
1 Introduction
2 Tri-modal dataset
3 Proposed baseline
4 Evaluation
5 Conclusions and future workConclusionsFuture work
44/47
![Page 48: Tri-modal Human Body Segmentation - UBsergio/linked/cristinamaster.pdf · Tri-modal Human Body Segmentation ... Human body articulated nature. ... M= 26 body parts. L scales per image](https://reader031.vdocuments.mx/reader031/viewer/2022022509/5ad3895c7f8b9afa798e0a98/html5/thumbnails/48.jpg)
Introduction Tri-modal dataset Proposed baseline Evaluation Conclusions and future work
Conclusions
A solution for human body segmentation in multi-modal datahas been proposed.
A novel tri-modal dataset has been presented, containing RGB- Depth - Thermal modalities.
Results show variable performance for the different modalitieswhen segmenting people in multi-modal data, and improvedsegmentation accuracy of the multi-modal GMM-SVMstacked learning method.
45/47
![Page 49: Tri-modal Human Body Segmentation - UBsergio/linked/cristinamaster.pdf · Tri-modal Human Body Segmentation ... Human body articulated nature. ... M= 26 body parts. L scales per image](https://reader031.vdocuments.mx/reader031/viewer/2022022509/5ad3895c7f8b9afa798e0a98/html5/thumbnails/49.jpg)
Introduction Tri-modal dataset Proposed baseline Evaluation Conclusions and future work
Future work
Impact factor journal in progress in collaboration with AalborgUniversity and HuPBA group.
Silhouette masks refinement using Grabcuts.
If pixel-level registration available among all modalities:
Combination of different modalities in background subtraction.Pixel-level feature extraction.Pixel-level description.
Extensive validation in real surveillance scenarios as a first realcase study, including gesture recognition methodologies(planning just started with Aalborg University).
46/47
![Page 50: Tri-modal Human Body Segmentation - UBsergio/linked/cristinamaster.pdf · Tri-modal Human Body Segmentation ... Human body articulated nature. ... M= 26 body parts. L scales per image](https://reader031.vdocuments.mx/reader031/viewer/2022022509/5ad3895c7f8b9afa798e0a98/html5/thumbnails/50.jpg)
Introduction Tri-modal dataset Proposed baseline Evaluation Conclusions and future work
Thank you.
47/47