food and activity detection in life logging images
TRANSCRIPT
Detecting Food & Activities in Life-logging Images
Bahjat [email protected]
Introduction- Massive multimedia archives are continuously produced, as every moment of life-experiance is captured and recorded to represent a lifelog
- Such a lifelog needs to be indexed, organised and searchable to be valuable to the lifelogger.
- Typical questions: “when did I meet X in place Y” or “ what I eat since a month” or many other similar ones.
President ObamaCheeringBar, PeopleBeer (Guinness) …
Semantic gap Search: Image of President Obama drinking Guinness
Semantic gap
Content-based Multimedia Indexing
Indexing:
Modeling:Labels
()
Train
Training set
Multimediadescription
Model
• For each concept (e.g. Eaters)
Classifier
MultimediaDescription
Eaters:0.950.15
Test samplesPredict
-+
-
+
-
-
- The feature extraction is domain specific and time consuming. - Generic systems: use several descriptors
Sift, Vlad, HoG, BoW…
Deep-learning CBMIGiven N visual concepts:1- Build features automatically based on training data2- Combine feature extraction and classification DL experts: define NN topology and train NN.The net trained in multi-class mode (one model with N neurons at the output layer)
Deep learning: train good feature automatically, and it applies the same method for different domains.
Full connection
Training set
ConvNet Model
Deep Learning Neural Network
Deep-learning CBMI
Neural Networks (NN)
Deep-learning Neural Networks (DCNN)
CBMI system: Deep learning
Training set
ConvNet
O1 O2 O3 O4 … 0.2 0.9 0.3 0.5 …0.9 0.5 0.4 0.3 …0.3 0.1 0.9 0.2 …
o1
o2
o2
o3
o4
o5
Full connection
ConvNet Full connection
Indexing Phase: uses forward function on the NN,results in N scores corresponding to the learned concepts.
Learning Phase: uses Back-propagation with different function at each layer.
Image
Layer1:conv+pool
Layer6: FC
Layer2:conv+pool
Layer3:conv
Layer4:conv
Layer5:conv+pool
Layer7: FC
SoftmaxOutput
Deep Learning NN
Deep Learning NN
Deep-Learning with Classical CBMI system
Indexing:
Modeling:Labels
()
Train
Training set
MultimediaDescription :
Deep-learning ‘DCNN’
Model
• For each visual concept (e.g. Eaters):
Classifier (SVM)
MultimediaDescription :
Deep-learning ‘DCNN’
Eaters:0.950.450.10
Test samples Predict
+-
--
+-
Early fusion of three descriptors based DCNN: 1- Alex-Net 2- GoogleNet 3- Visual Geometry Group
- Each descriptor is optimized separately before the fusion (using Power_low-PCA)- An additional optimization is applied after fusion (using Power_low-PCA –Power_low) final descriptor of 294 dim.
FMSVM : classifier due to its effectiveness in class-imbalance problem and its efficiency.
- Ranked first at Pascal-VOC challenge, and very good performance on TRECVid.
Our approach
Thank You!
[email protected]: http://mrim.imag.fr/lifelog/