how machines learn to talk amitabha mukerjee iit kanpur work done with: computer vision: profs. c....
TRANSCRIPT
![Page 1: How Machines Learn to Talk Amitabha Mukerjee IIT Kanpur work done with: Computer Vision: Profs. C. Venkatesh, Pabitra Mitra Prithvijit Guha, A. Ramakrishna](https://reader036.vdocuments.mx/reader036/viewer/2022062301/5697bfc91a28abf838ca8c97/html5/thumbnails/1.jpg)
How Machines Learn to Talk
Amitabha Mukerjee IIT Kanpur
work done with:Computer Vision: Profs. C. Venkatesh, Pabitra Mitra
Prithvijit Guha, A. Ramakrishna Rao, Pradeep Vaghela
Natural Language: Prof. Achla Raina, V. Shreeniwas
![Page 2: How Machines Learn to Talk Amitabha Mukerjee IIT Kanpur work done with: Computer Vision: Profs. C. Venkatesh, Pabitra Mitra Prithvijit Guha, A. Ramakrishna](https://reader036.vdocuments.mx/reader036/viewer/2022062301/5697bfc91a28abf838ca8c97/html5/thumbnails/2.jpg)
Robotics
Collaborations:IGCAR Kalpakkam
Sanjay Gandhi PG Medical Hospital
![Page 3: How Machines Learn to Talk Amitabha Mukerjee IIT Kanpur work done with: Computer Vision: Profs. C. Venkatesh, Pabitra Mitra Prithvijit Guha, A. Ramakrishna](https://reader036.vdocuments.mx/reader036/viewer/2022062301/5697bfc91a28abf838ca8c97/html5/thumbnails/3.jpg)
Visual Robot Navigation
Time-to-Collisionbased Robot Navigation
![Page 4: How Machines Learn to Talk Amitabha Mukerjee IIT Kanpur work done with: Computer Vision: Profs. C. Venkatesh, Pabitra Mitra Prithvijit Guha, A. Ramakrishna](https://reader036.vdocuments.mx/reader036/viewer/2022062301/5697bfc91a28abf838ca8c97/html5/thumbnails/4.jpg)
Hyper-Redundant Manipulators
• Reconfigurable Workspaces / Emergency Access
• Optimal Design of Hyper-Redundant Systems – Scara and 3D
The same manipulator can work in changing workspaces
![Page 5: How Machines Learn to Talk Amitabha Mukerjee IIT Kanpur work done with: Computer Vision: Profs. C. Venkatesh, Pabitra Mitra Prithvijit Guha, A. Ramakrishna](https://reader036.vdocuments.mx/reader036/viewer/2022062301/5697bfc91a28abf838ca8c97/html5/thumbnails/5.jpg)
Planar Hyper-Redundancy
4-link PlanarRobot
Motion Planning
![Page 6: How Machines Learn to Talk Amitabha Mukerjee IIT Kanpur work done with: Computer Vision: Profs. C. Venkatesh, Pabitra Mitra Prithvijit Guha, A. Ramakrishna](https://reader036.vdocuments.mx/reader036/viewer/2022062301/5697bfc91a28abf838ca8c97/html5/thumbnails/6.jpg)
Micro-Robots
• Micro Soccer Robots (1999-)
• 8cm Smart Surveillance Robot – 1m/s
• Autonomous Flying Robot (2004)
• Omni-directional platform (2002)
Omni-Directional Robot Sponsor: [email protected]
![Page 7: How Machines Learn to Talk Amitabha Mukerjee IIT Kanpur work done with: Computer Vision: Profs. C. Venkatesh, Pabitra Mitra Prithvijit Guha, A. Ramakrishna](https://reader036.vdocuments.mx/reader036/viewer/2022062301/5697bfc91a28abf838ca8c97/html5/thumbnails/7.jpg)
Flying Robot
heli-flight.wmv
Test Flight of UAV. Inertial Meas Unit (IMU) under commercial production
Start-Up at
IIT Kanpur
WhirligigRobotics
![Page 8: How Machines Learn to Talk Amitabha Mukerjee IIT Kanpur work done with: Computer Vision: Profs. C. Venkatesh, Pabitra Mitra Prithvijit Guha, A. Ramakrishna](https://reader036.vdocuments.mx/reader036/viewer/2022062301/5697bfc91a28abf838ca8c97/html5/thumbnails/8.jpg)
Tracheal Intubation Device
Device for Intubation during general Anesthesia
Aperture for Fibre optic video cable
Endotracheal tube Aperture
Aperture for Oxygenation tube
Hole for suction tube
Control cables Attachment Points
Ball & Socket joint
Assists surgeon while inserting breathing tube during general anaesthesia
Sponsor: DST / [email protected]
![Page 9: How Machines Learn to Talk Amitabha Mukerjee IIT Kanpur work done with: Computer Vision: Profs. C. Venkatesh, Pabitra Mitra Prithvijit Guha, A. Ramakrishna](https://reader036.vdocuments.mx/reader036/viewer/2022062301/5697bfc91a28abf838ca8c97/html5/thumbnails/9.jpg)
Draupadi’s Swayamvar
Can the Arrow hit the rotating mark? Sponsor: Media Lab Asia
![Page 10: How Machines Learn to Talk Amitabha Mukerjee IIT Kanpur work done with: Computer Vision: Profs. C. Venkatesh, Pabitra Mitra Prithvijit Guha, A. Ramakrishna](https://reader036.vdocuments.mx/reader036/viewer/2022062301/5697bfc91a28abf838ca8c97/html5/thumbnails/10.jpg)
High DOF Motion Planning
• Accessing Hard to Reach spaces
• Design of Hyper-Redundant Systems
• Parallel Manipulators
Sponsor: BRNS / [email protected]
10-link 3D Robot – Optimal Design
![Page 11: How Machines Learn to Talk Amitabha Mukerjee IIT Kanpur work done with: Computer Vision: Profs. C. Venkatesh, Pabitra Mitra Prithvijit Guha, A. Ramakrishna](https://reader036.vdocuments.mx/reader036/viewer/2022062301/5697bfc91a28abf838ca8c97/html5/thumbnails/11.jpg)
Multimodal Language Acquisition
Consider a child observing a scene together with adults talking about it
Grounded Language : Symbols are grounded in perceptual signals
Use of simple videos with boxes and simple shapes – standardly used in sociopsychology
![Page 12: How Machines Learn to Talk Amitabha Mukerjee IIT Kanpur work done with: Computer Vision: Profs. C. Venkatesh, Pabitra Mitra Prithvijit Guha, A. Ramakrishna](https://reader036.vdocuments.mx/reader036/viewer/2022062301/5697bfc91a28abf838ca8c97/html5/thumbnails/12.jpg)
Objective
To develop a computational frameworkfor Multimodal Language Acquisition• acquiring the perceptual structure
corresponding to verbs • using Recurrent Neural Networks as
a biologically plausible model for temporal abstraction
• Adapt the learned model to interpret activities in real videos
![Page 13: How Machines Learn to Talk Amitabha Mukerjee IIT Kanpur work done with: Computer Vision: Profs. C. Venkatesh, Pabitra Mitra Prithvijit Guha, A. Ramakrishna](https://reader036.vdocuments.mx/reader036/viewer/2022062301/5697bfc91a28abf838ca8c97/html5/thumbnails/13.jpg)
Visually Grounded Corpus
Two psychological research films, one based on the classic Heider & Simmel (1944) and other based on Hide & Seek
These animation portray motion paths of geometric figures (Big Square, Small square & Circle)
Chase Alt
![Page 14: How Machines Learn to Talk Amitabha Mukerjee IIT Kanpur work done with: Computer Vision: Profs. C. Venkatesh, Pabitra Mitra Prithvijit Guha, A. Ramakrishna](https://reader036.vdocuments.mx/reader036/viewer/2022062301/5697bfc91a28abf838ca8c97/html5/thumbnails/14.jpg)
Cognate clustering Similarity Clustering: Different
expressoins for same action, e.g.: “move away from center” vs “go to a corner”
Frequency: Remove Infrequent lexical units
Synonymy: Set of lexical units being used consistently in the same intervals, to mark the same action, for the same set of agents.
![Page 15: How Machines Learn to Talk Amitabha Mukerjee IIT Kanpur work done with: Computer Vision: Profs. C. Venkatesh, Pabitra Mitra Prithvijit Guha, A. Ramakrishna](https://reader036.vdocuments.mx/reader036/viewer/2022062301/5697bfc91a28abf838ca8c97/html5/thumbnails/15.jpg)
Perceptual Process
Cognate Clustering
Trained Simple
Recurrent Network
Descriptions
FeaturesVideo
Events
Feature Extraction
Multi Modal Input
VICES
![Page 16: How Machines Learn to Talk Amitabha Mukerjee IIT Kanpur work done with: Computer Vision: Profs. C. Venkatesh, Pabitra Mitra Prithvijit Guha, A. Ramakrishna](https://reader036.vdocuments.mx/reader036/viewer/2022062301/5697bfc91a28abf838ca8c97/html5/thumbnails/16.jpg)
Design of Feature Set The features selected here are related to
spatial aspects of conceptual primitives in children, such as position, relative pose, velocity etc.
Use features that are kinematical in nature, temporal derivations or simple transforms of the basic ones.
![Page 17: How Machines Learn to Talk Amitabha Mukerjee IIT Kanpur work done with: Computer Vision: Profs. C. Venkatesh, Pabitra Mitra Prithvijit Guha, A. Ramakrishna](https://reader036.vdocuments.mx/reader036/viewer/2022062301/5697bfc91a28abf838ca8c97/html5/thumbnails/17.jpg)
Monadic Features
![Page 18: How Machines Learn to Talk Amitabha Mukerjee IIT Kanpur work done with: Computer Vision: Profs. C. Venkatesh, Pabitra Mitra Prithvijit Guha, A. Ramakrishna](https://reader036.vdocuments.mx/reader036/viewer/2022062301/5697bfc91a28abf838ca8c97/html5/thumbnails/18.jpg)
Dyadic Predicates
![Page 19: How Machines Learn to Talk Amitabha Mukerjee IIT Kanpur work done with: Computer Vision: Profs. C. Venkatesh, Pabitra Mitra Prithvijit Guha, A. Ramakrishna](https://reader036.vdocuments.mx/reader036/viewer/2022062301/5697bfc91a28abf838ca8c97/html5/thumbnails/19.jpg)
VIdeo and Commentary for Event Structures [VICES]
Cognate Clustering
Trained Simple
Recurrent Network
Descriptions
FeaturesVideo
Events
Feature Extraction
Multi Modal Input
VICES
![Page 20: How Machines Learn to Talk Amitabha Mukerjee IIT Kanpur work done with: Computer Vision: Profs. C. Venkatesh, Pabitra Mitra Prithvijit Guha, A. Ramakrishna](https://reader036.vdocuments.mx/reader036/viewer/2022062301/5697bfc91a28abf838ca8c97/html5/thumbnails/20.jpg)
The classification problem
The problem is of time series classification
Possible methodologies include: Logic based methods Hidden Markov Models Recurrent Neural Networks
![Page 21: How Machines Learn to Talk Amitabha Mukerjee IIT Kanpur work done with: Computer Vision: Profs. C. Venkatesh, Pabitra Mitra Prithvijit Guha, A. Ramakrishna](https://reader036.vdocuments.mx/reader036/viewer/2022062301/5697bfc91a28abf838ca8c97/html5/thumbnails/21.jpg)
Elman Network Commonly a two-
layer network with feedback from the first-layer output to the first layer input
Elman Networks detect and generate time-varying patterns
It is also able to learn spatial patterns
![Page 22: How Machines Learn to Talk Amitabha Mukerjee IIT Kanpur work done with: Computer Vision: Profs. C. Venkatesh, Pabitra Mitra Prithvijit Guha, A. Ramakrishna](https://reader036.vdocuments.mx/reader036/viewer/2022062301/5697bfc91a28abf838ca8c97/html5/thumbnails/22.jpg)
Feature Extraction in Abstract Videos
Each image is read into a 2D matrix Connected Component Analysis is
performed Bounding box is computed for each
such connected component Dynamic tracking is used to keep
track of each object
![Page 23: How Machines Learn to Talk Amitabha Mukerjee IIT Kanpur work done with: Computer Vision: Profs. C. Venkatesh, Pabitra Mitra Prithvijit Guha, A. Ramakrishna](https://reader036.vdocuments.mx/reader036/viewer/2022062301/5697bfc91a28abf838ca8c97/html5/thumbnails/23.jpg)
![Page 24: How Machines Learn to Talk Amitabha Mukerjee IIT Kanpur work done with: Computer Vision: Profs. C. Venkatesh, Pabitra Mitra Prithvijit Guha, A. Ramakrishna](https://reader036.vdocuments.mx/reader036/viewer/2022062301/5697bfc91a28abf838ca8c97/html5/thumbnails/24.jpg)
Working with Real Videos Challenges
Noise in real world videos Illumination Changes Occlusions Extracting Depth Information
Our Setup Camera is fixed at head height. Angle of depression is 0 degrees (approx.).
Video
![Page 25: How Machines Learn to Talk Amitabha Mukerjee IIT Kanpur work done with: Computer Vision: Profs. C. Venkatesh, Pabitra Mitra Prithvijit Guha, A. Ramakrishna](https://reader036.vdocuments.mx/reader036/viewer/2022062301/5697bfc91a28abf838ca8c97/html5/thumbnails/25.jpg)
Background Subtraction Learn on still
background images Find pixel intensity
distributions Classify each pixel as
background if
Remove Shadows Special Case of Reduced
Illumination S = k*P where k<1.0
Background Subtraction
P(x,y) - µ(x,y) < P(x,y) - µ(x,y) < kkσσ(x,y)(x,y)
2
![Page 26: How Machines Learn to Talk Amitabha Mukerjee IIT Kanpur work done with: Computer Vision: Profs. C. Venkatesh, Pabitra Mitra Prithvijit Guha, A. Ramakrishna](https://reader036.vdocuments.mx/reader036/viewer/2022062301/5697bfc91a28abf838ca8c97/html5/thumbnails/26.jpg)
Contd.. Extract Human Blobs
By Connected Component Analysis
Bounding box is computed for each person
Track Human Blobs Each object is tracked
using a mean-shift tracking algorithm.
![Page 27: How Machines Learn to Talk Amitabha Mukerjee IIT Kanpur work done with: Computer Vision: Profs. C. Venkatesh, Pabitra Mitra Prithvijit Guha, A. Ramakrishna](https://reader036.vdocuments.mx/reader036/viewer/2022062301/5697bfc91a28abf838ca8c97/html5/thumbnails/27.jpg)
Contd..
![Page 28: How Machines Learn to Talk Amitabha Mukerjee IIT Kanpur work done with: Computer Vision: Profs. C. Venkatesh, Pabitra Mitra Prithvijit Guha, A. Ramakrishna](https://reader036.vdocuments.mx/reader036/viewer/2022062301/5697bfc91a28abf838ca8c97/html5/thumbnails/28.jpg)
Depth Estimation Two approximations
Using Gibson’s affordances Camera Geometry
Affordances: Visual Clues Action of a human is triggered by the
environment itself. A floor offers walk-on ability
Every object affords certain actions to perceive along with anticipated effects A cups handle affords grasping-lifting-drinking
![Page 29: How Machines Learn to Talk Amitabha Mukerjee IIT Kanpur work done with: Computer Vision: Profs. C. Venkatesh, Pabitra Mitra Prithvijit Guha, A. Ramakrishna](https://reader036.vdocuments.mx/reader036/viewer/2022062301/5697bfc91a28abf838ca8c97/html5/thumbnails/29.jpg)
Contd..
Gibson’s model Horizon is fixed at the head height of the
observer. Monocular Depth Cues
Interposition An object that occludes another is closer.
Height in the visual field Higher the object is the further it is.
![Page 30: How Machines Learn to Talk Amitabha Mukerjee IIT Kanpur work done with: Computer Vision: Profs. C. Venkatesh, Pabitra Mitra Prithvijit Guha, A. Ramakrishna](https://reader036.vdocuments.mx/reader036/viewer/2022062301/5697bfc91a28abf838ca8c97/html5/thumbnails/30.jpg)
Depth Estimation Pin hole Camera Model Mapping (X,Y,Z) to (x,y)
x = X * f / Z y = Y * f / Z
For the point of contact with the ground Z 1 / y X x / y
![Page 31: How Machines Learn to Talk Amitabha Mukerjee IIT Kanpur work done with: Computer Vision: Profs. C. Venkatesh, Pabitra Mitra Prithvijit Guha, A. Ramakrishna](https://reader036.vdocuments.mx/reader036/viewer/2022062301/5697bfc91a28abf838ca8c97/html5/thumbnails/31.jpg)
Depth plot for A chase B Top view (Z-X plane)
![Page 32: How Machines Learn to Talk Amitabha Mukerjee IIT Kanpur work done with: Computer Vision: Profs. C. Venkatesh, Pabitra Mitra Prithvijit Guha, A. Ramakrishna](https://reader036.vdocuments.mx/reader036/viewer/2022062301/5697bfc91a28abf838ca8c97/html5/thumbnails/32.jpg)
Results (contd..)
![Page 33: How Machines Learn to Talk Amitabha Mukerjee IIT Kanpur work done with: Computer Vision: Profs. C. Venkatesh, Pabitra Mitra Prithvijit Guha, A. Ramakrishna](https://reader036.vdocuments.mx/reader036/viewer/2022062301/5697bfc91a28abf838ca8c97/html5/thumbnails/33.jpg)
Results (contd..)
![Page 34: How Machines Learn to Talk Amitabha Mukerjee IIT Kanpur work done with: Computer Vision: Profs. C. Venkatesh, Pabitra Mitra Prithvijit Guha, A. Ramakrishna](https://reader036.vdocuments.mx/reader036/viewer/2022062301/5697bfc91a28abf838ca8c97/html5/thumbnails/34.jpg)
Results (contd..)
![Page 35: How Machines Learn to Talk Amitabha Mukerjee IIT Kanpur work done with: Computer Vision: Profs. C. Venkatesh, Pabitra Mitra Prithvijit Guha, A. Ramakrishna](https://reader036.vdocuments.mx/reader036/viewer/2022062301/5697bfc91a28abf838ca8c97/html5/thumbnails/35.jpg)
Results
Separate-SRN-for-each-action Trained & tested on different parts of the
abstract video Trained on abstract video and tested on
real video Single-SRN-for-all-actions
Trained on synthetic video and tested on real video
![Page 36: How Machines Learn to Talk Amitabha Mukerjee IIT Kanpur work done with: Computer Vision: Profs. C. Venkatesh, Pabitra Mitra Prithvijit Guha, A. Ramakrishna](https://reader036.vdocuments.mx/reader036/viewer/2022062301/5697bfc91a28abf838ca8c97/html5/thumbnails/36.jpg)
Basis for Comparison
E
E'-E Positives False
Mismatches Focus as classified Intervals :FM
E
E' E Positives True
occurring asevent an describe subjects when Intervals : E
E - t : E
occurring asevent an describes VICES when Intervals E'
'' EtE
E
E'-E Negatives False E
FM Mismatches Focus
t
EE 'E'EAccuracy
Let the total time of visual sequence for each verb be t time units
![Page 37: How Machines Learn to Talk Amitabha Mukerjee IIT Kanpur work done with: Computer Vision: Profs. C. Venkatesh, Pabitra Mitra Prithvijit Guha, A. Ramakrishna](https://reader036.vdocuments.mx/reader036/viewer/2022062301/5697bfc91a28abf838ca8c97/html5/thumbnails/37.jpg)
Separate SRN for each action
Framework : Abstract videoVerb True Positives False Positives False Negatives Focus Mismatches Accuracy
hit 46.02% 3.06% 53.98% 2.4% 92.37%
chase 24.44% 0% 75.24% 0.72% 93.71%
come Closer 25.87% 14.61% 73.26% 16.77% 63.66%
move Away 46.34% 7.21% 52.33% 15.95% 73.37 %
spins 82.54% 0% 16.51% 24.7% 97.03%
moves 68.24% 0.12% 31.76% 1.97% 77.33%
Verb True Positives False Positives False Negatives Focus Mismatches
hit 3 3 1 1
chase 6 0 3 4
come Closer 6 20 7 24
move Away 8 3 0 14
spins 22 0 1 9
moves 5 1 2 7
![Page 38: How Machines Learn to Talk Amitabha Mukerjee IIT Kanpur work done with: Computer Vision: Profs. C. Venkatesh, Pabitra Mitra Prithvijit Guha, A. Ramakrishna](https://reader036.vdocuments.mx/reader036/viewer/2022062301/5697bfc91a28abf838ca8c97/html5/thumbnails/38.jpg)
Time Line comparison for Chase
![Page 39: How Machines Learn to Talk Amitabha Mukerjee IIT Kanpur work done with: Computer Vision: Profs. C. Venkatesh, Pabitra Mitra Prithvijit Guha, A. Ramakrishna](https://reader036.vdocuments.mx/reader036/viewer/2022062301/5697bfc91a28abf838ca8c97/html5/thumbnails/39.jpg)
Separate SRN for each action Real video (action recognition only)
Verb Retrieved Relevant True Positives
False Positives
False Negatives
Precision Recall
A Chase B 237 140 135 96 5 58.4% 96.4%
B Chase A 76 130 76 0 56 100% 58.4%
![Page 40: How Machines Learn to Talk Amitabha Mukerjee IIT Kanpur work done with: Computer Vision: Profs. C. Venkatesh, Pabitra Mitra Prithvijit Guha, A. Ramakrishna](https://reader036.vdocuments.mx/reader036/viewer/2022062301/5697bfc91a28abf838ca8c97/html5/thumbnails/40.jpg)
Single SRN for all actions
Framework : Real video
Verb Retrieved Relevant True Positives
False Positives
False Negatives
Precision Recall
Chase 239 270 217 23 5 91.2% 80.7%
Going Away 21 44 13 8 31 61.9% 29.5%
![Page 41: How Machines Learn to Talk Amitabha Mukerjee IIT Kanpur work done with: Computer Vision: Profs. C. Venkatesh, Pabitra Mitra Prithvijit Guha, A. Ramakrishna](https://reader036.vdocuments.mx/reader036/viewer/2022062301/5697bfc91a28abf838ca8c97/html5/thumbnails/41.jpg)
Conclusions & Future Work Sparse nature of video provides for ease of
visual analysis Directly learning event structures from
perceptual stream. Extensions: Learn fine nuances between
event structures of related action words. Learn the Morphological variations. Extend the work towards using Long Short
Term Memory (LSTM). Hierarchical acquisition of higher level
action verbs.