learning and inference in cartoon videos using deep ... · byoung-tak zhang, kyung- min kim...
Post on 21-Jul-2020
0 Views
Preview:
TRANSCRIPT
Byoung-Tak Zhang, Kyung-Min Kim
Learning and Inference in Cartoon Videos using Deep Hypernetworks Concept Structure
2016-11-01
Introduction to Machine Learning
2
Definition – What is Concept?
Concept ‘Pororo’
High-level
blue, penguin, pororo, cute, playful
Visual and linguistic representation
Low-level
7. 딥개념신경망 (딥하이퍼넷)
w w w w w w v v v v v v
e e e e e e e e e e e e
c c c c c
“it”
SC4 SC2 SC5 SC6
“playful” “start” “crong” “sky” “plane”
Pororo = {SC1, SC4, …} SC1 = e_1 = {w1, w3, v1, v4} SC4 = e_5 = {w1, w5, v2, v6} Pororo = {e_1, e_5, …} = {w1, w3, w5, v1, v2, v4, v6, …}
SC1 SC3
Pororo
Population code = a collection of sparse microcodes
Microcodes (sparse)
Crong Eddy
Sparse Population Coding model1 (SPC)
1. Zhang B.-T., Ha J.-W., Kang M., Sparse Population Code Models of Word Learning in Concept Drift, Cogsci 2012
Deep Concept Hierarchies
4
r r r r w w w w w
c1
h
xGrowing
Growing and shrinking
c2
e1
e2
e3
e4
e5
𝑐11𝑐21
𝑐31 𝑐41
𝑐22𝑐12
(b) Hypergraph representation of (a)
h
c1
c2
“it”
“playful”
“start”“crong”
“sky”
“plane”
“robot”
(a) Example of deep concept hierarchy learned from Pororo videos
r
1 2 1 2 1 2( , | ) ( , | , ) ( | )P P P=∑h
r w c ,c r w h c ,c h c ,c
Algorithm: Learning of Concept Layers Three issues
Determining the number of the nodes of the concrete concept layer c1 (c1-nodes)
Associating c1-nodes and the modality layer h Associating c1-nodes and the abstract concept nodes (c2-nodes)
Number of c1-nodes and association of the c1 layer and h A c1-node is associated with a subgraph of the hypernetwork A subgraph = a cluster of hyperedges The number of clusters changes depending on the closeness
centrality using word2vec:
If Sim(hm) > θmax or Sim(hm) < θmin then hm is split or merged. Association of c1 and c2 layers
A hyperedge contains the c2 node information
( )( )| |
mm
mDistSim =
hhh (hm : the hyperedge cluster associated with )
1mc
21 2
(c , )(c ,c )
i m j mhmi j
i mhm
C hαω
α∈
∈
=∑
∑h
h
(Mikolov et al. 2013)
Why Cartoon Videos?
Children’s cartoon videos Multimodal Vision + language Simple grammars Explicit story lines Image processing Pseudo-real Educational Cognitive
6
Comparison to Previous Approaches
Previous approaches to knowledge acquisition and representation Semantic networks (Steyvers
2006) WordNet (Fellbaum 2010) Single-modality (usually
linguistic)
Here: Automated construction of conceptual knowledge from videos Visually-grounded knowledge Dynamic Concept drift Multimodal
Cartoon video data 17 ‘Pororo’ DVDs, 183 episodes, 1232 minutes, 16000 scene-
subtitle pairs
Image
R-CNN for extracting image patches CNN feature vector (4096-D vector) Color histogram (1000-D vector)
Text
Word set including functional words for sentence generation Total 3452 words Represent each word as a real vector by word2vec1
Data Description (1/2)
1 Mikolov T., Sutskever I., Chen K., Corrado G., Dean J., Distributed Representation of Words and Phrases and Their Compositionality, NIPS 2013
Data Description (2/2)
Image preprocessing: segmentation + descriptor Each scene image → a set of image patches (MSER) Image patch → CNN feature / RGB histogram
Text preprocessing: real vectors by word2vec
9
(dim: 4096)
CNN feature vector
10
10 10
RGB histogram tensor
Scene image
r1
r2
r3
R-CNN
Patches Visual features (SIFT / RGB histogram)
1Vv
2Vv
3Vv
1Cv
2Cv
3Cv
CNN feature, Color quantization
(nonnegative integer)
10
Concept Learning
Data Pipeline
i-th episode
Scene #1
Scene #2
….
Scene #n
Unit of input : an episode
Preprocessing Concept Networks
Kids Videome
v1 v2 v5
v2 v3 v7
v3 v4 v6
Hyperedge Large-scale video
Preprocessing
HE Sampling
Network construction
Image 개수 : Word 개수 :
300 50 350 80 400 110 450 140 500 170 550 210 600 240 650 270 700 300 750 330 800 360 850 390 900 420 950 450 1000 480 1050 510 1100 540 1150 570 1200 600 1250 630 1300 660 1400 690 1500 720 1600 750 1700 780 1800 810 1900 830 2000 850 3000 1600 4000 1700 7000 1800 10000 1900 2500 1000
Episode 개수 : 183 150 100 80 75 70 65 60 55 50 45 40 35 30 25 20 15 10 5 1
Evolution of Concept Map
Image Generation from Sentence Intermediate images generated from query sentences
13
Query sentences Episodes 1~52 (1 season) Episodes 1~104 (2 seasons) Episodes 1~183 (all seasons)
• Tongtong, please change this book using magic.
• Kurikuri, Kurikuri-tongtong!
• I like
cookies. • It looks
delicious • Thank you,
loopy
질의 문장 :
Hi
Pororo Let’s
Play
Hi Pororo Let’s
Image Generation from Sentence
질의 문장 : Hi Pororo Let’s Play!
Pororo Let’s
Play
Hi
Image Generation from Sentence
Sentence Generation from Image
16
Original subscript: clock, I have made another potion come and try it Generated subscripts • as i don't have the right magic potion come and try it was nice • ah, finished i finally made another potion come and try it we'll all alone?
Query = {i, try} Query = {thanks, drink} Original subscript: Oh, thanks Make him drink this Generated subscripts • oh, thanks make him drink this bread • oh, thanks make him drink this forest
Query = {he, take}
Original: Tong. Tong Let. Me. Take. It. To. Clock Generated • take your magic to know what is he doing? • take your magic to avoid the house to know what is he keeps going in
circles like this will turn you back to normal
Query = {ship, pulled}
Original: The ship is being pulled Generated • wow looks as if that's the ship is being pulled • the ship is being pulled
질의 이미지
magic
making
i
food
am sorry
i am making magic sorry
i am making food
Sentence Generation from Image
Analysis of Constructed Concept Structures
Distinguishable concept nodes
Formation of the character properties (Petty and Loopy)
(a) Microcodes (b) Centroids of c1-nodes
• Tongtong • Eddy • Pororo
• Pororo • Eddy
(a) Episode 1
• Loopy • Crong • Petty
(b) Episodes 1~52
• Loopy • Crong • Petty
Loopy: - A female beaver - Girlish and shy - Likes cooking
Petty: - A female penguin - Boyish and active - Likes sports
Additional Experiment 1
Proposing 3 different graph search methods
Additional Experiment 1
Graph Monte-Carlo Method Learning strategy: The probability of selecting a vertex, P(v(x)), for
generating hyperedges Uniform graph MC (UGMC)
Same probability for all vertices
Random selection Poorer-richer graph MC (PRGMC)
Prefer more frequently appearing vertices in graphs
Smaller and denser graph → fast but premature convergence Fair graph MC (FGMC)
Prefer less frequently appearing vertices Larger and sparse graph → a diverse representation but slow convergence
20
( )
1( ( )){ | }n
P v xx x +
=∈x ( : the set of variables with the positive value of the n-th instance) ( )n
+x
{ }( ( ))( ( )) , ( ( )) ( ( ), )
i t
i ie G
R d v xP v x d v x h v x eα
+
∈= = ∑x
{ }( ( ))( ( ))
R d v xP v x
−
=x
( R+(.): a ranking function of d(v(x)) in the ascending order)
( R-(.): a ranking function of d(v(x)) in the descending order)
© 2015, SNU CSE Biointelligence Lab., http://bi.snu.ac.kr 21
Scene-to-Subtitle Translation Scenes Episodes 1~13 Episodes 1~36
Original And petty taught loopy how to ski.
UGMC
- did you ask me how to swim.- the end how grateful I think she is coming. UGMC
- Wow petty that how that is not always so loopy taught if i can do fly it well.
- How did you have to be that I could ski just.
PRGMC - end how was so happy- the end how did you I would PRGMC - How did you pororo you.
- How about now you can I do not worry.
FGMC- To show how big you found- The end how grateful I am petty nice to lose careful
FGMC- Harry realized that how that is it is dangerous
- I thought that how that I could ski just
SPC- But how do someone stop.- The end how was it. SPC
- How about now you can you give me that how that is great.
- I will see let see how big.
Original Wow poby, you caught so many already.
UGMC
- Has been caughtUGMC
- Come out if you go in to hear you guys you have got a lot of fish I caught.
- You have caught a lot today did you see you later.
PRGMC- Has been caught
PRGMC- Everyone has caught a fish for dinner.- You have caught a lot today did you ask me how.
FGMC
- What are you guys you have caught a lot.
- What happened to ten everyone has caught a lot.
FGMC- Poby caught a boat a secret that all the wind is so big.
- You have caught a fish for the art diving.
SPC
- Pororo no pororo has caught- She caught the first place SPC
- You come with his new friend has caught a very interesting book recently
- What about pororo has caught a lot of fish
Proposing Dual Memory for Video Q&A
Additional Experiment 2
23
Video Q&A System (1/2)
Kids Videome
Concept learning
c1 c2 c5c3 c4
r r r r r w w w w w
er er er ew ew ew
s
Short-term memory
Knowledge base
c1 c2 c5c3 c4
r r r r r w w w w w
er er er ew ew ew
s
Long-term memory
Question
What did Pororo eat last night?
Question generation
Dinner Poby and Harry
…
Hypothesis answer set
Porongporong Dark castle
…
Hypothesis answer set
Answer evaluation Dinner
Final answer
c1 c2 c5c3 c4
r r r r r w w w w w
er er er ew ew ew
s
Image & Question multimodal learning
Overview of Video Q&A System
sO is a scoring function to get a candidate hyperedge-answer set In this work, cosine distance measuring between words in the question and words in hyperedges will be
used as sO
sR is a retrieval function to give answers.
24
Video Q&A System (2/2)
It is the day that pororo and crong
are flying.
Video
4096D CNN-feature vector
200D real valued vector using RNN
Long-term memory
Q : What are they doing now?
Short-term memory constructed by HNs
Candidate HEs
o = argmax sO(Q, Ei) i=1,…,N
A : Greeting
r = argmax sR([Q,E1,…EM], w) w⊆W
Q : What is pororo?
Candidate HEs
o = argmax sO(Q, Ei) i=1,…,N
A : Penguin
r = argmax sR([Q,E1,…EM], w) w⊆W
25
Dual Memory Networks
26
Evaluation
Sequence of Images
Sequence of Images
Answers (S/L)
Baking / Making a new toy Stars / Ball
Because they were so loud / His tall height and great strength Yes / Yes
Questions Can pororo swim out too far? How can pororo swim well?
Answers (S/L) Questions What did eddy trying to go to the playground all day?
What does eddy find in her sleep?
* S and L indicate short-term memory and long-term memory
Examples of generated questions & retrieved answers
Video turing test
Future Works : Human-machine Interactive system
비디오 학습
교육용 비디오
아이 교육용 로봇
비디오 개념 학습
학습 패턴 관찰에 따른 사용자 모델링
피드백
1)Pororo likes cookie 2)Eddy goes 3)Petty plays
The ship is being pulled
top related