understanding the semantics of media lecture notes on video search & mining, spring 2012...
TRANSCRIPT
Understanding the Semantics of Media
Lecture Notes on Video Search & Mining, Spring 2012
Presented by Jun Hee Yoo
Biointelligence Laboratory
School of Computer Science and Engineering
Seoul National Univertisy
http://bi.snu.ac.kr
© 2012, SNU CSE Biointelligence Lab., http://bi.snu.ac.kr 2
Semantic Understanding There are some tools which attempt to segment video at
a higher level. But this level of analysis does not tell us much about
the meaning represented in the media.
Problem Statement
© 2012, SNU CSE Biointelligence Lab., http://bi.snu.ac.kr 3
Approach
Segmentation Literature Use LSI because it allow us to quantify the position of a
portion of the document in a multi-dimensional semantic space.
Propose to summarize the text with LSI and analyze the signal with smooth Gaussians.
Semantic Retrieval Literature Use mixtures of probability experts for semantic-audio
retrieval (MPESAR) to model which more sophisticated model connecting words and media.
© 2012, SNU CSE Biointelligence Lab., http://bi.snu.ac.kr 4
Analysis Tools
SVD To reduce the dimensionality of a signal in a manner
which is optimum, in a least-squared sense. This use to reduce dimensionality of both audio and im-
age video data. Color Space
Concatenate into 512 histogram bins.
Word Space Using Latent semantic indexing with SVD. To measure the distance use the angle;
© 2012, SNU CSE Biointelligence Lab., http://bi.snu.ac.kr 5
Segmenting Video
Temporal Properties of Video Color:
It provides robust evidence for a shot change in a video signal.
However, it cannot tell us global structure of the video.
Random words form a transcript: The words indicate a lot about the overall structure of
the story.
© 2012, SNU CSE Biointelligence Lab., http://bi.snu.ac.kr 6
Segmenting Video
Test Material CNN Headline News (30min TV show). 21st Century Jet (Documentary). Use automatic speech recognition(ASR) to provide a
transcript of the audio.
© 2012, SNU CSE Biointelligence Lab., http://bi.snu.ac.kr 7
Segmenting Video
Scale Space Convert the original signal into scaled space.
In scale space, we analyze a signal with many differ-ent kernels.
With Low Pass Filter
Histogram
© 2012, SNU CSE Biointelligence Lab., http://bi.snu.ac.kr 8
Segmenting Video
Combined Image and Audio Data
Combined color, words and scale space analysis. The result is a 20-dimensional vector function of time and scale.
© 2012, SNU CSE Biointelligence Lab., http://bi.snu.ac.kr 9
Segmenting Video
Hierarchical Segmentation Results
Color and word autocorrelations for the Boeing 777 video
© 2012, SNU CSE Biointelligence Lab., http://bi.snu.ac.kr 10
Segmenting Video
Hierarchical Segmentation Results
Grouping 4-8 sentences produces a larger semantic auto-correlation.
© 2012, SNU CSE Biointelligence Lab., http://bi.snu.ac.kr 11
Segmenting Video
Intermediate Results A scale-space segmen-
tation algorithm pro-duced a boundary map showing the edges in the signal.
© 2012, SNU CSE Biointelligence Lab., http://bi.snu.ac.kr 12
Segmenting Video
A comparison of ground truth. Left: estimated result. Right: ground truth.
© 2012, SNU CSE Biointelligence Lab., http://bi.snu.ac.kr 13
Segmenting Video
Shot Boundary Segmentation. Use commercial product, designed by YesVideo.
© 2012, SNU CSE Biointelligence Lab., http://bi.snu.ac.kr 14
Segmenting Video
Manual Segmentation result
© 2012, SNU CSE Biointelligence Lab., http://bi.snu.ac.kr 15
Semantic Retrieval
MPESAR process
© 2012, SNU CSE Biointelligence Lab., http://bi.snu.ac.kr 16
Semantic Retrieval
Acoustic Signal processing chain
Acoustic to Semantic Lookup
© 2012, SNU CSE Biointelligence Lab., http://bi.snu.ac.kr 17
Semantic Retrieval
Testing
© 2012, SNU CSE Biointelligence Lab., http://bi.snu.ac.kr 18
Retrieval Results
Histogram of true label ranks based on likelihoods from au-dio-to-semantic tests
Histogram of true label ranks based on likelihoods from se-mantic-to-acoustic tests