video summarization using mutual reinforcement principle and shot arrangement patterns lu shi oct....
Post on 18-Dec-2015
222 views
TRANSCRIPT
Video Summarization Using Mutual Video Summarization Using Mutual Reinforcement Principle and Shot Reinforcement Principle and Shot
Arrangement PatternsArrangement Patterns
Lu ShiLu Shi
Oct. 4, 2004Oct. 4, 2004
OutlineOutline
Background Background Video semantics and annotationVideo semantics and annotation Mutual reinforcementMutual reinforcement Shot arrangement analysisShot arrangement analysis Video skim selectionVideo skim selection Preliminary experimentsPreliminary experiments
Background Background
Why video summarizationWhy video summarization Help the user to quickly grasp the content of a Help the user to quickly grasp the content of a
videovideo Video summary target: Video summary target:
ConcisenessConciseness Content coverageContent coverage CoherencyCoherency
TypeType Static and dynamicStatic and dynamic
BackgroundBackground
Two kinds of video summarization Two kinds of video summarization UnconstrainedUnconstrained
Generate a preview, only try to cover all the content of Generate a preview, only try to cover all the content of the video, only constrained by the time limit the video, only constrained by the time limit LL
Can be helped by mutual reinforcement resultCan be helped by mutual reinforcement result ConstrainedConstrained
User may have some preference on some specific User may have some preference on some specific content, like specific time range, with specific person, content, like specific time range, with specific person, etc.etc.
System overviewSystem overview
Raw vi deo
Vi deo shots
Semanti c content descri pti on
Vi deo structural I nformati on
Semanti c vi deo shot groups andshot i mportance val ues
Fi nal vi deoski mmi mg
Key vi deo shot patterns
Vi deo segmentati on
Semanti c annotati on
Mutual rei nforcementShot arrangement pattern
anal ysi s
Sel ect and assambl e
Video semanticsVideo semantics
Low level features and high level concepts: Low level features and high level concepts: semantic gapsemantic gap
Summary based on low level features is not Summary based on low level features is not able to ensure the perceived qualityable to ensure the perceived quality
Solution: obtain video semantic information Solution: obtain video semantic information by manual/semi-automatic annotationby manual/semi-automatic annotation
Usage:Usage: RetrievalRetrieval SummarySummary
Video semanticsVideo semantics
Semantic content template for a Semantic content template for a video shot video shot WhoWho WhereWhere What action What action What otherWhat other WhenWhen Dialog scriptDialog script
Concept term and video shot Concept term and video shot description (user editable)description (user editable)
Video semanticsVideo semantics
Concept term and video shot descriptionConcept term and video shot description Term: denote an entity, e.g. “Joe”, “talking”, “in the Term: denote an entity, e.g. “Joe”, “talking”, “in the
bank”bank” Context: “who”, “what action”…Context: “who”, “what action”… Shot description: the set comprising all the concept Shot description: the set comprising all the concept
terms that is related to the shot terms that is related to the shot Obtained by semi-automatic or video Obtained by semi-automatic or video
annotationannotation
}....{ 1 ntt
Video Edit ProcessVideo Edit Process
Shoot a set of video shot groups with similar Shoot a set of video shot groups with similar semantic content (takes)semantic content (takes)
Select video shots from the takes then arrange Select video shots from the takes then arrange the video shots from different video shot the video shots from different video shot groups to depict the story scenegroups to depict the story scene
Video summarizationVideo summarization
Recover the semantic video shot groupsRecover the semantic video shot groups Video summarization can be viewed as an Video summarization can be viewed as an
“inversion” of video editing, then select the “inversion” of video editing, then select the important partsimportant parts
Mutual Reinforcement Mutual Reinforcement
Given the annotated video shotsGiven the annotated video shots How to measure the priority for a set of concept terms How to measure the priority for a set of concept terms
and a set of descriptions? Who is the most important and a set of descriptions? Who is the most important person? Which shot is the most important one?person? Which shot is the most important one? A more important description contains more important A more important description contains more important
terms;terms; A more important term should be contained by more A more important term should be contained by more
important descriptionsimportant descriptions Mutual reinforcement principle [1]Mutual reinforcement principle [1]
Mutual ReinforcementMutual Reinforcement Let Let WW be the weight matrix describes the relationship be the weight matrix describes the relationship
between somebetween some terms terms and some and some shot descriptionsshot descriptions ( (WW can have various definitions, e.g. the number of can have various definitions, e.g. the number of occurrence of a term in a description)occurrence of a term in a description)
Let Let U,VU,V be the vector of the importance value of the be the vector of the importance value of the video shot description set and concept term set video shot description set and concept term set
We haveWe have
UU and and V V can be calculated by SVD of can be calculated by SVD of WW
WVk
U1
1 UW
kV T
2
1
}{ id }{ it
Mutual ReinforcementMutual Reinforcement
For each semantic context:For each semantic context: We choose the singular vectors correspond to We choose the singular vectors correspond to
W W ’s largest singular value as the importance ’s largest singular value as the importance vector for concept terms and sentencesvector for concept terms and sentences
Since Since W W is non-negative , the first singular is non-negative , the first singular vector will be non-negativevector will be non-negative
The importance score vector can be used to The importance score vector can be used to group semantic similar video shotsgroup semantic similar video shots
ExperimentsExperiments
Priority calculation on one video scenePriority calculation on one video scene Based on context “who” Based on context “who”
ExperimentsExperiments
Priority calculationPriority calculation Based on context “what action” Based on context “what action”
0
0.05
0.1
0.15
0.2
0.25
0.3
1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61 65 69 73
Series1
Shot arrangement patternsShot arrangement patterns
The way the director arrange the video shots cThe way the director arrange the video shots conveys his intentiononveys his intention
Minimal content redundancy and visual cohereMinimal content redundancy and visual coherencence
Semantic video shot group label form a stringSemantic video shot group label form a string K-Non-Repetitive Strings (K-Non-Repetitive Strings (k-nrsk-nrs)) String coverage String coverage
{3124} covers {312,124,31,12,24,3,1,2,4}{3124} covers {312,124,31,12,24,3,1,2,4}
Shot arrangement patternsShot arrangement patterns
Several detected Several detected nrsnrs strings strings
Video skim selectionVideo skim selection
dodo Select the most important Select the most important k-nrsk-nrs string into the ski string into the ski
m shot setm shot set Remove those Remove those nrsnrs strings covered by the selected s strings covered by the selected s
tringtring Until the target skim length is reachedUntil the target skim length is reached
ExperimentsExperiments We conduct the subjective testWe conduct the subjective test Compared with the previous graph based Compared with the previous graph based
algorithmalgorithm Achieve better coherencyAchieve better coherency
Future work Future work
More efficient way to annotate video shotsMore efficient way to annotate video shots Augment the semantic templateAugment the semantic template Personalized video summaryPersonalized video summary