video summarization using mutual reinforcement principle and shot arrangement patterns lu shi oct....

25
Video Summarization Using Video Summarization Using Mutual Reinforcement Mutual Reinforcement Principle and Shot Principle and Shot Arrangement Patterns Arrangement Patterns Lu Shi Lu Shi Oct. 4, 2004 Oct. 4, 2004

Post on 18-Dec-2015

222 views

Category:

Documents


0 download

TRANSCRIPT

Video Summarization Using Mutual Video Summarization Using Mutual Reinforcement Principle and Shot Reinforcement Principle and Shot

Arrangement PatternsArrangement Patterns

Lu ShiLu Shi

Oct. 4, 2004Oct. 4, 2004

OutlineOutline

Background Background Video semantics and annotationVideo semantics and annotation Mutual reinforcementMutual reinforcement Shot arrangement analysisShot arrangement analysis Video skim selectionVideo skim selection Preliminary experimentsPreliminary experiments

Background Background

Why video summarizationWhy video summarization Help the user to quickly grasp the content of a Help the user to quickly grasp the content of a

videovideo Video summary target: Video summary target:

ConcisenessConciseness Content coverageContent coverage CoherencyCoherency

TypeType Static and dynamicStatic and dynamic

BackgroundBackground

Two kinds of video summarization Two kinds of video summarization UnconstrainedUnconstrained

Generate a preview, only try to cover all the content of Generate a preview, only try to cover all the content of the video, only constrained by the time limit the video, only constrained by the time limit LL

Can be helped by mutual reinforcement resultCan be helped by mutual reinforcement result ConstrainedConstrained

User may have some preference on some specific User may have some preference on some specific content, like specific time range, with specific person, content, like specific time range, with specific person, etc.etc.

BackgroundBackground

4 level hierarchical video structure 4 level hierarchical video structure

System overviewSystem overview

Raw vi deo

Vi deo shots

Semanti c content descri pti on

Vi deo structural I nformati on

Semanti c vi deo shot groups andshot i mportance val ues

Fi nal vi deoski mmi mg

Key vi deo shot patterns

Vi deo segmentati on

Semanti c annotati on

Mutual rei nforcementShot arrangement pattern

anal ysi s

Sel ect and assambl e

Video semanticsVideo semantics

Low level features and high level concepts: Low level features and high level concepts: semantic gapsemantic gap

Summary based on low level features is not Summary based on low level features is not able to ensure the perceived qualityable to ensure the perceived quality

Solution: obtain video semantic information Solution: obtain video semantic information by manual/semi-automatic annotationby manual/semi-automatic annotation

Usage:Usage: RetrievalRetrieval SummarySummary

Video semanticsVideo semantics

Semantic content template for a Semantic content template for a video shot video shot WhoWho WhereWhere What action What action What otherWhat other WhenWhen Dialog scriptDialog script

Concept term and video shot Concept term and video shot description (user editable)description (user editable)

Video semanticsVideo semantics

Concept term and video shot descriptionConcept term and video shot description Term: denote an entity, e.g. “Joe”, “talking”, “in the Term: denote an entity, e.g. “Joe”, “talking”, “in the

bank”bank” Context: “who”, “what action”…Context: “who”, “what action”… Shot description: the set comprising all the concept Shot description: the set comprising all the concept

terms that is related to the shot terms that is related to the shot Obtained by semi-automatic or video Obtained by semi-automatic or video

annotationannotation

}....{ 1 ntt

Video shot annotationVideo shot annotation Annotation interfaceAnnotation interface

Video Edit ProcessVideo Edit Process

Shoot a set of video shot groups with similar Shoot a set of video shot groups with similar semantic content (takes)semantic content (takes)

Select video shots from the takes then arrange Select video shots from the takes then arrange the video shots from different video shot the video shots from different video shot groups to depict the story scenegroups to depict the story scene

Video summarizationVideo summarization

Recover the semantic video shot groupsRecover the semantic video shot groups Video summarization can be viewed as an Video summarization can be viewed as an

“inversion” of video editing, then select the “inversion” of video editing, then select the important partsimportant parts

Mutual Reinforcement Mutual Reinforcement

Given the annotated video shotsGiven the annotated video shots How to measure the priority for a set of concept terms How to measure the priority for a set of concept terms

and a set of descriptions? Who is the most important and a set of descriptions? Who is the most important person? Which shot is the most important one?person? Which shot is the most important one? A more important description contains more important A more important description contains more important

terms;terms; A more important term should be contained by more A more important term should be contained by more

important descriptionsimportant descriptions Mutual reinforcement principle [1]Mutual reinforcement principle [1]

Mutual ReinforcementMutual Reinforcement Let Let WW be the weight matrix describes the relationship be the weight matrix describes the relationship

between somebetween some terms terms and some and some shot descriptionsshot descriptions ( (WW can have various definitions, e.g. the number of can have various definitions, e.g. the number of occurrence of a term in a description)occurrence of a term in a description)

Let Let U,VU,V be the vector of the importance value of the be the vector of the importance value of the video shot description set and concept term set video shot description set and concept term set

We haveWe have

UU and and V V can be calculated by SVD of can be calculated by SVD of WW

WVk

U1

1 UW

kV T

2

1

}{ id }{ it

Mutual ReinforcementMutual Reinforcement

For each semantic context:For each semantic context: We choose the singular vectors correspond to We choose the singular vectors correspond to

W W ’s largest singular value as the importance ’s largest singular value as the importance vector for concept terms and sentencesvector for concept terms and sentences

Since Since W W is non-negative , the first singular is non-negative , the first singular vector will be non-negativevector will be non-negative

The importance score vector can be used to The importance score vector can be used to group semantic similar video shotsgroup semantic similar video shots

ExperimentsExperiments

Priority calculation on one video scenePriority calculation on one video scene Based on context “who” Based on context “who”

ExperimentsExperiments

Shot groups Shot groups

Joe

Joe and Terry

Terry

Background people

ExperimentsExperiments

Priority calculationPriority calculation Based on context “what action” Based on context “what action”

0

0.05

0.1

0.15

0.2

0.25

0.3

1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61 65 69 73

Series1

ExperimentsExperiments

Shot groupsShot groups

fight

Quarrel

Background

Shot arrangement patternsShot arrangement patterns

The way the director arrange the video shots cThe way the director arrange the video shots conveys his intentiononveys his intention

Minimal content redundancy and visual cohereMinimal content redundancy and visual coherencence

Semantic video shot group label form a stringSemantic video shot group label form a string K-Non-Repetitive Strings (K-Non-Repetitive Strings (k-nrsk-nrs)) String coverage String coverage

{3124} covers {312,124,31,12,24,3,1,2,4}{3124} covers {312,124,31,12,24,3,1,2,4}

Shot arrangement patternsShot arrangement patterns

Several detected Several detected nrsnrs strings strings

Video skim selectionVideo skim selection

dodo Select the most important Select the most important k-nrsk-nrs string into the ski string into the ski

m shot setm shot set Remove those Remove those nrsnrs strings covered by the selected s strings covered by the selected s

tringtring Until the target skim length is reachedUntil the target skim length is reached

ExperimentsExperiments We conduct the subjective testWe conduct the subjective test Compared with the previous graph based Compared with the previous graph based

algorithmalgorithm Achieve better coherencyAchieve better coherency

Future work Future work

More efficient way to annotate video shotsMore efficient way to annotate video shots Augment the semantic templateAugment the semantic template Personalized video summaryPersonalized video summary

Q & AQ & A

Thank you!!Thank you!!