© copyright 2005 michael smith 1 ava media copyright © 2001-2003 digital media classification and...

41
© Copyright 2005 Michael Smith 1 AVA Media AVA Media Copyright © 2001-2003 Digital Media Digital Media Classification and Classification and Asset Management Asset Management Michael Smith Michael Smith IS246 Spring 2007 IS246 Spring 2007

Post on 19-Dec-2015

220 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: © Copyright 2005 Michael Smith 1 AVA Media Copyright © 2001-2003 Digital Media Classification and Asset Management Michael Smith IS246 Spring 2007

© Copyright 2005 Michael Smith 1

AVA MediaAVA Media

Copyright © 2001-2003

Digital Media Classification Digital Media Classification and Asset Managementand Asset Management

Michael SmithMichael Smith

IS246 Spring 2007IS246 Spring 2007

Page 2: © Copyright 2005 Michael Smith 1 AVA Media Copyright © 2001-2003 Digital Media Classification and Asset Management Michael Smith IS246 Spring 2007

© Copyright 2005 Michael Smith 2

HistoryHistory

Copyright © 2001-2003

1990 – 1998 Technical Innovation• Digital Libraries and Automated Video Editing• Multimodal Content Analysis

1996 – 2002 Attempts at Commercialization• Corporate Spin Offs lead to Mergers

2000 – 2004 Broadband for Video• Internet Search and Enterprise Asset Management

2005 –• Mobile Media, Social Media and Personalization

Image and Audio Features• Camera and Object Motion• Text, Face and Object Detection

Video Summarization• Hierarchical Rules for Video Summaries• Combination of Text, Audio and Image Features

Page 3: © Copyright 2005 Michael Smith 1 AVA Media Copyright © 2001-2003 Digital Media Classification and Asset Management Michael Smith IS246 Spring 2007

© Copyright 2005 Michael Smith 3

Goal: Automatic Video CharacterizationGoal: Automatic Video Characterization

Scene

Cuts

Camera

Objects

Action

Captions

Scenery

Yellowstone

Static

Adult

Female

Head Motion

[Logo]

Indoor

Static

Animal

Left

Motion

Yellowston

e

Outdoor

Zoom

Two

adults

[Logo]

Indoor

Page 4: © Copyright 2005 Michael Smith 1 AVA Media Copyright © 2001-2003 Digital Media Classification and Asset Management Michael Smith IS246 Spring 2007

© Copyright 2005 Michael Smith 4

Page 5: © Copyright 2005 Michael Smith 1 AVA Media Copyright © 2001-2003 Digital Media Classification and Asset Management Michael Smith IS246 Spring 2007

© Copyright 2005 Michael Smith 5

Static Filmstrip AbstractionStatic Filmstrip Abstraction

Page 6: © Copyright 2005 Michael Smith 1 AVA Media Copyright © 2001-2003 Digital Media Classification and Asset Management Michael Smith IS246 Spring 2007

© Copyright 2005 Michael Smith 6

Active “Video Skim” GenerationActive “Video Skim” Generation

Page 7: © Copyright 2005 Michael Smith 1 AVA Media Copyright © 2001-2003 Digital Media Classification and Asset Management Michael Smith IS246 Spring 2007

© Copyright 2005 Michael Smith 7

Techniques Underlying Video Metadata

• Image processing• Detection of text overlaid on video• Detection of faces• Identification of camera and object motion• Breaking video into component shots• Detecting corpus-specific categories, e.g., anchorperson shots

and weather map shots

• Speech recognition• Text extraction and alignment

• Natural language processing• Determining best text matches for a given query• Identifying places, organizations, people• Producing phrase summaries

Page 8: © Copyright 2005 Michael Smith 1 AVA Media Copyright © 2001-2003 Digital Media Classification and Asset Management Michael Smith IS246 Spring 2007

© Copyright 2005 Michael Smith 8

Combined Technologies IntegrationCombined Technologies Integration

Text Detection

Camera Motion

Face Detection

Shot Changes

WordRelevance

Audio Level

Page 9: © Copyright 2005 Michael Smith 1 AVA Media Copyright © 2001-2003 Digital Media Classification and Asset Management Michael Smith IS246 Spring 2007

© Copyright 2005 Michael Smith 9

Text DetectionText Detection Text and Face DetectionText and Face Detection

Page 10: © Copyright 2005 Michael Smith 1 AVA Media Copyright © 2001-2003 Digital Media Classification and Asset Management Michael Smith IS246 Spring 2007

© Copyright 2005 Michael Smith 10

““Name-It” Face/Name AssociationName-It” Face/Name Association

Video Transcript

…said President Clinton. Al Gore presented his policies….Gore stated…. In a gala affair, Clinton addressed….

Face/Name Association(Co-occurrence evaluation)

Face Extraction

Name Extraction

Who is Gore?

Clinton

Page 11: © Copyright 2005 Michael Smith 1 AVA Media Copyright © 2001-2003 Digital Media Classification and Asset Management Michael Smith IS246 Spring 2007

© Copyright 2005 Michael Smith 11

Camera and Motion DetectionCamera and Motion Detection

Pan

Right object motion (not pan left)

Page 12: © Copyright 2005 Michael Smith 1 AVA Media Copyright © 2001-2003 Digital Media Classification and Asset Management Michael Smith IS246 Spring 2007

© Copyright 2005 Michael Smith 12

MPEG IBP FramesMPEG IBP Frames

Page 13: © Copyright 2005 Michael Smith 1 AVA Media Copyright © 2001-2003 Digital Media Classification and Asset Management Michael Smith IS246 Spring 2007

© Copyright 2005 Michael Smith 13

MPEG and EditingMPEG and Editing

• Limitations in Frame and Cut Accuracy

• Most editors use raw or lossless compression

• Thompson to release JPEG 2000 based I-frame only compression

Page 14: © Copyright 2005 Michael Smith 1 AVA Media Copyright © 2001-2003 Digital Media Classification and Asset Management Michael Smith IS246 Spring 2007

© Copyright 2005 Michael Smith 14

Page 15: © Copyright 2005 Michael Smith 1 AVA Media Copyright © 2001-2003 Digital Media Classification and Asset Management Michael Smith IS246 Spring 2007

© Copyright 2005 Michael Smith 15

http://www.chiariglione.org/mpeg/standards/mpeg-7/mphttp://www.chiariglione.org/mpeg/standards/mpeg-7/mpeg-7.htm#E11E42eg-7.htm#E11E42

MPEG 7 OntologyMPEG 7 Ontology

Page 16: © Copyright 2005 Michael Smith 1 AVA Media Copyright © 2001-2003 Digital Media Classification and Asset Management Michael Smith IS246 Spring 2007

© Copyright 2005 Michael Smith 16

Useful Video Format LinksUseful Video Format Links

• http://www.ultimatewebdesigning.com/articles/formats.htmlhttp://www.ultimatewebdesigning.com/articles/formats.html

• http://www.theasc.com/news/index.htmlhttp://www.theasc.com/news/index.html

• http://users.tkk.fi/~iisakkil/videoformats.htmlhttp://users.tkk.fi/~iisakkil/videoformats.html

Page 17: © Copyright 2005 Michael Smith 1 AVA Media Copyright © 2001-2003 Digital Media Classification and Asset Management Michael Smith IS246 Spring 2007

© Copyright 2005 Michael Smith 17

Speech Recognition FunctionsSpeech Recognition Functions

• Generates transcript to enable text-based retrieval from Generates transcript to enable text-based retrieval from

spoken language documents spoken language documents

• Improves text synchronization to audio/video in Improves text synchronization to audio/video in

presence of scriptspresence of scripts

• Provides speech interface to digital libraryProvides speech interface to digital library

• Supplies necessary information for library segmentation Supplies necessary information for library segmentation

and multimedia abstractionsand multimedia abstractions

• Modern systems rely more on single phoneme detection Modern systems rely more on single phoneme detection

than double or triple phoneme pairsthan double or triple phoneme pairs

Page 18: © Copyright 2005 Michael Smith 1 AVA Media Copyright © 2001-2003 Digital Media Classification and Asset Management Michael Smith IS246 Spring 2007

© Copyright 2005 Michael Smith 18

0

10

20

30

40

50

60

70

80

90Commercial

BenchmarkLab

TV Studio

DialogBroadcastNews

Word Error Rate

Speech Recognition Accuracy

Page 19: © Copyright 2005 Michael Smith 1 AVA Media Copyright © 2001-2003 Digital Media Classification and Asset Management Michael Smith IS246 Spring 2007

© Copyright 2005 Michael Smith 19

Information Retrieval Recall vs. Speech Recognition Accuracy

Word Error Rate

Rel

ativ

e R

ecal

l

% of Text IR

100

90

80

70

60

50

40

0 10 20 30 40 50 60 70 80

Page 20: © Copyright 2005 Michael Smith 1 AVA Media Copyright © 2001-2003 Digital Media Classification and Asset Management Michael Smith IS246 Spring 2007

© Copyright 2005 Michael Smith 20

Early Lessons LearnedEarly Lessons Learned

• Titles frequently used, Titles frequently used, should include length should include length and production dateand production date

• Results and title Results and title placement affect usageplacement affect usage

• Greater quantity of Greater quantity of video was desiredvideo was desired

• Storyboards (filmstrips) Storyboards (filmstrips) used infrequentlyused infrequently

Page 21: © Copyright 2005 Michael Smith 1 AVA Media Copyright © 2001-2003 Digital Media Classification and Asset Management Michael Smith IS246 Spring 2007

© Copyright 2005 Michael Smith 21

Yahoo Search ExampleYahoo Search Example

• Search bars placed at different locations for each Search bars placed at different locations for each BrowserBrowser

• Placement of Search bar improves usagePlacement of Search bar improves usage

• Centered Placement preferred over left or right Centered Placement preferred over left or right placement in portal applicationsplacement in portal applications

Page 22: © Copyright 2005 Michael Smith 1 AVA Media Copyright © 2001-2003 Digital Media Classification and Asset Management Michael Smith IS246 Spring 2007

© Copyright 2005 Michael Smith 22

Empirical Study: SkimsEmpirical Study: Skims

DFL

DFS

NEW

Skim Audio Skim Image

DFL - “default” long skimDFS - default short skim

NEW - selective skimRND - same audio as NEW but with unsynchronized video

Page 23: © Copyright 2005 Michael Smith 1 AVA Media Copyright © 2001-2003 Digital Media Classification and Asset Management Michael Smith IS246 Spring 2007

© Copyright 2005 Michael Smith 23

Skim Study ResultsSkim Study Results

Subjects asked if image was in the

video just seen

Subjects asked if text summarizes info. that

would be in full source video

6

7

8

9

10

RND DFS DFL NEW FULL

Images Correct(out of 10)

7.5

10

12.5

15

RND DFS DFL NEW FULL

Phrases Correct(out of 15)

© Copyright 2002 Michael G. Christel and Alexander G. Hauptmann Carnegie Mellon

Page 24: © Copyright 2005 Michael Smith 1 AVA Media Copyright © 2001-2003 Digital Media Classification and Asset Management Michael Smith IS246 Spring 2007

© Copyright 2005 Michael Smith 24

Skim Study QUIS ResultsSkim Study QUIS Results

1

3

5

7

9

RND DFS DFL NEW FULL

terrible-wonderful

frustrating-satisfying

dull-stimulating

wonderful, satisfying, stimulating

terrible, frustrating,

dull

Page 25: © Copyright 2005 Michael Smith 1 AVA Media Copyright © 2001-2003 Digital Media Classification and Asset Management Michael Smith IS246 Spring 2007

© Copyright 2005 Michael Smith 25

Skims: Preliminary FindingsSkims: Preliminary Findings

• Real benefit for skims appears to be for comprehension Real benefit for skims appears to be for comprehension rather than navigationrather than navigation

• For PBS documentaries, information in audio track is For PBS documentaries, information in audio track is very importantvery important

• Empirical study conducted in September 1997 to Empirical study conducted in September 1997 to determine advantages of skims over subsampled video, determine advantages of skims over subsampled video, and synchronization requirements for audio and visualsand synchronization requirements for audio and visuals

Page 26: © Copyright 2005 Michael Smith 1 AVA Media Copyright © 2001-2003 Digital Media Classification and Asset Management Michael Smith IS246 Spring 2007

© Copyright 2005 Michael Smith 26

Adding Imagery to VisualizationsAdding Imagery to Visualizations• Query-based thumbnail images added to timeline and map Query-based thumbnail images added to timeline and map

interfacesinterfaces

• Extend concept of “highest scoring” to represent country, Extend concept of “highest scoring” to represent country, or a point in timeor a point in time

Page 27: © Copyright 2005 Michael Smith 1 AVA Media Copyright © 2001-2003 Digital Media Classification and Asset Management Michael Smith IS246 Spring 2007

© Copyright 2005 Michael Smith 27

How Much Text, and Does Layout Matter?How Much Text, and Does Layout Matter?

NoText

BriefByRow

Brief

AllByRow

All

Page 28: © Copyright 2005 Michael Smith 1 AVA Media Copyright © 2001-2003 Digital Media Classification and Asset Management Michael Smith IS246 Spring 2007

© Copyright 2005 Michael Smith 28

Informedia Research TimelineInformedia Research Timeline

Copyright © 2001-2003

Page 29: © Copyright 2005 Michael Smith 1 AVA Media Copyright © 2001-2003 Digital Media Classification and Asset Management Michael Smith IS246 Spring 2007

© Copyright 2005 Michael Smith 29

ApplicationsApplications

Copyright © 2001-2003

Corporate Spin Offs• Virage -> Autonomy • Virage -> Pictron

-> Yahoo?

• Excalibur -> Convera (Enterprise Search)

• ISLIP -> MediaSite -> Sonic Foundry

Media Asset Management• Context Media

• Semagix

• Blue Order

Page 30: © Copyright 2005 Michael Smith 1 AVA Media Copyright © 2001-2003 Digital Media Classification and Asset Management Michael Smith IS246 Spring 2007

© Copyright 2005 Michael Smith 30

Systems That WorkSystems That Work

Copyright © 2001-2003

Image Matching• Evolution Robotics • Neven Vision

Internet Video• Google

• Yahoo

• Blinkx

Page 31: © Copyright 2005 Michael Smith 1 AVA Media Copyright © 2001-2003 Digital Media Classification and Asset Management Michael Smith IS246 Spring 2007

© Copyright 2005 Michael Smith 31

ViPR™ AlgorithmViPR™ Algorithm

Database buildingDatabase building• SIFT feature extractionSIFT feature extraction• Add SIFT features to databaseAdd SIFT features to database

Matching a new imageMatching a new image• SIFT feature extractionSIFT feature extraction• Feature pair-wise matchingFeature pair-wise matching• Clustering by votingClustering by voting• Pose refinementPose refinement

Model

Match

Page 32: © Copyright 2005 Michael Smith 1 AVA Media Copyright © 2001-2003 Digital Media Classification and Asset Management Michael Smith IS246 Spring 2007

© Copyright 2005 Michael Smith 32

Semantic Music CorrelationSemantic Music Correlation

Copyright © 2001-2003

Predexis • Automated Statistical Features (pitch, frequency, etc.)

Siren Systems• Pseudo Automated Features (Statistical and User

Genre)

Savage Beast• Manual Feature Set (300 – 400 features per Genre)

Page 33: © Copyright 2005 Michael Smith 1 AVA Media Copyright © 2001-2003 Digital Media Classification and Asset Management Michael Smith IS246 Spring 2007

© Copyright 2005 Michael Smith 33

Semantic Video CorrelationSemantic Video Correlation

Copyright © 2001-2003

Commercial • Netflix, Amazon, Blockbuster, YouTube

Social• YouTube, Cuts, StumbleVideo

Research• Machine Learning on User and Commercial MetaData

• Video Buzz Tracking and Usage

• Visual Pattern Recognition?

Page 34: © Copyright 2005 Michael Smith 1 AVA Media Copyright © 2001-2003 Digital Media Classification and Asset Management Michael Smith IS246 Spring 2007

© Copyright 2005 Michael Smith 34

Comparison of Video Buzz SitesComparison of Video Buzz Sites

Copyright material removed

See Splashcast Blog

Page 35: © Copyright 2005 Michael Smith 1 AVA Media Copyright © 2001-2003 Digital Media Classification and Asset Management Michael Smith IS246 Spring 2007

© Copyright 2005 Michael Smith 35

What’s nextWhat’s next

ContentContent

• Media Remixing – Video RemixingMedia Remixing – Video Remixing

• User-Generated Content User-Generated Content

MonetizationMonetization

• The Legal Aspects of New MediaThe Legal Aspects of New Media

• Digital AdvertisingDigital Advertising

Emerging TechnologyEmerging Technology

• Mobility Mobility

• High Def and Super High DefHigh Def and Super High Def

• Virtual Environments and Immersive SystemsVirtual Environments and Immersive Systems• Previsualization, Previsualization,

Page 36: © Copyright 2005 Michael Smith 1 AVA Media Copyright © 2001-2003 Digital Media Classification and Asset Management Michael Smith IS246 Spring 2007

© Copyright 2005 Michael Smith 36

2007Advertising Market Projection2007Advertising Market Projection

Source :expand-March 2007

OnlineOnline 19 Billion19 Billion

RadioRadio $21 Billion$21 Billion

OtherOther $43 Billion$43 Billion

TV TV $71 Billion$71 Billion

PrintPrint $102 Billion$102 Billion

Direct Direct $478 Billion$478 Billion

Marketing Marketing

TotalTotal $734 Billion$734 Billion

Page 37: © Copyright 2005 Michael Smith 1 AVA Media Copyright © 2001-2003 Digital Media Classification and Asset Management Michael Smith IS246 Spring 2007

© Copyright 2005 Michael Smith 37

Emerging TechnologyEmerging Technology

High Def and Super High Def and Photo RealismHigh Def and Super High Def and Photo Realism

Page 38: © Copyright 2005 Michael Smith 1 AVA Media Copyright © 2001-2003 Digital Media Classification and Asset Management Michael Smith IS246 Spring 2007

© Copyright 2005 Michael Smith 38

Emerging TechnologyEmerging Technology

Virtual Environments and Immersive SystemsVirtual Environments and Immersive Systems

• PrevisualizationPrevisualization

• Synthetic HumansSynthetic Humans

Page 39: © Copyright 2005 Michael Smith 1 AVA Media Copyright © 2001-2003 Digital Media Classification and Asset Management Michael Smith IS246 Spring 2007

© Copyright 2005 Michael Smith 39

3D Visualization 3D Visualization

• 3D Previsualization3D Previsualization

Pixel Liberation Front Pixel Liberation Front www.thefront.comwww.thefront.com

• 3D morphable model face animation3D morphable model face animation

http://www.kyb.tuebingen.mpg.de/bu/people/volker/http://www.kyb.tuebingen.mpg.de/bu/people/volker/

Page 40: © Copyright 2005 Michael Smith 1 AVA Media Copyright © 2001-2003 Digital Media Classification and Asset Management Michael Smith IS246 Spring 2007

© Copyright 2005 Michael Smith 40

Emerging TechnologyEmerging Technology

• Sports Sports • Ad insertionAd insertion• Logging - Logging -

http://www.dixonsports.com/images/liveevent/diagrams.htmlhttp://www.dixonsports.com/images/liveevent/diagrams.html

• HealthcareHealthcare• Patient MonitoringPatient Monitoring• Remote Diagnostics Remote Diagnostics

• Security and SurveillanceSecurity and Surveillance

• Forensics and DRMForensics and DRM• Cameras as SensorsCameras as Sensors• WatermarkingWatermarking

Page 41: © Copyright 2005 Michael Smith 1 AVA Media Copyright © 2001-2003 Digital Media Classification and Asset Management Michael Smith IS246 Spring 2007

CreditsCredits

Many Informedia Project and CMU research community members Many Informedia Project and CMU research community members contributed to this work; a partial list appears here: contributed to this work; a partial list appears here:

Project Director:Project Director: Howard Wactlar Howard Wactlar

User Interface:User Interface: Mike Christel, Chang Huang, Adrienne Warmack, Dave Mike Christel, Chang Huang, Adrienne Warmack, Dave WinklerWinkler

Image Processing:Image Processing: Takeo Kanade, Norm Papernick, Toshio Sato, Takeo Kanade, Norm Papernick, Toshio Sato, Henry Schneiderman, Michael SmithHenry Schneiderman, Michael Smith

Speech and Language Processing:Speech and Language Processing: Alex Hauptmann, Ricky Alex Hauptmann, Ricky Houghton, Rong Jin, Raj Reddy, Michael WitbrockHoughton, Rong Jin, Raj Reddy, Michael Witbrock

Informedia Library Essentials:Informedia Library Essentials: Bob Baron, Bruce Cardwell, Colleen Bob Baron, Bruce Cardwell, Colleen Everett, Mark Hoy, Melissa Keaton, Bryan Maher, Craig MarcusEverett, Mark Hoy, Melissa Keaton, Bryan Maher, Craig Marcus

© Copyright 2002 Michael G. Christel and Alexander G. Hauptmann 41 Carnegie Mellon