© copyright 2005 michael smith 1 ava media copyright © 2001-2003 digital media classification and...
Post on 19-Dec-2015
220 views
TRANSCRIPT
© Copyright 2005 Michael Smith 1
AVA MediaAVA Media
Copyright © 2001-2003
Digital Media Classification Digital Media Classification and Asset Managementand Asset Management
Michael SmithMichael Smith
IS246 Spring 2007IS246 Spring 2007
© Copyright 2005 Michael Smith 2
HistoryHistory
Copyright © 2001-2003
1990 – 1998 Technical Innovation• Digital Libraries and Automated Video Editing• Multimodal Content Analysis
1996 – 2002 Attempts at Commercialization• Corporate Spin Offs lead to Mergers
2000 – 2004 Broadband for Video• Internet Search and Enterprise Asset Management
2005 –• Mobile Media, Social Media and Personalization
Image and Audio Features• Camera and Object Motion• Text, Face and Object Detection
Video Summarization• Hierarchical Rules for Video Summaries• Combination of Text, Audio and Image Features
© Copyright 2005 Michael Smith 3
Goal: Automatic Video CharacterizationGoal: Automatic Video Characterization
Scene
Cuts
Camera
Objects
Action
Captions
Scenery
Yellowstone
Static
Adult
Female
Head Motion
[Logo]
Indoor
Static
Animal
Left
Motion
Yellowston
e
Outdoor
Zoom
Two
adults
[Logo]
Indoor
© Copyright 2005 Michael Smith 4
© Copyright 2005 Michael Smith 5
Static Filmstrip AbstractionStatic Filmstrip Abstraction
© Copyright 2005 Michael Smith 6
Active “Video Skim” GenerationActive “Video Skim” Generation
© Copyright 2005 Michael Smith 7
Techniques Underlying Video Metadata
• Image processing• Detection of text overlaid on video• Detection of faces• Identification of camera and object motion• Breaking video into component shots• Detecting corpus-specific categories, e.g., anchorperson shots
and weather map shots
• Speech recognition• Text extraction and alignment
• Natural language processing• Determining best text matches for a given query• Identifying places, organizations, people• Producing phrase summaries
© Copyright 2005 Michael Smith 8
Combined Technologies IntegrationCombined Technologies Integration
Text Detection
Camera Motion
Face Detection
Shot Changes
WordRelevance
Audio Level
© Copyright 2005 Michael Smith 9
Text DetectionText Detection Text and Face DetectionText and Face Detection
© Copyright 2005 Michael Smith 10
““Name-It” Face/Name AssociationName-It” Face/Name Association
Video Transcript
…said President Clinton. Al Gore presented his policies….Gore stated…. In a gala affair, Clinton addressed….
Face/Name Association(Co-occurrence evaluation)
Face Extraction
Name Extraction
Who is Gore?
Clinton
© Copyright 2005 Michael Smith 11
Camera and Motion DetectionCamera and Motion Detection
Pan
Right object motion (not pan left)
© Copyright 2005 Michael Smith 12
MPEG IBP FramesMPEG IBP Frames
© Copyright 2005 Michael Smith 13
MPEG and EditingMPEG and Editing
• Limitations in Frame and Cut Accuracy
• Most editors use raw or lossless compression
• Thompson to release JPEG 2000 based I-frame only compression
© Copyright 2005 Michael Smith 14
© Copyright 2005 Michael Smith 15
http://www.chiariglione.org/mpeg/standards/mpeg-7/mphttp://www.chiariglione.org/mpeg/standards/mpeg-7/mpeg-7.htm#E11E42eg-7.htm#E11E42
MPEG 7 OntologyMPEG 7 Ontology
© Copyright 2005 Michael Smith 16
Useful Video Format LinksUseful Video Format Links
• http://www.ultimatewebdesigning.com/articles/formats.htmlhttp://www.ultimatewebdesigning.com/articles/formats.html
• http://www.theasc.com/news/index.htmlhttp://www.theasc.com/news/index.html
• http://users.tkk.fi/~iisakkil/videoformats.htmlhttp://users.tkk.fi/~iisakkil/videoformats.html
© Copyright 2005 Michael Smith 17
Speech Recognition FunctionsSpeech Recognition Functions
• Generates transcript to enable text-based retrieval from Generates transcript to enable text-based retrieval from
spoken language documents spoken language documents
• Improves text synchronization to audio/video in Improves text synchronization to audio/video in
presence of scriptspresence of scripts
• Provides speech interface to digital libraryProvides speech interface to digital library
• Supplies necessary information for library segmentation Supplies necessary information for library segmentation
and multimedia abstractionsand multimedia abstractions
• Modern systems rely more on single phoneme detection Modern systems rely more on single phoneme detection
than double or triple phoneme pairsthan double or triple phoneme pairs
© Copyright 2005 Michael Smith 18
0
10
20
30
40
50
60
70
80
90Commercial
BenchmarkLab
TV Studio
DialogBroadcastNews
Word Error Rate
Speech Recognition Accuracy
© Copyright 2005 Michael Smith 19
Information Retrieval Recall vs. Speech Recognition Accuracy
Word Error Rate
Rel
ativ
e R
ecal
l
% of Text IR
100
90
80
70
60
50
40
0 10 20 30 40 50 60 70 80
© Copyright 2005 Michael Smith 20
Early Lessons LearnedEarly Lessons Learned
• Titles frequently used, Titles frequently used, should include length should include length and production dateand production date
• Results and title Results and title placement affect usageplacement affect usage
• Greater quantity of Greater quantity of video was desiredvideo was desired
• Storyboards (filmstrips) Storyboards (filmstrips) used infrequentlyused infrequently
© Copyright 2005 Michael Smith 21
Yahoo Search ExampleYahoo Search Example
• Search bars placed at different locations for each Search bars placed at different locations for each BrowserBrowser
• Placement of Search bar improves usagePlacement of Search bar improves usage
• Centered Placement preferred over left or right Centered Placement preferred over left or right placement in portal applicationsplacement in portal applications
© Copyright 2005 Michael Smith 22
Empirical Study: SkimsEmpirical Study: Skims
DFL
DFS
NEW
Skim Audio Skim Image
DFL - “default” long skimDFS - default short skim
NEW - selective skimRND - same audio as NEW but with unsynchronized video
© Copyright 2005 Michael Smith 23
Skim Study ResultsSkim Study Results
Subjects asked if image was in the
video just seen
Subjects asked if text summarizes info. that
would be in full source video
6
7
8
9
10
RND DFS DFL NEW FULL
Images Correct(out of 10)
7.5
10
12.5
15
RND DFS DFL NEW FULL
Phrases Correct(out of 15)
© Copyright 2002 Michael G. Christel and Alexander G. Hauptmann Carnegie Mellon
© Copyright 2005 Michael Smith 24
Skim Study QUIS ResultsSkim Study QUIS Results
1
3
5
7
9
RND DFS DFL NEW FULL
terrible-wonderful
frustrating-satisfying
dull-stimulating
wonderful, satisfying, stimulating
terrible, frustrating,
dull
© Copyright 2005 Michael Smith 25
Skims: Preliminary FindingsSkims: Preliminary Findings
• Real benefit for skims appears to be for comprehension Real benefit for skims appears to be for comprehension rather than navigationrather than navigation
• For PBS documentaries, information in audio track is For PBS documentaries, information in audio track is very importantvery important
• Empirical study conducted in September 1997 to Empirical study conducted in September 1997 to determine advantages of skims over subsampled video, determine advantages of skims over subsampled video, and synchronization requirements for audio and visualsand synchronization requirements for audio and visuals
© Copyright 2005 Michael Smith 26
Adding Imagery to VisualizationsAdding Imagery to Visualizations• Query-based thumbnail images added to timeline and map Query-based thumbnail images added to timeline and map
interfacesinterfaces
• Extend concept of “highest scoring” to represent country, Extend concept of “highest scoring” to represent country, or a point in timeor a point in time
© Copyright 2005 Michael Smith 27
How Much Text, and Does Layout Matter?How Much Text, and Does Layout Matter?
NoText
BriefByRow
Brief
AllByRow
All
© Copyright 2005 Michael Smith 28
Informedia Research TimelineInformedia Research Timeline
Copyright © 2001-2003
© Copyright 2005 Michael Smith 29
ApplicationsApplications
Copyright © 2001-2003
Corporate Spin Offs• Virage -> Autonomy • Virage -> Pictron
-> Yahoo?
• Excalibur -> Convera (Enterprise Search)
• ISLIP -> MediaSite -> Sonic Foundry
Media Asset Management• Context Media
• Semagix
• Blue Order
© Copyright 2005 Michael Smith 30
Systems That WorkSystems That Work
Copyright © 2001-2003
Image Matching• Evolution Robotics • Neven Vision
Internet Video• Google
• Yahoo
• Blinkx
© Copyright 2005 Michael Smith 31
ViPR™ AlgorithmViPR™ Algorithm
Database buildingDatabase building• SIFT feature extractionSIFT feature extraction• Add SIFT features to databaseAdd SIFT features to database
Matching a new imageMatching a new image• SIFT feature extractionSIFT feature extraction• Feature pair-wise matchingFeature pair-wise matching• Clustering by votingClustering by voting• Pose refinementPose refinement
Model
Match
© Copyright 2005 Michael Smith 32
Semantic Music CorrelationSemantic Music Correlation
Copyright © 2001-2003
Predexis • Automated Statistical Features (pitch, frequency, etc.)
Siren Systems• Pseudo Automated Features (Statistical and User
Genre)
Savage Beast• Manual Feature Set (300 – 400 features per Genre)
© Copyright 2005 Michael Smith 33
Semantic Video CorrelationSemantic Video Correlation
Copyright © 2001-2003
Commercial • Netflix, Amazon, Blockbuster, YouTube
Social• YouTube, Cuts, StumbleVideo
Research• Machine Learning on User and Commercial MetaData
• Video Buzz Tracking and Usage
• Visual Pattern Recognition?
© Copyright 2005 Michael Smith 34
Comparison of Video Buzz SitesComparison of Video Buzz Sites
Copyright material removed
See Splashcast Blog
© Copyright 2005 Michael Smith 35
What’s nextWhat’s next
ContentContent
• Media Remixing – Video RemixingMedia Remixing – Video Remixing
• User-Generated Content User-Generated Content
MonetizationMonetization
• The Legal Aspects of New MediaThe Legal Aspects of New Media
• Digital AdvertisingDigital Advertising
Emerging TechnologyEmerging Technology
• Mobility Mobility
• High Def and Super High DefHigh Def and Super High Def
• Virtual Environments and Immersive SystemsVirtual Environments and Immersive Systems• Previsualization, Previsualization,
© Copyright 2005 Michael Smith 36
2007Advertising Market Projection2007Advertising Market Projection
Source :expand-March 2007
OnlineOnline 19 Billion19 Billion
RadioRadio $21 Billion$21 Billion
OtherOther $43 Billion$43 Billion
TV TV $71 Billion$71 Billion
PrintPrint $102 Billion$102 Billion
Direct Direct $478 Billion$478 Billion
Marketing Marketing
TotalTotal $734 Billion$734 Billion
© Copyright 2005 Michael Smith 37
Emerging TechnologyEmerging Technology
High Def and Super High Def and Photo RealismHigh Def and Super High Def and Photo Realism
© Copyright 2005 Michael Smith 38
Emerging TechnologyEmerging Technology
Virtual Environments and Immersive SystemsVirtual Environments and Immersive Systems
• PrevisualizationPrevisualization
• Synthetic HumansSynthetic Humans
© Copyright 2005 Michael Smith 39
3D Visualization 3D Visualization
• 3D Previsualization3D Previsualization
Pixel Liberation Front Pixel Liberation Front www.thefront.comwww.thefront.com
• 3D morphable model face animation3D morphable model face animation
http://www.kyb.tuebingen.mpg.de/bu/people/volker/http://www.kyb.tuebingen.mpg.de/bu/people/volker/
© Copyright 2005 Michael Smith 40
Emerging TechnologyEmerging Technology
• Sports Sports • Ad insertionAd insertion• Logging - Logging -
http://www.dixonsports.com/images/liveevent/diagrams.htmlhttp://www.dixonsports.com/images/liveevent/diagrams.html
• HealthcareHealthcare• Patient MonitoringPatient Monitoring• Remote Diagnostics Remote Diagnostics
• Security and SurveillanceSecurity and Surveillance
• Forensics and DRMForensics and DRM• Cameras as SensorsCameras as Sensors• WatermarkingWatermarking
CreditsCredits
Many Informedia Project and CMU research community members Many Informedia Project and CMU research community members contributed to this work; a partial list appears here: contributed to this work; a partial list appears here:
Project Director:Project Director: Howard Wactlar Howard Wactlar
User Interface:User Interface: Mike Christel, Chang Huang, Adrienne Warmack, Dave Mike Christel, Chang Huang, Adrienne Warmack, Dave WinklerWinkler
Image Processing:Image Processing: Takeo Kanade, Norm Papernick, Toshio Sato, Takeo Kanade, Norm Papernick, Toshio Sato, Henry Schneiderman, Michael SmithHenry Schneiderman, Michael Smith
Speech and Language Processing:Speech and Language Processing: Alex Hauptmann, Ricky Alex Hauptmann, Ricky Houghton, Rong Jin, Raj Reddy, Michael WitbrockHoughton, Rong Jin, Raj Reddy, Michael Witbrock
Informedia Library Essentials:Informedia Library Essentials: Bob Baron, Bruce Cardwell, Colleen Bob Baron, Bruce Cardwell, Colleen Everett, Mark Hoy, Melissa Keaton, Bryan Maher, Craig MarcusEverett, Mark Hoy, Melissa Keaton, Bryan Maher, Craig Marcus
© Copyright 2002 Michael G. Christel and Alexander G. Hauptmann 41 Carnegie Mellon