maximizing the synergy between man and machine

29
Carnegie Mellon Maximizing the Synergy between Man and Machine Alex Hauptmann School of Computer Science Carnegie Mellon University Exploiting Human Abilities in Video Retrieval Interfaces

Upload: haru

Post on 13-Jan-2016

49 views

Category:

Documents


6 download

DESCRIPTION

Exploiting Human Abilities in Video Retrieval Interfaces. Maximizing the Synergy between Man and Machine. Alex Hauptmann School of Computer Science Carnegie Mellon University. Background. Automatic video analysis for detection/recognition is still quite poor - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Maximizing the Synergy between Man and Machine

Carnegie Mellon

Maximizing the Synergy between Man and Machine

Alex Hauptmann

School of Computer Science

Carnegie Mellon University

Exploiting Human Abilities in Video Retrieval Interfaces

Page 2: Maximizing the Synergy between Man and Machine

Carnegie MellonCarnegie Mellon

Background

Automatic video analysis for detection/recognition is still quite poor• Consider baseline (random guessing)

• Improvement is limited

• Consider near-duplicates (trivial similarity)

• Does not generalize well over video sources

• Better than nothing

Need humans to make up for this shortcoming!

Page 3: Maximizing the Synergy between Man and Machine

Carnegie MellonCarnegie Mellon

Differences to VIRAT• Most interface work done on broadcast TV

• Harder: • Unconstrained subject matter

• Graphics, animations, photos• Broadcasters

• Many different, short shots out of context

• Easier:• Better resolution• Conventions in editing, structure• Audio track

• Keyframes are typical unit of analysis

Page 4: Maximizing the Synergy between Man and Machine

Carnegie MellonCarnegie Mellon

“Classic” Interface Work

• Interactive Video Queries• Fielded text query matching capabilities• Fast image matching with simplified interface for

launching image queries• Interactive Browsing, Filtering, and Summarizing

• Browsing by visual concepts• Quick display of contents and context in

synchronized views

Page 5: Maximizing the Synergy between Man and Machine

Carnegie MellonCarnegie Mellon

Informedia Client Interface Example

Page 6: Maximizing the Synergy between Man and Machine

Carnegie MellonCarnegie Mellon

Interface optionsChristel08

Page 7: Maximizing the Synergy between Man and Machine

Carnegie MellonCarnegie Mellon

Suggesting Related ConceptsZavesky07

Page 8: Maximizing the Synergy between Man and Machine

Carnegie MellonCarnegie Mellon

Page 9: Maximizing the Synergy between Man and Machine

Carnegie MellonCarnegie Mellon

Fork BrowserSnoek09

Page 10: Maximizing the Synergy between Man and Machine

Carnegie MellonCarnegie Mellon

“Classic” Video Interface Results

• Concept browsing and image search used frequently

• Novices still have lower performance than experts

• Some topics cause “interactivity” to be one-shot query with little browsing/exploration

• Classic Informedia interface including concept browsing often good enough that user never proceeds to any additional text or image query

• “Classic Informedia” scored highest of those testing with novice users in TRECVID evaluations

Page 11: Maximizing the Synergy between Man and Machine

Carnegie MellonCarnegie Mellon

Visual Browsing

Page 12: Maximizing the Synergy between Man and Machine

Carnegie MellonCarnegie Mellon

Augmented Video Retrieval

The computer observes the user and LEARNS, based on what is marked as relevant

The system can learn:• What image characteristics are relevant

• What text characteristics (words) are relevant

• What combination weights should be used

We exploit the human’s ability to quickly mark relevant video and the computer’s ability to learn from given examples

Page 13: Maximizing the Synergy between Man and Machine

Carnegie MellonCarnegie Mellon

Combining Concept Detectors for Retrieval

Final ranked list for interface

Combination of diverse knowledge sources

TextFind Pope John Paul

Audio ImageMotionmultimodal question

Output1 Output2 Outputn

…Closed Caption

Color Feature

Motion Feature

Audio Feature

Video library

Face … (3k)

Knowledge Source/API

Building

Page 14: Maximizing the Synergy between Man and Machine

Carnegie MellonCarnegie Mellon

Why Relevance Feedback?• Limited training data

• Untrained sources are useful for some specific searches

Query: finding some boats/ships

Txt: 0.5, Img: 0.3, Face: -0.5

Q-Type: general object

(Learned from training set)

Outdoor: ?, Ocean: ?

(Unable to be learned)

Page 15: Maximizing the Synergy between Man and Machine

Carnegie MellonCarnegie Mellon

Probabilistic Local Context Analysis (pLCA)Yan07

• Goal: Refine results of the current query • Method: assume the combination parameters of “un-learned”

sources υ to be latent variables and compute P(yj|aj,Dj)

• Discover useful search knowledge based on initial results Ai

Query

InitialSearchResult

Video1

Video 2

Video M

A1

A2

Am

Y1

Y2

Ym

υ1:?, υ2:?,…, υN:?

Page 16: Maximizing the Synergy between Man and Machine

Carnegie MellonCarnegie Mellon

Undirected Model and Parameter Estimation

Compute the posterior probability of document relevance Y given initial results A based on an undirected graphical model

D1D1Y1Y1A1A1

υυ

0

1

1( | ; , ) ( | ; ) exp ,

D

l j j

M

l lj l jll j

yP y a D Q P Q v a f Q dZ

y D

Variational inference, i.e., iterate until convergence and approximate P(yj|aj) by qyj

Maximize w.r.t. variational para. of Y,

Maximize w.r.t. variational para. of v

D2D2Y2Y2A2A2

DmDmYmYmAmAm

1

1 exp ( , )yj j l l jl

q a q f D Q

0 ( , )l l yj l jj

q q f D Q

Page 17: Maximizing the Synergy between Man and Machine

Carnegie MellonCarnegie Mellon

Automatic vs Interactive Search

Page 18: Maximizing the Synergy between Man and Machine

Carnegie MellonCarnegie Mellon

Extreme Video Retrieval

• Automatic retrieval baseline for ranking order

• Two methods of presentation: • System-controlled Presentation - Rapid Serial

Visual Presentation (RSVP)

• User-controlled Presentation – Manual Browsing with Resizing of Pages

Page 19: Maximizing the Synergy between Man and Machine

Carnegie MellonCarnegie Mellon

System-controlled Presentation• Rapid Serial Visual Presentation (RSVP)

• Minimizes eye movements • All images in same location

• Maximizes information transfer: System Human• Up to 10 key images/second• 1 or 2 images per page• Presentation intervals are dynamically adjustable • Click when relevant shot is seen• Mark previous page also as relevant

• A final verification step is necessary

Page 20: Maximizing the Synergy between Man and Machine

Carnegie MellonCarnegie Mellon

User-controlled presentations

• Manual Browsing with Resizing of Pages• Manually page through images

• User decides to view next page

• Vary the number of images on a page

• Allow chording on the keypad to mark shots

• A very brief final verification step

Page 21: Maximizing the Synergy between Man and Machine

Carnegie MellonCarnegie Mellon

MBRP - Manual Browsing with Resizable Pages

Page 22: Maximizing the Synergy between Man and Machine

Carnegie MellonCarnegie Mellon

Extreme QA with RSVP

3x3 display

1 page/second

Numpad chording to select shots

Page 23: Maximizing the Synergy between Man and Machine

Carnegie MellonCarnegie Mellon

Mindreading – an EEG interface

• LearnRelevant/Non-Relevant

• 5 EEG probes

• Simple features

• Too slow• 250 ms/image

• Significant recoverytime after hit

• Relevance feedback

Page 24: Maximizing the Synergy between Man and Machine

Carnegie MellonCarnegie Mellon

Summarizing Video: Beyond Keyframes

• BBC Rushes• Unedited video for TV series production

• Summarize as video in 1/50th of total• Note the non-scalable target factor

• Lots of smart analysis• Clustering, salience, redundancy, importance

• Best performance for retrieval was to play every frame

Page 25: Maximizing the Synergy between Man and Machine

Carnegie MellonCarnegie Mellon

Speed-up summarization results

Page 26: Maximizing the Synergy between Man and Machine

Carnegie MellonCarnegie Mellon

Surveillance Event Detection

• Interesting stuff is rare

• Detection accuracy is limited

• Monitor many streams

Page 27: Maximizing the Synergy between Man and Machine

Carnegie MellonCarnegie Mellon

Surveillance Event Detection

• Need actionnot key frame

• Difficult for humans

• Combine speed-upwith automatic analysis• Slow down when interesting stuff happens

Page 28: Maximizing the Synergy between Man and Machine

Carnegie MellonCarnegie Mellon

Summary

Interfaces have much to contribute in retrieval• We don’t know what is best

• Task-specific• User-specific• System-dependent

• Collaborative search• Combining “best of current systems”• Simpler is usually better (Occam’s razor)

General principles are difficult to find

Page 29: Maximizing the Synergy between Man and Machine

Carnegie MellonCarnegie Mellon

Questions?