a study of awareness in multimedia search

A Study of Awareness in Multimedia Search

Robert Villa, Nick Gildea, Joemon Jose

Information Retrieval Group

April 2008

Overview

• Introduction• Collaboration and awareness in search• Research questions• Experimental study

– Inducing awareness: a game scenario– Video retrieval – Demo of multimedia retrieval system– Some results

• Conclusions

Information Retrieval

• Deals with practical and theoretical models of searching unstructured collections of documents– Idealised aim: supply the system with a

natural language description of your need– The system returns a ranked list of the

documents relevant to your need• Process is naturally probabilistic and

uncertain

Video retrieval

• Video retrieval systems index and search collections of videos– Like traditional IR, the indexing of the video

data is assumed to be automatic• Extraction of visual or audio features• Use of automatic speech recognition

– Queries are typically textual or by example

Example interface

Video retrieval

• Videos are automatically split into ‘shots’ (shot segmentation)– Shot boundaries are determined using the

visual content of video frames– Each shot is a short element of a video

• in TRECVID 2006, typically 2 to 3 seconds, although there are some much longer shots

• Shots are the element of retrieval – Where a text system retrieves documents,

video retrieval systems retrieve shots

Example of a shot

• Every shot has an associated text transcript: • E.g. “A dramatic arrival”

• Generated by Automatic Speech Recognition (ASR)• Transcript can often be very wrong

Collaborative Retrieval

• Most current search systems assume searching is a solitary activity– Is this always the case, or can

collaborative searching with one or more others be effective?

• Rather than focus on collaboration in general, we decided to look at only one aspect of collaboration - awareness

Awareness

• Awareness enables an “understanding of the activities of others”, an important aspect of collaboration

• Paul Dourish and Victoria Bellotti. “Awareness and Coordination in Shared Workspaces”, CSCW'92

• Scenario:– Two users are searching on the same task

at the same time in different places– Synchronous and remote

Previous work – collaborative search

• Cerchiamo (FXPAL, Xerox)• Adcock et. al., in TRECVID 2007• Two people collaborating, one a “gatherer” and the

other a “reviewer”

• SearchTogether (Microsoft)• Morris, M. R. (2007)• Provides a messaging system, recommending a

web page to another, query awareness, etc.

• Fischlar-DiamondTouch (DCU)• Smeaton et. al. (2007) • Table-top display which allows two people to work

around it

Research question

• Can awareness of another searcher aid a user when carrying out a multimedia search?– Will their performance increase?– Will less effort be needed to reach a given

performance?• Shots played, browsing required

– Will the user’s search behaviour change?• Number of queries executed, shots found

independently

Competitive game scenario

• We wanted to evaluate the effect of awareness in a “best case” scenario– i.e. a situation where there was some

benefit to users in being aware of another’s actions

• A competitive game scenario was used, where pairs of users competed to “win” the search tasks

Aim of the ‘game’

• The aim of the ‘game’ was to find as many relevant shots as possible for the task– Domain was video retrieval, where users

had to search a video collection for ‘shots’– Whoever finds the most shots ‘wins’– A monetary award was given to the winner

System

• Our existing video retrieval system was modified to allow collaboration– Each user could be given a view of the

other user’s search screen– This was designed to work with two

monitors:• The user’s own search interface on one screen• The other screen optionally showing the other

user’s search screen

– We supported 4 different situations

User A User B“Mutually Aware”

A can see B’s screen and B can see A’s screen

User A User B“A aware of B”

A can watch B’s screen while B cannot watch A

User A User B“B aware of A”

B can watch A’s screen while A cannot watch B

User A User B“Independent”

Both A and B cannot watch each other

System interface

Local search interface

Text Query

Search results

Shots to use in relevancefeedback

Result shots for the user

Remote search interface

User cannot see the other user’sfinal results –

only a count of the number of shot currentlymarked by the

user

This screen doesnot update

automatically, theuser must press

the “Refresh”button to update

the screen

Video browser

• Simple video browser pop’s up when the user clicks a keyframe

• Allows the user to view the shot, and move backwards and forwards in the video

Conditions

• From the point of view of an individual user:– Working independently– Cannot watch the other user, and knows

that the other user can watch him/her– Can watch the other user, and knows that

the other user cannot watch him/her– Can watch the other user, and knows that

the other user can watch him/her

TRECVID 2006 Collection

• Almost 260 hours of mostly news data from the end of 2005– CNN, LBC, CCN, etc. – Multilingual (English, Chinese and Arabic)

• Has a standard shot segmentation• ASR transcripts provided

– For Chinese and Arabic video, also automatically translated into English

TRECVID 2006 Topics

• 24 topics, of which we used the 4 worst performing overall from the interactive track– Hoped that these would be a similar

challenge for the user– Adcock et al. (2007) found that users

collaborated better on difficult tasks

Topics

TopicMedian

MAPTopic description

0189 0.038Find shots of a group including least four people dressed in suits, seated, and with at least one flag

0173 0.037Finds shots with one or more emergency vehicles in motion (e.g., ambulance, police car, fire truck, etc.)

0175 0.034Find shots with one or more people leaving or entering a vehicle

0192 0.030Find shots of a greeting by at least one kiss on the cheek

Experimental design

• Within user study was carried out• Latin square design

– 4 tasks– 4 conditions– 24 users (12 pairs)

Procedure

• Users took part in pairs• Users had 15 minutes to find as many

shots as possible• At the end of the 4 tasks, the “winner”

was announced – Each user was paid £10 – Winner got an extra £5, shared if there was

a draw

Results

• 12 competitive runs– 11 wins and 1 draw

• And there was an immediate issue with one of the user’s ...

Independent Watched Watching Mutual

MAP

Mean 0.0163 0.0199 0.0222 0.0243

SD 0.0150 0.0163 0.0165 0.0204

Precision at 10 shots

Mean 0.3083 0.3750 0.4167 0.4667

SD 0.2569 0.2996 0.3158 0.3332

Search performance

Search Performance

• No significant difference found between the level of performance in the four different conditions– Overall performance was very low (typical

in video IR, for these hard topics)– Performance does vary widely across the

four tasks • Tasks 189 and 192 performed worst

Search behaviour: queries

Ind Watched Watching Mutual

Total Queries 603 501 473 570

Queries per task

Mean(SD)

25.13

(14.72)

20.88

(13.61)

19.71

(9.34)

23.75

(13.35)

• Do users execute more queries searching alone?

• Significant variation between watching and Independent was found

Number of shots found independently

Ind Watched Watching Mutual

Total shots found 155 188 244 222

Shots found/task

Mean (SD)

6.46

(4.11)

7.83

(7.17)

10.17

(9.31)

9.25

(8.07)

• Significant interaction was found between Independent and Watching

Changes in search behaviour

• Users searched less when watching someone else– Also less searching executed in the

watched condition (not significant)• Users found more shots themselves

when watching someone else– Also found more shorts in the mutual and

watched conditions, but not significant

Search terms used

• One possible way awareness may help is by providing a user with new terms with which to use in queries

• Did users copy search terms from the remote user?– We could not directly record this in the logs

(terms are easily retyped)

Estimating copied search terms

• Search terms which could have been copied were derived from the logs

• Method:– Found the set of common terms– Found who used that term first– Checked for a click of the “refresh” button

by the user who was second– Assumed that second user could have then

copied that term

Copied terms

Watching Mutual

Total unique terms 355 388

Total terms copied 44 40

% terms copied %12 %10

• Suggests that a user is able to reuse search terms used by the other user

Searcher effortIndependent Watched Watching Mutual

Play events per search task

Mean 187.08 192.21 170.08 184.79

SD 116.51 112.91 101.28 117.52

Next shot in video events per search task

Mean 115.04 121.08 97.04 104.13

SD 106.64 99.23 96.35 107.01

Previous shot in video events per search task

Mean 16.88 20.71 12.33 12.29

SD 23.90 22.20 18.77 15.86

Searcher effort• Recorded three types of events to gauge searcher

effort– Play events when a user clicks a shot– Move to next shot in video– Move to previous shot in video

• Only significant relationship:– Watching and Watched and move to previous shot

Where did a user’s final results come from?

• From the interface, we logged the user dragging and dropping shots between the different parts of the interface– We could record when someone copied a

shot from the other user• Using this, we can estimate where

user’s got their final results– (roughly!)

Conclusions

• Despite the game scenario, users didn’t copy other people’s shots much– This came as something as a surprise

• There’s no significant increase in a user’s performance– Only a trend ...

• There is evidence that user’s do reuse search terms – 10 and 13% of terms are potentially copied

Conclusions

• Results from user effort were unclear– Only significantly less interaction in one

event

The End

a study of awareness in multimedia search

Documents