humane information seeking: going beyond the ir way

1

HUMANE INFORMATION SEEKING:GOING BEYOND THE IR WAY

JIN YOUNG KIM @ SNU DCC

2

Jin Young Kim• Graduate of SNU EE / Business

• 5th Year Ph.D Student in UMass Computer Science

• Starting as a Applied Researcher at Microsoft Bing

3

Today’s Agenda• A brief introduction of IR as a research area

• An example of how we design a retrieval model

• Other research projects and recent trends in IR

4

BACKGROUNDAn Information Retrieval Primer

5

Information Retrieval?• The study of how an automated system can enable its users to access, interact with, and make sense of information.

User

Query

DocumentVisit

IssueSurface

6

IR Research in Context• Situated between human interface and system / analytics research• Aims at satisfying user’s information needs• Based on large-scale system infrastructure & analytics

• Need for convergence research!

Information Retrieval

Large-scale System Infra.

Large-scale (Text)Analytic

s

End-user Interface(UX / HCI / InfoViz)

7

Major Problems in IR• Matching• (Keyword) Search : query – document• Personalized Search : (user+query) – document• Contextual Advertising : (user+context) – advertisement

• Quality• Authority/ Spam / Freshness• Various ways to capture them

• Relevance Scoring• Combination of matching and quality features• Evaluation is critical for optimal performance

User

Query

DocumentVisit

IssueSurface

8

HUMANE INFORMATION RETRIEVALGoing Beyond the IR Way

9

You need the freedom of expression.You need someone who understands.

Information seeking requires a communication.

10Information Seeking circa 2012

Search engine accepts keywords only.Search engine doesn’t understand you.

11Toward Humane Information Seeking

Rich User Interactions

Rich User ModelingProfileContextBehavior

SearchBrowsingFiltering

from Query to SessionRich User ModelingHCIR Way: 12

Action Response

Action Response

Action Response

USER SYSTEM

InteractionHistory

Filtering / BrowsingRelevance Feedback

…

Filtering ConditionsRelated Items

…

User Model

Rich User InteractionIR Way:The

ProfileContextBehavior

HCIR = HCI + IR

13The Rest of Talk…

Web Search

Personal SearchImproving search and browsing for known-item findingEvaluating interactions combining search and browsing

User modeling based on reading level and topicProviding non-intrusive recommendations for browsing

Book SearchAnalyzing interactions combining search and filtering

14

PERSONAL SEARCHRetrieval And Evaluation Techniquesfor Personal Information [Thesis]

15

Example: Desktop SearchExample: Search over Social Media

Ranking using Multiple Document Types for Desktop Search [SIGIR10]

Evaluating Search in Personal Social

Media Collections [WSDM12]

Structured Document Retrieval: Background• Field Operator / Advanced Search Interface• User’s search terms are found in multiple fields

16

Understanding Re-finding Behavior in Naturalistic Email Interaction Logs. Elsweiler, D, Harvey, M, Hacker., M [SIGIR'11]

17

Structured Document Retrieval: Models• Document-based Retrieval Model• Score each document as a whole

• Field-based Retrieval Model• Combine evidences from each field

q1 q2 ... qm

Document-based Scoring Field-based Scoring

f1

f2

fn

...

q1 q2 ... qm

f1

f2

fn

...

f1

f2

fn

...

w1

w2

wn

w1

w2

wn

18

1

1

221

2

• Field Relevance• Different field is important for different query-term

‘james’ is relevant when it occurs in

<to>

‘registration’ is relevant when it occurs

in <subject>

Improved Matching for Email SearchStructured Documents[CIKM09, ECIR09,12]

19

Estimating the Field Relevance• If User Provides Feedback• Relevant document provides sufficient information

• If No Feedback is Available• Combine field-level term statistics from multiple sources

contenttitle

from/to

Relevant Docscontent

titlefrom/to

Collection content

titlefrom/to

Top-k Docs

+ ≅

20

Retrieval Using the Field Relevance• Comparison with Previous Work

• Ranking in the Field Relevance Model

q1 q2 ... qm

f1

f2

fn

...

f1

f2

fn

...

w1

w2

wn

w1

w2

wn

q1 q2 ... qm

f1

f2

fn

...

f1

f2

fn

...

P(F1|q1)

P(F2|q1)

P(Fn|q1)

P(F1|qm)

P(F2|qm)

P(Fn|qm)

Per-term Field Weight

Per-term Field Score

sum

multiply

21

• Retrieval Effectiveness (Metric: Mean Reciprocal Rank)

DQL BM25F MFLM FRM-C FRM-T FRM-RTREC 54.2% 59.7% 60.1% 62.4% 66.8% 79.4%IMDB 40.8% 52.4% 61.2% 63.7% 65.7% 70.4%Monster 42.9% 27.9% 46.0% 54.2% 55.8% 71.6%

Evaluating the Field Relevance Model

DQL BM25F MFLM FRM-C FRM-T FRM-R40.0%

45.0%

50.0%

55.0%

60.0%

65.0%

70.0%

75.0%

80.0%

TRECIMDBMonster

Fixed Field WeightsPer-term Field Weights

Evaluation Challenges for Personal Search• Evaluation of Personal Search• Each based on its own user study• No comparative evaluation was performed yet

• Solution: Simulated Collections• Crawl CS department webpages, docs and calendars• Recruit department people for user study

• Collecting User Logs• DocTrack: a human-computation search game• Probabilistic User Model: a method for user simulation

22

[CIKM09,SIGIR10,CIKM11]

23

DocTrack Game

24

Summary so far…• Query Modeling for Structured Documents• Using the estimated field relevance improves the retrieval• User’s feedback can help personalize the field relevance

• Evaluation Challenges in Personal Search• Simulation of the search task using game-like structures• Related work : ‘Find It If You Can’ [SIGIR11]

25

WEB SEARCHCharacterizing Web Content, User Interests, and Search Behavior by Reading Level and Topic

[WSDM12]

Reading level distribution varies across major topical categories

User Modeling by Reading Level and Topic• Reading Level and Topic• Reading Level: proficiency (comprehensibility)• Topic: topical areas of interests

• Profile Construction

• Profile Applications• Improving personalized search ranking• Enabling expert content recommendation

P(R|d1) P(T|d1)P(R|d1) P(T|d1)P(R|d1) P(T|d1) P(R,T|u)

Profile matching can predict user’s preference over search results• Metric• % of user’s preferences predicted by profile matching• Profile matching measured in KL-Divergence of RT profiles

• Results• By the degree of focus in user profile• By the distance metric between user and website

User Group #Clicks KLR(u,s) KLT(u,s) KLRT(u,s)

↑Focused 5,960 59.23% 60.79% 65.27% 147,195 52.25% 54.20% 54.41%

↓Diverse 197,733 52.75% 53.36% 53.63%

Comparing Expert vs. Non-expert URLs• Expert vs. Non-expert URLs taken from [White’09]

Higher Reading Level

Lower Topic D

iversity

30

Enabling Browsing for Web Search

• SurfCanyon®

• Recommend results based on clicks

Initial results indicate that recommendations are useful for shopping

domain.

[Work-in-progress]

31

BOOK SEARCHUnderstanding Book Search Behavior on the Web

[Submitted to SIGIR12]

32

Understanding Book Search on the Web• OpenLibrary• User-contributed online digital library• DataSet: 8M records from web server log

33

Comparison of Navigational Behavior• Users entering directly show different behaviors from users entering via web search engines

Users entering the site directly Users entering via Google

34

Comparison of Search Behavior

Rich interaction reduces the query lengthsFiltering induces more interactions than search

35

LOOKING ONWARD

36

Where’s the Future? – Social Search• The New Bing Sidebar makes search a social activity.

37

Where’s the Future? – Semantic Search• The New Google serves ‘knowledge’ as well as docs.

38

Where’s the Future? – Siri-like Agent• The New Google serves ‘knowledge’ as well as docs.

39

Exciting Future is Awaiting US!• Recommended Readings in IR:• http://www.cs.rmit.edu.au/swirl12

Any Questions

?

Selected Publications• Structured Document Retrieval• A Probabilistic Retrieval Model for Semi-structured Data [ECIR09]

• A Field Relevance Model for Structured Document Retrieval [ECIR11]

• Personal Search• Retrieval Experiments using Pseudo-Desktop Collections [CIKM09]

• Ranking using Multiple Document Types in Desktop Search [SIGIR10]

• Building a Semantic Representation for Personal Information [CIKM10]

• Evaluating an Associative Browsing Model for Personal Info. [CIKM11]

• Evaluating Search in Personal Social Media Collections [WSDM12]

• Web / Book Search• Characterizing Web Content, User Interests, and Search Behavior by Reading

Level and Topic [WSDM12]

• Understanding Book Search Behavior on the Web [In submission to SIGIR12]

40

More at @lifidea, or

cs.umass.edu/~jykim

41

My Self-tracking Efforts• Life-optimization Project (2002~2006)

• LiFiDeA Project (2011-2012)

42

OPTIONAL SLIDES

43

The Great Divide: IR vs. HCIIR

• Query / Document• Relevant Results• Ranking / Suggestions• Feature Engineering• Batch Evaluation (TREC)• SIGIR / CIKM / WSDM

HCI• User / System• User Value / Satisfaction• Interface / Visualization• Human-centered Design• User Study• CHI / UIST / CSCW

Can we learn from each other?

44

The Great Divide: IR vs. RecSysIR

• Query / Document• Reactive (given query)• SIGIR / CIKM / WSDM

RecSys• User / Item• Proactive (push item)• RecSys / KDD / UMAP

45

The Great Divide: IR in CS vs. LISIR in CS

• Focus on ranking & relevance optimization

• Batch & quantitative evaluation

• SIGIR / CIKM / WSDM

• UMass / CMU / Glasgow

IR in LIS• Focus on behavioral study & understanding

• User study & qualitative evaluation

• ASIS&T / JCDL

• UNC / Rutgers / UW

46

• What

• How

Problems & Techniques in IR

Data Format (documents, records and linked data) /Size / Dynamics (static, dynamic, streaming)

User & Domain

End User (web and library)Business User (legal, medical and patent)System Component (e.g., IBM Watson)

Needs Known-item vs. Exploratory SearchRecommendation

System Indexing and Retrieval(Platforms for Big Data Handling)

Analytics Feature ExtractionRetrieval Model Tuning & Evaluation

Presentation

User InterfaceInformation Visualization

47

More about the Matching Problem• Finding Representations• Term vector vs. Term distribution• Topical category, Reading level, …

• Estimating Representations• By counting terms• Using automatic classifiers

• Calculating Matching Scores• Cosine similarity vs. KL-divergence• Combining multiple reps.

User

Query

DocumentVisit

IssueSurface

humane information seeking: going beyond the ir way

Documents

sense of information

personal information

irmatchingkeyword search

web search personalization

ir research background

typical search engine

d student

year ph