azzopardi2012economics of iir_tech_talk

42
The Economics in Interactive Information Retrieval Leif Azzopardi http://www.dcs.gla.ac.uk/~leif

Upload: leif-azzopardi

Post on 15-Jan-2015

300 views

Category:

Technology


0 download

DESCRIPTION

In this talk, I discuss how Micro-economics can be used to describe, explain and prediction the interactions of a user and information retrieval system. The work is based on the ACM SIGIR 2011 paper ( http://dl.acm.org/citation.cfm?id=2009923 ) and is available to download from: http://www.dcs.gla.ac.uk/~leif/papers/azzopardi2011economics.pdf

TRANSCRIPT

Page 1: Azzopardi2012economics of iir_tech_talk

The Economics in Interactive Information Retrieval

Leif Azzopardihttp://www.dcs.gla.ac.uk/~leif

Page 2: Azzopardi2012economics of iir_tech_talk

Cost

Interaction

Benefit

Page 3: Azzopardi2012economics of iir_tech_talk

RelevantInformation

Interactive and Iterative Search

Queries

A simplified, abstracted, representation

Information Need

DocumentsReturned

System

User

Page 4: Azzopardi2012economics of iir_tech_talk

Observational & Empirical

Theoretical & FormalInformation Foraging Theory

ASK

Berry Picking IS&R

Framework

Pirolli (1999)

Page 5: Azzopardi2012economics of iir_tech_talk

Interactive Information Retrieval needs formal models to: • describe, explain and predict the interaction of users

with systems,• provide a basis on which to reason about interaction,• understand the relationships between interaction,

performance and cost,• help guide the design, development and research of

information systems, and• derive laws and principles of interaction.

Theoretical & Formal

A Major Research Challenge

Belkin (2008)

Jarvelin (2011)

Page 6: Azzopardi2012economics of iir_tech_talk

How do users behave?

Patent searchers typically examine 100-200 documents per query (using a Boolean system)

User queries tend to be short (only 2-3 terms) Web searchers typically

only examine the first page of results

Users adapt to degraded systems by issuing more queries

Users rarely provide explicit relevance feedback

Users will often pose a series of short queries

Patent searchers usually express longer and complex queries

Why do users behave like this?

Page 7: Azzopardi2012economics of iir_tech_talk

So why do users pose short queries?

User queries tend to be short

But longer queries tend to be more effective!

Page 8: Azzopardi2012economics of iir_tech_talk

So why do users pose short queries?

0 5 10 15 20 25 300

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

Query Length (No. of Terms)

Perf

orm

ance

Exponentially diminishing returns kicks in after 2 query terms

Around 2-3 terms is where the user gets the most bang for their buck

Marginal Performance

Total Performance

Azzopardi (2009)

Page 9: Azzopardi2012economics of iir_tech_talk

How can we use microeconomics to model the search process?

Page 10: Azzopardi2012economics of iir_tech_talk

Microeconomics

Consumer Theory

Production Theory

Utility Maximization

Cost Minimization

Page 11: Azzopardi2012economics of iir_tech_talk

Production Theorya.k.a. Theory of Firms

OutputInputs

The Firm

Technology

Utilizes Constrains

CapitalLabor

Widgets

Varian (1987)

Page 12: Azzopardi2012economics of iir_tech_talk

Production FunctionsCa

pita

l

Labor

Production Function

Page 13: Azzopardi2012economics of iir_tech_talk

Production FunctionsCa

pita

l

LaborQuantity 1

Quantity 2

Quantity 3

Production Function Quantity = F ( Capital, Labor )

Page 14: Azzopardi2012economics of iir_tech_talk

Production FunctionsCa

pita

l

LaborQuantity 1

Quantity 2

Quantity 3

Production FunctionProduction Set

Page 15: Azzopardi2012economics of iir_tech_talk

Production FunctionsCa

pita

l

LaborQuantity 1

Quantity 2

Quantity 3

Technology constrainsthe production set

Production FunctionProduction Set

Page 16: Azzopardi2012economics of iir_tech_talk

Applying Production Theory to Interactive Information Retrieval

Page 17: Azzopardi2012economics of iir_tech_talk

RelevantInformation

Interactive and Iterative Search

Queries

A simplified, abstracted, representation

Information Need

DocumentsReturned

System

User

Page 18: Azzopardi2012economics of iir_tech_talk

Search as Production

OutputInputs

The Firm

Search Engine Technology

Utilizes Constrains

QueriesAssessments

Relevance Gain

Page 19: Azzopardi2012economics of iir_tech_talk

Search Production FunctionN

o. o

f Que

ries

(Q)

No. of Assessments per Query (A)Gain = 10

Gain = 20

Gain = 30

Gain = F(Q,A)

The function represents how well a system could be used. i.e. the min input required to achieve that level of gain

Page 20: Azzopardi2012economics of iir_tech_talk

Few Queries,

Lots of Assessment

s?

Lots of Queries,

Few Assessment

s?

Or someother way?

What strategies can the user employwhen interacting with the search system to achieve their end goal

What is the most cost-efficient way for a user to interact with an IR system?

Page 21: Azzopardi2012economics of iir_tech_talk

Modeling Caveatsof an economic model of the search process

AbstractedSimplified

Representative

Gain = F(Q,A)

Page 22: Azzopardi2012economics of iir_tech_talk

What does the model tell us about search & interaction?

Page 23: Azzopardi2012economics of iir_tech_talk

ScenarioSearch Scenario

• Task: Find news articles about ….

• Goal: To find a number of relevant documents and reach the desired level of Cumulative Gain.

• Output: Total Cumulative Gain (G) across the session

• Inputs:

– Y No. of Queries, and

– X No. of Assessments per Query

• Collections:

– TREC News Collections (AP, LA, Aquaint)

– Each topic had about 30 or more relevant documents

• Simulation: built using C++ and the Lemur IR toolkit

Page 24: Azzopardi2012economics of iir_tech_talk

Simulating User Interaction

TREC Documents marked Relevant

Issues Y Queries of Length 3

TREC Aquaint Topics

AssessesX Documents per QuerySimulated User

Models:ProbabilisticVector SpaceBoolean

Queries generated from Relevant set

Record X & Y for each level of gain

Select the best query first/next

The simulation assumes the user has perfect information – in order to find out how well the system could be used.

Page 25: Azzopardi2012economics of iir_tech_talk

0 50 100 150 200 250 3000

2

4

6

8

10

12

14

16

18

20

BM25 NCG=0.2

BM25 NCG=0.4

Search Production Curves

No. of Assessments per Query

No.

of Q

uerie

sTREC Aquaint Collection

8 Q & 15 Q/A gets NCG = 0.44 Q & 40 Q/A gets NCG = 0.4

7.7 Q & 5 Q/A gets NCG = 0.23.6 Q & 15 Q/A gets NCG = 0.2

Same Retrieval Model, Different Gain

To double the gain, requires more than double the no. of assessments

Page 26: Azzopardi2012economics of iir_tech_talk

0 50 100 150 200 250 3000

2

4

6

8

10

12

14

16

18

20BM25 NCG=0.4

BOOL NCG=0.4

TFIDF NCG=0.4

Search Production Curves

No. of Assessments per Query

No.

of Q

uerie

s

TREC Aquaint Collection

No input combinations with depth less than this are technically feasible!

BM25 provides more strategies (i.e. input combinations) than BOOL or TFIDF

User Adaption:-BM25: 5 Q @ 25 A/Q-BOOL: 10 Q @ 25A/QMore queries on the degraded systems

For the same gain, BOOL and TFIDF require a lot more interaction.

Different Retrieval Models, Same Gain

Page 27: Azzopardi2012economics of iir_tech_talk

Search Production FunctionCobbs-Douglas Production Function

Model K α Goodness of FitBM25 5.39 0.58 0.995BOOL 3.47 0.58 0.992TFIDF 1.69 0.50 0.997

Example Values on Aquaint when NCG = 0.6

No. of queries issued

No. of Assessments per query Mixing parameter determined by the technology

Efficiency of the technology used

Page 28: Azzopardi2012economics of iir_tech_talk

Using the Cobbs-Douglas Search Function

– the change in gain over the change in querying– i.e. how much more gain do we get if we pose

extra queries

We can differentiate the function to find the rates of change of the input variables

Marginal Product of Querying

Marginal Product of Assessing – the change in gain over the change in assessing– i.e. how much more gain do we get if we assess

extra documents

Page 29: Azzopardi2012economics of iir_tech_talk

Technical Rate of Substitution

0 50 100 150 200 250 3000

2

4

6

8

10

12

14

16

18

20

BM25 NCG=0.4

How many more assessments per query are needed, if one less query was posed?

0.4

1.2

2.5

4.2

8.3

No. of Assessments per Query

No.

of Q

uerie

s

TRS of Assessments for Queries

EXAMPLE:If 5 queries are submitted, instead of 6, then 24.2 docs/query need to be assessed, instead of 20 docs/query

6Q @ 20A / Q = 120 A5Q @ 24.2 / Q = 121 A

At this point if you gave up one query you’d need to assess 1.2 extra docs/query

Page 30: Azzopardi2012economics of iir_tech_talk

What about the cost of interaction?

Page 31: Azzopardi2012economics of iir_tech_talk

User Search Cost Function

No. of queries issued

No. of Assessments per query

Relative cost of a Query to an Assessment

Total no. of documents assessed

A linear cost function

What is the relative cost of a query?Using cognitive costs of querying and assessing taken from Gwizdka (2010):• The average cost of querying was 2628 ms• The average cost of assessing was 2226 ms• So β was set to 2628/2226 = 1.1598

Page 32: Azzopardi2012economics of iir_tech_talk

Cost Efficient Strategies

0 5 10 15 20 25 300

10

20

30

40

50

0 5 10 15 20 25 30130

180

230

280

330

380

BM25 0.4 and 0.6 Gains

Cost

No.

of Q

uerie

s

No. of Assessment per Query

Minimum Cost

On BM25 to increase gain pose more queries, but examine the same no. of docs per [email protected]

[email protected]

Page 33: Azzopardi2012economics of iir_tech_talk

Cost Efficient Strategies

20 40 60 80100

120140

160180

200300500700900

110013001500

20 60100

140180

02468

1012

Cost

No.

of Q

uerie

s

BOOL 0.4 & 0.6 Gains

No. of Assessment per Query

Minimum Cost

On Boolean, to increase gain,

issue the about the same no. of queries,

but examine more docs per query

[email protected]

[email protected]

Page 34: Azzopardi2012economics of iir_tech_talk

Contrasting Systems

20 40 60 80100

120140

160180

200300500700900

110013001500

20 60100

140180

02468

1012

Cost

No.

of Q

uerie

s

0 5 10 15 20 25 300

10

20

30

40

50

0 5 10 15 20 25 30130

180

230

280

330

380

BM25 0.4 and 0.6 Gains BOOL 0.4 and 0.6 Gains

Cost

No.

of Q

uerie

s

No. of Assessment per Query No. of Assessment per Query

BM25 is less costly to use than BOOL

On BM25 issue more queries

But examine less doc per [email protected]

[email protected]

Page 35: Azzopardi2012economics of iir_tech_talk

A Hypothetical Experiment

Querying costsgo down?

More queries issued

Decrease in assessments per query

Querying costs go up?

Increase in assessmentsper query

Decrease inqueries issued

$$$$

What happens if

Page 36: Azzopardi2012economics of iir_tech_talk

Changing the Relative Query CostCo

st

No. of Assessment per Query

As β increases the relative cost of querying goes up, it is cheaper to assess more documents per query and consequently query less!

Page 37: Azzopardi2012economics of iir_tech_talk

• Knowing how benefit, interaction and cost relate can help guide how we design systems – We can theorize about how changes to the system

will affect the user’s interaction• Is this desirable? Do we want the user to query more? Or

for them to assess more?

– We can categorize the type of user• Is this a savvy rational user? Or is this a user behaving

irrationally?

– We can scrutinize the introduce of new features• Are they going to be of any use? Are they worth it for the

user? i.e. how much more performance, or how little must they cost?

Implications for Design

Page 38: Azzopardi2012economics of iir_tech_talk

Future Directions• Validate the theory by conducting

observational & empirical research– Do the predictions about user behavior hold?

• Incorporate other inputs into the model– Find Similar, Relevance Feedback, Browsing, – Query length, Query Type, etc

• Develop more accurate cost functions– Obtain Better Estimates of Costs

• Model other search tasks

Future Directions

Page 39: Azzopardi2012economics of iir_tech_talk

Contact Details

Email: [email protected]

Skype: Leifos

Twitter: @leifos

Questions

Page 40: Azzopardi2012economics of iir_tech_talk

• Varian, H., Intermediate Microeconomics, 1987• Varian, H., Economics and Search, ACM SIGIR Forum,

1999• Pirolli, P., Information Foraging Theory, 1999• Belkin, N., Some (what) grand challenges of Interactive

Information Retrieval, ACM SIGIR Forum, 2008• Azzopardi, L., Query Side Evaluation, ACM SIGIR 2009

– http://dl.acm.org/citation.cfm?doid=1571941.1572037

• Azzopardi, L., The Economics of Interactive Information Retrieval, ACM SIGIR 2011 – http://dl.acm.org/citation.cfm?doid=2009916.2009923

• Jarvelin, K., IR Research: Systems, Interaction, Evaluation and Theories, ACM SIGIR Forum, 2011

Selected References

Page 41: Azzopardi2012economics of iir_tech_talk

Search Production FunctionIn

tera

ction

X

Interaction Y

G = F( X, Y )

Example

Page 42: Azzopardi2012economics of iir_tech_talk

Search Production FunctionLe

ngth

of Q

uery

(L)

No. of Assessments (A)

P@10= 0.1

P@10= 0.2

P@10= 0.3

P@10 = F(L,A)

Example application for web search