learning by example: training users through high-quality query suggestions

31
Learning by Example: training users through high-quality query suggestions (SIGIR’15) A collaboration with Morgan Harvey & David Elsweiler. Claudia Hauff Web Information Systems

Upload: claudia-hauff

Post on 22-Jan-2018

659 views

Category:

Science


3 download

TRANSCRIPT

Page 1: Learning by example: training users through high-quality query suggestions

Learning by Example: training users through high-quality query suggestions (SIGIR’15)

A collaboration with Morgan Harvey & David Elsweiler.

Claudia HauffWeb Information Systems

Page 2: Learning by example: training users through high-quality query suggestions

0

50,000,000

100,000,000

150,000,000

200,000,000

250,000,000

300,000,000

350,000,000

Sep*12 Apr*13 Oct*13 May*14 Dec*14 Jun*15 Jan*16

Data available at https://duckduckgo.com/traffic.html

NSA collecting phone records of millions of Verizon customers daily. The Guardian. June 6, 2013.

Not everyonestays around.

Page 3: Learning by example: training users through high-quality query suggestions

I do care about privacy … until the moment my searches fail me.

@flickr:eviloars

Can we teach searchers to use an arbitrary search engine as best as possible?

Page 4: Learning by example: training users through high-quality query suggestions

@flickr:practicalowl

Advanced retrieval algorithms; queries as a given.

Assisting users in creating better queries.

query suggestions related searches query autocompletion

Personalised & context-driven search.

Educate users to become better searchers.Educate users to become better searchers.

complimentary to technical solutions system specific

Page 5: Learning by example: training users through high-quality query suggestions

• Altering the size [Franzen & Karlgren, 2000] and wording [Belkin et al., 2003] of the search box influences the length of submitted queries

• Exchanging a complex multi-field catalogue interface for a simple search box radically alters user behaviour [McKay & Buchanan, 2013]

• Training users how to construct boolean logic queries can change search behaviour [Lucas & Topi, 2004]

• Allowing users to compare their search behaviour to expert searchers enables them to reflect and change their habits [Bateman et al., 2012]

deeper in the results list [6].

Behaviour change support systems

“… information systems designed to form, alter, or reinforce attitudes or behaviours or both without using coercion or deception” [Oinas-Kukkonen & Harjumaa, 2008]

Page 6: Learning by example: training users through high-quality query suggestions

We created zing

Page 7: Learning by example: training users through high-quality query suggestions

Our questions

Are users able to notice differences between good queries and their own? Can they abstract these differences to change their own behaviour?

How effectively can users learn and abstract from good queries? Do users who are “trained” perform better than users who did not receive training?

@flickr:eviloars

Page 8: Learning by example: training users through high-quality query suggestions

Our hypotheses

@flickr:carbonnyc

H1: Users can adapt their querying behaviour to pose good queries to an unfamiliar search system.

H3: A small number of “training queries” are sufficient.

H4: A user who receives training with queries he can relate to, learns better than a user who receives training with less-relatable queries.

H5: A user who receives training with queries he can relate to, learns faster than a user who receives training with less-relatable queries.

H2: Users are able to identify salient characteristics of good queries.

Page 9: Learning by example: training users through high-quality query suggestions

A collection of user studies

Piloting zing

User perception of high-quality queries Main study: zing

Training size study

Generating training queries

All studies are based on AQUAINT and the TREC 2005 Robust track topics.

Page 10: Learning by example: training users through high-quality query suggestions

• Query quality is measured in Average Precision

• The queries should intuitively make sense to humans (instead of relying on quirks in documents)

• The queries should not be overly verbose or specific

Generating high-quality queries I

Page 11: Learning by example: training users through high-quality query suggestions

for each TREC topic

relevant documents

100 single-term queries AQUAINT

Hand-crafted filtering rules to avoid unintuitive term selection.

Generating high-quality queries II

Page 12: Learning by example: training users through high-quality query suggestions

for each TREC topic

relevant documents AQUAINT

AP-based query ranking

top two-term queries

Hand-crafted filtering rules to avoid unintuitive term selection.

Generating high-quality queries II

Page 13: Learning by example: training users through high-quality query suggestions

for each TREC topic

relevant documents AQUAINT

AP-based query ranking

3x

: top 100 queries up to length 4Hand-crafted filtering rules to avoid unintuitive term selection.

Generating high-quality queries II

Page 14: Learning by example: training users through high-quality query suggestions

Identify positive accomplishments of the Hubble telescope since it was launched in 1991. (303)

Identify drugs used in the treatment of mental illness. (383)

What is the status of The Three Gorges Project? (416)

* universe astronomer faint hubble* infrared galaxies universe hubble* infrared stars universe hubble

* antidepressant risk zoloft prozac* zoloft studies prozac* antidepressant effective zoloft

* cofferdams damming generating 2009* dam corporation phase 2009* 2009 river construction

Median AP across the 100 generated queries: 0.38

Generating high-quality queries III

Page 15: Learning by example: training users through high-quality query suggestions

A collection of user studies

Piloting

User perception of high-quality queries Main study:

Training size study

Generating training queries

Page 16: Learning by example: training users through high-quality query suggestions

You are given an information need and a query suggestion that has been derived for this information need. Rate the suggestion along four dimensions: knowledge, surprise, usage and relevance.

Identify positive accomplishments of the Hubble telescope since it was launched in 1991.

universe astronomer faint hubble

Top 15 queries per topic. Hit: 10 tasks, 12 cents. 3 workers per task.

task

User perception I

Page 17: Learning by example: training users through high-quality query suggestions

1 2 3 4 50

100

200

300

400

500

600

Rating

Number

ofratings

How surprised were you?

Not

Very1 2 3 4 5

0

200

400

600

800

Rating

Number

ofratings

Would you use the suggestion?

No

Yes

1 2 3 4 50

200

400

600

800

Rating

Number

ofratings

What will the quality of the search results be?Low

High

User perception II

Page 18: Learning by example: training users through high-quality query suggestions

1 2 3 4 50

100

200

300

400

500

600

Rating

Number

ofratings

How surprised were you?

Not

Very1 2 3 4 5

0

200

400

600

800

Rating

Number

ofratings

Would you use the suggestion?

No

Yes

1 2 3 4 50

200

400

600

800

Rating

Number

ofratings

What will the quality of the search results be?Low

High

User perception IIIndicates that our query generation approach is valid.

Many of our suggestions are not very convincing.

Expected search result quality is mostly average.

Page 19: Learning by example: training users through high-quality query suggestions

• Familiar topics tend to be of broad interest

• Topics covering specific themes attract low knowledge ratings

User perception III

What factors contributed to the growth of consumer on-line shopping? (639) 3.0/5Identify drugs used in the treatment of mental illness. (383) 2.89/5

What is the status of The Three Gorges Project? (416) 1.58/5

Page 20: Learning by example: training users through high-quality query suggestions

A collection of user studies

Piloting zing

User perception of high-quality queries Main study:

Training size study

Generating training queries

Page 21: Learning by example: training users through high-quality query suggestions

A closer look at zing

How well am I doing?

Suggestions (higher AP than user queries)after 2 initial queries.

Relevant documents aremarked by the system

Page 22: Learning by example: training users through high-quality query suggestions

Piloting• N=22 undergraduates • 10 medium difficulty topics • Randomized topic order • Reflection prompts

When does fatigue set in?

By topic 7, median AP≈0

Query characteristics 81 reflections encodedC1: Specific query termsC2: More general query termsC3: Queries not in topic descriptionC4: Unexpected or surprising vocab.C5: Surprising non-use of vocab.C6: Terms the user was surprised

at the usefulness ofC7: Thinking creativelyC8: Advanced vocabulary (rare)C9: Specialist vocabulary (rare)C10: Good combination of search termsC11: Synonyms and related conceptsC12: Query requires specialist knowledgeUsers are able to identify salient characteristics of good queries.

Page 23: Learning by example: training users through high-quality query suggestions

A collection of user studies

Piloting

User perception of high-quality queries Main study: zing

Training size study

Generating training queries

Page 24: Learning by example: training users through high-quality query suggestions

• Between-group design, N=91 • 6 medium difficulty topics • Randomized topic order • Training & test phase

Main study

Group Gexp_high Trained on high-quality suggestions, that were also perceived as high quality.

Group Gexp_low Trained on high-quality suggestions, that were perceived as low quality.

Group Gcontrol No training at any stage.

topic+suggestions

topic+suggestions topictopic

+suggestionstopic

+suggestions topic

topic topic topictopic topic topic

Page 25: Learning by example: training users through high-quality query suggestions

Main study: query effectivenessTraining topics Test topics

Users who receive high-quality training suggestions perform better on average & achieve considerably higher max. AP scores.

Page 26: Learning by example: training users through high-quality query suggestions

Main study: query sequence effectiveness

1 2 3 4 5 6 7 8 9 100

0.1

0.2

0.3

0.4

Query sequence

AveragePrecision

ControlExp_HighExp_Low

Average precision over sequences of queries on test topics. Each point represents the mean AP of all queries submitted as nth query.

Gexp_high & Gexp_low significantly outperform Gcontrol. No significant differences observed between Gexp_high & Gexp_low.

Page 27: Learning by example: training users through high-quality query suggestions

A collection of user studies

Piloting

User perception of high-quality queries Main study: zing

Training size study

Generating training queries

Page 28: Learning by example: training users through high-quality query suggestions

Training size study

• Between-group design, N=57 • Analogous setup to Main study

1 2 3 4 5 6 7 8 9 100

0.1

0.2

0.3

0.4

Query sequence

AveragePrecision

ControlExp_HighExp_Low

Main study: 4 training

& 2 test topics

1 2 3 4 5 6 7 8 9 100

0.1

0.2

0.3

0.4

Query sequence

AveragePrecision

ControlExp_HighExp_Low

Now:2 training

&4 test topicsLess training yields fewer (but still stat. significant) improvements. Similarity between Gexp_high & Gexp_low remains stable.

Page 29: Learning by example: training users through high-quality query suggestions

Looking back at our hypotheses

@flickr:carbonnyc

H1: Users can adapt their querying behaviour to pose good queries to an unfamiliar search system.

H3: A small number of “training queries” are sufficient.

H4: A user who receives training with queries he can relate to, learns better than a user who receives training with less-relatable queries.

H5: A user who receives training with queries he can relate to, learns faster than a user who receives training with less-relatable queries.

H2: Users are able to identify salient characteristics of good queries.

Page 30: Learning by example: training users through high-quality query suggestions

• Learning is limited to a single session • Does the learning effect hold across sessions and

over time?

• How to translate this approach (requiring qrels) into settings where users are unwilling to train? • Are implicit relevance indicators sufficient?

• What is the most efficient manner of presenting such “learning queries” to users?

Looking ahead

@flickr:

Page 31: Learning by example: training users through high-quality query suggestions

Ideas, comments & suggestions are more than welcome!

Thank you.

[email protected]