predicting user satisfaction with intelligent assistants

40
Predicting User Satisfaction with Intelligent Assistants Julia Kiseleva, Kyle Williams, Ahmed Hassan Awadallah, Aidan C. Crook, Imed Zitouni, Tasos Anastasakos Eindhoven University of Technology Pennsylvania State University Microsoft SIGIR’16, Pisa, Italy

Upload: julia-kiseleva

Post on 11-Jan-2017

426 views

Category:

Internet


0 download

TRANSCRIPT

Page 1: Predicting User Satisfaction with Intelligent Assistants

Predicting User Satisfaction with Intelligent Assistants

Julia Kiseleva, Kyle Williams, Ahmed Hassan Awadallah, Aidan C. Crook, Imed Zitouni, Tasos

Anastasakos

Eindhoven University of Technology Pennsylvania State University

Microsoft

SIGIR’16, Pisa, Italy

Page 2: Predicting User Satisfaction with Intelligent Assistants

From Queries to DialoguesQ1: how is the weather in ChicagoQ2: how is it this weekendQ3: find me hotelsQ4: which one of these is the cheapestQ5: which one of these has at least 4 starsQ6: find me directions from the Chicago airport to number one

User’s dialogue with

Cortana:Task is

“Finding a hotel in

Chicago”

Page 3: Predicting User Satisfaction with Intelligent Assistants

From Queries to Dialogues

Q1: find me a pharmacy nearbyQ2: which of these is highly ratedQ3: show more information about number 2Q4: how long will it take me to get thereQ5: Thanks

User’s dialogue with

Cortana:Task is

“Finding a pharmacy”

Page 4: Predicting User Satisfaction with Intelligent Assistants

Cortana: “Here are

ten restaurant

s near you”

Cortana:“Here are ten restaurants

near you that have good reviews”

Cortana:“Getting you direction to the Mayuri

Indian Cuisine”

User:“show restaur

ants near me”

User:“show

the best ones”

User:“show

directions to the second one”

From Queries to Dialogues

Page 5: Predicting User Satisfaction with Intelligent Assistants

Main Research QuestionHow can we automatically predict

user satisfaction with search dialogues on intelligent

assistants using click, touch, and voice

interactions?

Page 6: Predicting User Satisfaction with Intelligent Assistants

User:“Do I

need to have a jacket

tomorrow?”

Cortana: “You could

probably go without one. The forecast

shows …”

Single Task Search Dialogue

Page 7: Predicting User Satisfaction with Intelligent Assistants

Cortana: “Here are

ten restaurant

s near you”

Cortana:“Here are ten restaurants

near you that have good reviews”

Cortana:“Getting you direction to the Mayuri

Indian Cuisine”

User:“show restaur

ants near me”

User:“show

the best ones”

User:“show

directions to the second one”

Multi-Task Search Dialogues

Page 8: Predicting User Satisfaction with Intelligent Assistants

How to define user satisfaction with with search dialogues?

Page 9: Predicting User Satisfaction with Intelligent Assistants

Cortana: “Here are

ten restaurant

s near you”

Cortana:“Here are ten restaurants

near you that have good reviews”

Cortana:“Getting you direction to the Mayuri

Indian Cuisine”

User:“show restaur

ants near me”

User:“show

the best ones”

User:“show

directions to the second one”

No Clicks ??

?

Page 10: Predicting User Satisfaction with Intelligent Assistants

Cortana: “Here are

ten restaurant

s near you”

Cortana:“Here are ten restaurants

near you that have good reviews”

Cortana:“Getting you direction to the Mayuri

Indian Cuisine”

User:“show restaur

ants near me”

User:“show

the best ones”

User:“show

directions to the second one”

SAT?

SAT?

SAT?

Overall SAT? ? SAT

?SAT

?SAT

?

Page 11: Predicting User Satisfaction with Intelligent Assistants

User Frustration

Q1: what's the weather like in San FranciscoQ2: what's the weather like in Mountain ViewQ3: can you find me a hotel close to Mountain ViewQ4: can you show me the cheapest onesQ5: show me the third oneQ6: show me the directions from SFO to this hotelQ6: show me the directions from SFO to this hotel

Q7: go back to first hotel (misrecognition) Q8: show me hotels in Mountain ViewQ9: show me cheap hotels in Mountain ViewQ10: show me more about the third one

Dialog with Intelligent Assistant

Task is “Planning a weekend ”

Intl.

As

sist

ant l

ost

cont

ext

Rest

art

sear

chA

user

is s

atis

fied

Page 12: Predicting User Satisfaction with Intelligent Assistants

What interaction signals can track during search dialogues?

Page 13: Predicting User Satisfaction with Intelligent Assistants

Tracking User Interaction: Click Signals

• Number of queries in a dialogue

• Number of clicks in a dialogue

• Number of SAT clicks (> 30 sec. dwell time) in a dialogue

• Number of DSAT clicks (< 15 sec. dwell time) in a dialogue

• Time (seconds) until the first click in a dialogue

Page 14: Predicting User Satisfaction with Intelligent Assistants

Tracking User Interaction: Acoustic Signals

Phonetic Similarity between consecutive requests

Page 15: Predicting User Satisfaction with Intelligent Assistants

Tracking User Interaction

Page 16: Predicting User Satisfaction with Intelligent Assistants

3 seconds

6 seconds33% of

ViewPort 66% of

ViewPort

View

Port

H

eigh

t

2 seconds20% of ViewPor

t

1s 4s 0.4s 5.4s+ + =

Tracking User Interaction

Page 17: Predicting User Satisfaction with Intelligent Assistants

• Number of Swipes• Number of up-swipes• Number of down-swipes• Total distance swiped (pixels)• Number of swipes

normalized by time• Total distance divided by

num. of swipes• Total swiped distance divided

by time• Number of swipe direction

changes

• SERP answer duration (seconds) which is shown on screen (even partially)

• Fraction of visible pixels belonging to SERP answer

• Attributed time (seconds) to viewing a particular element (answer) on SERP

• Attributed time (seconds) per unit height (pixels) associated with a particular element on SERP

• Attributed time (milliseconds) per unit area (square pixels) associated with a particular element on SERP

Tracking User Interaction: Touch Signals

Page 18: Predicting User Satisfaction with Intelligent Assistants

How to collect data?

Page 19: Predicting User Satisfaction with Intelligent Assistants

User Study Participants

75%

25%

GENDERMale Female

55%

45%

LANGUAGEEnglish Other

82%

8%2% 8%

Education Computer Science

Electrical Engineering

Mathematics

Other

• 60 Participants• 25.53 +/- 5.42 years

Page 20: Predicting User Satisfaction with Intelligent Assistants

You are planning a vacation. Pick a place. Check if the weather is good enough for the period you are

planning the vacation. Find a hotel that suits you. Find the driving

directions to this place.

Page 21: Predicting User Satisfaction with Intelligent Assistants

You are planning a vacation. Pick a place. Check if the weather is good enough for the period you are

planning the vacation. Find a hotel that suits you. Find the driving

directions to this place.

Page 22: Predicting User Satisfaction with Intelligent Assistants

Questionnaire• Were you able to complete the task?

o Yes/No

• How satisfied are you with your experience in this task?o If the task has sub-tasks participants indicate their graded

satisfaction e.g. o a. How satisfied are you with your experience in finding a hotel? o b. How satisfied are you with your experience in finding directions?

• How well did Cortana recognize what you said?o 5-point Likert scale

• Did you put in a lot of effort to complete the task?o 5-point Likert scale

Page 23: Predicting User Satisfaction with Intelligent Assistants

Questionnaire• Were you able to complete the task?

o Yes/No

• How satisfied are you with your experience in this task?o If the task has sub-tasks participants indicate their graded

satisfaction e.g. o a. How satisfied are you with your experience in finding a hotel? o b. How satisfied are you with your experience in finding directions?

• How well did Cortana recognize what you said?o 5-point Likert scale

• Did you put in a lot of effort to complete the task?o 5-point Likert scale

8 Tasks: 1 simple,

4 with 2 subtasks, 3 with 3 subtasks

~ 30 Minutes

Page 24: Predicting User Satisfaction with Intelligent Assistants

Search Dialog Dataset• Total amount of queries is 2, 040 • Amount of unique queries is 1, 969• The average query-length is 7.07

Page 25: Predicting User Satisfaction with Intelligent Assistants

Search Dialog Dataset• Total amount of queries is 2, 040 • Amount of unique queries is 1, 969• The average query-length is 7.07

• The simple task generated 130 queries• Tasks with 2 context switches generated 685

queries• Tasks with 3 context switches generated 1, 355

queries

Page 26: Predicting User Satisfaction with Intelligent Assistants

How can we predict user satisfaction

with search dialogues using interaction signals?

Page 27: Predicting User Satisfaction with Intelligent Assistants

Q1: what do you have medicine for the stomach acheQ2: stomach ache medicine over the counter

General WebSERP

User’s dialogue about the ‘stomach ache’

Page 28: Predicting User Satisfaction with Intelligent Assistants

Q1: what do you have medicine for the stomach acheQ2: stomach ache medicine over the counterQ3: show me the nearest pharmacyQ4: more information on the second one

General WebSERP

Structured SERP

User’s dialogue about the ‘stomach ache’

Page 29: Predicting User Satisfaction with Intelligent Assistants

General Web and Structured SERP

Page 30: Predicting User Satisfaction with Intelligent Assistants

General Web and Structured SERP

Page 31: Predicting User Satisfaction with Intelligent Assistants

Aggregating Touch Interactions

I( )1.

Page 32: Predicting User Satisfaction with Intelligent Assistants

Aggregating Touch Interactions

I( )I( , ) 1. 2.

Page 33: Predicting User Satisfaction with Intelligent Assistants

Aggregating Touch Interactions

I( ) I( ),I( ) I( , ) 1. 2. 3.

Page 34: Predicting User Satisfaction with Intelligent Assistants

Quality of Interaction Model

Method Accuracy (%) Average F1 (%)Baseline 70.62 61.38

Interaction Model 1 78.78*(+11.55)

83.59*(+35.90)

Interaction Model 2 80.21*(+13.58)

83.31*(+35.44)

Interaction Model 3 80.81*(14.43)

79.08*(28.83)

* Statistically significant improvement (p < 0,05 )

Page 35: Predicting User Satisfaction with Intelligent Assistants

Which interaction signals havethe highest impact on

predicting user satisfaction with search dialogues?

Page 36: Predicting User Satisfaction with Intelligent Assistants

Predicting User Satisfaction• F1: The SERP for a query is ordered by a measure of relevance as

determined by the system, then additional exploration is unlikely to achieve user satisfaction, but is more likely an indication that the best-provided results (i.e. the SERP top) are insufficient to address the user intent

Page 37: Predicting User Satisfaction with Intelligent Assistants

Predicting User Satisfaction• F1: The SERP for a query is ordered by a measure of relevance as

determined by the system, then additional exploration is unlikely to achieve user satisfaction, but is more likely an indication that the best-provided results (i.e. the SERP top) are insufficient to address the user intent

• F2: In the converse case of F1, when users find content that satisfies their intent, their likelihood of scrolling is reduced, and they dwell for an extended period on the top viewport

Page 38: Predicting User Satisfaction with Intelligent Assistants

Predicting User Satisfaction• F1: The SERP for a query is ordered by a measure of relevance as

determined by the system, then additional exploration is unlikely to achieve user satisfaction, but is more likely an indication that the best-provided results (i.e. the SERP top) are insufficient to address the user intent

• F2: In the converse case of F1, when users find content that satisfies their intent, their likelihood of scrolling is reduced, and they dwell for an extended period on the top viewport

• F3: When users are involved in a complex task, they are dissatisfied when redirected to a general web SERP. Unlike F2, the absence of scrolling on this landing page is an indication of dissatisfaction

Page 39: Predicting User Satisfaction with Intelligent Assistants

How can we define user satisfaction with search dialogues?• User satisfaction with search dialogues is defined in the generalized

form, which showed understanding the nature of user satisfaction as an aggregation of satisfaction with all dialogue’s tasks and not as a satisfaction with all dialogue’s queries separately

How can we predict user satisfaction with search dialogues using interaction signals?• We showed that features derived from voice and especially from touch

and voice interactions add significant gain in accuracy over the baseline

How can we predict user satisfaction with search dialogues using interaction signals?• Our analysis showed a strong negative correlation between user

satisfaction and swipe actions

Conclusion

Page 40: Predicting User Satisfaction with Intelligent Assistants

• User satisfaction with search dialogues is defined in the generalized form, which showed understanding the nature of user satisfaction as an aggregation of satisfaction with all dialogue’s tasks and not as a satisfaction with all dialogue’s queries separately

• We showed that features derived from voice and especially from touch and voice interactions add significant gain in accuracy over the baseline

• Our analysis showed a strong negative correlation between user satisfaction and swipe actions

Thank you!

Questions?