how emotional are users' needs? emotion in query logs

60
How Emotional are Users’ Needs? Exploring Emotion in Query Logs Marina Santini 29 Jan 2013 Marina Santini - CyberEmotions2013 Warsaw University of Technology 29-30 Jan 2013 1

Upload: marina-santini

Post on 28-Jan-2015

116 views

Category:

Technology


1 download

DESCRIPTION

Emotional behaviour seems to be ubiquitous on the web. Predictably, social media web genres such as tweets, blog posts and blog comments show high emotional involvement. What about other genres on the web? In this talk, the focus is on the search query log genre. According to recent IR research, searchers’ behaviour is not only limited to traditional informational, navigational and transactional needs. A novel hypothesis is that the seeking behaviour is driven by emotion. But can emotion be detected by analysing the queries typed by users in a search box? In this talk, I will present the results of some experiments carried out to investigate whether it is possible to identify emotion in the query log genre, and discuss how emotion could be utilized to improve the relevance of retrieved documents in searches. These experiments are part of SearchInFocus, a study centred on search.

TRANSCRIPT

Page 1: How Emotional Are Users' Needs? Emotion in Query Logs

1

How Emotional are Users’

Needs?Exploring Emotion in Query Logs

Marina Santini29 Jan 2013

Marina Santini - CyberEmotions2013 Warsaw University of Technology 29-30 Jan 2013

Page 2: How Emotional Are Users' Needs? Emotion in Query Logs

Marina Santini - CyberEmotions2013 Warsaw University of Technology 29-30 Jan 2013

2

Outline• Inspirational Triggers:

o The Big Unstructured Textual Data Issue o Emotion in IRo Research hypothesis

• Genre- and Emotion- Profiling of Query Logso Characterization of genreo Definition of emotiono Benefits of genre and emotion awareness in query log analysis

• Experimentso Query Logs from GenitoriCrescono thematic blog (in iItalian)o Query Logs from Västra Götlands Region (in Swedish)

• Conclusions

Page 3: How Emotional Are Users' Needs? Emotion in Query Logs

Marina Santini - CyberEmotions2013 Warsaw University of Technology 29-30 Jan 2013

3

Inspirational Trigger 1BIG UNSTRUCTURED TEXTUAL

DATA

Page 4: How Emotional Are Users' Needs? Emotion in Query Logs

Marina Santini - CyberEmotions2013 Warsaw University of Technology 29-30 Jan 2013

4

Big Unstructured Texutal Data

“Merrill Lynch estimates that more than 85 percent of all business information exists as unstructured data –commonly appearing in e‐mails, memos, notes from call centers and support operations, news, user groups, chats, reports, letters, surveys, white papers, marketing material, research, presentations and web pages.” [DM Review Magazine, February 2003 Issue]

ECONOMIC LOSS!Lots of different genres!

Page 5: How Emotional Are Users' Needs? Emotion in Query Logs

Marina Santini - CyberEmotions2013 Warsaw University of Technology 29-30 Jan 2013

5

Simple search is not enough…

• Of course, it is possible to use simple search. But simple search is unrewarding, because is based on single terms.o ”a search is made on the term felony. In a simple search, the term

felony is used, and everywhere there is a reference to felony, a hit to an unstructured document is made. But a simple search is crude. It does not find references to crime, arson, murder, embezzlement, vehicular homicide, and such, even though these crimes are types of felonies” [ Source: Inmon, B. & A. Nesavich, "Unstructured Textual Data in the Organization" from "Managing Unstructured data in the organization", Prentice Hall 2008, pp. 1–13]

Page 6: How Emotional Are Users' Needs? Emotion in Query Logs

Marina Santini - CyberEmotions2013 Warsaw University of Technology 29-30 Jan 2013

6

Text Analytics• A set of NLP techniques that provide some

structure to textual documents. • Common components:

o Tokenizationo Morphological Analysiso Syntactic Analysiso Named Entity Recognitiono Sentiment Analysis o Automatic Summarization o Etc.

Page 7: How Emotional Are Users' Needs? Emotion in Query Logs

Marina Santini - CyberEmotions2013 Warsaw University of Technology 29-30 Jan 2013

7

Text Analytics Products and Frameworks

• Commercial:o Attensityo Clarabridgeo Temiso Lexalyticso Texifyo SASo IBM Cognoso etc.

Open Source:• GATE• NLTK• UIMA• etc.

Business Intelligence (BI)Customer Experience Management (CEM)

Page 8: How Emotional Are Users' Needs? Emotion in Query Logs

Marina Santini - CyberEmotions2013 Warsaw University of Technology 29-30 Jan 2013

8

Actionable Intelligence

• Business Intelligence (BI) + Customer Experience Management (CEM) = Actionable Intelligence

• Actionable Intelligence is information that:1. must be accurate and verifiable2. must be timely3. must be comprehensive4. must be comprehensible5. give the power to make decisions and to

act straightaway

Page 9: How Emotional Are Users' Needs? Emotion in Query Logs

Marina Santini - CyberEmotions2013 Warsaw University of Technology 29-30 Jan 2013

9

Today…

In 2003, Merryl Lynch pointed out that it was too difficult to extract automatically usable intelligence from the following genres:

o e‐mailso memoso notes from call centers and support

operationso newso user groupso chatso reportso letterso surveyso white paperso marketing material, o research, o presentationso web pages

Previous genres plus

•Blogs

•Tweets

•FB microposts

•FB comments

•Many other social network texutal

”interactions”

Page 10: How Emotional Are Users' Needs? Emotion in Query Logs

Marina Santini - CyberEmotions2013 Warsaw University of Technology 29-30 Jan 2013

10

From Big Data to Query Logs

Current State of affair Viable Alternative

1. Big Unstructured Textual Data2. Text Analytics (commercial

products and frameworks)3. Structured information for BI

and CEM

• Query Logs• Genre- & Context

aware Text Analytics• Actionable

Information (BI, CEM, sentiment, emerging topics…)

The main advantage to uses query logs (when they are available) instead of other genres consists in REDUCED DATA SIZE, REDUCED PRE-PROCESSING; REDUCED NOISE, REDUCED DATA CLEANING!

Typical Use CaseA company managing: •Website•Blog•eMails•Facebook Page•Twitter account

Page 11: How Emotional Are Users' Needs? Emotion in Query Logs

Marina Santini - CyberEmotions2013 Warsaw University of Technology 29-30 Jan 2013

11

Query Logs provide Actionable Intelligence for:- search providers- clients- end-users

SearchInFocusExploratory Study on Query Logs

and Actionable Intelligence

Exploratory Query-log

Analysis Workshop

Organized by Findwise,

AB – Sweden

SLTC 2012

Page 12: How Emotional Are Users' Needs? Emotion in Query Logs

Marina Santini - CyberEmotions2013 Warsaw University of Technology 29-30 Jan 2013

12

Inspirational Trigger 2EMOTION IN INFORMATION RETRIEVAL

(IR)

Page 13: How Emotional Are Users' Needs? Emotion in Query Logs

Marina Santini - CyberEmotions2013 Warsaw University of Technology 29-30 Jan 2013

13

Role of Emotion in Information Retrieval

by Yashar MoshfeghiPhD Thesis at University of

Glasgow, 2012

Emotion in IR

o Three concepts:• Emotion need• Emotion object• Emotion relevance

” uncover social situations

where emotion is the primary

factor (i.e., source of

motivation) in an IR&S

process.” (from the abstract)

Page 14: How Emotional Are Users' Needs? Emotion in Query Logs

Marina Santini - CyberEmotions2013 Warsaw University of Technology 29-30 Jan 2013

14

Emotion Need• The whole IR&S behaviour is driven by an emotion

need.

• An emotion need is more fundamental than an information need in the sense that if an information need exists it implies that there is an underlying emotion need to satisfy this information need.

• Emotion needs, even when they do not lead to a particular information need, can motivate searchers to use an IR system.

Page 15: How Emotional Are Users' Needs? Emotion in Query Logs

Marina Santini - CyberEmotions2013 Warsaw University of Technology 29-30 Jan 2013

15

Research Hypothesis for the exploration of emotion

in query logsIt is plausible that much of the IR&S behaviour is driven by an emotion need and that users’ emotions are expressed in the

queries that are typed in search boxes and stored in query logs.

If this is true, also emotion extraction from query logs provides actionable intelligence, because extracted emotions can be used to improve decision making and more grounded future choices.

Page 16: How Emotional Are Users' Needs? Emotion in Query Logs

Marina Santini - CyberEmotions2013 Warsaw University of Technology 29-30 Jan 2013

16

Research Questions• Is it possible to extract emotion from query logs?

• If so, is it possible to use emotion from query logs for actionable intelligence?

Page 17: How Emotional Are Users' Needs? Emotion in Query Logs

Marina Santini - CyberEmotions2013 Warsaw University of Technology 29-30 Jan 2013

17

Genre Profiling of Query Logs

Page 18: How Emotional Are Users' Needs? Emotion in Query Logs

Marina Santini - CyberEmotions2013 Warsaw University of Technology 29-30 Jan 2013

18

What characterizes a genre?

1. Must have a name2. Must be recognized within a community3. Must be produced during a task4. Must have conventions5. Must raise expectations6. Can change over time. It is an cultural artifact

(culture here includes society, media, techonology, etc.)

Page 19: How Emotional Are Users' Needs? Emotion in Query Logs

Marina Santini - CyberEmotions2013 Warsaw University of Technology 29-30 Jan 2013

19

The query log genre is…

a newly acknowledge but fully-

emerged webgenre1. Name: in line with other digital genres (ex: web log

blog)2. Community: internet users, IR practitioners3. Task: to express searchers’needs in a search engine4. Conventions: short texts written in”keywordese”5. Expectations: to find information relevant to the

query6. Cultural artifact: a product of sinternet-based

society OR a subproduct of search engines

Page 20: How Emotional Are Users' Needs? Emotion in Query Logs

Marina Santini - CyberEmotions2013 Warsaw University of Technology 29-30 Jan 2013

20

The query log genre: Languistic and Textual

Conventions• Length: short text (a query log can be seen as a

corpus of very short texts, shorter than tweets, mobile text messages, chat logs, etc.)

• Sublanguage/Jargon: ”keywordese”• Register: neutral• Morphology: REDUCED• Syntax : REDUCED (usually no subclauses, etc.)

Page 21: How Emotional Are Users' Needs? Emotion in Query Logs

Marina Santini - CyberEmotions2013 Warsaw University of Technology 29-30 Jan 2013

21

The Query Log Genre: Benefits

• wrt discourse analysis: o Conceptual lean and essential jargon

• reduced morphology• reduced syntax• short texts• mostly nouns and verbs

Benefit1: Predictable Sublanguage

• wrt BIG UNSTRUCTURED TEXTUAL DATA BENEFIT 2: REDUCED SIZE, REDUCED PRE-PROCESSING; LITTLE DATA CLEANING!

Page 22: How Emotional Are Users' Needs? Emotion in Query Logs

Marina Santini - CyberEmotions2013 Warsaw University of Technology 29-30 Jan 2013

22

Emotion Profiling of Query Logs

Page 23: How Emotional Are Users' Needs? Emotion in Query Logs

Marina Santini - CyberEmotions2013 Warsaw University of Technology 29-30 Jan 2013

23

What is emotion?BROAD DEFINITION: ANY DEGREE OF JUDGEMENTAL EVALUATION.

LIKE SENTISTRENGTH’S SCALE : DUAL 5-POINTS SYSTEM FOR POSITIVE [1; 2; 3; 4; 5]

AND NEGATIVE [-1; -2; -3; -4; -5] EMOTIONS

Page 24: How Emotional Are Users' Needs? Emotion in Query Logs

Marina Santini - CyberEmotions2013 Warsaw University of Technology 29-30 Jan 2013

24

Explorations

Page 25: How Emotional Are Users' Needs? Emotion in Query Logs

Marina Santini - CyberEmotions2013 Warsaw University of Technology 29-30 Jan 2013

25

Thematic Blog – Italian

Logs from Google Analytics

Page 26: How Emotional Are Users' Needs? Emotion in Query Logs

Marina Santini - CyberEmotions2013 Warsaw University of Technology 29-30 Jan 2013

26

Genitori Crescono

http://genitoricrescono.co

m/• Parents Grow Up:

to learn together the parent profession

• About: parenthood, childcare, maternity, upbringing, behaviours during childhood…

belongs to:

FattoreMammaNetwork

(gathers websites targeted

to mothers and written by

mothers)

Page 27: How Emotional Are Users' Needs? Emotion in Query Logs

Marina Santini - CyberEmotions2013 Warsaw University of Technology 29-30 Jan 2013

27

Queries from Google Analytics

www.genitoricrescono.com - Search Overview 2009-01-01-2012-11-10

togliere il pannolino = stop wearing nappies/stop using diapers

genitori crescono = is website name

Nopron = is the name of a controvensial syrup to make children sleep all night long

Tracy Hogg is is maternity nurse to Hollywood stars known as 'the baby whisperer' for her skill in calming unruly infants

nanna = familiar  bye-byes   (Brit)  , beddy-byes

neonato 4 mesi = 4-months-old baby

io mi svezzo da solo: I wean by myself

nulla osta = certificate of no impediment

aborto terapeutico=therapeutic abortion

Page 28: How Emotional Are Users' Needs? Emotion in Query Logs

Marina Santini - CyberEmotions2013 Warsaw University of Technology 29-30 Jan 2013

28

Zipf’s distribution

“… much research has

shown that query term

frequency distributions

conform to the power law, or

long tail distribution curves.

That is, a small portion of

the terms observed in a

large query log (e.g. > 100

million queries) are used

most often, while the

remaining terms are used

less often individually."

Page 29: How Emotional Are Users' Needs? Emotion in Query Logs

Marina Santini - CyberEmotions2013 Warsaw University of Technology 29-30 Jan 2013

29

Parts of Speech

NOUNS

VERBS

ADJECTIVES AND ADVERBS

ARTICLES AND

PREPOSITIONS

1.9

Page 30: How Emotional Are Users' Needs? Emotion in Query Logs

Marina Santini - CyberEmotions2013 Warsaw University of Technology 29-30 Jan 2013

30

Most Frequent Syntactic Patterns

inserimento al nido

bambini aggressivi

metodo estivill

Page 31: How Emotional Are Users' Needs? Emotion in Query Logs

Marina Santini - CyberEmotions2013 Warsaw University of Technology 29-30 Jan 2013

31

Average Lengths

“The average length of a

search query was 2.4 terms"

in a recent study in 2011 it

was found that the average

length of queries has grown

steadily over time and

average length of non-

English languages queries

had increased more than

English queries."

Page 32: How Emotional Are Users' Needs? Emotion in Query Logs

Marina Santini - CyberEmotions2013 Warsaw University of Technology 29-30 Jan 2013

32

Long query, informal syntax

How to stop breastfeeding and make it sleep alone i am planning second pregnacy

Page 33: How Emotional Are Users' Needs? Emotion in Query Logs

Marina Santini - CyberEmotions2013 Warsaw University of Technology 29-30 Jan 2013

33

SentiStrength (basic options)

• Queries’ Emotional Strength (i)

Page 34: How Emotional Are Users' Needs? Emotion in Query Logs

Marina Santini - CyberEmotions2013 Warsaw University of Technology 29-30 Jan 2013

34

The power of genre and the importance of

the communicative situation

• ”bambini aggressivi”

• Refinement of the concept presented in ”Topic-based Sentiment Analysis in the Social Media …” (Thelwall and Buckley, 2012): the polarity of affect words might flip according to genre and the communicative situation, and not only according the topic.

Page 35: How Emotional Are Users' Needs? Emotion in Query Logs

Marina Santini - CyberEmotions2013 Warsaw University of Technology 29-30 Jan 2013

35

Addition: Emotion Words

Page 36: How Emotional Are Users' Needs? Emotion in Query Logs

Marina Santini - CyberEmotions2013 Warsaw University of Technology 29-30 Jan 2013

36

Emotional Strength: Basic vs. Boosted

Page 37: How Emotional Are Users' Needs? Emotion in Query Logs

Marina Santini - CyberEmotions2013 Warsaw University of Technology 29-30 Jan 2013

37

NegationBasic Options Boosted

bambini che non mangiano 1 -1

quando i bambini non dormono 1 -1

bambini che non mangiano 2 -1

quando i bambini non dormono 2 -1

children who do not eat

when children do not sleep

Page 38: How Emotional Are Users' Needs? Emotion in Query Logs

Marina Santini - CyberEmotions2013 Warsaw University of Technology 29-30 Jan 2013

38

Most frequent wordTrigrams

Page 39: How Emotional Are Users' Needs? Emotion in Query Logs

Marina Santini - CyberEmotions2013 Warsaw University of Technology 29-30 Jan 2013

39

Query ”Normalization”• Stopword removal• Lemmatization• And ideally synomym expansion

Page 40: How Emotional Are Users' Needs? Emotion in Query Logs

Marina Santini - CyberEmotions2013 Warsaw University of Technology 29-30 Jan 2013

40

Ex: for increasing traffic to a

websiteIncrease emotion relevance:

• be empathetic to

searchers ’s problems by

sympathising and by

convetring the negative

words into more neutral

concepts

• Give heart and hope and

offer many solutions…

• In a few word: offer a new

communication stategy…

Use emotion needs as Actionable Intelligence

Page 41: How Emotional Are Users' Needs? Emotion in Query Logs

Marina Santini - CyberEmotions2013 Warsaw University of Technology 29-30 Jan 2013

41

Public organization website

Enterprise search and log server by Findwise,

AB.

Page 42: How Emotional Are Users' Needs? Emotion in Query Logs

Marina Santini - CyberEmotions2013 Warsaw University of Technology 29-30 Jan 2013

42

Within the Västra Götaland Region

website…

Page 43: How Emotional Are Users' Needs? Emotion in Query Logs

Marina Santini - CyberEmotions2013 Warsaw University of Technology 29-30 Jan 2013

43

…hittavård [find health care center]

Regional HealthCare

Page 44: How Emotional Are Users' Needs? Emotion in Query Logs

Marina Santini - CyberEmotions2013 Warsaw University of Technology 29-30 Jan 2013

44

VGR Corpus Description• Corpus Time frame: 2010-2011 (2 years)

• Description: “These logs come from the search at hittavard.vgregion.se. The biggest bulk should come from 1177.se. The rest should be from vgregion.se. The target audience are both VGR (Västra Götalands Region) users/employees as well as the general public, as it is a public site. The internal files aresearches made from within the VGR…”

• Corpus size:o size = 3,167 KB (only queries) (BIG DATA is usually > 1TB)o number of queries = 249,243o number of words = 306,453

• Average query length: 1.23 words

Page 45: How Emotional Are Users' Needs? Emotion in Query Logs

Marina Santini - CyberEmotions2013 Warsaw University of Technology 29-30 Jan 2013

45

VGR Top Queries

egenremiss=self-certification mina vårdkontakter=my healthcare contacts

webbisar=a invented word referring to newborn babies whose pictures have been published on the web

sjukresa/or=trip to the hospital

Page 46: How Emotional Are Users' Needs? Emotion in Query Logs

Marina Santini - CyberEmotions2013 Warsaw University of Technology 29-30 Jan 2013

46

Linguistic Remarks• At the top of the frequency

list:o Simple nounso Compoundso V+N

Simple nouns•feber•influensa•klamydia•…

Compounds•urinvägsinfektion•öroninflammation•Reseersättning•…

V+N•byta vårdcentral •avboka tid •boka tid•…

Page 47: How Emotional Are Users' Needs? Emotion in Query Logs

Marina Santini - CyberEmotions2013 Warsaw University of Technology 29-30 Jan 2013

47

More complex constructions at the

bottom

Page 48: How Emotional Are Users' Needs? Emotion in Query Logs

Marina Santini - CyberEmotions2013 Warsaw University of Technology 29-30 Jan 2013

48

SentiStrength on VGR

Page 49: How Emotional Are Users' Needs? Emotion in Query Logs

Marina Santini - CyberEmotions2013 Warsaw University of Technology 29-30 Jan 2013

49

It seems that no emotion is conveyed by VGR

users…

• Are Swedes less emotional than Italians?• Is the ”healthcare” topic less emotional than the

”childcare” topic?

Page 50: How Emotional Are Users' Needs? Emotion in Query Logs

Marina Santini - CyberEmotions2013 Warsaw University of Technology 29-30 Jan 2013

50

It might be that…There is a difference in users’ emotional behaviour when specifying queries to a web search engine OR when using a the search engine of a specialized website.

Page 51: How Emotional Are Users' Needs? Emotion in Query Logs

Marina Santini - CyberEmotions2013 Warsaw University of Technology 29-30 Jan 2013

51

Emotion Interpretation…

is not straightforward…

• There are several factors to be accounted for:o One important factor is the context of communication: similar words

or sentences can convey positive emotion in a query and negative emotion in Facebook post, for example.

Page 52: How Emotional Are Users' Needs? Emotion in Query Logs

Marina Santini - CyberEmotions2013 Warsaw University of Technology 29-30 Jan 2013

52

different communicative contexts = different

genres

Page 53: How Emotional Are Users' Needs? Emotion in Query Logs

Marina Santini - CyberEmotions2013 Warsaw University of Technology 29-30 Jan 2013

53

Genre Awaraness • In practical terms, genre awareness is important

in text analytics and sentiment analysis because, all things being equal:o let you choose the easiest and less problematic texts to

process;o help interpret and disambiguate the real meanings of

words and sentences according to the different communicative context in which they appear.

Page 54: How Emotional Are Users' Needs? Emotion in Query Logs

Marina Santini - CyberEmotions2013 Warsaw University of Technology 29-30 Jan 2013

54

In Summary

Page 55: How Emotional Are Users' Needs? Emotion in Query Logs

Marina Santini - CyberEmotions2013 Warsaw University of Technology 29-30 Jan 2013

55

Is it possible to identify and extract emotion from query

logs?• It is possible to identify and extract emotion from

web query logs.

• It seems more difficult to extract emotion from enterprise search engine query logs.

Page 56: How Emotional Are Users' Needs? Emotion in Query Logs

Marina Santini - CyberEmotions2013 Warsaw University of Technology 29-30 Jan 2013

56

Is it possible to use emotion from query logs for actionable

intelligence?

• If present, query log emotion can be used for actionable intelligence.

Page 57: How Emotional Are Users' Needs? Emotion in Query Logs

Marina Santini - CyberEmotions2013 Warsaw University of Technology 29-30 Jan 2013

57

What do you think?

Thank you for your attention!

Page 58: How Emotional Are Users' Needs? Emotion in Query Logs

Marina Santini - CyberEmotions2013 Warsaw University of Technology 29-30 Jan 2013

58

Details

Page 59: How Emotional Are Users' Needs? Emotion in Query Logs

Marina Santini - CyberEmotions2013 Warsaw University of Technology 29-30 Jan 2013

59

Benefits for the Search Provider

• Mining query logs to extract user-created knowlege, ie queries that can be used as tags (metadata)

• Quickly create domain-specific taxonomies you can capitalize upon, especially for new client companies working in related fields

• Enhancements of current search products• Inexpensive creation of annotated corpora:

document annotation through query logs is a simple technique that in the a short time will build massive annotated corpora to use for machine learning, which will allow more sophisticated search refinements.

Page 60: How Emotional Are Users' Needs? Emotion in Query Logs

Marina Santini - CyberEmotions2013 Warsaw University of Technology 29-30 Jan 2013

60

Benefits for Clients & End Users

• Somebody said: SEARCH MUST BE MIND READER!• BUT ALSO faster, more friendly, more exhaustive and

more accurate.• If this happens, clients will spend less for customer

care. If the end user finds what s/he needs online and quickly, there is no need to call an helpdesk or customer care service.

• Through the analysis of query logs, log analysts can spot the less ”satisfied” queries (i.e. user’s needs). Companies can use this information to plan future products or product enhancement or marketing strategies, etc. (BI)