clustering as presented at ux poland 2013

Post on 29-Nov-2014

762 Views

Category:

Technology

2 Downloads

Preview:

Click to see full reader

DESCRIPTION

 

TRANSCRIPT

Copyright © President & Fellows of Harvard College.

Ravi Mynampaty

Categorizing Your Search Queries to Improve Findability

About this talk…

Case study on how we are improving search and

browse by performing clustering exercises on search

query data

Not rocket science

High-level overview

You can follow this method, with your own insights and

tweaks

You can kick this off next week at your work

Inspired by…

• Chapters 8 & 9

• The power of incrementalism

What is clustering?

A process for organizing and analyzing search log

data that:

Is repeatable, low-cost, scalable, simple

Yields actionable results

Supports constant incremental improvement

to search

What’s clustering good for?

Ensure results for high frequency queries

Improve Metadata and Taxonomy

Inform and validate decision making in site IA

Informs editorial/curatorial activities

Provides Feedback for Search Suggestions

o Autosuggest, synonym lists, no-hits page

suggestions

But more on this later...

So how do I cluster search queries?

A simple set of steps

Create query report

Cluster queries

Determine # queries to analyze

Analyze clusters

Draw conclusions

and ACT

Step 1: Create a query report

We started with the site with the most traffic

• Upper-bound limit

• One year’s data by quarter

• Cut off tail at frequency < 10

Step 1: Create a query report

We started with the site with the most traffic

• Upper-bound limit

• One year’s data by quarter

• Cut off tail at frequency < 10

Step 1: Create a query report

We started with the site with the most traffic

• Upper-bound limit

• One year’s data by quarter

• Cut off tail at frequency < 10

HBS Working Knowledge FY12 Use Snapshot

Overall Traffic

Page Views: 6,439,485

Visits: 3,635,746

Unique visitors: 2,734,620

On-site searches: 174,425

Views per Visit: 1.77

Local Search visit rate: 5%

Organic Search visit rate: 46%

Step 2: Cluster the queries

Step 2 (cont’d): Three levels of clustering

Level Method Example

Narrow Simple

normalization

Eliminate

grammatical,

spelling, typos, and

punctuation

differences

Mid-level Group by subject management,

finance, decision

making

Broad Group by facet topic, name, date,

content type

Step 2 (cont’d): Levels Tasks Enabled

Level Improve your

base for

query

analysis

Ensure

representation

of major

clusters on your

site

Improve

Metadata/Index

/Taxonomy

Improve

Search

Suggestions

Narrow

(simple)

X X X

Mid-level

(group by

subject)

X X X

Broad

(group by

facet)

X X

Step 2 (cont’d): Narrow Clustering Example

Step 2 (cont’d): Mid-level Example Cluster brand

branding 245

brand 160

brand management 73

consumer branding 57

global brand 32

service brands 24

brand image retail bank 17

employer branding 16

brand management professional

services 16

global branding 13

b2b branding 13

importance of branding 12

brand 2002 12

brand equity 11

brand image 11

Step 2 (cont’d): Broad Clustering Example

Step 2 (cont’d): List of facets we used

Facet Example

content type case studies, cases, working papers, articles, newspaper

date 2011, world in 2030

demographic characteristics women, Gen Y, gender, baby boomers

event economic crisis

format podcast, video

geographic area india, japan, mount everest

industry global wine industry

job type/role independent director, entrepreneur, ceo, phd economist

organization name ikea, zara, toyota

person name michael porter, kanter, sebenius

product name / brand name ipad

product/commodity coffee, wine, cement

topic this covers the majority of keywords

work faculty work, ex: publication name, title of a case

Step 3: Choose #clusters to analyze

Number of

Clusters

Analyzed

Analyze Top Hits Improve Metadata/

Taxonomy

/Index

Supply Search

Suggestions

50 X

150 X X

300+ X X X

Small # Clusters can cover a lot of your data

Number of top clusters % Total Queries

Top 20 clusters 14

Top 30 clusters 18

Top 50 clusters 26

Top 100 clusters 37

Now you have your clusters…

What do you do with them?

TAKE ACTION!

Analyze Top (“Short Head”) Clusters

Clustering has created a condensed and reliable

list of your top search queries

Are they what you thought they would be?

Does the information on your site accurately

represent the top searches?

Are you fulfilling user needs?

Use your clusters: Improve Site Navigation

Examine the short-head of clusters, basically:

For each cluster, add up the frequencies

of queries

Reorder clusters by cumulative frequency

descending

Ensure top clusters are accounted for in your

navigation

Use cluster topics as browse/navigation

headers/footers for your website

WK Top Clusters

Cluster Frequency

innovation 867

balanced scorecard 794

leadership 570

cases 545

social media 508

negotiation 470

knowledge management 457

ethics 448

apple 430

corporate social responsibility 398

Use your clusters: Improve Taxonomy

• Missing categories in browse taxonomy

• "Balanced Scorecard"

• “Ethics”

• “Social media”

• Second-level topics in the WK context

Use your clusters: Improve Taxonomy

• Missing categories in browse taxonomy

• "Balanced Scorecard"

• “Ethics”

• “Social media”

• Second-level topics in the WK context

Use your clusters: Improve Taxonomy

• Missing categories in browse taxonomy

• "Balanced Scorecard"

• “Ethics”

• “Social media”

• Second-level topics in the WK context

Use your clusters: Improve Taxonomy

• Missing categories in browse taxonomy

• "Balanced Scorecard"

• “Ethics”

• “Social media”

• Second-level topics in the WK context

Mid-level clustering:

Informs editorial /curatorial activities

“Featured Topics”

o What topics to highlight this week/month/year

o News items to focus on

o What research guides to create

o How to formulate queries for the topics

How about improving search?

Clustered list provides synonyms for taxonomy

Requires human judgment and

standards/guidelines for synonyms – in our

case, synonyms are exact

Map to one "like term" in the search engine

Example:

Balanced Scorecard, BSC, Balanced score card

kaplan and norton -> Balanced Scorecard

Use your clusters: Improve no-hits page

Time Commitment

• 2 hours to 2 weeks

• Variables include:

• What kind of information you want to gather

• How broad or narrow you want your clusters

• How many queries you analyze

• In our case ~2 person-weeks

Results vs. Time Invested

Analyze top

clusters

Update

Taxonomy

Create New

Metadata

Determine

New Search

Suggestions

2 Hours X X

6 Hours X X X

One Week X X X X

Next Steps: Autosuggest

Your top clusters probably make up a large

percentage of what people are looking for

o Use them to establish/supplement

auto-suggest!

Example: suggestions for “innovation”

o innovation and leadership

o disruptive innovation

o innovation management

o open innovation

Next Steps: New Access Structures

Needed an obvious way to search podcasts

o Put in best bets for now

A lot of people searching for article titles o Considering simple interface/approach for select

field-specific search, e.g. “title”

Consider adding other facets to browse

taxonomy where we have entities tagged o “company name”, “job type/class”, etc.

Summary

Established plan/process, but be willing to tweak

as you go

Keep it very simple.

Play with your data – the more we played, the better

we understood what benefits could be realized by

levels of clustering and effort

Tuning process/results

o Build staging/working prototypes

o Repeat process on other sites

Thank you! And remember…TAKE ACTION!

Kropla drąży skalę !

Questions?

searchguy@hbs.edu

@ravimynampaty

http://www.slideshare.net/mynampaty/

top related