2002-04-24chi web behavior patterns1 separating the swarm categorization methods for user sessions...

24
2002-04-24 CHI Web Behavior Patterns 1 Separating the Swarm Separating the Swarm Categorization Methods for Categorization Methods for User Sessions on the Web User Sessions on the Web Jeffrey Heer, Ed H. Chi Palo Alto Research Center 2002.04.24 – CHI Web Behavior Patterns

Upload: elaine-clark

Post on 14-Dec-2015

215 views

Category:

Documents


1 download

TRANSCRIPT

2002-04-24 CHI Web Behavior Patterns 1

Separating the SwarmSeparating the Swarm Categorization Methods for Categorization Methods for User Sessions on the Web User Sessions on the Web

Jeffrey Heer, Ed H. Chi

Palo Alto Research Center

2002.04.24 – CHI Web Behavior Patterns

2002-04-24 CHI Web Behavior Patterns 2

Web Analytics: Web Analytics: What can you measure?What can you measure?

- content- page traffic

Marketing

Infrastructure- load testing

- user intent- usability- user experience

Site Design

Want to improve site design, content, and performance

2002-04-24 CHI Web Behavior Patterns 3

The Change in Web Sites:The Change in Web Sites:What What shouldshould you measure? you measure?

Page-based websites

Activity-based websites

Time

Sit

e C

om

ple

xity

Products

Management Team

I’d like information on used cars.

Search for a car dealer in my neighborhood.

TRAFFIC

USER EXPERIENCE

2002-04-24 CHI Web Behavior Patterns 4

MotivationMotivation

What are users’ information goals?

Understanding the composition of web user traffic.

Strategy: Use all available data to discover user goals.(Content, Usage, Topology)

System Description Evaluation Implications Conclusion

2002-04-24 CHI Web Behavior Patterns 5

System DescriptionSystem Description

Generate a user profile for each user session.– How: Use access logs and site content to to build

a multi-featured model of user activity (multi-modal clustering).

Group user profiles into common activities like “product browsing” and “job seeking”– How: Apply clustering algorithms to user profiles

2002-04-24 CHI Web Behavior Patterns 6

System DescriptionSystem Description

Web CrawlAccess Logs

Document Model

User Sessions

User Profiles

ClusteredProfiles

Steps:

1. Process Access Logs

2. Crawl Web Site

3. Build Document Model

4. Extract User Sessions

5. Build User Profiles

6. Cluster Profiles

2002-04-24 CHI Web Behavior Patterns 7

Document ModelDocument Model

Site is crawled– Pay special attention to pages in logs.

Documents described by feature vectors:Content: TF.IDF weighted keyword vector

URL: Tokenized and TF.IDF weighted

Inlinks: Column vectors in topology matrix

Outlinks: Row vectors in topology matrix

Vectors are concatenated to form a single multi-modal vector Pd for each document.

Web CrawlAccess Logs

Document Model

User Sessions

User Profiles

ClusteredProfiles

2002-04-24 CHI Web Behavior Patterns 8

User SessionsUser Sessions

Sessions extracted and represented by a vector s:– For path i = ABD, si = <1,1,0,1,0>

(For site with 5 documents <A,B,C,D,E>)

Different weightings can be employed in creating the session vector s:Frequency: number of times each page is accessed. ABD, s = <1,1,0,1,0> TF.IDF: hits / # paths including pagePosition: Use order of pages within surfing path.

ABD, s = <1,2,0,3,0>View Time: Use time spent viewing pages.

A10sB20sD15s, s = <10,20,0,15,0>

Web CrawlAccess Logs

Document Model

User Sessions

User Profiles

ClusteredProfiles

2002-04-24 CHI Web Behavior Patterns 9

User ProfilesUser Profiles

User profiles are linear combination of the viewed pages.– “You are what you see.”

N

ddidi PsUP

1User Profiles

Session weights

Document Vectors

Web CrawlAccess Logs

Document Model

User Sessions

User Profiles

ClusteredProfiles

2002-04-24 CHI Web Behavior Patterns 10

ClusteringClustering

Clustering is a form of statistical analysis which organizes data into individual clusters.

– Groupings are determined by a shared similarity.

– Similarity is defined by a computable similarity metric.

Clustering proceeds by recursive bisection, using K-Means to perform the bisections [Zhao01].

Web CrawlAccess Logs

Document Model

User Sessions

User Profiles

ClusteredProfiles

Modalitesm

mj

mimji UPUPwUPUPd ),cos(),(

weights wm specify the

contribution of each modality

2002-04-24 CHI Web Behavior Patterns 11

User population breakdown

Detailed stats

Keywords describing

user groups

Frequent documents accessed by group

2002-04-24 CHI Web Behavior Patterns 12

Clustering ResultsClustering Results

Users reached end of tutorial, had nowhere to go.

http://www.diamondreview.com

2002-04-24 CHI Web Behavior Patterns 13

System EvaluationSystem Evaluation

Does the system correctly infer user intentions?

Logs

System

User Intent Groupings

User Intent

Compare

2002-04-24 CHI Web Behavior Patterns 14

User StudyUser Study

Asked users to surf specific tasks on www.xerox.com– captured actions using the WebQuilt proxy logger [Hong01]– done at their leisure.

15 unique tasks: – Tasks developed after exploring xerox.com and reading user

e-mail feedback– 5 task groups with 3 tasks per group.– Products, TechSupport, Supplies, Company Info, and Jobs

Participation:– 21 users signed up, 18 went through, 104 usable sessions.

2002-04-24 CHI Web Behavior Patterns 15

Results: Results: 340 combinations of clustering schemes

Outlink-based schemes performed poorly (omitted).

2002-04-24 CHI Web Behavior Patterns 16

Analysis: ModalitiesAnalysis: ModalitiesAnalys is of Modalities in Unim odal Cases

0.0%

10.0%

20.0%

30.0%

40.0%

50.0%

60.0%

70.0%

80.0%

90.0%

100.0%

Path Weighting Schem es

% c

orr

ec

tly

clu

ste

red

RAW PATH

CONTENT

URL

INLINK

OUTLINK

Linear Contrast shows Content sig. different:(unimodal) F(1,105)=32.51, MSE=.005361, p<0.0001

(multimodal) F(1,35)=33.36, MSE=.007332, p<0.0001

Content is King! Mean=0.96, StdDev=0.07

2002-04-24 CHI Web Behavior Patterns 17

Analysis: Path WeightingAnalysis: Path Weighting

Paired t-Test between Time-based and non-Time based weightings: n=60, t(59)=4.85, p=4.68e-6

V.T.mean=89.5%, s.d.=12.7%, non-V.T.mean=83.2%, s.d.=12.0%

Analysis of Path Weighting

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

Modalities

% c

orr

ec

tly

clu

ste

red

uniform

tf idf

time

position

tf idf ,time

tf idf ,pos

time,pos

time,pos,tf idf

View Time is best!

2002-04-24 CHI Web Behavior Patterns 18

Observation: Multi-Modal vs. UnimodalObservation: Multi-Modal vs. Unimodal

In practice, Multi-Modal should be more robust– Some pages don’t have much content

» Images, Audio, Video

» PDF, PS (if you don’t have necessary software)

– URL Tokens: All pages have URLs.– Inlinks: don’t depend on any features of a page!

In our experience, Content-based Multi-Modal Clustering retains accuracy.

Linear Contrast shows no significant difference between multi-modal and uni-modal schemes:

F(1,77)=1.63, MSE=.004407, p=.21

2002-04-24 CHI Web Behavior Patterns 19

FindingsFindings

Incorporating View Time improves clustering accuracy.

Though it involves extra work, extracting Content can provide very high accuracy.

Adding other modalities make clustering more robust.

Modalities should be chosen carefully, and tailored for each specific site.

2002-04-24 CHI Web Behavior Patterns 20

Implications for DesignersImplications for Designers

Good design means understanding your users. It’s possible to understand trends of user

activities accurately.– Requires well-defined user tasks doable on the site.

Now you can design and tailor user experience.– Address discovered usability issues.– Update design to facilitate common tasks.

2002-04-24 CHI Web Behavior Patterns 21

Summary: “You are what you see.”

UserInformation

Goals

Web site

PageContent

Topology

InfoScent ClusteringObserved

Usage

Users follow the best Information Scent to accomplish their goals.

2002-04-24 CHI Web Behavior Patterns 22

Future WorkFuture Work

Determining # of clusters– Currently done semi-manually

Model unstructured task more directly Directly recommend design changes Integrate with

– Clustering Visualization– User Path Visualization

Lots of Commercial Interest, Licensing

2002-04-24 CHI Web Behavior Patterns 23

ConclusionConclusion

Performed first known user study to characterize the analytic space of session clustering techniques.

Found that session clustering can be highly accurate with respect to user intentions.

Demonstrated our method is scalable and useful in real-world scenarios.

This should prove to be a useful tool for web designers and researchers!

2002-04-24 CHI Web Behavior Patterns 24

AcknowledgementsAcknowledgements

Peter Pirolli, Stu Card, Adam Rosien, Pam Schraedley and the the UIR and Bloodhound Team at PARC.

George Karypis for CLUTO software Participants in our user study Office of Naval Research

Contact:

Jeff Heer ([email protected])

Ed H. Chi ([email protected])

Separating the SwarmSeparating the Swarm Categorization Methods for Categorization Methods for User Sessions on the Web User Sessions on the Web