analyzing web logs sarah waterson 18 april 2002 sims 213 group for user interface research

29
Analyzing Web Logs Sarah Waterson 18 April 2002 SIMS 213 Group for User Interfa ce Researc h

Post on 21-Dec-2015

218 views

Category:

Documents


1 download

TRANSCRIPT

Analyzing Web LogsSarah Waterson

18 April 2002SIMS 213

Group for

UserInterfac

e Research

SIMS 21318 April 2002

Talk Outline

What is a web log? Where do they come from? Why are they relevant? How can we analyze them?

Study Discussion

SIMS 21318 April 2002

What is a web log?

A record of a visit to a web page

Visitor (IP address) URL Time of visit Time spent on a page Browser used Referring URL

Type of request Reply code Number of bytes

in the reply etc…

A record of a visit to a web page

SIMS 21318 April 2002

What is a clickstream?

A record of a path through web pages

Visitor (IP address) URL Time of visit Time spent on a page Browser used Referring URL

Type of request Reply code Number of bytes

in the reply Next URL etc…

A record of a path through web pages

SIMS 21318 April 2002

What is a Web Log?Apache web log:205.188.209.10 - - [29/Mar/2002:03:58:06 -0800] "GET

/~sophal/whole5.gif HTTP/1.0" 200 9609 "http://www.csua.berkeley.edu/~sophal/whole.html" "Mozilla/4.0 (compatible; MSIE 5.0; AOL 6.0; Windows 98; DigExt)"

216.35.116.26 - - [29/Mar/2002:03:59:40 -0800] "GET /~alexlam/resume.html HTTP/1.0" 200 2674 "-" "Mozilla/5.0 (Slurp/cat; [email protected]; http://www.inktomi.com/slurp.html)“

202.155.20.142 - - [29/Mar/2002:03:00:14 -0800] "GET /~tahir/indextop.html HTTP/1.1" 200 3510 "http://www.csua.berkeley.edu/~tahir/" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)“

202.155.20.142 - - [29/Mar/2002:03:00:14 -0800] "GET /~tahir/animate.js HTTP/1.1" 200 14261 "http://www.csua.berkeley.edu/~tahir/indextop.html" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)“

SIMS 21318 April 2002

Where do they come from?

Servers Done on most

web servers Standard formats

Clients

Browsers, loggers on client machine Must send data back

Proxy Log

Proxies Similar to servers Hang out in between client and server

SIMS 21318 April 2002

Why are web logs relevant?

Lots of data Quantitative analysis is much more fun!

User behavior, patterns Real users, tasks Or at least more realistic users and tasks

Leaving the usability lab Testing effect

Fast, easy, cheap Automatic or almost-automatic

SIMS 21318 April 2002

Ed Chi asks…

Usage: How has information been accessed? How frequently? What’s popular? What’s not? How do people enter the site? Exit? Where do people spend time? How long do they spend there? How do people travel within the site? Who are the people visiting?

SIMS 21318 April 2002

Ed Chi asks…

Structural: What information has been added,

deleted, modified, moved? Usage + Structural

What happens when the site changes? (Google)

Does navigation change? Does popularity change? What about missing data?

SIMS 21318 April 2002

How do you analyze web logs?

1. Data Mining: task or intent unknown “Automated extraction of hidden predictive

information from (large) databases” – Kurt Thearling

Server log analysis

2. Remote Usability Testing: task or intent known Similar to traditional lab usability testing Clickstream analysis

What are people doing?

How well does the site support what people are doing?

SIMS 21318 April 2002

How? Data MiningStatistics and numbers galore! Gazillions of tools for server log analysis

Computers>Software>Internet>Site Management> Log Analysis

Usually charts, graphs, numbers galore Analog & NetTracker typical statistics In 3D too (eBizinsights)

SIMS 21318 April 2002

How? Data Mining cont’dOther interesting work: Web Ecologies (Chi)

Development over time Information scent (Chi)

Behavior patterns Understand how to organize info

“Information scent is made of cues that people use to decide whether a path is interesting.“

Useful for web designers?

SIMS 21318 April 2002

Web Ecologies (Chi 1998)

SIMS 21318 April 2002

How? Remote Usability Testing

Analyze clickstream in the context of the task and user intentions

Can be gathered on client, server, and via proxy

Varied granularities of interaction Mouse movements page access

Varied levels of user awareness Interactive invisible

Varied levels of access Site only entire web

SIMS 21318 April 2002

How? Remote Usability TestingWebVip and VisVip

(NIST) Server side logging Javascript

instrumentation Individual paths within

context of site Animation/replay

sessionsQuestions: What part of site used

for a task? Not used? How long to finish

task? Per page? What sorts of

behavior for task?

SIMS 21318 April 2002

How? Remote Usability TestingClickViz (Blue Martini)

Server side logging Custom instrumentation Aggregate paths based

on file system Include demographics,

purchase history Filtering

Questions: How does visitor of

type X compare to type Y?

Success vs. “failure”

SIMS 21318 April 2002

How? Remote Usability Testing

NetRaker Clickstream

Vividence ClickStreams

Not restricted to servers Testing suites Interesting aggregation methods

SIMS 21318 April 2002

How? Remote Usability Testing

WebQuilt (GUIR)Logging Design Goals:

Extensible, Scalable Allow for unobtrusive, “naturalistic” user interaction Multi-platform, multi-device compatibility Fast and easy to deploy on any website

Solution: Proxy-based logger rewrites links

Nearly invisible to user Independent of client browser

Infer actions (e.g. back button clicks) Stand alone or use with other tools

SIMS 21318 April 2002

How? Remote Usability Testing

WebQuilt (GUIR)Visual Analysis Tool:

Put data within context of the design Show deviations from expected paths Interactive graph

SIMS 21318 April 2002

Study: Purpose

Exploratory comparison of lab and remote usability testing with mobile devices

What types of usability issues can we: find with either method? find with one that we can’t find with the

other?

Design implications testing tools testing strategies

SIMS 21318 April 2002

Study: The Mobile Web

Limited and/or new interaction methods Small screens Graffiti, keypads, thumb-pads

Beyond the desktop Driving, traveling, walking Noisy, public

Gathering good usability data is vital to making these interfaces, and subsequently these devices, successful.

SIMS 21318 April 2002

Study: Design 10 users asked to find:

Anti-lock brake information on the latest Nissan Sentra

The closest Nissan dealer http://pda.edmunds.com Handspring Visor Edge with

OmniSky wireless modem 5 users in the lab 5 users in the wild Web-based questionnaires

SIMS 21318 April 2002

Study: Identifying Usability Issues

Lab Data Tester observations Participant

comments Questionnaire

Remote Data Clickstream analysis Questionnaire

Severity Levels 0 indicates a

comment 15

(minorcritical)

Four Categories

Device Browser Site Design Test Design

SIMS 21318 April 2002

Study: Caveats

Analysis and observation for both tests done by same person

Issues identified from remote tests first Avoids biasing remote analysis tools

Looking for potential problem areas

SIMS 21318 April 2002

Study: Results

Totals: 18 unique issues 7 found remotely

Lab Remote

Device 4 1

Browser 2 0

Test Design 6 2

Site Design 9 5

Site Design 5 of the 9 issues 3 of the 4 with severity level > 3

1/3 device or browser related

Test Design 2 of the 6 issues 2 of the 4 with severity level > 3

SIMS 21318 April 2002

Study: Process Observations

Remote usability testing can capture some usability issues that lab

testing already discovers

Lab testing gets me: Qualitative observations Thinking aloud comments Non-content usability issues

SIMS 21318 April 2002

Study: Process Observations

What can remote testing get us that labs can’t?

Lab effect Quitting a task is easier when not in

lab Network problems more realistic

With more users Patterns emerge Can reduce uncertainty

Faster

SIMS 21318 April 2002

Study: Conclusions

Remote usability testing is a promising technique for capturing realistic

usage data for mobile web site design

Main concerns Gathering user feedback on mobile devices is even

more difficult because of limited input Understanding users can be ambiguous

Potentially alleviated by ability to test larger number of users

SIMS 21318 April 2002

Discussion

Comments Questions

Where does web log analysis fit into a design cycle?

Understanding what methods to use when and where

Experiences? These or other tools?

Design

Evaluate Prototype