lies, damn lies and web statistics

18
Dr. Mike Lowndes, Interactive Media Manager, Natural History Museum, London – Houses 350-permanent scientific staff, plus postgraduate students; one of the largest UK research institutes in the natural sciences. (Right-click or click-hold (Mac) and press k or select Speaker Notes) IWMW 2005: Who’s web is it anyway? Lies, Damn lies and Web Statistics

Upload: finnea

Post on 21-Jan-2016

47 views

Category:

Documents


0 download

DESCRIPTION

Lies, Damn lies and Web Statistics. IWMW 2005: Who’s web is it anyway?. Dr. Mike Lowndes, Interactive Media Manager, Natural History Museum, London Houses 350-permanent scientific staff, plus postgraduate students; one of the largest UK research institutes in the natural sciences. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Lies, Damn lies and Web Statistics

Dr. Mike Lowndes,Interactive Media Manager,

Natural History Museum, London– Houses 350-permanent scientific staff, plus postgraduate students; one of the largest UK research institutes in the natural sciences.

(Right-click or click-hold (Mac) and press k or select Speaker Notes)

IWMW 2005: Who’s web is it anyway?

Lies, Damn lies and Web Statistics

Page 2: Lies, Damn lies and Web Statistics

Contents

• Why bother?• Issues with web logs• Issues with analytic tools• Browser tracking• Comparison between approaches• Known issues with browser tracking• Nedstat input and findings from Newcastle

University

Page 3: Lies, Damn lies and Web Statistics

Why bother?

• Web log analysis is currently the main method used to quantify web site usage for reporting.

• Results are used by the government as performance indicators for institutional websites.

• Not accurate or meaningful most of the time– no good for absolute measurement of usage.

Can be used for:• Trend analysis• Content preferences • ROI estimation• Checking and fixing your site• Understanding users behaviour• Testing assumed pathways

Page 4: Lies, Damn lies and Web Statistics

Issues with server logs

• Dynamic IP – Many users using the same IP number over time.– Same user assigned many IP numbers over time.

• Proxies– Several or many users behind 1 IP number

• Caches (can be ‘in’ Proxies)– Commonly requested files cached closer to the users.– Can form the top 20-50 hosts accessing sites.

• Robots and spiders– Few visits but lots of hits.– Analytic packages cannot keep up to date with all of them for

exclusion.• Syndication

– RSS feeds generate huge logs, but are not ‘read’ by humans initially. – Click-through configuration.

• Reporting by analysis tools– Often weekly or monthly reports: realtime is very labour/server

intensive– Reports often complex and techy.

Page 5: Lies, Damn lies and Web Statistics

Issues with log analysis tools

• Webtrends vs Summary.net

• 1. Natural History Museum– Summary SP (summary.net) Version 4.2.1, unregistered demo, default configuration

• 2. UKOLN (Bath)– WebTrends (www.webtrends.com) Version 5, default configuration

• Both tools were applied to the same log file• Default configurations – not removing robots

– Note: WebTrends documentation not clear on this point

Page 6: Lies, Damn lies and Web Statistics

Measurement discrepancies

  Summary SP Webtrends 7

Connections (hits) - +0.67% hits

Page views (page hits) - +5.00%

Visits (user sessions) - +0.07%

Failed hits - +0.30%

Average visit duration - -30.0% (+250%)

Browsers    

IE 75% 86%

Netscape compatible 2% 4%

     

Referrers    

Top Level Domains US US

  UK UK

  AUS CAN

  NETHER NETHER

  CAN AUS

  JAP JAP

     

   

Page 7: Lies, Damn lies and Web Statistics

Comparison between tools

• Not a single measurement was identical.• Most measurements were within 5%• Visit duration measurement widely different,

and can depend on configuration. Possible bug in WebTrends version 5.

• Page view measurements were quite different.

Results broadly similar but direct comparisons, especially of Page Views, are not really justified.

Page 8: Lies, Damn lies and Web Statistics

Browser tracking

• Do they have fewer inaccuracies and distortions?

• Is it easier on the web team?• Is it affordable?• Does it give us more information / better

information?

Page 9: Lies, Damn lies and Web Statistics

Browser tracking

• Requires code to be added to pages• Uses an image, sourced from the tracking

website. Also uses javascript and cookies for gathering extended and repeat-visit information

• Usually hosted services • Provide near real-time tracking• Few of the issues distorting logs affect these

measurements (according to the blurb)• Main players: Nedstat, Nielson/Netratings,

WebSideStory

Page 10: Lies, Damn lies and Web Statistics

Comparison between tools

• Summary SP VS Nielson/Netratings• Run on one section of a site over a month.• ‘Visiting’ section of the Natural History

Museum site – small but popular and easily tagged.

Page 11: Lies, Damn lies and Web Statistics

Results 1 – visits and visitors

Visits / User sessions 27,663 40,402 -32% 35,395Visits per day (ave) 922 1,347 1,180Visits per visitor per month (ave) 1.1 1.7 1.5Unique visitors (browsers) 25,127 23,585 23,084

Pages per visit (ave) 3.31 3 2.1Visit duration (ave) 02:09 07:13 04:08

Page impressions 91,506 117,447 71,895

Page 12: Lies, Damn lies and Web Statistics

Results 2 – pages viewed

value Browser track Log analysisTop 10index.html, Visiting home. 31,117 28591where are we? page 17,897 26566planning your visit page 6,835 16773events calendar page 9,221 9369howtogethere -local map page 4,700 5005access guide introduction page 1,978 4653travel details page 3,550 3668facilities page 2,767 3497activities page 3,293 3375multilingual info. 828 1901top ten totals 82,186 103,398

Page 13: Lies, Damn lies and Web Statistics

Results 3 – country

Browser tr. GeoIP (Sum.)Countries uk 75% uk 62%

us 5% us 8%spain spainitaly netherlandsnetherlands germanyfrance italygermany francebelgium canadapoland poland

• Depends on the quality of the geographical IP database, not the mode of tracking?

Page 14: Lies, Damn lies and Web Statistics

Conclusions regarding traditional Log analysis

Assuming browser tracking is more accurate…• We have fewer visit sessions than we thought, but

more visitors– Fewer visits (sessions), possibly due to robot exclusion – More visitors (unique users), possibly due to the masking

effect of proxies/caches and browser caches

• Visit duration is much shorter than thought– possibly due to robots/spiders and cache updating.

• Country information is roughly accurate so long as a geographical lookup is used.

• Activity of popular pages, which are often cached, will be underestimated

Page 15: Lies, Damn lies and Web Statistics

Browser tracking advantages

• Almost real-time analysis, incremental data.• Better repeat user tracking and individual

pathway analysis.• Configurable, graphical reports for non-techies

– Techie still needs to configure those reports however, as an understanding of web analytics is required

• Cut our monthly staff time down from 1.5 days to 1 hour

• Appear to be more accurate in describing the activity of real people, but we would like to see some independent research.

Page 16: Lies, Damn lies and Web Statistics

Issues with browser tracking

• Setup is not trivial: You need to add code to every page. – Multiple server / ownership issues.

• Does not always work (or get full user details) if Javascript is turned off or cookies disallowed.

• Does not work with text-only browsers.• Unknown compatibility with PDAs, mobiles etc.Questions:• Would we get different results with different hosted services?

– ABCE: industry standards for measurement

• Cookies often deleted unless user is confident in the source? – This would affect the measurement of repeat visitors and behaviour

Political issues:• Issues with external hosting of institutional data• Security of personal data issues with external hosting

– E.g. measurements of student and staff use of a VLE.

Page 17: Lies, Damn lies and Web Statistics

Next steps

• Many private sector and public sector sites have already moved to browser tracking.

• About 6 National Museums are currently discussing hosted browser tracking.

• 5 Universities currently involved in a trial of NedStat.

Page 18: Lies, Damn lies and Web Statistics

Thank you