web usage patterns ryan mcfadden ist 497e december 5, 2002

28
Web Usage Patterns Ryan McFadden IST 497E December 5, 2002

Upload: buck-oconnor

Post on 12-Jan-2016

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Web Usage Patterns Ryan McFadden IST 497E December 5, 2002

Web Usage Patterns

Ryan McFaddenIST 497EDecember 5, 2002

Page 2: Web Usage Patterns Ryan McFadden IST 497E December 5, 2002

Introduction Web Data Mining Application Areas of Web Data Mining Problems with Web Data Mining Current Research Nielsen//NetRatings Other Issues – Privacy, Security, etc Conclusions

Page 3: Web Usage Patterns Ryan McFadden IST 497E December 5, 2002

Web Data Mining

Web Data Mining is the application of data mining techniques to discover and retrieve useful information and patterns from the World Wide Web documents and services.

Page 4: Web Usage Patterns Ryan McFadden IST 497E December 5, 2002

What web data is being mined? Content – data from Web documents –

text & graphics Structure – data from Web Structure –

HTML or XML tags Usage – data from Web log data – IP

addresses, date & time access User Profile – data that is user specific –

registration and customer profile

Page 5: Web Usage Patterns Ryan McFadden IST 497E December 5, 2002

Web Data Mining Process

Page 6: Web Usage Patterns Ryan McFadden IST 497E December 5, 2002

Web Data Mining Process Tasks Resource finding:

The task of retrieving intended Web documents Information selection and pre-processing:

Automatically selecting and pre-processing specific information from retrieved Web resources

Generalization: Automatically discover general patterns at individual

Web sites as well as across multiple sites Analysis:

Validation and/or interpretation of the mined patterns

Page 7: Web Usage Patterns Ryan McFadden IST 497E December 5, 2002

Application Areas for Web Usage Mining Personalization System Improvement Site Modification Business Intelligence Usage Characterization

Page 8: Web Usage Patterns Ryan McFadden IST 497E December 5, 2002

Personalization Personalizing the Web experience for a user is

the holy grail of many Web-based applications Dynamic recommendations to a Web user based

on a profile in addition to usage behavior The specification to the individual of tailored

products, services, information or information relating to products or service

Page 9: Web Usage Patterns Ryan McFadden IST 497E December 5, 2002

System Improvement Performance and other service quality attributes

are crucial to user satisfaction and high quality performance of a web application is expected

Web usage mining of patterns provides a key to understanding Web traffic behavior, which can be used to deal with policies on web caching, network transmission, load balancing, or data distribution

Web usage and data mining is also useful for detecting intrusion, fraud, and attempted break-ins to the system

Page 10: Web Usage Patterns Ryan McFadden IST 497E December 5, 2002

Site Modification This application of web usage patterns involves

the attractiveness of a Web site, in terms of content and structure

Web usage patterns or mining can provide detailed feedback on user behavior which can lead the Web site designer to information on which to base redesign decisions

This could lead to future applications where the structure and content of a Web site based on usage patterns

Page 11: Web Usage Patterns Ryan McFadden IST 497E December 5, 2002

Business Intelligence Information on how customers are using a Web

site is critical information for marketers of e-commerce businesses

Customer relationship life cycle: Customer attraction Customer retention Cross sales Customer departure

Can provide information on products bought and advertisement click-through rates

Page 12: Web Usage Patterns Ryan McFadden IST 497E December 5, 2002

Usage Characterization

Mining of web usage patterns can help in the study of how browsers are used and the user’s interaction with a browser interface

Usage characterization can also look into navigational strategy when browsing a particular site

Web usage mining focuses on techniques that could predict user behavior while the user interacts with the Web

Page 13: Web Usage Patterns Ryan McFadden IST 497E December 5, 2002

Problems with Web Data Mining The World Wide Web is a huge, diverse and

dynamic medium for the dissemination of information – maybe too much information to mine – information overload – a lot of this information is irrelevant and not indexed

Other problems with Web Data Mining: Finding relevant information to mine Personalization & mass customization is difficult E-commerce businesses have to know what the

customers want

Page 14: Web Usage Patterns Ryan McFadden IST 497E December 5, 2002

Current Research

WebSIFT example

Data Mining for Intelligent Web Caching

Areas of Future Research

Page 15: Web Usage Patterns Ryan McFadden IST 497E December 5, 2002

WebSIFT Example

Web Site Information Filter System (WebSIFT) is a Web usage mining framework, that uses the content and structure information from a Web site, and identifies the interesting results from mining usage data

Input of the mining process: server logs (access, referrer, and agent), HTML files, optional data

Prototypical Web usage mining system

Page 16: Web Usage Patterns Ryan McFadden IST 497E December 5, 2002

Data Mining for Intelligent Web Caching Application based on data warehouse technology

that is capable of adapting its behavior based on access patterns of the clients/users

Use an algorithm to maximize the hit rate, or percentage of requested Web entities that are retrieved directly in cache, without requesting them back to the origin server

This approach enhances least recently used caching with data mining models based on historical data, aimed at increasing the hit rate

Page 17: Web Usage Patterns Ryan McFadden IST 497E December 5, 2002

Areas of Future Research Data mining in the following application areas:

Electronic Commerce Bioinformatics Computer security Web intelligence Intelligent learning Database systems Finance Marketing Healthcare Telecommunications, And other fields

Page 18: Web Usage Patterns Ryan McFadden IST 497E December 5, 2002

Nielsen//NetRatings

What are they?

What is the purpose?

Current NetRatings for home and work

Page 19: Web Usage Patterns Ryan McFadden IST 497E December 5, 2002

Nielsen//NetRatings – What are they? This service is provided via a partnership

between NetRatings, Nielsen Media Research and ACNielsen

The service includes an Internet audience measurement service and they report Internet usage estimates based on a sample of households that have access to the Internet

Page 20: Web Usage Patterns Ryan McFadden IST 497E December 5, 2002

Nielsen//NetRatings – What is the purpose? The purpose of the Nielsen//NetRatings

service is to provide a source of global information on consumer and business usage of the Internet

This information helps companies make business-critical decisions

Page 21: Web Usage Patterns Ryan McFadden IST 497E December 5, 2002

Average Web Usage at Home –Month of October 2002, US DataNumber of Sessions per Month 23

Number of Unique Sites Visited 49

Time Spent per Month 12:06:56

Time Spent During Surfing Session 32:03:00

Duration of a Page viewed 0:55

Active Internet Universe 106,567,327

Current Internet Universe Estimate 168,366,482

Page 22: Web Usage Patterns Ryan McFadden IST 497E December 5, 2002

Average Web Usage at Work –Month of October 2002, US DataNumber of Sessions per Month 56

Number of Unique Sites Visited 95

Time Spent per Month 31:08:04

Time Spent During Surfing Session 33:21:00

Duration of a Page viewed 1:01

Active Internet Universe 47,844,347

Current Internet Universe Estimate 53,057,035

Page 23: Web Usage Patterns Ryan McFadden IST 497E December 5, 2002

September 2002 Global Internet Index Average Usage ( * Home Internet Access)

  September August % Change

Number of Sessions per Month 19 19 1.99

Number of Unique Domains Visited 49 48 0.77

Page Views per Month 778 785 -0.97

Page Views per Surfing Session 40 41 -2.9

Time Spent per Month 10:17:45 10:17:44 0

Time Spent During Surfing Session 0:31:44 0:32:22 -1.95

Duration of a Page Viewed 0:00:48 0:00:47 0.98

Active Internet Universe 220,444,008 218,038,452 1.1

Current Internet Universe Estimate 385,564,028 385,998,080 -0.11

Page 24: Web Usage Patterns Ryan McFadden IST 497E December 5, 2002

Other Issues

Privacy Security Intellectual Ownership Visual Data Mining Risk Analysis

Page 25: Web Usage Patterns Ryan McFadden IST 497E December 5, 2002

Conclusions

Web usage and data mining to find patterns is a growing area with the growth of Web-based applications

Application of web usage data can be used to better understand web usage, and apply this specific knowledge to better serve users

Web usage patterns and data mining can be the basis for a great deal of future research

Page 26: Web Usage Patterns Ryan McFadden IST 497E December 5, 2002

Any Questions?

Page 27: Web Usage Patterns Ryan McFadden IST 497E December 5, 2002
Page 28: Web Usage Patterns Ryan McFadden IST 497E December 5, 2002

References

Data Mining for Intelligent Web Caching – Francesco Bonchi, Fosca Giannotti, Giuseppe Manco, Mirco Nanni, Dino Pedreschi, Chiara Renso, Salvatore Ruggieri

IEEE International Conference on Data Mining -http://www.cs.uvm.edu/~xwu/icdm.html

Nielsen//NetRatings – http://www.nielsen-netratings.com Web Usage: Mining: Discovery and Applications of Usage Patterns

from Web Data - Jaideep Srivastava, Robert Cooley, Mukund Deshpande, Pang-Ning Tan Dept of CSE – University of Minnesota

Web Mining: Pattern Discovery from World Wide Web Transactions - Web Mining Research: A Survey – Raymond Kosala, Hendrik Blockeel

Dept of CS Katholieke Universiteit Leuven