web usage mining: an overview

24
Web Usage Mining: Web Usage Mining: An Overview An Overview Lin Lin Department of Management Lehigh University Jan. 30 th

Upload: mohammad-freeman

Post on 03-Jan-2016

55 views

Category:

Documents


3 download

DESCRIPTION

Web Usage Mining: An Overview. Lin Lin Department of Management Lehigh University Jan. 30 th. Agenda. Web Usage Mining: Definition Research Issues in Web Usage Mining Current Research in Web Usage Mining Going Forward. Web Usage Mining: A Definition. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Web Usage  Mining: An Overview

Web Usage Mining: An Web Usage Mining: An OverviewOverview

Lin Lin Department of Management

Lehigh University Jan. 30th

Page 2: Web Usage  Mining: An Overview

AgendaAgenda

• Web Usage Mining: DefinitionWeb Usage Mining: Definition• Research Issues in Web Usage MiningResearch Issues in Web Usage Mining• Current Research in Web Usage MiningCurrent Research in Web Usage Mining• Going ForwardGoing Forward

Page 3: Web Usage  Mining: An Overview

Web Usage Mining: A DefinitionWeb Usage Mining: A Definition

• The process of applying data mining techniques to The process of applying data mining techniques to the discovery of the discovery of usage patterns usage patterns from Web data, from Web data, targeted towards various applicationstargeted towards various applications

• Different from Different from content mining content mining & & structure mining structure mining

(Adamic, L. A., and Adar, E. 2003. Friends and neighbors on the web. Social Networks 25(3):211–230.))

Page 4: Web Usage  Mining: An Overview

Web Usage Mining: Data SourceWeb Usage Mining: Data Source

• Typical data sources for web usage mining are:Typical data sources for web usage mining are:– Web structure data Web structure data

(site map, links, etc.)(site map, links, etc.)– Web content data Web content data – User profile User profile

(may not be available)(may not be available)– Web log Web log

(web usage data, (web usage data, clickstream data)clickstream data)

Page 5: Web Usage  Mining: An Overview

Web Usage Mining: ProcedureWeb Usage Mining: Procedure

Page 6: Web Usage  Mining: An Overview

Preprocessing: ChallengesPreprocessing: Challenges• WHO WHO are the users?are the users?

– IP vs. real people IP vs. real people

• HOW LONG HOW LONG did the users stay?did the users stay?– Measuring session time Measuring session time

(L. Catledge and J. Pitkow. Characterizing browsing behaviors on the world wide web. Computer Networks and ISDN Systems, 27(6), 1995)

(Berendt, B. Mobasher, M. Nakagawa, and M. Spiliopoulou. The impact of site structure and user environment on session reconstruction in web usage analysis. In Proceedings of the 4th WebKDD 2002 Workshop, at the ACM-SIGKDD Conference on Knowledge Discovery in Databases (KDD’2002), Edmonton, Alberta, Canada, July 2002.

• WHEREWHERE did the users go? did the users go?– Server side vs. Client sideServer side vs. Client side

• WHATWHAT did the users view? did the users view?

– Content processingContent processingMoe, Wendy W. 2003. Buying, Moe, Wendy W. 2003. Buying, searching, or browsing: searching, or browsing: Differentiating between online shoppers Differentiating between online shoppers using in-store navigational click-stream. using in-store navigational click-stream. J. Consumer Psych. J. Consumer Psych. 13(1, 2) 29–40.13(1, 2) 29–40.

---------------------------------------------------------------------------------------For the best review on preprocessing methods, refer to: R. Cooley, B. Mobasher, J. Srivastava, Data preparation for mining world wide web browsing patterns, Knowledge and Information Systems 1 (1) (1999) 5–32

Page 7: Web Usage  Mining: An Overview

Usage Pattern Discovery: MethodsUsage Pattern Discovery: Methods• Statistical Methods (including dependency modeling and stochastic Statistical Methods (including dependency modeling and stochastic

modeling)modeling)

• Association Rule MiningAssociation Rule Mining

• Clustering (Clustering (useruser cluster vs. page cluster) cluster vs. page cluster)• ClassificationClassification

Page 8: Web Usage  Mining: An Overview

Usage Pattern Discovery: Usage Pattern Discovery: Research StreamsResearch Streams

• Why am Why am II interested in web usage mining? (a.k.a., what’s the big deal?) interested in web usage mining? (a.k.a., what’s the big deal?)– Blattberg, Robert C. and John Deighton (1991), "Interactive Marketing: Exploring the Age of Addressability," Blattberg, Robert C. and John Deighton (1991), "Interactive Marketing: Exploring the Age of Addressability," Sloane Sloane

Management Review, 33 [1), 5-14Management Review, 33 [1), 5-14– Ghosh, S. 1998. Making business sense of the Internet. Ghosh, S. 1998. Making business sense of the Internet. Harvard Business Review Harvard Business Review 76(2) 126–13576(2) 126–135– Bucklin R. E., Lattin, J. M., Ansari, A., Bell, D., Coupey, E. Gupta, S., Little, J. D. C., Mela, C. Montgomery, A. Steckel, J. Bucklin R. E., Lattin, J. M., Ansari, A., Bell, D., Coupey, E. Gupta, S., Little, J. D. C., Mela, C. Montgomery, A. Steckel, J.

Choice and the Internet: From Clickstream To Research Stream. Marketing Letters, 2002,Vol. 13, No. 3, pp. 245-258Choice and the Internet: From Clickstream To Research Stream. Marketing Letters, 2002,Vol. 13, No. 3, pp. 245-258

Page 9: Web Usage  Mining: An Overview

Usage Pattern Discovery: Usage Pattern Discovery: Research StreamsResearch Streams

• Lin’s two cents on current research streamsLin’s two cents on current research streams

– Build a better siteBuild a better site:: • For everybody – system improvement For everybody – system improvement

(caching & web design)(caching & web design)• For individuals – personalizationFor individuals – personalization• For search engines – SEOFor search engines – SEO

– Know your visitors betterKnow your visitors better::• Customer behaviorCustomer behavior

– Be a better businessBe a better business

Page 10: Web Usage  Mining: An Overview

Build a Better SiteBuild a Better Site: : System ImprovementSystem Improvement

• Server-side caching of web pages (Server-side caching of web pages (association rulesassociation rules))

– Y.-H. Wu, A.L.P. Chen, Prediction of web page accesses by proxy server log, World Wide Y.-H. Wu, A.L.P. Chen, Prediction of web page accesses by proxy server log, World Wide Web 5 (1) (2002) 67–88Web 5 (1) (2002) 67–88

– Preprocessing:Preprocessing: No IP discussion, sessions split by time-based heuristicsNo IP discussion, sessions split by time-based heuristics– Method:Method: Sequential pattern miningSequential pattern mining– Data:Data: Usage Usage – Contribution:Contribution: Use frequent sequence to predict candidate page, Use frequent sequence to predict candidate page, “ “personalize” based on user maturitypersonalize” based on user maturity

Page 11: Web Usage  Mining: An Overview

Build a Better SiteBuild a Better Site: : System ImprovementSystem Improvement

• Improvement of general web design (Improvement of general web design (AR, SP, MMAR, SP, MM))

– Fang, X. and O. R. L. Sheng (2004). Link Selector: A web mining approach to hyperlink Fang, X. and O. R. L. Sheng (2004). Link Selector: A web mining approach to hyperlink selection for web portals. ACM Transactions on Internet Technology 4, 209–237selection for web portals. ACM Transactions on Internet Technology 4, 209–237

– Preprocessing:Preprocessing: No IP distinguished, sessions split by 25.5 minutesNo IP distinguished, sessions split by 25.5 minutes– Method:Method: Association miningAssociation mining– Data:Data: Usage & StructureUsage & Structure– Contribution:Contribution: Combine structure info. and usage info. to optimize portal Combine structure info. and usage info. to optimize portal

page designpage design

• Where are we headed: Where are we headed: adaptive web designadaptive web design– Y. Fu, M. Creado, C. Ju, Reorganizing web sites based on user access patterns, in: Proceedings of the Tenth International Conference on

Information and Knowledge Management, ACM Press, 2001, pp. 583–585 (usage & content)

Page 12: Web Usage  Mining: An Overview

Build a Better SiteBuild a Better Site: : PersonalizationPersonalization

• Personalize the web site based on usage patterns (Personalize the web site based on usage patterns (AR, ClusteringAR, Clustering))– A key research domain: A key research domain: recommender systems*recommender systems*– Content clustering vs. users clustering vs. hybrid approachContent clustering vs. users clustering vs. hybrid approach– C. Shahabi and F. Banaei-Kashani. Ecient and anonymous web usage mining for web C. Shahabi and F. Banaei-Kashani. Ecient and anonymous web usage mining for web

personalization. INFORMS Journal on Computing, Special Issue on Data Mining, 2002personalization. INFORMS Journal on Computing, Special Issue on Data Mining, 2002

– Method:Method: Clustering of sessionsClustering of sessions– Data:Data: Client side usage dataClient side usage data

• Where are we headed: Where are we headed: incorporate time and web 2.0 incorporate time and web 2.0

– *: Refer to Adomavicius, G., & Tuzhilin, A. (2005). Toward the next generation of recommender systems: A survey of the state-of-the-art and possible extensions. IEEE Transactions on Knowledge and Data Engineering, 17(6), 734–749 for a good review on recommender systems

Page 13: Web Usage  Mining: An Overview

Build a Better SiteBuild a Better Site: : SEOSEO

• Adding usage information into PageRank Adding usage information into PageRank

– Kalyan Beemanapalli, Ramya Rangarajan, Jaideep Srivastava, “Usage-Aware Average Kalyan Beemanapalli, Ramya Rangarajan, Jaideep Srivastava, “Usage-Aware Average Clicks”, In Proc. Of WebKDD 2006: KDD Workshop on Web Mining and Web Usage Clicks”, In Proc. Of WebKDD 2006: KDD Workshop on Web Mining and Web Usage Analysis, in conjunction with the 12Analysis, in conjunction with the 12thth ACM SIGKDD International Conference on ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2006), August 20-23 2006Knowledge Discovery and Data Mining (KDD 2006), August 20-23 2006

– Method:Method: Association rule in spirit Association rule in spirit

Page 14: Web Usage  Mining: An Overview

Know your visitors betterKnow your visitors better::Customer behaviorCustomer behavior

• A favorite research stream by marketers and MIS researchersA favorite research stream by marketers and MIS researchers– Statistical models are used most of the time Statistical models are used most of the time – ““Macro-level” behavior is often the focusMacro-level” behavior is often the focus– Interesting questions related to firm performance and profitabilityInteresting questions related to firm performance and profitability

Page 15: Web Usage  Mining: An Overview

Know your visitors betterKnow your visitors better::Customer behaviorCustomer behavior

• Johnson, E. J., Wendy Moe, Peter S. Fader, Steven Bellman, and Jerry Lohse. "On the Depth Johnson, E. J., Wendy Moe, Peter S. Fader, Steven Bellman, and Jerry Lohse. "On the Depth and Dynamics of Online Search Behavior," Management Science, Vol. 50, No. 3, March 2004, and Dynamics of Online Search Behavior," Management Science, Vol. 50, No. 3, March 2004, pp. 299–308pp. 299–308

– model an individual’s tendency to search as a logarithmic processmodel an individual’s tendency to search as a logarithmic process – hierarchical Bayesian modelhierarchical Bayesian model with Depth of SearchDepth of Search , dynamics of search and dynamics of search and

activity of searchactivity of search– interested in the number of unique sites searched by each household within a interested in the number of unique sites searched by each household within a

given product categorygiven product category– Preprocessing:Preprocessing: Households identified by client-side programs, session is Households identified by client-side programs, session is

month-basedmonth-based– Method:Method: Statistical Modeling (log model)Statistical Modeling (log model)– Data:Data: Usage (search) Usage (search)

Page 16: Web Usage  Mining: An Overview

Know your visitors betterKnow your visitors better::Customer behaviorCustomer behavior

• Moe, Wendy W. 2003. Buying, searching, or browsing: Differentiating between online Moe, Wendy W. 2003. Buying, searching, or browsing: Differentiating between online shoppers using in-store navigational clickstream. shoppers using in-store navigational clickstream. J. Consumer Psych. J. Consumer Psych. 13(1, 2) 29–4013(1, 2) 29–40

– WHY WHY do the customers visit?do the customers visit? – Preprocessing:Preprocessing: Content Processing Content Processing – Method:Method: Clustering of sessions by visiting behavior parameters and Clustering of sessions by visiting behavior parameters and

content parameterscontent parameters– Data:Data: Usage & ContentUsage & Content– Conclusion:Conclusion:

Page 17: Web Usage  Mining: An Overview

Know your visitors betterKnow your visitors better::Customer behaviorCustomer behavior

• Moe, Wendy W. 2003. Buying, searching, or browsing: Differentiating between online Moe, Wendy W. 2003. Buying, searching, or browsing: Differentiating between online shoppers using in-store navigational clickstream. shoppers using in-store navigational clickstream. J. Consumer Psych. J. Consumer Psych. 13(1, 2) 29–4013(1, 2) 29–40

Page 18: Web Usage  Mining: An Overview

Know your visitors betterKnow your visitors better::Customer behaviorCustomer behavior

• Sismeiro, Catarina, Randolph E. Bucklin. 2004. Modeling Purchase Behavior at an E-Sismeiro, Catarina, Randolph E. Bucklin. 2004. Modeling Purchase Behavior at an E-Commerce Web Site: A Task Completing Approach. Commerce Web Site: A Task Completing Approach. Journal of Marketing ResearchJournal of Marketing Research. . 41 41 (3), (3), 306-323306-323

– How How do the customers visit?do the customers visit?

– Predicts online buying by linking the purchase decision to what visitors Predicts online buying by linking the purchase decision to what visitors do and to what they are exposed while at the site.do and to what they are exposed while at the site.

– Preprocessing:Preprocessing: Content Processing Content Processing – Method:Method: Statistical ModelingStatistical Modeling– Data:Data: Usage & ContentUsage & Content– Conclusion:Conclusion:

Page 19: Web Usage  Mining: An Overview

Know your visitors betterKnow your visitors better::Customer behaviorCustomer behavior

• Sismeiro, Catarina, Randolph E. Bucklin. 2004. Modeling Purchase Behavior at an E-Sismeiro, Catarina, Randolph E. Bucklin. 2004. Modeling Purchase Behavior at an E-Commerce Web Site: A Task Completing Approach. Commerce Web Site: A Task Completing Approach. Journal of Marketing ResearchJournal of Marketing Research. . 41 41 (3), (3), 306-323306-323

– browsing behavior (i.e., time and page views)browsing behavior (i.e., time and page views)– repeat visitation to the site (return and total number of sessions) repeat visitation to the site (return and total number of sessions) – use of interactive decision aidsuse of interactive decision aids– Data input effort and information gathering and processingData input effort and information gathering and processing– a series of page specific characteristicsa series of page specific characteristics

Page 20: Web Usage  Mining: An Overview

Know your visitors betterKnow your visitors better::Customer behaviorCustomer behavior

• My Research: Online Customer LifetimeMy Research: Online Customer Lifetime

– predict an individual’s tendency to stay with an e-tailerpredict an individual’s tendency to stay with an e-tailer – Hybrid BG/NBD modelHybrid BG/NBD model + Neural Networks– interested in the relationship between online customer lifetime and firm interested in the relationship between online customer lifetime and firm

profitabilityprofitability– Preprocessing:Preprocessing: Households identified by client-side programs, session is Households identified by client-side programs, session is

month-basedmonth-based– Method:Method: Statistical Modeling & ClassificationStatistical Modeling & Classification– Data:Data: Usage Usage

Page 21: Web Usage  Mining: An Overview

Know your visitors betterKnow your visitors better::Customer behaviorCustomer behavior

• My Research: Online Customer LifetimeMy Research: Online Customer Lifetime

• Given N customers with visiting history (Xi, txi, T )– T : the observed time period– Xi : number of visits customer i made during T – txi: time of the last visit made by customer i

• Find the best fit for the following maximum likelihood equation to estimate the four parameters r, a, b and

01

( , ) ( ) ( 1, 1) ( )[ ]

( , ) ( )( ) ( , ) ( )( )

r rN

xr x r xi x

B a b x r x B a b x r x

B a b r T B a b r t

Page 22: Web Usage  Mining: An Overview

Know your visitors betterKnow your visitors better::Customer behaviorCustomer behavior

• Given r, a, b and , we can predict:– Total number of visits during a time period t (starting from time 0)

– Number of visits an individual will make in the future t time units Y(t) (from T+1 to T+t)

1[1 ( ) ( , ; 1; )]*

1ra b tF r b a b N

a T t

0

1[1 ( ) ( , ; 1; )]

1

1 ( )1

r x

r xx

x

a b x T tF r x b x a b x

a T t T ta T

b x t

Page 23: Web Usage  Mining: An Overview

Know your visitors betterKnow your visitors better::Customer behaviorCustomer behavior

Product Type Company Number of visitors Calibration period Testing period Mean LifetimePercentage Right censoredB. Acc Acc.5 1 125.8 44.91% 75.27% 77.82%4 2 136 42.86% 72.45% 74.90%5 1 123.93 23.73% 67.23% 77.40%4 2 128.16 38.98% 79.10% 70.62%5 1 131.31 43.75% 88.16% 78.62%4 2 126.96 47.70% 80.59% 70.07%5 1 74.16 28.08% 84.21% 78.95%4 2 72.28 21.06% 78.95% 75.43%5 1 100.96 30.34% 80.89% 74.72%4 2 102.97 18.44% 78.77% 70.95%5 1 45.6 6.25% 71.88% 81.25%4 2 51.5 9.38% 87.50% 75.00%5 1 101.41 39.77% 79.31% 80.68%4 2 113.85 39.77% 78.41% 72.73%5 1 52.56 14.82% 74.07% 77.78%4 2 63.92 12.96% 72.22% 79.63%

Search Goods

Amazon

Drugstore

1267

BMG 177

Columbia 304

57

Ticketmaster 179

Experience Goods

landsend 32

doldNavy 88

victoriassecret 54

•My Research: Online Customer LifetimeMy Research: Online Customer Lifetime

Page 24: Web Usage  Mining: An Overview

Web Usage Mining: The FutureWeb Usage Mining: The Future