web analytics: challenges in data modeling
DESCRIPTION
This presentation accompanied a great talk on Web Analytics by Anne Marie Macek, Senior Manager in Data Strategy at Marriott International, at the DC Business Intelligentsia Meetup on December 11. For more info on future events visit: http://www.meetup.com/BusinessIntelligentsiaDC/events/150884302/TRANSCRIPT
![Page 1: Web Analytics: Challenges in Data Modeling](https://reader033.vdocuments.mx/reader033/viewer/2022060108/555128acb4c905f1528b4a2f/html5/thumbnails/1.jpg)
C H A L L E N G E S I N DATA M O D E L I N G
WEB ANALYTICS
![Page 2: Web Analytics: Challenges in Data Modeling](https://reader033.vdocuments.mx/reader033/viewer/2022060108/555128acb4c905f1528b4a2f/html5/thumbnails/2.jpg)
AGENDA
• Introduction to Web Analytics• Data Sources, Data Capture• Vocabulary
• Data Modeling Basics• Relational vs. Dimensional• Normalization, De-normalization, Aggregation
• Web Analytics + Data Modeling• Four-tiered Data Model for Web data• Challenges
• Q & A
![Page 3: Web Analytics: Challenges in Data Modeling](https://reader033.vdocuments.mx/reader033/viewer/2022060108/555128acb4c905f1528b4a2f/html5/thumbnails/3.jpg)
INTRODUCTION
• Anne Marie Macek• Senior Manager, Data Strategy• Consumer Insight and Revenue Strategy• Marriott International
• 30+ years Data Modeling and Reporting• 14+ years Data Warehousing and Business
Intelligence• 4+ years Web Analytics Data and Reporting• MBA, Management Information Systems• BS, Mathematics and Computer Science
![Page 4: Web Analytics: Challenges in Data Modeling](https://reader033.vdocuments.mx/reader033/viewer/2022060108/555128acb4c905f1528b4a2f/html5/thumbnails/4.jpg)
EXPERIENCE
• Data Modeling:• Flat Files, IMS/DB, DB2, Oracle, Netezza• MS Access, Borland Paradox• Cognos Powerplay, MS Analysis Services, Cognos 10.2
Dynamic Cubes
• Reporting:• COBOL, Focus, SAS, Actuate• Cognos BI Suite
• Business Functions:• eCommerce, Revenue Management, Sales & Marketing• Human Resources, Finance
![Page 5: Web Analytics: Challenges in Data Modeling](https://reader033.vdocuments.mx/reader033/viewer/2022060108/555128acb4c905f1528b4a2f/html5/thumbnails/5.jpg)
DEFINITION
• Web analytics is the measurement, collection, analysis and reporting of internet data for purposes of understanding and optimizing web usage.
Source: Wikipedia
![Page 6: Web Analytics: Challenges in Data Modeling](https://reader033.vdocuments.mx/reader033/viewer/2022060108/555128acb4c905f1528b4a2f/html5/thumbnails/6.jpg)
OBJECTIVES
• Website Performance• Conversion Rate ($ sales / # visits)• Trends over time• In Response to Campaigns
• Website Optimization• Customer Behavior • Technological Trends
• Integration• Customer Lifetime Value / Segmentation
• Personalization• Proactive display of pertinent information
![Page 7: Web Analytics: Challenges in Data Modeling](https://reader033.vdocuments.mx/reader033/viewer/2022060108/555128acb4c905f1528b4a2f/html5/thumbnails/7.jpg)
DATA SOURCES
• Click-stream Data• Search Engine Optimization (SEO)• Campaign Classification• Email Campaigns• Advertising Impressions• 3rd Party Marketing Data• IP Geolocation• Competitive Analysis• Customer Information• Multi-channel Analysis• Outcome Data
![Page 8: Web Analytics: Challenges in Data Modeling](https://reader033.vdocuments.mx/reader033/viewer/2022060108/555128acb4c905f1528b4a2f/html5/thumbnails/8.jpg)
CLICKSTREAM COLLECTION
• Web Log Files• Rudimentary data collected on company’s web server• Page name, IP address, browser, date/time
• Does not screen out search engine robots
• JavaScript Tagging (Google Analytics, Omniture, WebTrends)• As page loads, data is sent to 3rd party for collection• Assigns a cookie to the user• Can implement custom tags on specific pages• Does not count pages served from cache
• Packet Sniffers (Cloudmeter Pion, Tealeaf CX Connect)• Software or hardware layer installed on web servers• Parsing raw data, and ensuring PII can be complex
![Page 9: Web Analytics: Challenges in Data Modeling](https://reader033.vdocuments.mx/reader033/viewer/2022060108/555128acb4c905f1528b4a2f/html5/thumbnails/9.jpg)
CLICKSTREAM ANALYSIS
• Number of Visitors • Total vs. Unique• New vs. Repeat
• Source of Visit (Session)• External Link (Campaign Analysis / Attribution)• Direct
• Searches Performed On Site• Keywords• Sort Order of Results
• Page Analysis• Specific Actions Performed• Order (Booking)• Signup for Membership, Credit Card, Event
• Abandonment (Bounce Rate)
![Page 10: Web Analytics: Challenges in Data Modeling](https://reader033.vdocuments.mx/reader033/viewer/2022060108/555128acb4c905f1528b4a2f/html5/thumbnails/10.jpg)
BRINGING CLICKSTREAM IN-HOUSE
• Control/Consolidate Business Rules• Integration with Corporate Systems of Record • Single Version of the Truth
• Integration with Other Web Data Sources• Enable more “intelligent” metrics• Not all visits are a conversion opportunity
• Shift from “visit analysis” to “customer analysis”• Enable advanced statistical and predictive
modeling• Multi-touch Attribution• Pay Per Click (PPC) Keyword Bid Optimization
![Page 11: Web Analytics: Challenges in Data Modeling](https://reader033.vdocuments.mx/reader033/viewer/2022060108/555128acb4c905f1528b4a2f/html5/thumbnails/11.jpg)
CLICKSTREAM CHALLENGES
• “Clickstream data … is delightfully complex, ever changing, and full of mysterious occurrences.” Avinash Kaushik, Web Analytics: An Hour a Day
• Volume• Cons- It’s big• Pros- It’s incremental
• Fairly Unstructured• Exceptions to every rule• Mobile App vs. Mobile Web vs. Desktop• Rapidly Changing• Most queries require trending YTD + 2 years’ history• Few “natural” metrics; most require count (distinct)• How do I model this data??
![Page 12: Web Analytics: Challenges in Data Modeling](https://reader033.vdocuments.mx/reader033/viewer/2022060108/555128acb4c905f1528b4a2f/html5/thumbnails/12.jpg)
DATA WAREHOUSE APPROACHES
Bill Inmon
• DW is Central Repository of all Enterprise Data• “Top Down”• Relational Model (3NF)• Feeds Functional Data
Marts• Huge Undertaking
Ralph Kimball
• DW is the “Virtual” Integration of Various Functional Data Marts• “Bottom Up”• Dimensional Model• Quicker to Develop• Silo-ed and Redundant
![Page 13: Web Analytics: Challenges in Data Modeling](https://reader033.vdocuments.mx/reader033/viewer/2022060108/555128acb4c905f1528b4a2f/html5/thumbnails/13.jpg)
RELATIONAL MODEL
Source: sqlservercentral.com
![Page 14: Web Analytics: Challenges in Data Modeling](https://reader033.vdocuments.mx/reader033/viewer/2022060108/555128acb4c905f1528b4a2f/html5/thumbnails/14.jpg)
DIMENSIONAL MODELS
Star Schema Snowflake Schema
Source: Wikipedia
![Page 15: Web Analytics: Challenges in Data Modeling](https://reader033.vdocuments.mx/reader033/viewer/2022060108/555128acb4c905f1528b4a2f/html5/thumbnails/15.jpg)
NORMALIZATION
• Removes redundancy and dependency from data structures.
• 1NF: Remove Repeating Groups• 2NF: Remove Partial Key Dependencies• 3NF: Remove Dependencies Among Attributes
• Tutorial: http://phlonx.com/resources/nf3/
• Data Warehouses require some De-Normalization to improve query performance
![Page 16: Web Analytics: Challenges in Data Modeling](https://reader033.vdocuments.mx/reader033/viewer/2022060108/555128acb4c905f1528b4a2f/html5/thumbnails/16.jpg)
ECOMMERCE DATA WAREHOUSE
Native Source Model
Fact Model BI ModelAggregate
Model
![Page 17: Web Analytics: Challenges in Data Modeling](https://reader033.vdocuments.mx/reader033/viewer/2022060108/555128acb4c905f1528b4a2f/html5/thumbnails/17.jpg)
NATIVE SOURCE MODEL
Plus
• In-database copy of the source data• Stores data elements
we are not yet ready to model further• Maintains details for
research purposes• Prevents repeating
historical conversion
Minus
• Huge• Unstructured• Not normalized (at all)• Not useful for analysis
or reporting
![Page 18: Web Analytics: Challenges in Data Modeling](https://reader033.vdocuments.mx/reader033/viewer/2022060108/555128acb4c905f1528b4a2f/html5/thumbnails/18.jpg)
NATIVE SOURCE MODEL
![Page 19: Web Analytics: Challenges in Data Modeling](https://reader033.vdocuments.mx/reader033/viewer/2022060108/555128acb4c905f1528b4a2f/html5/thumbnails/19.jpg)
FACT MODEL
Plus
• “Snow-relational”• Nearly Normalized
(optimized for load)• Multiple Fact &
Extension Tables (manage I/O)
• Granular (click row)• Contains keys to
integrate with enterprise data
Minus
• Complex load including propagation and look-back• Use requires non-
filtered joins of massive tables• Difficult to use for
analysis, cannot be used for reporting
![Page 20: Web Analytics: Challenges in Data Modeling](https://reader033.vdocuments.mx/reader033/viewer/2022060108/555128acb4c905f1528b4a2f/html5/thumbnails/20.jpg)
FACT MODEL
![Page 21: Web Analytics: Challenges in Data Modeling](https://reader033.vdocuments.mx/reader033/viewer/2022060108/555128acb4c905f1528b4a2f/html5/thumbnails/21.jpg)
BI MODEL
Plus
• “Star-flake” Model• De-normalized
(optimized for query)• Pre-joined• Granular (click row)• Integrated with
enterprise data at load time
• Useful for detailed analysis
Minus
• Complex load process• It’s still big!• Corrections to Fact
Model data issues require re-build or complex conversion processes• Difficult to use for
reporting
![Page 22: Web Analytics: Challenges in Data Modeling](https://reader033.vdocuments.mx/reader033/viewer/2022060108/555128acb4c905f1528b4a2f/html5/thumbnails/22.jpg)
BI MODEL
![Page 23: Web Analytics: Challenges in Data Modeling](https://reader033.vdocuments.mx/reader033/viewer/2022060108/555128acb4c905f1528b4a2f/html5/thumbnails/23.jpg)
AGGREGATE MODEL
Plus
• Star Schema (simple)• De-normalized
(optimized for query)• Aggregated• Fast query
performance• Great for pre-
determined reports
Minus
• Corrections to Fact Model data issues and embedded dimensions require re-build• Count distincts only
available for pre-determined dimensions• Limited use for
analysis
![Page 24: Web Analytics: Challenges in Data Modeling](https://reader033.vdocuments.mx/reader033/viewer/2022060108/555128acb4c905f1528b4a2f/html5/thumbnails/24.jpg)
AGGREGATE MODEL
![Page 25: Web Analytics: Challenges in Data Modeling](https://reader033.vdocuments.mx/reader033/viewer/2022060108/555128acb4c905f1528b4a2f/html5/thumbnails/25.jpg)
QUESTIONS?
• Thank You!