McGraw-Hill/Irwin ©2007 The McGraw-Hill Companies, Inc. All rights reserved
3-2
Contemporary Database
• Gain competitive advantage – customer information systems
• data mining
• Develop and market new products• micromarketing
McGraw-Hill/Irwin ©2007 The McGraw-Hill Companies, Inc. All rights reserved
3-3
Systems• Database
– Personal, small business level
• On-Line Analytic Processing (OLAP)– Ability to use many dimensions, reports & graphics
• Data Mart– Usually temporary analysis
• Data Warehouse– Usually permanent repository
McGraw-Hill/Irwin ©2007 The McGraw-Hill Companies, Inc. All rights reserved
3-4
Data WarehousingPrice Waterhouse definition:A data warehouse is an orderly and accessible
repository of known facts and related data that is used as a basis for making better management decisions. The data warehouse provides a unified repository of consistent data for decision making that is subject oriented, integrated, time variant, and nonvolatile.
McGraw-Hill/Irwin ©2007 The McGraw-Hill Companies, Inc. All rights reserved
3-5
Data Warehousing
• Provide business users views of data appropriate to mission
• Consolidate & reconcile data
• Give macro views of critical aspects
• Timely & detailed access to information
• Provide specific information to groups
• Ability to identify trends
McGraw-Hill/Irwin ©2007 The McGraw-Hill Companies, Inc. All rights reserved
3-6
Data Warehousing
Price Waterhouse:
Not just a technology;
an architecture and process designed to support decision making
special-purpose database systems to improve query performance significantly
index, partition, pre-aggregate data
McGraw-Hill/Irwin ©2007 The McGraw-Hill Companies, Inc. All rights reserved
3-7
Data Warehousing
Beyond OLAP: Data warehouseOLAP On-Line Transactional Processing
summary data detailed operational data
few users many concurrent users
data driven transaction driven
effectiveness efficiency
use EIS, spreadsheets to access
McGraw-Hill/Irwin ©2007 The McGraw-Hill Companies, Inc. All rights reserved
3-8
Data Marts
• Intermediate-level database system
• Often used as temporary storage– Gather data for study from data
warehouse, other sources (including external)
– Clean & transform for data mining
McGraw-Hill/Irwin ©2007 The McGraw-Hill Companies, Inc. All rights reserved
3-9
OLAP• Multidimensional spreadsheet• Hypercube – term to reflect ability to sort on
many dimensions• Many forms
– MOLAP – multidimensional– ROLAP – relational (uses SQL)– DOLAP – desktop– WOLAP – web enabled– HOLAP - hybrid
McGraw-Hill/Irwin ©2007 The McGraw-Hill Companies, Inc. All rights reserved
3-10
Key Concepts• Scalability
– Ability to accurately cope with changing conditions (especially magnitude of computing)
• Granularity– Level of detail
• Data warehouse – tends to be fine granularity• OLAP – tends to aggregate to coarse granularity
McGraw-Hill/Irwin ©2007 The McGraw-Hill Companies, Inc. All rights reserved
3-11
Data Warehouse Implementation
• Reliable, comprehensive source of clean data– Accurate, complete, in correct format
• Processes– System development– Data acquisition– Data extraction for use
McGraw-Hill/Irwin ©2007 The McGraw-Hill Companies, Inc. All rights reserved
3-12
Data Warehouse Generation
• Extract data from sources
• Transform
• Clean
• Load into data warehouse– 60-80% of effort in operating data
warehouse
McGraw-Hill/Irwin ©2007 The McGraw-Hill Companies, Inc. All rights reserved
3-13
Data Extraction Routines
• Interpret data formats
• Identify changed records
• Copy information to intermediate file
McGraw-Hill/Irwin ©2007 The McGraw-Hill Companies, Inc. All rights reserved
3-14
Data Transformation• Consolidate data from multiple sources
• Filter to eliminate unnecessary details
• Clean data– eliminate incorrect entries– eliminate duplications
• Convert & translate data into proper format
• Aggregate data as designed
McGraw-Hill/Irwin ©2007 The McGraw-Hill Companies, Inc. All rights reserved
3-15
Data Management
• Retrieve information• Extraction programs• Problems:
– Required data not available– Initial data warehouse scope too broad– Not enough time to do prototyping, or
needs analysis– Insufficient senior direction
McGraw-Hill/Irwin ©2007 The McGraw-Hill Companies, Inc. All rights reserved
3-16
Meta Data
• Data to keep track of data
• Life cycle:– Manage meta data– Design data warehouse– Ensure data quality– Manage system during operations
McGraw-Hill/Irwin ©2007 The McGraw-Hill Companies, Inc. All rights reserved
3-17
Business Meta Data
• What data are available
• Source of each data element
• Frequency of data updates
• Location of specific data
• Predefined reports & queries
• Methods of data access
McGraw-Hill/Irwin ©2007 The McGraw-Hill Companies, Inc. All rights reserved
3-18
Technical Meta Data• Data source
– (internal or external)• Data preparation features
– (transformation & aggregation rules)• Logical structure of data• Physical structure & content• Data ownership• Security aspects
– (access rights, restrictions)• System information
– (date of last update, retention policy, data usage)
McGraw-Hill/Irwin ©2007 The McGraw-Hill Companies, Inc. All rights reserved
3-19
Wal-Mart’s Data Warehouse• Heavy user of IT• Core competency – supply chain distribution
– 2900 outlets– Data warehouse of 101 terabytes ($4 billion)– 65 million transactions per week– Subject-oriented, integrated, time-variant,
nonvolatile data– 65 weeks of data by item, store, day
McGraw-Hill/Irwin ©2007 The McGraw-Hill Companies, Inc. All rights reserved
3-20
Wal-Mart
• Use data warehouse to:– Support decision making– Buyers, merchandisers, logistics,
forecasters– 3,500 vendor partners can query– Can handle 35 thousand queries per week
• Benefit $12,000 per query• Some users about 1 thousand queries per day
McGraw-Hill/Irwin ©2007 The McGraw-Hill Companies, Inc. All rights reserved
3-21
Summers Rubber Company
• Distribution firm– 7 operating locations– 10,000 items– 3,000 customers
• Old system:– OLAP– Databases transactional & summarized,
distributed
McGraw-Hill/Irwin ©2007 The McGraw-Hill Companies, Inc. All rights reserved
3-22
Summers Data Storage System
• Built in-house, PCs, Access database• Visual Basic & Excel• Distributed system
– Data warehouse server controlled queries, managed resources
• Security– Passwords gave some protection– To protect from leaving employees, used data
marts with small versions of central database
McGraw-Hill/Irwin ©2007 The McGraw-Hill Companies, Inc. All rights reserved
3-23
Summers
• Move from transactional databases to new system
• Small prototype, iterative feedback from users
• Data came from many sources• Scrubbing data
– Reformatting (time units, scales, currency measures, etc.)
McGraw-Hill/Irwin ©2007 The McGraw-Hill Companies, Inc. All rights reserved
3-24
Summers – Negative features
• Too much disk space on user local drives
• Often difficult to understand & use
• Updating multiple data sites slow, limited access
• Summary data often wrong
• Couldn’t use data mining tools– Problem was aggregated data stored
McGraw-Hill/Irwin ©2007 The McGraw-Hill Companies, Inc. All rights reserved
3-25
Comparison
Product Use Duration Granularity
Warehouse Repository Permanent Finest
Mart Specific study
Temporary Aggregate
OLAP Report & analysis
Repetitive Summary
McGraw-Hill/Irwin ©2007 The McGraw-Hill Companies, Inc. All rights reserved
3-26
Examples of Data Uses
• Customer information systems
• Fingerhut
McGraw-Hill/Irwin ©2007 The McGraw-Hill Companies, Inc. All rights reserved
3-27
Customer Information Systems
• Massive databases
• Detailed information about individuals and households
• Use automated analysis– identify focused market target
McGraw-Hill/Irwin ©2007 The McGraw-Hill Companies, Inc. All rights reserved
3-28
Micromarketing• Target small groups of highly responsive
customers
• Own niches like smaller competitors
• EXAMPLES:– Great Atlantic & Pacific Tea Company (A&P)
• target customers, centralize buying
– Fingerhut• sell on credit to households <$25,000 income
McGraw-Hill/Irwin ©2007 The McGraw-Hill Companies, Inc. All rights reserved
3-29
Media Companies• R. R. Donnelley & Sons
– world’s largest printer– provide consumer & life-style data– customized individual publications
• Mass marketing has become less effective• Profit in developing niche-oriented strategy• Need marketing information technology
McGraw-Hill/Irwin ©2007 The McGraw-Hill Companies, Inc. All rights reserved
3-30
Information Overload• Retail food (groceries)
– average store - 20,000 items• larger stores 40,000 to 60,000;• with weights, flavors, etc., hundreds of thousands
– every year 10,000 new items– 550 corporate and regional buying offices– 100,000 salespeople– several hundred thousand price changes/year
McGraw-Hill/Irwin ©2007 The McGraw-Hill Companies, Inc. All rights reserved
3-31
Information Overload
• Grocery data collection– point-of-sale scanning– used to allocate shelf space– used to optimize product mix– control inventories– avoid shortages
McGraw-Hill/Irwin ©2007 The McGraw-Hill Companies, Inc. All rights reserved
3-32
Customer Information Systems
• tens of thousands of characters of information
• tens of millions of customers
• enormous data storage– hundreds of gigabytes
• parallel computing
• YOU HAVE TO BE BIG TO AFFORD
McGraw-Hill/Irwin ©2007 The McGraw-Hill Companies, Inc. All rights reserved
3-33
Customer Information Systems
• USES– adjust prices– see new product possibilities– develop promotions– personalized advertising
McGraw-Hill/Irwin ©2007 The McGraw-Hill Companies, Inc. All rights reserved
3-34
Customer Information Systems
• OPERATION– artificial intelligence
• neural networks to wade through data• identify shopping trends• segment groups of customers
McGraw-Hill/Irwin ©2007 The McGraw-Hill Companies, Inc. All rights reserved
3-35
Customer Information Systems• AIRLINE INDUSTRY
– 1980s - deregulation– number of possible fares & rates skyrocketed– SABRE - 45 million fares,
40 million changes/month– industry now dominated by
American (SABRE) & United (Apollo)– cost - hundreds of millions of dollars
McGraw-Hill/Irwin ©2007 The McGraw-Hill Companies, Inc. All rights reserved
3-36
Own the Customer• A&P
– point-of-sale scanning– frequent shopper programs
• used to build customer database• sign up, get free bonus saver cards, check cashing,
hundreds of special discounts• A&P gathers list of purchases, feeds database
– centralized buying, better inventory, advertising
McGraw-Hill/Irwin ©2007 The McGraw-Hill Companies, Inc. All rights reserved
3-37
Versioning• Assemble hundreds of versions of the same ad• Switch & reassemble products & prices• Cigarette makers
– some of most advanced database marketing– direct mail, discount coupons, freebies– have built databases on smoker
demographics– anticipate market changes, target promotions
McGraw-Hill/Irwin ©2007 The McGraw-Hill Companies, Inc. All rights reserved
3-38
Versioning• FINGERHUT
– 150 catalog mailings in 1992– based on statistically predicted consumer
response– 13 million customers, 14% annual growth– database captures 1400 pieces of
information about a household• demographics, purchasing histories
McGraw-Hill/Irwin ©2007 The McGraw-Hill Companies, Inc. All rights reserved
3-39
FINGERHUT
• identify your kid’s birthdays, send ideas– FRONT-END programs
• get new customers (purchased from others)
– TRANSITION programs• evaluate new purchasers, keep best
– BACK-END programs• maximize profit
McGraw-Hill/Irwin ©2007 The McGraw-Hill Companies, Inc. All rights reserved
3-40
FINGERHUT• FRONT-END
– newspaper, magazines, TV, postcards, catalogs
– predictive models – lists from other companies– if you respond
• TRANSITION– sort out good credit risks, good purchasers
McGraw-Hill/Irwin ©2007 The McGraw-Hill Companies, Inc. All rights reserved
3-41
FINGERHUT
• BACK-END– 80% of revenue from repeat customers– customers segmented
• 75 specialty catalogs• personalized messages
McGraw-Hill/Irwin ©2007 The McGraw-Hill Companies, Inc. All rights reserved
3-42
Marketing Budgets
• Saturated advertising channels– expenditures more than doubled in 1980s– too much advertising, too little relevant
• Shift to– promotional discounts– slotting - buy shelf space– undermines brand loyalty
McGraw-Hill/Irwin ©2007 The McGraw-Hill Companies, Inc. All rights reserved
3-43
Narrowcasting
• Cable TV
• In-store coupons
• Special monitors– doctors’ offices, airport lounges
• Interactive kiosks
• Interactive home TV shopping
McGraw-Hill/Irwin ©2007 The McGraw-Hill Companies, Inc. All rights reserved
3-44
R.R. Donnelley & Sons• Will manage customer’s database
• Supply consumer data
• Identify market segments
• Printing– Farm Journal - 8000 different
editions/month– tailored editorial & advertising content