Transcript
  • DATA WAREHOUSINGFUNDAMENTALSA Comprehensive Guide forIT Professionals

    PAULRAJ PONNIAH

    A Wiley-Interscience PublicationJOHN WILEY & SONS, INC.New York / Chichester / Weinheim / Brisbane / Singapore / Toronto

    Innodata0471463892.jpg

  • DATA WAREHOUSINGFUNDAMENTALS

  • DATA WAREHOUSINGFUNDAMENTALSA Comprehensive Guide forIT Professionals

    PAULRAJ PONNIAH

    A Wiley-Interscience PublicationJOHN WILEY & SONS, INC.New York / Chichester / Weinheim / Brisbane / Singapore / Toronto

  • Designations used by companies to distinguish their products are often claimed as trademarks. In all instanceswhere John Wiley & Sons, Inc., is aware of a claim, the product names appear in initial capital or ALL CAPITALLETTERS. Readers, however, should contact the appropriate companies for more complete information regardingtrademarks and registration.

    Copyright © 2001 by John Wiley & Sons, Inc. All rights reserved.

    No part of this publication may be reproduced, stored in a retrieval system or transmitted in any form or by any means, electronicor mechanical, including uploading, downloading, printing, decompiling, recording or otherwise, except as permitted underSections 107 or 108 of the 1976 United States Copyright Act, without the prior written permission of the Publisher. Requests tothe Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., 605 Third Avenue,New York, NY 10158-0012, (212) 850-6011, fax (212) 850-6008, E-Mail: PERMREQ @ WILEY.COM.

    This publication is designed to provide accurate and authoritative information in regard to the subject matter covered. It is sold with the understanding that the publisher is not engaged in rendering professional services. If professional advice or other expert assistance is required, theservices of a competent professional person should be sought.

    This title is also available in print as ISBN 0-471-41254-6.

    For more information about Wiley products, visit our web site at www.Wiley.com.

    http://www.Wiley.com

  • ToVimala, my loving wife

    and to

    Joseph, David, and Shobi,my dear children

  • CONTENTS

    Foreword xxi

    Preface xxiii

    Part 1 OVERVIEW AND CONCEPTS

    1 The Compelling Need for Data Warehousing 11 Chapter Objectives 11 Escalating Need for Strategic Information 21 The Information Crisis 31 Technology Trends 41 Opportunities and Risks1 Failures of Past Decision-Support Systems 71 History of Decision-Support Systems 81 Inability to Provide Information 91 Operational Versus Decision-Support Systems 91 Making the Wheels of Business Turn 101 Watching the Wheels of Business Turn 101 Different Scope, Different Purposes 101 Data Warehousing—The Only Viable Solution 121 A New Type of System Environment 121 Processing Requirements in the New Environment 121 Business Intelligence at the Data Warehouse 121 Data Warehouse Defined 131 A Simple Concept for Information Delivery 14

    vii

    6

  • 1 An Environment, Not a Product 141 A Blend of Many Technologies 141 Chapter Summary 151 Review Questions 161 Exercises 16

    2 Data Warehouse: The Building Blocks 19

    1 Chapter Objectives 191 Defining Features 201 Subject-Oriented Data 201 Integrated Data 211 Time-Variant Data 221 Nonvolatile Data 231 Data Granularity 231 Data Warehouses and Data Marts 241 How are They Different? 2511 Top-Down Versus Bottom-Up Approach 261 A Practical Approach 271 Overview of the Components 281 Source Data Component 281 Data Staging Component 311 Data Storage Component 331 Information Delivery Component 341 Metadata Component 351 Management and Control Component 351 Metadata in the Data Warehouse 351 Types of Metadata 361 Special Significance 361 Chapter Summary 361 Review Questions 371 Exercises 37

    3 Trends in Data Warehousing 39

    1 Chapter Objectives 391 Continued Growth in Data Warehousing 401 Data Warehousing is Becoming Mainstream 401 Data Warehouse Expansion 411 Vendor Solutions and Products 421 Significant Trends 431 Multiple Data Types 441 Data Visualization 461 Parallel Processing 48

    viii CONTENTS

  • 1 Query Tools 491 Browser Tools 501 Data Fusion 501 Multidimensional Analysis 511 Agent Technology 511 Syndicated Data 521 Data Warehousing and ERP 521 Data Warehousing and KM 531 Data Warehousing and CRM 541 Active Data Warehousing 561 Emergence of Standards 561 Metadata 571 OLAP 571 Web-Enabled Data Warehouse 581 The Warehouse to the Web 591 The Web to the Warehouse 591 The Web-Enabled Configuration 601 Chapter Summary 611 Review Questions 611 Exercises 62

    Part 2 PLANNING AND REQUIREMENTS

    4 Planning and Project Management 63

    1 Chapter Objectives 631 Planning Your Data Warehouse 641 Key Issues 641 Business Requirements, Not Technology 661 Top Management Support 671 Justifying Your Data Warehouse 671 The Overall Plan 681 The Data Warehouse Project 691 How is it Different? 701 Assessment of Readiness 711 The Life-Cycle Approach 711 The Development Phases 731 The Project Team 741 Organizing the Project Team 751 Roles and Responsibilities 751 Skills and Experience Levels 771 User Participation 781 Project Management Considerations 801 Guiding Principles 81

    CONTENTS ix

  • 1 Warning Signs 821 Success Factors 821 Anatomy of a Successful Project 831 Adopt a Practical Approach 841 Chapter Summary 861 Review Questions 861 Exercises 87

    5 Defining the Business Requirements 89

    1 Chapter Objectives 891 Dimensional Analysis 901 Usage of Information Unpredictable 901 Dimensional Nature of Business Data 901 Examples of Business Dimensions 921 Information Packages—A New Concept 931 Requirements Not Fully Determinate 931 Business Dimensions 951 Dimension Hierarchies/Categories 951 Key Business Metrics or Facts 961 Requirements Gathering Methods 971 Interview Techniques 991 Adapting the JAD Methodology 1021 Review of Existing Documentation 1031 Requirements Definition: Scope and Content 1041 Data Sources 1051 Data Transformation 1051 Data Storage 1051 Information Delivery 1051 Information Package Diagrams 1061 Requirements Definition Document Outline 1061 Chapter Summary 1061 Review Questions 1071 Exercises 107

    6 Requirements as the Driving Force for Data Warehousing 109

    1 Chapter Objectives 1091 Data Design 1101 Structure for Business Dimensions 1121 Structure for Key Measurements 1121 Levels of Detail 1131 The Architectural Plan 1131 Composition of the Components 114

    x CONTENTS

  • 1 Special Considerations 1151 Tools and Products 1181 Data Storage Specifications 1191 DBMS Selection 1201 Storage Sizing 1201 Information Delivery Strategy 1211 Queries and Reports 1221 Types of Analysis 1231 Information Distribution 12311 Decision Support Applications 1231 Growth and Expansion 1231 Chapter Summary 1241 Review Questions 1241 Exercises 125

    Part 3 ARCHITECTURE AND INFRASTRUCTURE

    7 The Architectural Components 127

    1 Chapter Objectives 1271 Understanding Data Warehouse Architecture 1271 Architecture: Definitions 1271 Architecture in Three Major Areas 1281 Distinguishing Characteristics 1291 Different Objectives and Scope 1301 Data Content 1301 Complex Analysis and Quick Response 1311 Flexible and Dynamic 1311 Metadata-driven 1321 Architectural Framework 1321 Architecture Supporting Flow of Data 1321 The Management and Control Module 1331 Technical Architecture 1341 Data Acquisition 1351 Data Storage 1381 Information Delivery 1401 Chapter Summary 1421 Review Questions 1421 Exercises 143

    8 Infrastructure as the Foundation for Data Warehousing 145

    1 Chapter Objectives 1451 Infrastructure Supporting Architecture 145

    CONTENTS xi

  • 1 Operational Infrastructure 1471 Physical Infrastructure 1471 Hardware and Operating Systems 1481 Platform Options 1501 Server Hardware 1581 Database Software 1641 Parallel Processing Options 1641 Selection of the DBMS 1661 Collection of Tools 1671 Architecture First, Then Tools 1681 Data Modeling 1691 Data Extraction 1691 Data Transformation 1691 Data Loading 1691 Data Quality 1691 Queries and Reports 1701 Online Analytical Processing (OLAP) 1701 Alert Systems 1701 Middleware and Connectivity 1701 Data Warehouse Management 1701 Chapter Summary 1701 Review Questions 1711 Exercises 171

    9 The Significant Role of Metadata 173

    1 Chapter Objectives 1731 Why Metadata is Important 1731 A Critical Need in the Data Warehouse 1751 Why Metadata is Vital for End-Users 1771 Why Metadata is Essential for IT 1791 Automation of Warehousing Tasks 1811 Establishing the Context of Information 1831 Metadata Types by Functional Areas 1831 Data Acquisition 1841 Data Storage 1861 Information Delivery 1861 Business Metadata 1871 Content Overview 1881 Examples of Business Metadata 1881 Content Highlights 1891 Who Benefits? 1901 Technical Metadata 190

    xii CONTENTS

  • 1 2 Content Overview 1901 2 Examples of Technical Metadata 1911 2 Content Highlights 1921 2 Who Benefits? 19212 How to Provide Metadata 1931 2 Metadata Requirements 1931 2 Sources of Metadata 1941 2 Challenges for Metadata Management 1961 2 Metadata Repository 1961 2 Metadata Integration and Standards 1981 2 Implementation Options 1991 2 Chapter Summary 2001 2 Review Questions 2011 2 Exercises 201

    Part 4 DATA DESIGN AND DATA PREPARATION

    10 Principles of Dimensional Modeling 203

    Chapter Objectives 203

    From Requirements to Data Design 203

    1 2 Design Decisions 2041 2 Dimensional Modeling Basics 2041 2 E-R Modeling Versus Dimensional Modeling 2091 2 Use of CASE Tools 209

    The STAR Schema 210

    1 2 Review of a Simple STAR Schema 2101 2 Inside a Dimension Table 2121 2 Inside the Fact Table 2141 2 The Factless Fact Table 2161 2 Data Granularity 217

    STAR Schema Keys 218

    1 2 Primary Keys 2181 2 Surrogate Keys 2191 2 Foreign Keys 219

    Advantages of the STAR Schema 220

    1 2 Easy for Users to Understand 2201 2 Optimizes Navigation 2211 2 Most Suitable for Query Processing 2221 2 STARjoin and STARindex 223

    Chapter Summary 223

    Review Questions 224

    Exercises 224

    CONTENTS xiii

  • 11 Dimensional Modeling: Advanced Topics 225

    Chapter Objectives 225

    Updates to the Dimension Tables 226

    1 2 Slowly Changing Dimensions 2261 2 Type 1 Changes: Correction of Errors 2271 2 Type 2 Changes: Preservation of History 2281 2 Type 3 Changes: Tentative Soft Revisions 230

    Miscellaneous Dimensions 231

    1 2 Large Dimensions 2311 2 Rapidly Changing Dimensions 2331 2 Junk Dimensions 235

    The Snowflake Schema 235

    1 2 Options to Normalize 2351 2 Advantages and Disadvantages 2381 2 When to Snowflake 238

    Aggregate Fact Tables 239

    1 2 Fact Table Sizes 2411 2 Need for Aggregates 2421 2 Aggregating Fact Tables 2431 2 Aggregation Options 247

    Families of STARS 249

    1 2 Snapshot and Transaction Tables 2501 2 Core and Custom Tables 2511 2 Supporting Enterprise Value Chain or Value Circle 2511 2 Conforming Dimensions 2531 2 Standardizing Facts 2541 2 Summary of Family of STARS 254

    Chapter Summary 255

    Review Questions 255

    Exercises 256

    12 Data Extraction, Transformation, and Loading 257

    Chapter Objectives 257

    ETL Overview 258

    1 2 Most Important and Most Challenging 2591 2 Time-consuming and Arduous 2601 2 ETL Requirements and Steps 2601 2 Key Factors 261

    Data Extraction 262

    1 2 Source Identification 2631 2 Data Extraction Techniques 2631 2 Evaluation of the Techniques 270

    xiv CONTENTS

  • Data Transformation 271

    1 2 Data Transformation: Basic Tasks 2721 2 Major Transformation Types 2731 2 Data Integration and Consolidation 2751 2 Transformation for Dimension Attributes 2771 2 How to Implement Transformation 277

    Data Loading 279

    1 2 Applying Data: Techniques and Processes 2801 2 Data Refresh Versus Update 2821 2 Procedure for Dimension Tables 2831 2 Fact Tables: History and Incremental Loads 284

    ETL Summary 285

    1 2 ETL Tool Options 2851 2 Reemphasizing ETL Metadata 2861 2 ETL Summary and Approach 287

    Chapter Summary 288

    Review Questions 288

    Exercises 289

    13 Data Quality: A Key to Success 291

    Chapter Objectives 291

    Why is Data Quality Critical? 292

    1 2 What is Data Quality? 2921 2 Benefits of Improved Data Quality 2951 2 Types of Data Quality Problems 296

    Data Quality Challenges 299

    1 2 Sources of Data Pollution 2991 2 Validation of Names and Addresses 3011 2 Costs of Poor Data Quality 302

    Data Quality Tools 303

    1 2 Categories of Data Cleansing Tools 3031 2 Error Discovery Features 3031 2 Data Correction Features 3031 2 The DBMS for Quality Control 304

    Data Quality Initiative 304

    1 2 Data Cleansing Decisions 3051 2 Who Should be Responsible? 3071 2 The Purification Process 3091 2 Practical Tips on Data Quality 311

    Chapter Summary 311

    Review Questions 312

    Exercises 312

    CONTENTS xv

  • Part 5 INFORMATION ACCESS AND DELIVERY

    14 Matching Information to the Classes of Users 315

    Chapter Objectives 315

    Information from the Data Warehouse 316

    1 2 Data Warehouse Versus Operational Systems 3161 2 Information Potential 3181 2 User-Information Interface 3211 2 Industry Applications 323

    Who Will Use the Information? 323

    1 2 Classes of Users 3231 2 What They Need 3261 2 How to Provide Information 329

    Information Delivery 329

    1 2 Queries 3311 2 Reports 3321 2 Analysis 3331 2 Applications 334

    Information Delivery Tools 335

    1 2 The Desktop Environment 3351 2 Methodology for Tool Selection 3351 2 Tool Selection Criteria 3381 2 Information Delivery Framework 340

    Chapter Summary 341

    Review Questions 341

    Exercises 341

    15 OLAP in the Data Warehouse 343

    Chapter Objectives 343

    Demand for Online Analytical Processing 344

    1 2 Need for Multidimensional Analysis 3441 2 Fast Access and Powerful Calculations 3451 2 Limitations of Other Analysis Methods 3471 2 OLAP is the Answer 3491 2 OLAP Definitions and Rules 3491 2 OLAP Characteristics 352

    Major Features and Functions 353

    1 2 General Features 3531 2 Dimensional Analysis 3531 2 What are Hypercubes? 3571 2 Drill-Down and Roll-Up 3601 2 Slice-and-Dice or Rotation 362

    xvi CONTENTS

  • 1 2 Uses and Benefits 3631 1OLAP Models 3631 2 Overview of Variations 3641 2 The MOLAP Model 3651 2 The ROLAP Model 3661 2 ROLAP Versus MOLAP 3671 1OLAP Implementation Considerations 3681 2 Data Design and Preparation 3681 2 Administration and Performance 3701 2 OLAP Platforms 3721 2 OLAP Tools and Products 3731 2 Implementation Steps 374

    Chapter Summary 374

    Review Questions 374

    Exercises 375

    16 Data Warehousing and the Web 377

    Chapter Objectives 377

    Web-Enabled Data Warehouse 378

    1 2 Why the Web? 3781 2 Convergence of Technologies 3801 2 Adapting the Data Warehouse for the Web 3811 2 The Web as a Data Source 382

    Web-Based Information Delivery 383

    1 2 Expanded Usage 3831 2 New Information Strategies 3851 2 Browser Technology for the Data Warehouse 3871 2 Security Issues 389

    OLAP and the Web 389

    1 2 Enterprise OLAP 3891 2 Web-OLAP Approaches 3901 2 OLAP Engine Design 390

    Building a Web-Enabled Data Warehouse 391

    1 2 Nature of the Data Webhouse 3911 2 Implementation Considerations 3931 2 Putting the Pieces Together 3941 2 Web Processing Model 394

    Chapter Summary 396

    Review Questions 396

    Exercises 396

    CONTENTS xvii

  • 17 Data Mining Basics 399

    Chapter Objectives 399

    What is Data Mining? 400

    1 2 Data Mining Defined 4011 2 The Knowledge Discovery Process 4021 2 OLAP Versus Data Mining 4041 2 Data Mining and the Data Warehouse 406

    Major Data Mining Techniques 408

    1 2 Cluster Detection 4091 2 Decision Trees 4111 2 Memory-Based Reasoning 4131 2 Link Analysis 4151 2 Neural Networks 4171 2 Genetic Algorithms 4181 2 Moving into Data Mining 419

    Data Mining Applications 422

    1 2 Benefits of Data Mining 4231 2 Applications in Retail Industry 4241 2 Applications in Telecommunications Industry 4251 2 Applications in Banking and Finance 426

    Chapter Summary 426

    Review Questions 426

    Exercises 427

    Part 6 IMPLEMENTATION AND MAINTENANCE

    18 The Physical Design Process 429

    Chapter Objectives 429

    Physical Design Steps 430

    1 2 Develop Standards 4301 2 Create Aggregates Plan 4311 2 Determine the Data Partitioning Scheme 4311 2 Establish Clustering Options 4321 2 Prepare an Indexing Strategy 4321 2 Assign Storage Structures 4321 2 Complete Physical Model 433

    Physical Design Considerations 433

    1 2 Physical Design Objectives 4331 2 From Logical Model to Physical Model 4341 2 Physical Model Components 4351 2 Significance of Standards 436

    Physical Storage 438

    xviii CONTENTS

  • 1 2 Storage Area Data Structures 4391 2 Optimizing Storage 4401 2 Using RAID Technology 4421 2 Estimating Storage Sizes 442

    Indexing the Data Warehouse 443

    1 2 Indexing Overview 4431 2 B-Tree Index 4451 2 Bitmapped Index 4461 2 Clustered Indexes 4481 2 Indexing the Fact Table 4481 2 Indexing the Dimension Tables 449

    Performance Enhancement Techniques 449

    1 2 Data Partitioning 4491 2 Data Clustering 4501 2 Parallel Processing 4501 2 Summary Levels 4511 2 Referential Integrity Checks 4511 2 Initialization Parameters 4511 2 Data Arrays 452

    Chapter Summary 452

    Review Questions 452

    Exercises 453

    19 Data Warehouse Deployment 455

    Chapter Objectives 455

    Major Deployment Activities 456

    1 2 Complete User Acceptance 4561 2 Perform Initial Loads 4571 2 Get User Desktops Ready 4581 2 Complete Initial User Training 4591 2 Institute Initial User Support 4601 2 Deploy in Stages 460

    Considerations for a Pilot 462

    1 2 When Is a Pilot Data Mart Useful? 4621 2 Types of Pilot Projects 4631 2 Choosing the Pilot 4651 2 Expanding and Integrating the Pilot 466

    Security 467

    1 2 Security Policy 4671 2 Managing User Privileges 4681 2 Password Considerations 4691 2 Security Tools 469

    CONTENTS xix

  • Backup and Recovery 470

    1 2 Why Back Up the Data Warehouse? 4701 2 Backup Strategy 4711 2 Setting Up a Practical Schedule 4721 2 Recovery 472

    Chapter Summary 473

    Review Questions 474

    Exercises 474

    20 Growth and Maintenance 477

    1 Chapter Objectives 477Monitoring the Data Warehouse 478

    1 2 Collection of Statistics 4781 2 Using Statistics for Growth Planning 4801 2 Using Statistics for Fine-Tuning 4801 2 Publishing Trends for Users 481

    User Training and Support 481

    1 2 User Training Content 4821 2 Preparing the Training Program 4821 2 Delivering the Training Program 4841 2 User Support 485

    Managing the Data Warehouse 487

    1 2 Platform Upgrades 4871 2 Managing Data Growth 4881 2 Storage Management 4881 2 ETL Management 4891 2 Data Model Revisions 4891 2 Information Delivery Enhancements 4891 2 Ongoing Fine-Tuning 490

    Chapter Summary 490

    Review Questions 491

    Exercises 491

    Appendix A. Project Life Cycle Steps and Checklists 493

    Appendix B. Critical Factors for Success 497

    Appendix C. Guidelines for Evaluating Vendor Solutions 499

    References 501

    Glossary 503

    Index 511

    xx CONTENTS

  • FOREWORD

    I am delighted to share my thoughts with information technology professionals about myfaculty colleague Paulraj Ponniah’s textbook Data Warehousing Fundamentals. In thespring of 1998, Raritan Valley Community College decided to offer a course on datawarehousing. This was mainly through the initiative of Dr. Ponniah, who had been teach-ing our database design and development course for several years. It was very difficult tofind a good textbook for a college course on data warehousing. We had to settle for a bookthat was not quite suitable. In order to make the course effective, Paul had to supplementthe book with his own data warehousing seminar materials. Our students, primarily ITprofessionals from local industries, received the course very well. Now this magnificenttextbook on data warehousing comes to you through the foresight and diligent work of Dr.Ponniah, along with the insightful support of the publishers, John Wiley and Sons.

    This book has numerous features that make it a winner:

    � The order of topics is very logical.� The choice of topics is quite appropriate for a comprehensive introductory book.

    The coverage of topics is also very well balanced.

    � The subject matter is logically structured, with chapters covering essential compo-nents of the data warehousing field. The sequence of topics is well planned to pro-vide a seamless transition from design to implementation.

    � Within each chapter, the continuity of topics is excellent.� None of the topics included in the textbook is superfluous to the basic objectives.� The material included is technically correct and up-to-date. The figures appropriate-

    ly enhance and amplify the topics.

    � Ample review questions and exercises can be found at the end of each chapter. Thisis something lacking in most books on data warehousing. These review questionsand exercises are pedagogically sound. They are designed to test the knowledge, notthe ignorance, of the reader.

    xxi

  • Dr. Ponniah’s writing style is clear and concise. Because of the simplicity and com-pleteness of this book, I believe it will find a definite market niche, particularly amongcollege students, not-so-technically savvy IT people, and data warehousing mavens.

    In spite of a plethora of books on data warehousing by luminaries such as Kimball, In-mon, Barquin, and Singh, this book fulfills a special purpose, and information technologyprofessionals will definitely benefit from reading it. In addition, the book should be wellreceived by college professors for use by students in their data warehousing courses. Toput it succinctly, this book fills a void in the midst of plenty.

    In summary, Dr. Ponniah has produced a winner for both students and experienced ITprofessionals. As someone who has been in IT education for many years, I certainly rec-ommend this book to college professors and seminar leaders for their data warehousingcourses.

    PRATAP P. REDDY, Ph.D.

    Professor and Chair of CIS DepartmentRaritan Valley Community CollegeNorth Branch, New Jersey

    xxii FOREWORD

  • PREFACE

    THIS BOOK IS FOR YOU

    Are you an information technology professional watching, with great interest, the massiveunfolding of the data warehouse movement? Are you contemplating a move into this newarea of opportunity? Are you a systems analyst, programmer, data analyst, database ad-ministrator, project leader, or software engineer eager to grasp the fundamentals of datawarehousing? Do you wonder how many different books you may have to read to learn thebasics? Are you lost in the maze of the literature and products on the subject? Do youwish for a single publication on data warehousing, clearly and specifically designed for ITprofessionals? Do you need a textbook that helps you learn the fundamentals in sufficientdepth—not more, not less? If you answered “yes” to any of the above, this book is writtenspecially for you.

    This is the one definitive book on data warehousing clearly intended for IT profession-als. The organization and presentation of the book are specially tuned for IT professionals.This book does not presume to target anyone and everyone remotely interested in the sub-ject for some reason or another, but is written to address the specific needs of IT profes-sionals like you. It does not tend to emphasize certain aspects and neglect other criticalones. The book takes you over the entire landscape of data warehousing.

    How can this book be exactly suitable for IT professionals? As a veteran IT profession-al with wide and intensive industry experience, as a successful database and data ware-housing consultant for many years, and as one who teaches data warehousing fundamen-tals in the college classroom and in public seminars, I have come to appreciate the preciseneeds of IT professionals, and in every chapter I have incorporated these requirements ofthe IT community.

    xxiii

  • THE SCENARIO

    Why are companies rushing into data warehousing? Why is there a tremendous surge ininterest? Data warehousing is no longer a purely novel idea just for research and experi-mentation. It has become a mainstream phenomenon. True, the data warehouse is not inevery doctor’s office yet, but neither is it confined to only high-end businesses. More thanhalf of all U.S. companies and a large percentage of worldwide businesses have made acommitment to data warehousing.

    In every industry across the board, from retail chain stores to financial institutions,from manufacturing enterprises to government departments, and from airline companiesto utility businesses, data warehousing is revolutionizing the way people perform businessanalysis and make strategic decisions. Every company that has a data warehouse is realiz-ing the enormous benefits translated into positive results at the bottom line. These compa-nies, now incorporating Web-based technologies, are enhancing the potential for greaterand easier delivery of vital information.

    Over the past five years, hundreds of vendors have flooded the market with numerousdata warehousing products. Vendor solutions and products run the gamut of data ware-housing—data modeling, data acquisition, data quality, data analysis, metadata, and soon. The market is already large and continues to grow.

    CHANGED ROLE OF IT

    In this scenario, information technology departments of all progressive companies per-ceive a radical change in their roles. IT is no longer required to create every report andpresent every screen for providing information to the end-users. IT is now charged withthe building of information delivery systems and letting the end-users themselves retrieveinformation in innovative ways for analysis and decision making. Data warehousing isproving to be just that type of successful information delivery system.

    IT professionals responsible for building data warehouses need to revise their mindsetsabout building applications. They have to understand that a data warehouse is not a one-size-fits-all proposition; they must get a clear understanding of the extraction of data fromsource systems, data transformations, data staging, data warehouse architecture, infra-structure, and the various methods of information delivery.

    In short, IT professionals, like you, must get a strong grip on the fundamentals of datawarehousing.

    WHAT THIS BOOK CAN DO FOR YOU

    The book is comprehensive and detailed. You will be able to study every significant topicin planning, requirements, architecture, infrastructure, design, data preparation, informa-tion delivery, deployment, and maintenance. It is specially designed for IT professionals;you will be able to follow the presentation easily because it is built upon the foundation ofyour background as an IT professional, your knowledge, and the technical terminology fa-miliar to you. It is organized logically, beginning with an overview of concepts, movingon to planning and requirements, then to architecture and infrastructure, on to data design,then to information delivery, and concluding with deployment and maintenance. This pro-

    xxiv PREFACE

  • gression is typical of what you are most familiar with in your experience and day-to-daywork.

    The book provides an interactive learning experience. It is not a one-way lecture. Youparticipate through the review questions and exercises at the end of each chapter. For eachchapter, the objectives set the theme and the summary provides a list of the topics cov-ered. You can relate each concept and technique to the data warehousing industry andmarketplace. You will notice a substantial number of industry examples. Although intend-ed as a first course on fundamentals, this book provides sufficient coverage of each topicso that you can comfortably proceed to the next step of specialization for specific roles ina data warehouse project.

    Featuring all the significant topics in appropriate measure, this book is eminently suit-able as a textbook for serious self-study, a college course, or a seminar on the essentials. Itprovides an opportunity for you to become a data warehouse expert.

    I acknowledge my indebtedness to the authors listed in the reference section at the endof the book. Their insights and observations have helped me cover adequately the topics. Imust also express my appreciation to my students and professional colleagues. Our inter-actions have enabled me to shape this textbook according to the needs of IT professionals.

    PAULRAJ PONNIAH, Ph.D. Edison, New JerseyJune 2001

    PREFACE xxv

  • DATA WAREHOUSINGFUNDAMENTALS

  • CHAPTER 1

    THE COMPELLING NEED FOR DATA WAREHOUSING

    CHAPTER OBJECTIVES

    � Understand the desperate need for strategic information� Recognize the information crisis at every enterprise� Distinguish between operational and informational systems� Learn why all past attempts to provide strategic information failed� Clearly see why data warehousing is the viable solution

    As an information technology professional, you have worked on computer applicationsas an analyst, programmer, designer, developer, database administrator, or project manag-er. You have been involved in the design, implementation, and maintenance of systemsthat support day-to-day business operations. Depending on the industries you haveworked in, you must have been involved in applications such as order processing, generalledger, inventory, in-patient billing, checking accounts, insurance claims, and so on.

    These applications are important systems that run businesses. They process orders,maintain inventory, keep the accounting books, service the clients, receive payments, andprocess claims. Without these computer systems, no modern business can survive. Com-panies started building and using these systems in the 1960s and have become completelydependent on them. As an enterprise grows larger, hundreds of computer applications areneeded to support the various business processes. These applications are effective in whatthey are designed to do. They gather, store, and process all the data needed to successfullyperform the daily operations. They provide online information and produce a variety ofreports to monitor and run the business.

    In the 1990s, as businesses grew more complex, corporations spread globally, andcompetition became fiercer, business executives became desperate for information to staycompetitive and improve the bottom line. The operational computer systems did provideinformation to run the day-to-day operations, but what the executives needed were differ-ent kinds of information that could be readily used to make strategic decisions. They

    1

  • wanted to know where to build the next warehouse, which product lines to expand, andwhich markets they should strengthen. The operational systems, important as they were,could not provide strategic information. Businesses, therefore, were compelled to turn tonew ways of getting strategic information.

    Data warehousing is a new paradigm specifically intended to provide vital strategic in-formation. In the 1990s, organizations began to achieve competitive advantage by build-ing data warehouse systems. Figure 1-1 shows a sample of strategic areas where datawarehousing is already producing results in different industries.

    We will now briefly examine a crucial question: why do enterprises really need datawarehouses? This discussion is important because unless we grasp the significance of thiscritical need, our study of data warehousing will lack motivation. So, please pay close at-tention.

    ESCALATING NEED FOR STRATEGIC INFORMATION

    While we discuss the clamor by enterprises for strategic information, we need to look atthe prevailing information crisis that is holding them back as well as the technology trendsof the past few years that are working in our favor, enabling us to provide strategic infor-mation. Our discussion of the need for strategic information will not be complete unlesswe study the opportunities provided by strategic information and the risks facing a com-pany without such information.

    Who needs strategic information in an enterprise? What exactly do we mean by strate-gic information? The executives and managers who are responsible for keeping the enter-prise competitive need information to make proper decisions. They need information toformulate the business strategies, establish goals, set objectives, and monitor results.

    Here are some examples of business objectives:

    � Retain the present customer base� Increase the customer base by 15% over the next 5 years

    2 THE COMPELLING NEED FOR DATA WAREHOUSING

    � Retail

    � Customer Loyalty

    � Market Planning

    � Financial

    � Risk Management

    � Fraud Detection

    � Airlines

    � Route Profitability

    � Yield Management

    � Manufacturing

    � Cost Reduction

    � Logistics Management

    � Utilities

    � Asset Management

    � Resource Management

    � Government

    � Manpower Planning

    � Cost Control

    Organizations achieve competitive advantage:

    Figure 1-1 Organizations’ use of data warehousing.


Top Related