the data warehouse ebusiness dba handbook 2003

Upload: allqoo-seo-baidu

Post on 30-May-2018

215 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/14/2019 The Data Warehouse eBusiness DBA Handbook 2003

    1/219

  • 8/14/2019 The Data Warehouse eBusiness DBA Handbook 2003

    2/219

  • 8/14/2019 The Data Warehouse eBusiness DBA Handbook 2003

    3/219

    The Data Warehouse eBusiness DBAHandbook

    Donald K. BurlesonJoseph Hudicka

    William H.InmonCraig MullinsFabian Pascal

  • 8/14/2019 The Data Warehouse eBusiness DBA Handbook 2003

    4/219

  • 8/14/2019 The Data Warehouse eBusiness DBA Handbook 2003

    5/219

    The Data Warehouse eBusiness DBAHandbook

    By Donald K. Burleson, Joseph Hudicka, William H. Inmon,Craig Mullins, Fabian Pascal

    Copyright 2003 by BMC Software and DBAzine. Used with permission.

    Printed in the United States of America.

    Series Editor: Donald K. Burleson

    Production Manager: John Lavender

    Production Editor: Teri Wade

    Cover Design: Bryan Hoff

    Printing History:

    August, 2003 for First Edition

    Oracle, Oracle7, Oracle8, Oracle8i and Oracle9i are trademarks of Oracle Corporation.

    Many of the designations used by computer vendors to distinguish their products areclaimed as Trademarks. All names known to Rampant TechPress to be trademark namesappear in this text as initial caps.

    The information provided by the authors of this work is believed to be accurate andreliable, but because of the possibility of human error by our authors and staff, BMCSoftware, DBAZine and Rampant TechPress cannot guarantee the accuracy orcompleteness of any information included in this work and is not responsible for anyerrors, omissions or inaccurate results obtained from the use of information or scripts inthis work.

    Links to external sites are subject to change; DBAZine.com, BMC Software andRampant TechPress do not control or endorse the content of these external web sites,and are not responsible for their content.

    ISBN 0-9740716-2-5

    iii The Data Warehousing eBusiness DBA Handbook

  • 8/14/2019 The Data Warehouse eBusiness DBA Handbook 2003

    6/219

    Table of Contents

    Conventions Used in this Book .....................................................ixAbout the Authors ...........................................................................xiForeword..........................................................................................xiii

    Chapter 1 - Data Warehousing and eBusiness....................... 1Making the Most of E-business by W. H. Inmon........................1

    Chapter 2 - The Benefits of Data Warehousing.....................9The Data Warehouse Foundation by W. H. Inmon ....................9References........................................................................................ 18

    Chapter 3 - The Value of the Data Warehouse .................... 19

    The Foundations of E-Business by W. H. Inmon .................... 19 Why the Internet? ........................................................................... 19Intelligent Messages........................................................................ 20Integration, History and Versatility.............................................. 21

    The Value of Historical Data........................................................ 22Integrated Data ............................................................................... 23Looking Smarter ............................................................................. 26

    Chapter 4 - The Role of the eDBA.......................................28Logic, e-Business, and the Procedural eDBA by Craig S.Mullins.............................................................................................. 28

    The Classic Role of the DBA ....................................................... 28The Trend of Storing Process With Data................................... 30Database Code Objects and e-Business...................................... 32Database Code Object Programming Languages...................... 34

    The Duality of the DBA................................................................ 35The Role of the Procedural DBA ................................................ 37Synopsis............................................................................................ 38

    Chapter 5 - Building a Solid Information Architecture .......39

    iv The Data Warehousing eBusiness DBA Handbook

    How to Select the Optimal Information Exchange Architectureby Joseph Hudicka.......................................................................... 39

  • 8/14/2019 The Data Warehouse eBusiness DBA Handbook 2003

    7/219

  • 8/14/2019 The Data Warehouse eBusiness DBA Handbook 2003

    8/219

    Addressing the Issues..................................................................... 87Chapter 11 - The Importance of Data Quality Strategy ....... 88

    Develop a Data Quality Strategy Before Implementing a Data

    Warehouse by Joseph Hudicka..................................................... 88

    Data Quality Problems in the Real World.................................. 88Why Data Quality Problems Go Unresolved ............................ 89Fraudulent Data Quality Problems.............................................. 90

    The Seriousness of Data Quality Problems................................ 91Data Collection ............................................................................... 92Solutions for Data Quality Issues ................................................ 92

    Option 1: Integrated Data Warehouse ................................... 92

    Option 2: Value Rules ............................................................... 94Option 3: Deferred Validation................................................. 94

    Periodic sampling averts future disasters .................................... 94Conclusion....................................................................................... 96

    Chapter 12 - Data Modeling and eBusiness......................... 97Data Modeling for the Data Warehouse by W. H. Inmon ...... 97"Just the Facts, Ma'am" ................................................................. 97

    Modeling Atomic Data.............................................................. 98Through Data Attributes, Many Classes of Subject Areas AreAccumulated ............................................................................. 100

    Other Possibilities -- - Generic Data Models........................... 103Design Continuity from One Iteration of Development to the

    Next ................................................................................................ 104

    Chapter 13 - Don't Forget the Customer ........................... 105Interacting with the Internet Viewer by W. H. Inmon........... 105IN SUMMARY............................................................................. 113

    Chapter 14 - Getting Smart..................................................114Elasticity and Pricing: Getting Smart by W. H. Inmon.......... 114Historically Speaking.................................................................... 114At the Price Breaking Point ........................................................ 116

    vi The Data Warehousing eBusiness DBA HandbookHow Good Are the Numbers .................................................... 117

  • 8/14/2019 The Data Warehouse eBusiness DBA Handbook 2003

    9/219

    How Elastic Is the Price .............................................................. 118Conclusion..................................................................................... 120

    Chapter 15 - Tools of the Trade: Java .................................121The eDBA and Java by Craig S. Mullins................................... 121

    What is Java?.................................................................................. 121Why is Java Important to an eDBA?......................................... 122How can Java improve availability? ........................................... 123How Will Java Impact the Job of the eDBA?.......................... 124Resistance is Futile........................................................................ 127Conclusion..................................................................................... 128

    Chapter 16 - Tools of the Trade: XML............................... 129New Technologies of the eDBA: XML by Craig S. Mullins . 129

    What is XML? ............................................................................... 129Some Skepticism........................................................................... 132Integrating XML........................................................................... 133Defining the Future Web ............................................................ 134

    Chapter 17 - Multivalue Database Technology Pros andCons ................................................................................... 136

    MultiValue Lacks Value by Fabian Pascal ................................ 136References...................................................................................... 144

    Chapter 18 - Securing your Data ........................................ 146Data Security Internals by Don Burleson................................. 146

    Traditional Oracle Security.......................................................... 147

    Concerns About Role-based Security........................................ 150Closing the Back Doors............................................................... 151Oracle Virtual Private Databases ............................................... 152Procedure Execution Security .................................................... 158Conclusion..................................................................................... 160

    Chapter 19 - Maintaining Efficiency.................................. 162eDBA: Online Database Reorganization by Craig S. Mullins 162Reorganizing Tablespaces ........................................................... 166

    Table of Contents vii

  • 8/14/2019 The Data Warehouse eBusiness DBA Handbook 2003

    10/219

    Online Reorganization ................................................................. 167Synopsis.......................................................................................... 168

    Chapter 20 - The Highly Available Database .................... 170The eDBA and Data Availability by Craig S. Mullins............. 170

    The First Important Issue is Availability .................................. 171What is Implied by e-vailability?................................................. 171The Impact of Downtime on an e-business............................. 175Conclusion..................................................................................... 176

    Chapter 21 - eDatabase Recovery Strategy ........................ 177The eDBA and Recovery by Craig S. Mullins.......................... 177eDatabase Recovery Strategies ................................................... 179Recovery-To-Current ................................................................... 181Point-in-Time Recovery .............................................................. 183

    Transaction Recovery................................................................... 184Choosing the Optimum Recovery Strategy.............................. 188Database Design ........................................................................... 189Reducing the Risk......................................................................... 189

    Chapter 22 - Automating eDBA Tasks ...............................191Intelligent Automation of DBA Tasks by Craig S. Mullins ... 191Duties of the DBA ....................................................................... 192

    A Lot of Effort ............................................................................. 194 Intelligent Automation................................................................. 195Synopsis.......................................................................................... 196

    Chapter 23 - Where to Turn for Help................................. 197Online Resources of the eDBA by Craig S. Mullins ............... 197Usenet Newsgroups ..................................................................... 197Mailing Lists .................................................................................. 200

    Websites and Portals .................................................................... 201No eDBA Is an Island ................................................................. 203

    viii The Data Warehousing eBusiness DBA Handbook

  • 8/14/2019 The Data Warehouse eBusiness DBA Handbook 2003

    11/219

    Conventions Used in this Book

    It is critical for any technical publication to follow rigorous

    standards and employ consistent punctuation conventions tomake the text easy to read.

    However, this is not an easy task. Within Oracle there aremany types of notation that can confuse a reader. Some Oracleutilities such as STATSPACK and TKPROF are always spelledin CAPITAL letters, while Oracle parameters and procedures

    have varying naming conventions in the Oracle documentation.It is also important to remember that many Oracle commandsare case sensitive, and are always left in their original executableform, and never altered with italics or capitalization.

    Hence, all Rampant TechPress books follow these conventions:

    Parameters - All Oracle parameters will be lowercase italics.Exceptions to this rule are parameter arguments that arecommonly capitalized (KEEP pool, TKPROF), these will beleft in ALL CAPS.

    Variables All PL/SQL program variables and arguments willalso remain in lowercase italics (dbms_job, dbms_utility).

    Tables & dictionary objects All data dictionary objects arereferenced in lowercase italics (dba_indexes, v$sql ). Thisincludes all v$ and x$ views (x$kcbcbh, v$parameter ) anddictionary views (dba_tables, user_indexes).

    SQL All SQL is formatted for easy use in the code depot,and all SQL is displayed in lowercase. The main SQL terms

    (select, from, where, group by, order by, having) will alwaysappear on a separate line.

    Conventions Used in this Book ix

  • 8/14/2019 The Data Warehouse eBusiness DBA Handbook 2003

    12/219

    Programs & Products All products and programs that areknown to the author are capitalized according to the vendorspecifications (IBM, DBXray, etc). All names known byRampant TechPress to be trademark names appear in this

    text as initial caps. References to UNIX are always made inuppercase.

    x The Data Warehousing eBusiness DBA Handbook

  • 8/14/2019 The Data Warehouse eBusiness DBA Handbook 2003

    13/219

    About the Authors

    Bill Inmon is universally recognized as the "father of the data warehouse." He has more than 26 years of databasetechnology management experience and data warehousedesign expertise, and has published 36 books and more than350 articles in major computer journals. He is knownglobally for his seminars on developing data warehouses andhas been a keynote speaker for many major computingassociations. Inmon has consulted with a large number of

    Fortune 1000 clients, offering data warehouse design anddatabase management services. For more information, visitwww.BillInmon.com or call (303) 221-4000.

    Joseph Hudicka is the founder of the Information Architecture Team, an organization that specializes in data quality, datamigration, and ETL. Winner of the ODTUG Best Speakeraward for the Spring 1999 conference, Joseph is an

    internationally recognized speaker at ODTUG, OOW,IOUG-A, TDWI and many local user groups. Josephcoauthored Oracle8 Design Using UML Object Modelingfor Osborne/McGraw-Hill & Oracle Press, and has also

    written or contributed to several articles for publication inDMReview, Intelligent Enterprise and The Data

    Warehousing Institute (TDWI).

    Craig S. Mullins is a director of technology planning for BMCSoftware. He has over 15 years of experience dealing withdata and database technologies. He is the author of the bookDB2 Developer's Guide(now available in a fourth edition thatcovers up to and includes the latest release of DB2 -Version6) and is working on a book about database administration

    practices (to be published this year by Addison Wesley).

    About the Authors xi

  • 8/14/2019 The Data Warehouse eBusiness DBA Handbook 2003

    14/219

    Craig can be reached via his Website atwww.craigsmullins.com or at [email protected].

    Fabian Pascal has a national and international reputation as anindependent technology analyst, consultant, author andlecturer specializing in data management. He was affiliated

    with Codd & Date and for 20 years held various analyticaland management positions in the private and public sectors,has taught and lectured at the business and academic levels,and advised vendor and user organizations on datamanagement technology, strategy and implementation.

    Clients include IBM, Census Bureau, CIA, Apple, Borland,Cognos, UCSF, and IRS. He is founder, editor and publisherof DATABASE DEBUNKINGS(http://www.dbdebunk.com/), a Web site dedicated todispelling persistent fallacies, flaws, myths andmisconceptions prevalent in the IT industry (Chris Date is asenior contributor). Author of three books, he has published

    extensively in most trade publications, includingDM Review,Database Programming and Design, DBMS, Byte, Infoworld andComputerworld. He is author of the contrarian columnsAgainstthe Grain, Setting Matters Straight, and for The Journal ofConceptual Modeling. His third book, Practical Issues in Database

    MANAGEMENTserves as text for his seminars.

    xii The Data Warehousing eBusiness DBA Handbook

  • 8/14/2019 The Data Warehouse eBusiness DBA Handbook 2003

    15/219

    Foreword

    With the advent of cheap disk I/O subsystems, it is finally

    possible for database professionals to have databases storemultiple billions and even multiple trillions of bytes ofinformation. As the size of these databases increases tobehemoth proportions, it is the challenge of the databaseprofessionals to understand the correct techniques for loading,maintaining, and extracting information from very largedatabase management systems. The advent of cheap disks has

    also led to an explosion in business technology, where even themost modest financial investment can bring forth an onlinesystem with many billions of bytes. It is imperative that thebusiness manager understand how to manage and control large

    volumes of information while at the same time provide theconsumer with high-volume throughput and sub-secondresponse time

    This book provides you with insight into how to build thefoundation of your eBusiness application. Youll learn theimportance of the Data Warehouse in your daily operations.

    Youll gain lots of insight into how to properly design and buildyour information architecture to handle the rapid growth thateCommerce business sees today. Once your system is up and

    running, it must be maintained. There is information in thistext that goes through how to maintain online data systems toreduce downtime. Keeping your online data secure is anotherbig issue with online business. To wrap things up, youll getlinks to some of the best online resources on Data

    Warehousing.

    The purpose of this book is to give you significant insights intohow you can manage and control large volumes of data. As the

    Foreword xiii

  • 8/14/2019 The Data Warehouse eBusiness DBA Handbook 2003

    16/219

    technology has expanded to support terabyte data capacity, thechallenge to the database professionals is to understandeffective techniques for the loading and maintaining of these

    very large database systems. This book brings together some of

    the world's foremost authors on data warehousing in order toprovide you with the insights that you need to be successful inyour data warehousing endeavors.

    xiv The Data Warehousing eBusiness DBA Handbook

  • 8/14/2019 The Data Warehouse eBusiness DBA Handbook 2003

    17/219

    1Data Warehousingand eBusiness

    CHAPTER

    Making the Most of E-business

    Everywhere you look today, you see e-business. In the tradejournals. On TV. In the Wall Street Journal. Everywhere. Andthe message is that if your business is not e-business enabled,that you will be behind the curve.

    So what is all the fuss about? Behind the corporate push to getinto e-business is a Web site. Or multiple Web sites. The Website allows your corporation to have a reach into themarketplace that is direct and far reaching. Businesses that

    would never have entertained entry to foreign marketplaces andother marketplaces that are hard to access suddenly have easy

    and cheap presence. In a word, e-business opens uppossibilities that previously were impractical or evenimpossible.

    So the secret to e-business is a Web site. Right? Well almost.Indeed, a Web site is a wonderful delivery mechanism. The

    Web site allows you to go where you might not have ever been

    able to go before. But after all is said and done, a Web site ismerely a delivery mechanism. To be effective, the deliverymechanism must be allied with application of strong businesspropositions. There is a way of expressing this -- opportunity =delivery mechanism + business proposition.

    Making the Most of E-business 1

  • 8/14/2019 The Data Warehouse eBusiness DBA Handbook 2003

    18/219

    Figure 1: The web site is at the heart of e-Business

    To illustrate the limitations of a Web site, consider the personal Web sites that many people have created. If there were anyinherent business advantage to having a Web site, then thesepersonal sites would be achieving business results for theirowners. But no one thinks that just putting up a Web siteproduces results. It is what you do with the Web site that

    counts.

    To exploit the delivery mechanism that is the Webenvironment, applications are necessary. There are many kindsof applications that can be adapted to the Web environment.But the most potent, most promising applications are a classthat are called Customer Relationship Management (CRM)

    applications. CRM applications have the capability ofproducing very important business results. Executed properly,CRM applications:

    protect market share

    gain new market share

    increase revenues

    increase profits

    2 The Data Warehousing eBusiness DBA Handbook

  • 8/14/2019 The Data Warehouse eBusiness DBA Handbook 2003

    19/219

    And there's not a business around that doesn't want to do thesethings.

    So what kind of applications are we talking about here? There

    are many different flavors. Typical CRM applications include: yield management

    customer retention

    customer segmentation

    cross selling

    up selling

    household selling

    affinity analysis

    market basket analysis

    fraud detection

    credit scoring, and so forth

    In short, there are many different ways that applications can becreated to absolutely maximize the effectiveness of the Web.Stated differently, without these applications, the Webenvironment is just another Web site.

    And there are other related non-CRM applications that canimprove the bottom line of business as well. These applicationsinclude:

    quality control

    profitability analysis

    destination analysis (for airlines)

    purchasing consolidation, and the like

    Making the Most of E-business 3

  • 8/14/2019 The Data Warehouse eBusiness DBA Handbook 2003

    20/219

    In short, once the Web is enabled by supporting applications,then very real business advantage occurs.

    But applications do not just happen by themselves.

    Applications such as CRM and others are built on a foundationof data called a data warehouse. The data warehouse is at thecenter of an infrastructure called the "corporate informationfactory." Figure 2 shows the corporate information factory andthe Web environment.

    Figure 2: Sitting behind the web site is the infrastructure called the"corporate information factory"

    Figure 2 shows that the Web environment serves as a conduitinto the corporate information factory. The corporateinformation factory provides a variety of important functionsfor the Web environment:

    4 The Data Warehousing eBusiness DBA Handbook

  • 8/14/2019 The Data Warehouse eBusiness DBA Handbook 2003

    21/219

    the corporate information factory enables the Webenvironment to gather and manage an unlimited amount ofdata

    the corporate information factory creates and environment where sweeping business patterns can be detected andanalyzed

    the corporate information factory provides a place whereWeb-based data can be integrated with other corporate data

    the corporate information factory makes edited andintegrated data quickly available to the Web environment,and so forth

    In a word, the corporate information factory provides thebackground infrastructure that turns the Web from a deliverymechanism into a truly powerful tool. The differentcomponents of the corporate information factory are:

    the data warehouse

    the corporate ODS

    data marts

    the exploration warehouse

    alternative/near-line storage

    The heart of the corporate information factory is the datawarehouse. The data warehouse is a structure that contains:

    detailed, granular data

    integrated data

    historical data

    corporate data

    Making the Most of E-business 5

    A convenient way to think of the data warehouse is as astructure that contain very fine grains of sand. Different

  • 8/14/2019 The Data Warehouse eBusiness DBA Handbook 2003

    22/219

    applications take those grains of sand and reshape them intothe form and structure that is most familiar to the organization.

    One of the issues that frequently arises with applications for

    the Web is whether it is necessary to have a data warehouse insupport of the applications. Strictly speaking, it is not necessaryto have a data warehouse in support of the applications thatrun on the Web. Figure 3 shows that different applicationshave been built from the legacy foundation.

    Figure 3: Building applications without a data warehouse

    6 The Data Warehousing eBusiness DBA Handbook

  • 8/14/2019 The Data Warehouse eBusiness DBA Handbook 2003

    23/219

    In Figure 3, multiple applications have been built from thesame supporting applications. Looking at figure 3, it becomesclear that the same processing -- accessing data, gathering data,editing data, cleansing data, merging data and integrating data --

    are done for every application. Almost all of the processingshown is redundant. There is no need for every application torepeat what every other application has done. Figure 4 showsthat by building a data warehouse, the repetitive activities aredone just once.

    Figure 3: Building a data warehouse for the different applications

    Making the Most of E-business 7

  • 8/14/2019 The Data Warehouse eBusiness DBA Handbook 2003

    24/219

  • 8/14/2019 The Data Warehouse eBusiness DBA Handbook 2003

    25/219

    The Benefits of DataWarehousing

    CHAPTER

    2

    The Data Warehouse Foundation

    The Web-based e-business environment has tremendouspotential. The Web is a tremendously powerful medium fordelivery of information. But there is nothing intrinsicallypowerful about the Web other than its ability to deliverinformation. In order for the Web-based e-businessenvironment to deliver its full potential, the Web-basedenvironment requires an infrastructure in support of itsinformation processing needs. The infrastructure that bestsupports the Web is called the corporate information factory.

    At the center of the corporate information factory is a data

    warehouse.

    Fig 1 shows the basic infrastructure supporting the Web-basede-business environment.

    The Data Warehouse Foundation 9

  • 8/14/2019 The Data Warehouse eBusiness DBA Handbook 2003

    26/219

    Figure 1: the web environment and the supporting infrastructure

    The heart of the corporate information factory is the data

    warehouse. The data warehouse is the place where corporategranular integrated historical data resides.

    The data warehouse serves many functions, but the mostimportant function it serves is that of making informationavailable cheaply and quickly. Stated differently, without a data

    warehouse the cost of information goes sky high and the length

    of time required to get information is exceedingly long. If the Web-based e-business environment is to be successful, it isnecessary to have information that is cheap to access andimmediately available.

    How does the data warehouse lower the cost of gettinginformation? And how does the data warehouse greatly

    accelerate the speed with which information is available? These

    10 The Data Warehousing eBusiness DBA Handbook

  • 8/14/2019 The Data Warehouse eBusiness DBA Handbook 2003

    27/219

    issues are not immediately obvious when looking at thestructure of the corporate information factory.

    In order to explain how the data warehouse accomplishes its

    important functions, consider the seemingly innocent requestfor information in a manufacturing environment where there isno data warehouse. A financial analyst wants to find out whatcorporate sales were for the last quarter. Is this a reasonablerequest for information? Absolutely. Now, what is required toget that information?

    Figure 2:getting information from applications

    Fig 2 shows that many different sources have to be accessed to

    get the desired information. Some of the data is in IMS; some isin VSAM. Yet other files are in ADABAS. The key structure ofthe European file is different from the key structure of the

    Asian file. The parts data uses different closing dates than thetruck data. The body design for cars is called one thing in thecars file and another thing in the parts file. To get the requiredinformation takes lots of analysis, access to 10 programs and

    the ability to integrate the data. Moreover, it takes six monthsto deliver the information -- at a cost of $250,000.

    The Data Warehouse Foundation 11

  • 8/14/2019 The Data Warehouse eBusiness DBA Handbook 2003

    28/219

    These numbers are typical for a mid-sized to large corporation.In some cases these numbers are very much understated. Butthe real issue isn't the costs and length of time required for

    accessing data. The real issue is how many resources are neededfor accessing many units of information.

    Fig 3 shows that seven different types of information havebeen requested.

    Figure 3:getting information from applications for seven different reports

    The costs that were described for Fig 2 now are multiplied byseven (or whatever number of units of data are required). As

    the analyst is developing the procedures for getting the unit ofinformation required, no thought is given to gettinginformation for other units of information. Therefore each

    12 The Data Warehousing eBusiness DBA Handbook

  • 8/14/2019 The Data Warehouse eBusiness DBA Handbook 2003

    29/219

    time a new piece of information is required, the processdescribed in Fig 2 begins all over again. AS a result, the cost ofinformation spikes dramatically.

    But suppose, for example, that this organization had a data warehouse. And suppose the organization had a request forseven units of information. What would it cost to get thatinformation and how long would it take?

    Fig 4 illustrates this scenario.

    Figure 4: making a report from a data warehouse

    Once the data warehouse is built, it can serve multiple requests

    for information. The granular integrated data that resides in thedata warehouse is ideal for being shaped and reshaped. Oneanalyst can look at the data one way; another analyst can lookat the same data in yet another way. And you only have tocreate the infrastructure once. The financial analyst may spend30 minutes tracking down a unit of data, such as consolidatedsales. Or if the data is difficult to calculate it may take a day to

    get the job done. Depending on the complexity and how costsare calculated, it may cost from between $100 to $1000 to

    The Data Warehouse Foundation 13

  • 8/14/2019 The Data Warehouse eBusiness DBA Handbook 2003

    30/219

    access the data. Compare that price range to what it might costat an organization with no data warehouse, and it becomesobvious why a data warehouse makes data available quickly andcheaply.

    Of course the real difference between having a data warehouseand not having one lies in not having to build the infrastructurerequired for accessing the data. With a data warehouse, youbuild the infrastructure only once. With no data warehouse, youhave to build at least part of the infrastructure every time you

    want new data.

    In reality, however, no company goes looking for just one pieceof data. In fact, it's quite the opposite - most companies requiremany forms of data. And the need for new forms andstructures of data is recreated every day. When it comes tolooking at the larger picture - not the cost of data for a singleitem, but for the cost of data for all data - the data warehouse

    greatly eases the burden placed on the information systemsorganization. Fig 5 shows the difference between having a data

    warehouse and not having a data warehouse in the case offinding multiple types of data.

    14 The Data Warehousing eBusiness DBA Handbook

  • 8/14/2019 The Data Warehouse eBusiness DBA Handbook 2003

    31/219

    Figure 5: making seven reports from a data warehouse

    Looking at Fig 5, it's obvious that a data warehouse really doeslower the cost of getting information and greatly accelerates therate at which data can be found.

    But organizations have a habit of not looking at the big picture,preferring instead to focus on immediate needs. They look only

    up to next Tuesday and not an hour beyond it. What do short-sighted organizations see? The comparison between the data warehouse infrastructure and the need for a single unit ofinformation. Fig 6 shows this comparison.

    The Data Warehouse Foundation 15

  • 8/14/2019 The Data Warehouse eBusiness DBA Handbook 2003

    32/219

    Figure 6: when all you are looking at is a single report it appears that itis more expensive to get it from applications directly and not build a data

    warehouse

    When looking at the diagram in Fig 6, the short-term approachof not building a data warehouse is attractive. The organizationthinks only of the quick fix. And in the very short term, it isless expensive just to dive in and get data from applications

    without building a data warehouse. There are a hundredexcuses the corporation has for not looking to the long term:

    The data warehouse is so big

    We heard that data warehouses don't really work

    All we need is some quick and dirty information

    I don't have time to build a data warehouse If I build a data warehouse and pay for it, one of my

    neighbors will use the data later on and they don't have topay for it, and so forth.

    As long as a corporation insists on having nothing but a short-term focus, it will never build a data warehouse. But the minute

    the corporation takes a long-term look, the future becomes anentirely different picture. Fig 7 shows the long-term focus.

    16 The Data Warehousing eBusiness DBA Handbook

  • 8/14/2019 The Data Warehouse eBusiness DBA Handbook 2003

    33/219

    Figure 7: when you look at the larger picture you see that building a datawarehouse saves huge amounts of resources

    Fig 7 shows that when the long-term needs for information areconsidered, the data warehouse is far and away the lessexpensive than the series of short term efforts. And the lengthof time for access to information is an intangible whose worth

    is difficult to measure. No one argues that information today,right now is much more effective than information six monthsfrom now. In fact, six months from now I will have forgotten

    why I wanted the information in the first place. You simplycannot beat a data warehouse for speed and ease of access ofinformation.

    The Web environment, then, is a most promising environment.But in order to unlock the potential of the Web, informationmust be freely and cheaply available. The supportinginfrastructure of the data warehouse provides that foundationand is at the heart of the effectiveness of the Webenvironment.

    The Data Warehouse Foundation 17

  • 8/14/2019 The Data Warehouse eBusiness DBA Handbook 2003

    34/219

    References

    Inmon, W. H. - The Corporate Information Factory, 2nd edition,John Wiley, NY, NY 2000

    Inmon, W. H. - Building the Data Warehouse, 2nd edition, JohnWiley, NY, NY 1998

    Inmon, W. H. - Building the Operational Data Store, 2nd edition,John Wiley, NY, NY 1999

    Inmon, W. H. - Exploration Warehousing, John Wiley, NY, NY2000

    Website - www.BILLINMON.COM, a site containing usefulinformation about architecture, data models, articles,presentations, white papers, near line storage, exploration

    warehousing, methodologies and other important topics.

    18 The Data Warehousing eBusiness DBA Handbook

  • 8/14/2019 The Data Warehouse eBusiness DBA Handbook 2003

    35/219

    The Value of the DataWarehouse

    CHAPTER

    3

    The Foundations of E-Business

    The basis for a long-term, sound e-business competitiveadvantage is the data warehouse.

    Why the Internet?

    Consider the Internet. When you get down to it, what is theInternet good for? It is good for connectivity, and withconnectivity comes opportunity - the opportunity to sellsomebody something, to help someone, to get a messageacross. But at the same time, connectivity is ALL the Internetprovides. In order to take advantage of that connectivity, thereal competitive advantage is found in the content andpresentation of the messages that are passed along the lines ofconnectivity.

    Consider the telephone. Before the advent of the telephone,getting a message to someone was accomplished by mail or

    shouting. Then when the telephone appeared, it was possible tohave cheap and instant access to someone. But merely makinga quick call becomes a trite act. The important thing aboutmaking a telephone call quickly is what you say to the person,not the fact that you did it cheaply and quickly. The messagedelivered over the phone becomes the essence, not the phoneitself.

    With the phone you can:

    The Foundations of E-Business 19

  • 8/14/2019 The Data Warehouse eBusiness DBA Handbook 2003

    36/219

    ask your girlfriend out for Saturday night

    tell the county you aren't available for jury duty

    call in sick for work and go play golf

    find out if it had snowed in Aspen last night

    call the doctor, and so forth.

    The real value of the phone is the communication of themessage.

    The same is true of the Internet. Today, people are enamored

    of the novelty of the ability to communicate instantaneously.But where commercial advantage is concerned, the real value ofthe Internet lies in the messages that are passed throughcyberspace, not in the novelty of the passage itself.

    Intelligent Messages

    To give your messages sent via the Internet some punch, youneed intelligence behind them. And the basis of thatintelligence is the information that is buried in a data

    warehouse.

    Why is the data warehouse the basis of business intelligence?Simple. With a data warehouse, you have two facets ofinformation that have otherwise not been available: integrationand history. In years past, application systems have been builtin which each application considered only its own set ofrequirements. One application thought of a customer as onething, another application thought of a customer as somethingelse. There was no integration - no cohesive understanding ofinformation - from one application to the next.

    20 The Data Warehousing eBusiness DBA Handbook

  • 8/14/2019 The Data Warehouse eBusiness DBA Handbook 2003

    37/219

  • 8/14/2019 The Data Warehouse eBusiness DBA Handbook 2003

    38/219

    looking at data internally doesn't really have anything to do with e-business or the Internet. And the data warehouse hastremendous advantages there.

    How do the Internet and the data warehouse work together toproduce a business advantage? The Internet providesconnectivity and the data warehouse produces continuity.

    The Value of Historical Data

    Consider the value of historical data when it comes to

    understanding a customer. When you have historical data aboutcustomers, you have the key to understanding their futurebehavior. Why? Because people are creatures of habit withpredictable life patterns. The habits that we form early in ourlife stick with us throughout our life. The clothes we wear, theplace we live, the food we eat, the cars we drive, how we payour bills, how we invest, where we go on vacation - all of these

    features are set early in our adulthood. Understanding acustomer's past history then becomes a tremendous predictorof the future.

    Customers are subject to patterns. In our youth, most of usdon't have much money to invest. But as we get older, we havemore disposable income. At mid-life, our children start looking

    for colleges. At late mid-life, we start thinking about retirement.In short, there are predictable patterns of behavior thatpractically everyone experiences. Knowing the history of yourcustomer allows you to predict what the next pattern ofbehavior will be.

    What happens when you can predict your customer's behavior?Basically, you're in a position to package products and tailorthem to your customers. Having historical data that resides in a

    22 The Data Warehousing eBusiness DBA Handbook

  • 8/14/2019 The Data Warehouse eBusiness DBA Handbook 2003

    39/219

    data warehouse lets you do exactly that. Through the Internet,you reach the customer. Then, the data warehouse tells you

    what you to say to the customer to get his or her attention. Theinformation in the data warehouse allows you to craft a

    message that your customer wants to hear.

    Integrated Data

    Integrated data has a related but different effect. Suppose youare a salesperson wanting to sell something (it really doesn'tmatter what). Your boss gives you a list and says go to it. Here's

    your list:

    acct 123acct 234acct 345acct 456acct 567acct 678

    You start by making a few contacts, but you find that you'renot having much success. Most everyone on your list isn'tinterested in what you're selling.

    Now somebody suggests that you get a little integrated data.You don't know exactly what that is, but anything is better thanbeating your head against a wall. So now you have a list of very

    basic integrated data:acct 123 - John Smith - maleacct 234 - Mary Jones - femaleacct 345 - Tom Watson - maleacct 456 - Chris Ng - femaleacct 567 - Pat Wilson - maleacct 678 - Sam Freed - female

    This simple integrated data makes your life as a salesperson alittler simpler. You know not to sell bras to a male or cigars to afemale (or at least not to most females.) Your sales productivity

    Integrated Data 23

  • 8/14/2019 The Data Warehouse eBusiness DBA Handbook 2003

    40/219

  • 8/14/2019 The Data Warehouse eBusiness DBA Handbook 2003

    41/219

    acct 567- Pat Wilson - male - 68 years old - married

    - profession - retired - income - 25,000- two sons

    acct 678- Sam Freed - female - 45 years old - married- profession - pilot - income - 150,000- son and daughter

    With the new infusion of integrated information, thesalesperson can start to be very scientific about who to target.

    Trying to sell a new Ferrari to Pat Wilson is not likely toproduce any good results at all. Pat simply does not have the

    income to warrant such a purchase. But trying to sell theFerrari to Sam Freed or Tom Watson may produce someresults because they can afford it.

    Adding even more integrated information produces thefollowing results:

    acct 123

    - John Smith - male - 25 years old - single- profession - accountant - income - 35,000 - no family -owns home- net worth - 15,000 - drives Ford- school - CU - degree - BS- hobbies - golf

    acct 234

    - Mary Jones - female - 58 years old - widow- profession - teacher - income - 40,000 - daughter and

    two sons - rents- net worth - 250,000 - drives Chevrolet- school - NMSU - degree - BS- hobbies - mountain climbing

    acct 345

    - Tom Watson - male - 52 years old - married- profession - doctor - income - 250,000 - son anddaughter - owns home- net worth - 3,000,000 - drives - Mercedes

    - school - Yale - degree - MBA- hobbies - stamp collecting

    Integrated Data 25

  • 8/14/2019 The Data Warehouse eBusiness DBA Handbook 2003

    42/219

    acct 456

    - Chris Ng - female - 18 years old - single- profession - hair dresser - income - 18,000 - no family -rents- net worth - 0 - drives - Honda - school - none- degree - none- hobbies - hiking, tennis

    acct 567

    - Pat Wilson - male - 68 years old - married- profession - retired - income - 25,000 - two sons - rents- net worth - 25,000 - drives - nothing- school - U Texas - degree - PhD- hobbies - watching football

    acct 678

    - Sam Freed - female - 45 years old - married

    - profession - pilot - income - 150,000 - son anddaughter - owns home- net worth - 750,000 - drives - Toyota- school - UCLA - degree - BS- hobbies - thimble collecting

    Now the salesperson is armed with even more information.Qualifying who will be a prospect to buy is now a reasonabletask. More to the point, knowing who you are talking to on theInternet is no longer a hit-or-miss proposition. You can start tobe very accurate about what you say and what you offer. Yourmessage across the Internet becomes a lot more cogent.

    Looking Smarter

    Stated differently, with integrated data you can be a great dealmore accurate and efficient in your sales efforts. Integrated datasaves huge amounts of time that would otherwise be wasted.

    With integrated customer data, your Internet messages start tomake you look smart.

    But making sales isn't the only use for integrated information.Marketing can also make great use of this information. Itprobably doesn't make sense, for example, to market tennis

    26 The Data Warehousing eBusiness DBA Handbook

  • 8/14/2019 The Data Warehouse eBusiness DBA Handbook 2003

    43/219

    equipment to Sam Freed. Chris Ng is a much better bet forthat. And it probably doesn't make sense to market footballjerseys to Tom Watson. Instead, marketing those things to Pat

    Wilson makes the most sense. Integrated information is worth

    its weight in gold when it comes to not wasting marketingdollars and opportunities.

    The essence of the data warehouse is historical data andintegrated data. When the euphoria and the novelty of beingable to communicate with someone via the Internet wears off,the fact remains that the message being communicated is much

    more important than the means. To create meaningfulmessages, the content of the data warehouse is ideal forcommercial purposes.

    Looking Smarter 27

  • 8/14/2019 The Data Warehouse eBusiness DBA Handbook 2003

    44/219

    The Role of the eDBA CHAPTER4

    Logic, e-Business, and the Procedural eDBA

    Until recently, the domain of a database management systemwas, appropriately enough, to store, manage, and access data.Although these core capabilities are still required of a modernDBMS, additional procedural functionality is becoming not just

    a nice-to-have feature, but a necessity.

    A modern DBMS has the ability to define business rules to theDBMS instead of in a separate, application program.Specifically, all of the most popular RDBMS products supportan array of complex features and components to facilitateprocedural logic. Procedural DBMS facilities are being drivenby organizations as they move to become e-businesses.

    As the DBMS adapts to support more procedural capabilities,organizations must modify and expand the way they handledatabase management and administration. Typically, as newfeatures are added, the administrative, design, and managementof these features is assigned to the database administrator(DBA) by default. Simply dumping these new administrativeburdens on the already overworked DBA staff may not be thebest approach. But "DBA-like duties" are required toeffectively manage these procedural elements.

    The Classic Role of the DBA

    Every database programmer has their favorite "curmudgeonDBA" story. You know, those famous anecdotes that begin

    28 The Data Warehousing eBusiness DBA Handbook

  • 8/14/2019 The Data Warehouse eBusiness DBA Handbook 2003

    45/219

    with "I have a problem..." and end with "...and then he told meto stop bothering him and read the manual." DBAs simply donot have a "warm and fuzzy" image. This probably has more todo with the nature and scope of the job than anything else. The

    DBMS spans the enterprise, effectively placing the DBA on callfor the applications of the entire organization.

    To make matters worse, the role of the DBA has expandedover the years. In the pre-relational days, both database designand data access was complex. Programmers were required tocode program logic to navigate through the database and access

    data. Typically, the pre-relational DBA was assigned the task ofdesigning the hierarchic or network database design. Thisprocess usually consisted of both logical and physical databasedesign, although it was not always recognized as such at thetime. After the database was designed and created, and theDBA created backup and recovery jobs, little more than spacemanagement and reorganizations were required. I do not want

    to belittle these tasks. Pre-relational DBMS products (such asIMS and IDMS) require a complex series of utility programs tobe run in order to perform backup, recovery, andreorganization. This can consume a large amount of time,energy, and effort.

    As RDBMS products gained popularity, the role of the DBAexpanded. Of course, DBAs still designed databases, butincreasingly these were generated from logical data modelscreated by data administrators and data modelers. Now theDBA has become involved in true logical design and must beable to translate a logical design into a physical databaseimplementation. Relational database design still requires

    physical implementation decisions such as indexing,denormalization, and partitioning schemes. But, instead ofmerely concerning themselves with physical implementation

    The Classic Role of the DBA 29

  • 8/14/2019 The Data Warehouse eBusiness DBA Handbook 2003

    46/219

  • 8/14/2019 The Data Warehouse eBusiness DBA Handbook 2003

    47/219

    Stored procedures can be thought of as programs that aremaintained, administered, and executed through the RDBMS.

    The primary reason for using stored procedures is to moveapplication code off of a client workstation and on to the

    database server to reduce overhead. A client can invoke thestored procedure and then the procedure invokes multiple SQLstatements. This is preferable to the client executing multipleSQL statements directly because it minimizes network traffic,thereby enhancing performance. A stored procedure can accessand/or modify data in one or more tables. Basically, storedprocedures work like "programs" that "live" in the RDBMS.

    Triggers are event-driven specialized procedures that are storedin, and executed by, the RDBMS. Each trigger is attached to asingle, specified table. Triggers can be thought of as anadvanced form of "rule" or "constraint" written usingprocedural logic. A trigger cannot be directly called orexecuted; it is automatically executed (or "fired") by the

    RDBMS as the result of an action-usually a data modificationto the associated table. Once a trigger is created it is alwaysexecuted when its "firing" event occurs (update, insert, delete,time, etc.).

    A user-defined function, or UDF, is procedural code that works within the context of SQL statements. Each UDFprovides a result based on a set of input values. UDFs areprograms that can be executed in place of standard, built-inSQL scalar or column functions. A scalar function transformsdata for each row of a result set; a column function evaluateseach value for a particular column in each row of the results setand returns a single value. Once written, and defined to the

    RDBMS, a UDF can be used in SQL statements just like anyother built-in functions.

    The Trend of Storing Process With Data 31

  • 8/14/2019 The Data Warehouse eBusiness DBA Handbook 2003

    48/219

    Stored procedures, triggers, and UDFs are just like otherdatabase objects such as tables, views, and indexes, in that theyare controlled by the DBMS. These objects are oftencollectively referred to as database code objects, or DBCOs,

    because they are actually program code that is stored andmaintained by a database server as a database object.Depending on the particular RDBMS implementation, theseobjects may or may not "physically" reside in the RDBMS.

    They are, however, always registered to, and maintained inconjunction with, the RDBMS.

    Database Code Objects and e-Business

    The drive to develop Internet-enabled applications has led toincreased usage of database code objects. DBCOs can reducedevelopment time and everyone knows that Web-basedprojects are tasked out in Web time - there is a lot to do butlittle time in which to do it. DBCOs help because using they

    promote code reusability. Instead of replicating code onmultiple servers or within multiple application programs,DBCOs enable code to reside in a single place: the databaseserver. DBCOs can be automatically executed based on contextand activity or can be called from multiple client programs asrequired. This is preferable to cannibalizing sections ofprogram code for each new application that must be

    developed. DBCOs enable logic to be invoked from multipleprocesses instead of being re-coded into each new processevery time the code is required.

    An additional benefit of DBCOs is increased consistency. Ifevery user and every database activity (with the samerequirements) is assured of using the DBCO instead ofmultiple, replicated code segments, then you can assure thateveryone is running the same, consistent code. If each

    32 The Data Warehousing eBusiness DBA Handbook

  • 8/14/2019 The Data Warehouse eBusiness DBA Handbook 2003

    49/219

    individual user used his or her own individual and separatecode, no assurance could be given that the same business logic

    was being used by everyone. Actually, it is almost a certaintythat inconsistencies will occur. Additionally, DBCOs are useful

    for reducing the overall code maintenance effort. BecauseDBCOs exist in a single place, changes can be made quickly

    without requiring propagation of the change to multipleworkstations.

    Another common reason to employ DBCOs is to enhanceperformance. A stored procedure, for example, may result in

    enhanced performance because it may be stored in parsed (orcompiled) format thereby eliminating parser overhead.

    Additionally, stored procedures reduce network traffic becausemultiple SQL statements can be invoked with a singleexecution of a procedure instead of sending multiple requestsacross the communication lines.

    UDFs in particular are used quite often in conjunction withmultimedia data. And many e-business applications requiremultimedia instead of static text pages. UDFs can be coded tomanipulate multimedia objects that are stored in the database.For example, UDFs are available that can play audio files,search for patterns within image files, or manipulate video files.

    Finally, DBCOs can be coded to support database integrityconstraints, implement security requirements, and supportremote data access. DBCOs are useful for creating specializedmanagement functionality for the multimedia data typesrequired of leading-edge e-business applications. Indeed, thereare many benefits provided by DBCOs.

    Database Code Objects and e-Business 33

  • 8/14/2019 The Data Warehouse eBusiness DBA Handbook 2003

    50/219

    Database Code Object Programming Languages

    Because they are application logic, most server code objectsmust be created using some form of programming language.

    Check constraints and assertions do not require procedurallogic as they can typically be coded with a single predicate.

    Although different RDBMS products provide differentapproaches for DBCO development, there are three basictactics employed:

    Use a proprietary dialect of SQL extended to include

    procedural constructs Use a traditional programming language (either a 3GL or a

    4GL)

    Use a code generator to create DBCOs

    The most popular approach is to use a procedural SQL dialect.One of the biggest benefits derived from moving to a RDBMS

    is the ability to operate on sets of data with a single line ofcode. Using a single SQL statement, multiple rows can beretrieved, modified, or removed. But this very capability limitsthe viability of using SQL to create server code objects. All ofthe major RDBMS products support procedural dialects ofSQL that add looping, branching, and flow of controlstatements. The Sybase and Microsoft language is known as

    Transact-SQL, Oracle provides PL/SQL, and DB2 uses a more ANSI standard language simply called SQL procedurelanguage. Procedural SQL has major implications on databasedesign.

    Procedural SQL will look familiar to anyone who has ever written any type of SQL or coded using any type of

    programming language. Typically, procedural SQL dialectscontain constructs to support looping (while), exiting (return),

    34 The Data Warehousing eBusiness DBA Handbook

  • 8/14/2019 The Data Warehouse eBusiness DBA Handbook 2003

    51/219

    branching (goto), conditional processing (if...then...else),blocking (begin...end), and variable definition and usage. Ofcourse, the procedural SQL dialects (Transact-SQL, PL/SQL,and SQL Procedure Language) are incompatible and can not

    interoperate with one another.

    The second approach is one supported by DB2 for OS/390:using a traditional programming languages to develop forstored procedures. Once coded the program is registered toDB2 and can be referenced by SQL procedure calls.

    A final approach is to use a tool to generate the logic for theserver code object. Code generators can be used for any ofRDBMS that supports DBCOs, as long as the code generatorsupports the language required by the RDBMS product beingused. Of course, code generators can be created for anyprogramming language.

    Which is the best approach? Of course, the answer is "Itdepends!" Each approach has its strengths and weaknesses.

    Traditional programming languages are more difficult to usebut provide standards and efficiency. Procedural SQL is easierto use and more likely to be embraced by non-programmers,but is non-standard from product to product and can result insub-optimal performance.

    It would be nice if the developer had an implementationchoice, but the truth of the matter is that he must live with theapproach implemented by the RDBMS vendor.

    The Duality of the DBA

    Once DBCOs are coded and made available to the RDBMS,applications and developers will begin to rely on them.

    The Duality of the DBA 35

  • 8/14/2019 The Data Warehouse eBusiness DBA Handbook 2003

    52/219

  • 8/14/2019 The Data Warehouse eBusiness DBA Handbook 2003

    53/219

    The Role of the Procedural DBA

    The procedural DBA should be responsible for those databasemanagement activities that require procedural logic support

    and/or coding. Of course, this should include primaryresponsibility for DBCOs. Whether DBCOs are actuallyprogrammed by the procedural DBA will differ from shop toshop. This will depend on the size of the shop, the number ofDBAs available, and the scope of DBCO implementation. At aminimum, the procedural DBA should participate in and leadthe review and administration of DBCOs. Additionally, he

    should be on call for DBCO failures.

    Other procedural administrative functions that should beallocated to the procedural DBA include application codereviews, access path review and analysis (from EXPLAIN orshow plan), SQL debugging, complex SQL analysis, and re-

    writing queries for optimal execution. Off-loading these tasksto the procedural DBA will enable the traditional, data-orientedDBAs to concentrate on the actual physical design andimplementation of databases. This should result in much betterdesigned databases.

    The procedural DBA should still report through the samemanagement unit as the traditional DBA and not through theapplication programming staff. This enables better skillssharing between the two distinct DBA types. Of course, there

    will need to be a greater synergy between the procedural DBAand the application programmer/analyst. In fact, the typical jobpath for the procedural DBA should come from the applicationprogramming ranks because this is where the coding skill-base

    exists.

    The Role of the Procedural DBA 37

  • 8/14/2019 The Data Warehouse eBusiness DBA Handbook 2003

    54/219

    Synopsis

    As organizations begin to implement more procedural logicusing the capabilities of the RDBMS, database administration

    will become increasingly more complicated. The role of theDBA is rapidly expanding to the point where no singleprofessional can be reasonably expected to be an expert in allfacets of the job. It is high time that the job be explicitlydefined into manageable components.

    38 The Data Warehousing eBusiness DBA Handbook

  • 8/14/2019 The Data Warehouse eBusiness DBA Handbook 2003

    55/219

    Building a SolidInformationArchitecture

    CHAPTER

    5

    How to Select the Optimal Information ExchangeArchitecture

    IntroductionOver 80 percent of Information Technology (IT) projects fail.Startling? Maybe. Surprising? Not at all. In almost every ITproject that fails, weakly documented requirements are typicallythe reason behind the failure. And nowhere is this moreobvious than in data migration.

    As pointed out by Jim Collins book, Good to Great,technology is at best an accelerator of a companys growth. Thefact is, IT would not exist if not to improve a business and itsability to meet its demand efficiently.

    Data is the natural by-product of IT systems, which provide

    structure around the data, as it moves through various levels ofoperational processing. But is the value of data purelyoperational? If that were the case, there would be no need formigration. Companies can conduct forecasting exercises basedon ordering trends of recent or parallel time periods, projectfulfillment limits based on historic capacity measurements, ordetect fraudulent activity by analyzing insurance claim trends

    for anomalies.

    How to Select the Optimal Information Exchange 39

  • 8/14/2019 The Data Warehouse eBusiness DBA Handbook 2003

    56/219

    As more companies begin to understand the strategic value ofdata, the demands for accessing the data in new, innovative

    ways increase. This growth in information exchangerequirements is precisely why a company must carefully deploy

    a solid information exchange architecture that can grow withthe companys ever-changing information sharing needs.

    The Main Variables to Ponder

    The main variables you have to consider are throughput of dataacross the network and processing power for transformation

    and cleansing. These are formidable challenges fraught withpotential danger like that bubble that forms on the inside wallof a tire as the tread wears through, soon to give way to ablowout.

    First, get some diagnostics of the current environment:

    Data Volume Determine how much data needs to movefrom point to point (or server to server) in the informationexchange.

    Available System Resources Determine how muchprocessing power is available at each point. Take thesemeasurements at both peak and non-peak intervals.

    Transformation Requirements Estimate the amount oftransformation and cleansing to be conducted.

    Frequency Determine the frequency at which thisvolume of data will be transmitted.

    Data Volume

    40 The Data Warehousing eBusiness DBA Handbook

    Understanding how much data must be moved from point topoint will give you metrics against which you can compare yournetwork bandwidth. If your network is nearly saturated already,

  • 8/14/2019 The Data Warehouse eBusiness DBA Handbook 2003

    57/219

  • 8/14/2019 The Data Warehouse eBusiness DBA Handbook 2003

    58/219

    Optimal Architecture Components

    The optimal information exchange architecture will include asmany of the following components as warranted by the

    projects objectives:1. Data profiling

    2. Data cleansing

    3. System/network bandwidth resources

    4. ETL (Extraction, Transformation & Loading)

    5. Data monitoring

    Naturally, there are commercial products available for each ofthese components, but you can just as easily build utilities toaddress your specific objectives.

    Conclusion

    While there is no single architecture that is ideal for allInformation exchange projects, the components laid out in thispaper are the key criteria that successful information exchangeprojects address. Perhaps you can apply this five-tierarchitecture to a new information exchange project, or evaluateexisting information exchange architectures in comparison to it,

    and see if there is room for improvement. It is never too late toimprove the foundation of such a critical business tool.

    The more adept we become at sharing informationelectronically, the more rapidly our businesses can react to thedaily changes that inevitably affect the bottom line. Rapidaccess to high quality information on demand is the name of

    the game, and the first step is implementing a solid, stable,information architecture.

    42 The Data Warehousing eBusiness DBA Handbook

  • 8/14/2019 The Data Warehouse eBusiness DBA Handbook 2003

    59/219

    Data 101 CHAPTER6

    Getting Down to Data Basics

    Well, this is the fourth eDBA column I have written fordbazine and I think it's time to start over at the beginning. Upto this point we have focused on the transition from DBA toeDBA, but some e-businesses are brand new to database

    management. These organizations are implementing eDBAbefore implementing DBA. And the sad fact of the matter isthat many are not implementing any formalized type of DBA atall.

    Some daring young enterprises embark on Web-enableddatabase implementation with nothing more than a bevy ofapplication developers. This approach is sure to fail. If you takenothing else away from this article, make sure you understandthis: every organization that manages data using a databasemanagement system (DBMS) requires a databaseadministration group to ensure the effective use anddeployment of the company's databases.

    In short, e-businesses that are brand new to databasedevelopment need a primer on database design andadministration. So, with that in mind, it's time to get back todata basics.

    Data Modeling and Database Design

    Novice database developers frequently begin with the quick-and-dirty approach to database implementation. They approach

    Getting Down to Data Basics 43

  • 8/14/2019 The Data Warehouse eBusiness DBA Handbook 2003

    60/219

  • 8/14/2019 The Data Warehouse eBusiness DBA Handbook 2003

    61/219

    conforms to the first rule, the data model is said to be in "firstnormal form," and so on.

    A database design in First Normal Form (1NF) will have no

    repeating groups and each instance of an entity can beidentified by a primary key. For Second Normal Form (2NF),instances of an entity must not depend on anything other thanthe primary key for that entity. Third Normal Form (3NF)removes data elements that do not depend on the primary key.If the contents of a group of data elements can apply to morethan a single entity instance, those data elements belong in a

    separate entity.

    There are further levels of normalization that I will not discussin this column to keep the discussion moving along. For anintroductory discussion of normalization visithttp://wdvl.com/Authoring/DB/Normalization.

    Physical Database Design

    But you cannot stop after developing a logical data model in3NF. The logical model must be adapted to a physical databaseimplementation. Contrary to popular belief this is not a simpletransformation of entities to tables. Many other physical designfactors must be planned and implemented. These factors

    include:

    A relational table is not the same as a file or a data set. TheDBA must design and create the physical storage structuresto be used by the relational databases to be implemented.

    The order of columns may need to be different than thatspecified by the data model based on the functionality of

    the RDBMS being used. Column order and access may havean impact on database logging, locking, and organization.

    Physical Database Design 45

  • 8/14/2019 The Data Warehouse eBusiness DBA Handbook 2003

    62/219

    The DBA must understand these issues and transform thelogical model appropriately.

    The logical data model needs to be analyzed to determinewhich relationships need to be physically implemented usingreferential integrity (RI). Not all relationships should bedefined using RI due to processing and performancereasons.

    Indexes must be designed to ensure optimal performance. To create the proper indexes the DBA must examine thedatabase design in conjunction with the proposed SQL to

    ensure that database queries are supported with the properindexes.

    Database security and authorization must be defined for thenew database objects and its users.

    These are not simple tasks that can be performed by individualswithout database design and implementation skills. That is to

    say, DBAs are required.

    The DBA Management Discipline

    Database administration must be approached as a managementdiscipline. The term discipline implies planning andimplementation, according to that plan. When database

    administration is treated as a management discipline, thetreatment of data within your organization will improve. It isthe difference between being reactive and proactive.

    All too frequently the DBA group is overwhelmed by requestsand problems. This occurs for many reasons, includingunderstaffing, overcommitment to application development

    projects, lack of repeatable processes, lack of budget and so on.

    46 The Data Warehousing eBusiness DBA Handbook

  • 8/14/2019 The Data Warehouse eBusiness DBA Handbook 2003

    63/219

    When operating in this manner, the database administrator isbeing reactive. The reactive DBA functions more like afirefighter. His attention is focused on resolving the biggestproblem being brought to his attention. A proactive DBA can

    avoid many problems altogether by developing andimplementing a strategic blueprint to follow when deployingdatabases within their organization.

    The 17 Skills Required of a DBA

    Implementing a DBA function in your organization requires

    careful thought and planning. The previous sections of thisarticle are just a beginning. The successful eDBA will need toacquire and hone expertise in the following areas:

    Data modeling and database design. The DBA must possessthe ability to create an efficient physical database designfrom a logical data model and application specifications. The

    physical database may not conform to the logical model 100percent due to physical DBMS features, implementationfactors, or performance requirements. If the data resourcemanagement discipline has not been created, the DBA alsomust be responsible for creating data modeling,normalization, and conceptual and logical design.

    Metadata management and repository usage. The DBA is

    required to understand the technical data requirements ofthe organization. But this is not a complete description ofhis duties. Metadata, or data about the data, also must bemaintained. The DBA, or sometimes the Data Administrator(DA), must collect, store, manage, and enable the ability toquery the organization's metadata. Without metadata, thedata stored in databases lacks true meaning.

    The 17 Skills Required of a DBA 47

    Database schema creation and management. A DBA must beable to translate a data model or logical database design into

  • 8/14/2019 The Data Warehouse eBusiness DBA Handbook 2003

    64/219

    an actual physical database implementation and to managethat database once it has been implemented.

    Procedural skills. Modern databases manage more than merelydata. The DBA must possess procedural skills to helpdesign, debug, implement, and maintain stored procedures,triggers, and user-defined functions that are stored in theDBMS. For more on this topic check out

    www.craigsmullins.com/db2procd.htm.

    Capacity planning. Because data consumption and usagecontinues to grow, the DBA must be prepared to support

    more data, more users, and more connection. The ability topredict growth based on application and data usage patternsand to implement the necessary database changes toaccommodate the growth is a core capability of the DBA.

    Performance management and tuning. Dealing withperformance problems is usually the biggest post-

    implementation nightmare faced by DBAs. As such, theDBA must be able to proactively monitor the databaseenvironment and to make changes to data structures, SQL,application logic or the DBMS subsystem to optimizeperformance.

    Ensuring availability. Applications and data are more and morerequired to be up and available 24 hours a day, seven days a

    week. The DBA must be able to ensure data availabilityusing non-disruptive administration tactics.

    SQL code reviews and walk-throughs. Although applicationprogrammer usually write SQL, DBAs are usually blamed forpoor performance. Therefore, DBAs must possess in-depthSQL knowledge so they can understand and review SQL and

    host language programs and to recommend changes foroptimization.

    48 The Data Warehousing eBusiness DBA Handbook

  • 8/14/2019 The Data Warehouse eBusiness DBA Handbook 2003

    65/219

    Backup and recovery. Everyone owns insurance of some typebecause we want to be prepared in case something badhappens. Implementing robust backup and recoveryprocedures is the insurance policy of the DBA. The DBA

    must implement an appropriate database backup andrecovery strategy based on data volatility and applicationavailability requirements.

    Ensuring data integrity. DBAs must be able to design databasesso that only accurate and appropriate data is entered andmaintained. To do so, the DBA can deploy multiple types of

    database integrity including entity integrity, referentialintegrity, check constraints, and database triggers.Furthermore, the DBA must ensure the structural integrityof the database.

    General database management. The DBA is the central sourceof database knowledge in the organization. As such he mustunderstand the basic tenets of relational database technology

    and be able to accurately communicate them to others.

    Data security. The DBA is charged with the responsibility toensure that only authorized users have access to data. Thisrequires the implementation of a rigorous securityinfrastructure for production and test databases.

    General systems management and networking skills. Because

    once databases are implemented they are accessedthroughout the organization and interact with othertechnologies, the DBA must be a jack of all trades. Doing sorequires the ability to integrate database administrationrequirements and tasks with general systems managementrequirements and tasks (like job scheduling, networkmanagement, transaction processing, and so on).

    The 17 Skills Required of a DBA 49

  • 8/14/2019 The Data Warehouse eBusiness DBA Handbook 2003

    66/219

    ERP and business knowledge. For e-businesses doingEnterprise Resource Planning (ERP) the DBA mustunderstand the requirements of the application users and beable to administer their databases to avoid interruption of

    business. This sounds easy, but most ERP applications(SAP, Peoplesoft, etc.) use databases differently thanhomegrown applications. So DBAs require an understandingof how the ERP packaged applications impact the e-businessand how the databases used by those packages differ fromtraditional relational databases. Some typical differencesinclude application-enforced RI, program locking, and the

    creation of database objects (tables, indexes, etc.) on-the-fly. These differences require different DBA techniques tomanage the ERP package effectively.

    Extensible data type administration. The functionality ofmodern DBMSes can be extended using user-defined datatypes. The DBA must understand how these extended data

    types are implemented by the DBMS vendor and be able toimplement and administer any extended data typesimplemented in their databases.

    Web-specific technology expertise. For e-businesses, DBAs arerequired to have knowledge of Internet and Webtechnologies to enable databases to participate in Web-basedapplications. Examples of this type of technology includeHTTP, FTP, XML, CGI, Java, TCP/IP, Web servers,firewalls and SSL. Other DBMS-specific technologiesinclude IBM's Net.Data for DB2 and Oracle Portal(formerly WebDB).

    Storage management techniques. The data stored in everydatabase resides on disk somewhere (unless it is stored on

    one of the new Main Memory DBMS products). The DBAmust understand the storage hardware and software available

    50 The Data Warehousing eBusiness DBA Handbook

  • 8/14/2019 The Data Warehouse eBusiness DBA Handbook 2003

    67/219

    for use, and how it interacts with the DBMS being used.Storage technologies include raw devices, RAID, SANs andNAS.

    Meeting the Demand The number of mission-critical Web-based applications thatrely on back-end databases is increasing. Established andemerging e-businesses achieve enormous benefits from the

    Web/database combination, such as rapid applicationdevelopment, cross-platform deployment and robust, scalable

    access to data. E-business usage of database technology willcontinue to grow, and so will the demand for the eDBA. Makesure your organization is prepared to manage its Web-enableddatabases before moving them to production. Or be preparedto encounter plenty of problems.

    Meeting the Demand 51

  • 8/14/2019 The Data Warehouse eBusiness DBA Handbook 2003

    68/219

    Designing EfficientDatabases

    CHAPTER

    7

    Design and the eDBA

    Welcome to another installment in the ongoing saga of theeDBA. So far in this series of articles, we have discussed eDBAissues ranging including availability and database recovery, newtechnologies such as Java and XML, and even sources of on-line DBA information.

    But for this installment we venture back to the very beginningsof a relational database - to the design stage. In this article we

    will investigate the impact of e-business on the design processand discuss the basics of assuring proper database design.

    Living at Web Speed

    One of the biggest problems that an eDBA will encounter when moving from traditional development to e-businessdevelopment is coping with the mad rush to "get it doneNOW!" Industry pundits have coined the phrase "Internet

    time" to describe this phenomenon.

    Basically, when a business starts operating on "Internet time"things move faster. One "Web month" is said to be equivalentto about three standard months. The nugget of truth in thisload of malarkey is that Web projects move very fast for anumber of reasons:

    52 The Data Warehousing eBusiness DBA Handbook

  • 8/14/2019 The Data Warehouse eBusiness DBA Handbook 2003

    69/219

    Because business executives want to conduct more andmore business over the Web to save costs and to connectbetter with their clients.

    Because someone read an article in an airline magazinesaying that Web projects should move fast.

    Because everyone else is moving fast so you'd better movefast, too, or risk losing business.

    Well, two of these three reasons are quite valid. I'm sure youmay have heard other reasons for rapid applicationdevelopment (RAD). And sometimes RAD is required forcertain projects. But RAD is bad for database design. Why?

    Applications are temporary, but the data is permanent.Organizations are forever coding and re-coding theirapplications - sometimes the next incarnation of an applicationis being developed before the last one even has been moved toproduction.

    But when did you ever throw away data? Oh, sure, you mayredesign a database or move from one DBMS to another. But

    what did you do? Chances are, you saved the data and migratedit from the old database to the new one. Some changes had tobe made, maybe some external data was purchased to combine

    with the existing data, and most likely some parts of the

    database were not completely populated. But data lives forever.

    To better enable you to glean value from your data it is wise totake care when designing the database. A well-designeddatabase is easy to navigate and therefore, it is easier to retrievemeaningful data from the database.

    Living at Web Speed 53

  • 8/14/2019 The Data Warehouse eBusiness DBA Handbook 2003

    70/219

    Database Design Steps

    The DBA should create databases by transforming logical datamodels into physical implementation. It is not wise to dive

    directly into a physical design without first conducting an in-depth examination of the data needs of the business.

    Data modeling is the process of analyzing the things of interestto your organization and how these things are related to eachother. The data modeling process results in the discovery anddocumentation of the data resources of your business. Data

    modeling asks the question, "What?" instead of the morecommon data processing question, "How?"

    Before implementing databases of any sort, a sound model ofthe data to be stored in the database should be developed.Novice database developers frequently begin with the quick-and-dirty approach to database implementation. They approachdatabase design from a programming perspective.

    That is, novices do not have experience with databases and datarequirements gathering, so they attempt to design databases likethe flat files they are accustomed to using. This is a mistakebecause problems inevitably occur after the databases andapplications become operational in a production environment.

    At a minimum, performance will suffer and data may not be asreadily available as required. At worst, data integrity problemsmay arise rendering the entire application unusable.

    A proper database design cannot be thrown together quickly by

    novices. What is required is a practiced and formal approach togathering data requirements and modeling data. This modelingeffort requires a formal approach to discovering and identifying

    54 The Data Warehousing eBusiness DBA Handbook

  • 8/14/2019 The Data Warehouse eBusiness DBA Handbook 2003

    71/219

  • 8/14/2019 The Data Warehouse eBusiness DBA Handbook 2003

    72/219

    model is to perform a simple translation from logical terms tophysical objects.

    Of course, this simple transformation will not result in a

    complete and correct physical database design -- it is simply thefirst step. The transformation consists of the following:

    Identify and create the physical data structures to be used bythe database objects (for example, table spaces, segments,partitions, and files)

    Transform logical entities in the data model to physical

    tables Transform logical attributes in the data model to physical

    columns

    Transform domains in the data model to physical data typesand constraints

    Choose a primary key for each table from the list of logicalcandidate keys

    Examine column ordering to take advantage of theprocessing characteristics of the DBMS

    Build referential constraints for relationships in the datamodel

    Reexamine the physical design for performanceOf course, the above discussion is a very quick introduction toand summary of data modeling and database design. EveryDBA should understand these topics and make sure that allprojects, even e-business projects operating on "Internet time,"follow the tried and true steps to database design.

    56 The Data Warehousing eBusiness DBA Handbook

  • 8/14/2019 The Data Warehouse eBusiness DBA Handbook 2003

    73/219

    Database Design Traps

    Okay, so what if you do not practice data modeling anddatabase design? Or what if you'd like to, but are forced to

    operate on "Internet time" for certain databases?

    Well, the answer, of course, is "it depends!" The best advice Ican give you is to be aware of design failures that can result in ahostile database. A hostile database is difficult to understand,hard to query, and takes an enormous amount of effort tochange.

    Of course, it is impossible to list every type of database designflaw that could be introduced to create a hostile database. Butlet's examine some common database design failures.

    Assigning inappropriate table and column names is a commondesign error made by novices. Database names that are used tostore data should be as descriptive as possible to allow thetables and columns to self-document themselves, at least tosome extent. Application programmers are notorious forcreating database naming problems, such as using screen

    variable names for columns or coded jumbles of letters andnumbers for table names.

    When pressed for time, some DBAs resort to designing thedatabase with output in mind. This can lead to flaws such asstoring numbers in character columns because leading zeroesneed to be displayed on reports. This is usually a bad idea witha relational database. It is better to let the database systemperform the edit-checking to ensure that only numbers are

    stored in the column.

    Database Design Traps 57

  • 8/14/2019 The Data Warehouse eBusiness DBA Handbook 2003

    74/219

    If the column is created as a character column, then thedeveloper will need to program edit-checks to validate that onlynumeric data is stored in the column. It is better in terms ofintegrity and efficiency to store the data based on its domain.

    Users and programmers can format the data for display insteadof forcing the data into display mode for storage in thedatabase.

    Another common database design problem is overstuffingcolumns. This actually is a normalization issue. Sometimes asingle column is used for convenience to store what should be

    two or three columns. Such design flaws are introduced whenthe DBA does not analyze the data for patterns andrelationships. An example of overstuffing would be storing aperson's name in a single column instead of capturing firstname, middle initial, and last name as individual columns.

    Poorly designed keys can wreck the usability of a database. A

    primary key should be nonvolatile