The Data Warehouse and Technology - Building the Data Warehouse

Download The Data Warehouse and Technology - Building the Data Warehouse

Post on 03-Apr-2018

219 views

Category:

Documents

0 download

Embed Size (px)

TRANSCRIPT

<ul><li><p>7/29/2019 The Data Warehouse and Technology - Building the Data Warehouse</p><p> 1/43</p><p>Building Data WareHouse by</p><p>Inmon</p><p>Chapter 5: The Data Warehouse and Technology</p><p>http://it-slideshares.blogspot.com/</p>http://it-slideshares.blogspot.com/http://it-slideshares.blogspot.com/http://it-slideshares.blogspot.com/http://it-slideshares.blogspot.com/</li><li><p>7/29/2019 The Data Warehouse and Technology - Building the Data Warehouse</p><p> 2/43</p><p>5.0 Overview</p><p> Requires a simpler set oftechnological features than itsoperational predecessors:</p><p>Online updating: Not need. Locking, integrity: needs are minimal.</p><p> Teleprocessing interface: is required verybasic.</p><p> This chapter outlines some oftechnological requirements for thedata warehouse.</p></li><li><p>7/29/2019 The Data Warehouse and Technology - Building the Data Warehouse</p><p> 3/43</p><p>MANAGING LARGEAMOUNTS OF DATA</p><p>1. Manage Volumes</p><p>2. Manage multiple</p><p>media technology</p><p>3. Index and</p><p>monitoring data</p><p>4. Interface to</p><p>retrieve and</p><p>passing data</p></li><li><p>7/29/2019 The Data Warehouse and Technology - Building the Data Warehouse</p><p> 4/43</p><p>Managing Multiple Media</p><p>Following is a hierarchy of storage of data interms of speed of access and cost of storage: Main memory Very fast Very</p><p>expensive</p><p> Expanded memory Very fast Expensive</p><p> Cache Very fast Expensive</p><p> DASD Fast Moderate</p><p> Magnetic tape Not fast Notexpensive</p><p> Near line Not fast* Notexpensive</p><p> Optical disk Not slow Notexpensive</p><p> Fiche Slow Cheap</p><p>*Not fast to find first record sought; very fast to find all other records in the block.</p></li><li><p>7/29/2019 The Data Warehouse and Technology - Building the Data Warehouse</p><p> 5/43</p><p>Indexing and Monitoring Data</p><p>Monitoring data warehouse datadetermines such factors as the</p><p>following:</p><p> If a reorganization needs to be done If an index is poorly structured</p><p> If too much or not enough data is in</p><p>overflow The statistical composition of the access</p><p>of the data</p><p>Available remaining space</p></li><li><p>7/29/2019 The Data Warehouse and Technology - Building the Data Warehouse</p><p> 6/43</p><p>Interfaces to Many</p><p>TechnologiesThe interface to different technologies requires</p><p>several considerations: Does the data pass from one DBMS to another</p><p>easily?</p><p> Does it pass from one operating system toanother easily?</p><p> Does it change its basic format in passage(EBCDIC, ASCII, and so forth)?</p><p> Can passage into multidimensional processingbe done easily?</p><p> Can selected increments of data, such aschanged data capture (CDC) be passed ratherthan entire tables?</p><p> Is the context of data lost in translation as data ismoved to other environments?</p></li><li><p>7/29/2019 The Data Warehouse and Technology - Building the Data Warehouse</p><p> 7/43</p><p>PROGRAMMER ORDESIGNER CONTROL OFDATA PLACEMENT</p><p>Place data at</p><p>block/page level</p><p>Manage data in parallel</p><p>Solid Meta Data controlRich Language</p><p>Interface</p></li><li><p>7/29/2019 The Data Warehouse and Technology - Building the Data Warehouse</p><p> 8/43</p><p>Parallel Storage and</p><p>Management of DataMetadata Management</p><p>Data warehouse table structures</p><p>Data warehouse table attribution</p><p>Data warehouse source data (the system of</p><p>record)Mapping from the system of record to the</p><p>data warehouse</p><p>Data model specification</p><p>Extract loggingCommon routines for access of data</p><p>Definitions and/or descriptions of data</p><p>Relationships of one unit of data to another</p></li><li><p>7/29/2019 The Data Warehouse and Technology - Building the Data Warehouse</p><p> 9/43</p><p>Language Interface</p><p>Typically, the language interface to thedata warehouse should do the</p><p>following:</p><p> Be able to access data a set at a time Be able to access data a record at a time</p><p> Specifically ensure that one or more</p><p>indexes will be used in the satisfaction ofa query</p><p> Have an SQL interface</p><p> Be able to insert, delete, or update data</p></li><li><p>7/29/2019 The Data Warehouse and Technology - Building the Data Warehouse</p><p> 10/43</p><p>EFFICIENT LOADING OFDATA</p><p>Load efficiently</p><p>Use indexes</p><p>efficiently</p><p>Store data incompact way</p><p>Support compound</p><p>Keys</p></li><li><p>7/29/2019 The Data Warehouse and Technology - Building the Data Warehouse</p><p> 11/43</p><p>Efficient Index Utilization</p><p>Technology can support efficient index</p><p>access in several ways:</p><p> Using bit maps Having multileveled indexes</p><p> Storing all or parts of an index in main memory</p><p>Compacting the index entries when the order ofthe data being indexed allows such compaction</p><p> Creating selective indexes and range indexes</p></li><li><p>7/29/2019 The Data Warehouse and Technology - Building the Data Warehouse</p><p> 12/43</p><p>Compaction of Data</p><p> Manage large amounts of data.</p><p> Programmer gets the most out of a</p><p>given I/O when data is stored</p><p>compactly</p></li><li><p>7/29/2019 The Data Warehouse and Technology - Building the Data Warehouse</p><p> 13/43</p><p>Compound Keys</p><p> The time valiancy of data warehousedata.</p><p> Key-foreign key relationships are quite</p><p>common in the atomic data</p></li><li><p>7/29/2019 The Data Warehouse and Technology - Building the Data Warehouse</p><p> 14/43</p><p>VARIABLE-LENGTH DATAVariable-length data efficientlyLock Manager, explicit control at programmer LevelAble Index Only processingRestore data in Bulk efficiently</p></li><li><p>7/29/2019 The Data Warehouse and Technology - Building the Data Warehouse</p><p> 15/43</p><p>Lock Management</p><p> Ensures that two or more people arenot updating the same record at the</p><p>same time.</p><p> Turn the lock manager off and on isnecessary.</p></li><li><p>7/29/2019 The Data Warehouse and Technology - Building the Data Warehouse</p><p> 16/43</p><p>Index-Only Processing</p><p> Looking in an index (or indexes)without going to the primary source of</p><p>data</p></li><li><p>7/29/2019 The Data Warehouse and Technology - Building the Data Warehouse</p><p> 17/43</p><p>Fast Restore</p><p> The capability to quickly restore a datawarehouse table from non-DASD</p><p>storage</p></li><li><p>7/29/2019 The Data Warehouse and Technology - Building the Data Warehouse</p><p> 18/43</p><p>Other Technological Features</p><p>Some of those features include thefollowing:</p><p> Transaction integrity</p><p> High-speed buffering Row- or page-level locking</p><p> Referential integrity</p><p>VIEWs of data Partial block loadin</p></li><li><p>7/29/2019 The Data Warehouse and Technology - Building the Data Warehouse</p><p> 19/43</p><p>DBMS Types and the Data</p><p>WarehouseData warehouses manage massive amounts of</p><p>data because: Granular, atomic detail</p><p> Historical information</p><p> Summary as well as detailed data</p><p>Because record level, transaction-based updatesare a regular feature of the general-purposeDBMS, must offer facilities: Locking</p><p> COMMITs</p><p> Checkpoints</p><p> Log tape processing</p><p> Deadlock</p><p> Backout</p></li><li><p>7/29/2019 The Data Warehouse and Technology - Building the Data Warehouse</p><p> 20/43</p><p>Changing DBMS Technology</p><p>Such a change may be in order for severalreasons: DBMS technologies may be available.</p><p> The size of the warehouse has grown.</p><p>Use of the warehouse has escalated andchanged.</p><p> The basic DBMS decision must be revisited fromtime to time.</p><p>Should the decision be made to go to a new</p><p>DBMS technology, what are theconsiderations? Will the new DBMS technology meet the</p><p>foreseeable requirements?</p><p> How will the conversion from the older DBMS</p><p>technology to the newer DBMS technology bedone?</p></li><li><p>7/29/2019 The Data Warehouse and Technology - Building the Data Warehouse</p><p> 21/43</p><p>Multidimensional DBMS and the</p><p>Data Warehouse</p></li><li><p>7/29/2019 The Data Warehouse and Technology - Building the Data Warehouse</p><p> 22/43</p><p>Multidimensional DBMS and the</p><p>Data Warehouse cont</p><p>The multidimensional DBMS The data warehouse</p><p>1. holds at least an order ofmagnitude less data.</p><p>2. is geared for very heavy andunpredictable access andanalysis of data.</p><p>3. holds a much shorter timehorizon of data.</p><p>4. allows unfettered access.</p><p>5. enjoy a complementary</p><p>relationship.</p><p>1. holds massive amounts ofdata</p><p>2. is geared for a limited amountof flexible access</p><p>3. contains data with a very</p><p>lengthy time horizon (from 5to 10 years)</p><p>4. allows analysts to access itsdata in a constrained fashion</p><p>5. being housed in amultidimensional DBMS</p></li><li><p>7/29/2019 The Data Warehouse and Technology - Building the Data Warehouse</p><p> 23/43</p><p>Multidimensional DBMS and the</p><p>Data Warehouse cont</p><p>Following is the relational foundation formultidimensional DBMS data marts:</p><p> Strengths:Can support a lot of data.</p><p>Can support dynamic joining of data.Has proven technology.</p><p> Is capable of supporting general-purposeupdate processing.</p><p>If there is no known pattern of usage of data,then the relational structure is as good asany other.</p><p> Weaknesses:Has performance that is less than optimal.</p><p>Cannot be purely optimized for access </p></li><li><p>7/29/2019 The Data Warehouse and Technology - Building the Data Warehouse</p><p> 24/43</p><p>Multidimensional DBMS and the</p><p>Data Warehouse contFollowing is the cube foundation for</p><p>multidimensional DBMS data marts:</p><p> Strengths: Performance that is optimal for DSS processing.</p><p> Can be optimized for very fast access of data.</p><p> If pattern of access of data is known, then thestructure of data can be optimized.</p><p> Can easily be sliced and diced.</p><p> Can be examined in many ways.</p><p> Weaknesses: Cannot handle nearly as much data as a standard</p><p>relational format.</p><p> Does not support general-purpose updateprocessing.</p><p> May take a long time to load.</p><p> If access is desired on a path not supported by the </p></li><li><p>7/29/2019 The Data Warehouse and Technology - Building the Data Warehouse</p><p> 25/43</p><p>Multidimensional DBMS and the</p><p>Data Warehouse cont</p></li><li><p>7/29/2019 The Data Warehouse and Technology - Building the Data Warehouse</p><p> 26/43</p><p>Multidimensional DBMS and the</p><p>Data Warehouse cont</p></li><li><p>7/29/2019 The Data Warehouse and Technology - Building the Data Warehouse</p><p> 27/43</p><p>MULTIDIMENSIONAL DBMSAND THE DATA WAREHOUSECONT</p></li><li><p>7/29/2019 The Data Warehouse and Technology - Building the Data Warehouse</p><p> 28/43</p><p>Data Warehousing across</p><p>Multiple Storage MediaA large amount of data is spread</p><p>across more than one storage</p><p>medium.</p><p> One processing environment is the DASDenvironment where online, interactive</p><p>processing is done.</p><p> The other processing environment is often</p><p>a tape or mass store environment</p></li><li><p>7/29/2019 The Data Warehouse and Technology - Building the Data Warehouse</p><p> 29/43</p><p>The Role of Metadata in the Data</p><p>Warehouse Environment</p></li><li><p>7/29/2019 The Data Warehouse and Technology - Building the Data Warehouse</p><p> 30/43</p><p>The Role of Metadata in the Data</p><p>Warehouse Environment</p></li><li><p>7/29/2019 The Data Warehouse and Technology - Building the Data Warehouse</p><p> 31/43</p><p>The Role of Metadata in the Data</p><p>Warehouse Environment</p></li><li><p>7/29/2019 The Data Warehouse and Technology - Building the Data Warehouse</p><p> 32/43</p><p>Context and Content</p><p> The context of the reports is explainedfor the contents</p></li><li><p>7/29/2019 The Data Warehouse and Technology - Building the Data Warehouse</p><p> 33/43</p><p>Three Types of Contextual</p><p>Information Three levels of contextual information must bemanaged:</p><p> Simple contextual information</p><p> Complex contextual information</p><p> External contextual information</p><p> Simple contextual information relates to thebasic structure of data itself, and includessuch things as these: The structure of data</p><p> The encoding of data The naming conventions used for data</p><p> The metrics describing the data, such as: How much data there is</p><p> How fast the data is growing</p><p> What sectors of the data are growing </p></li><li><p>7/29/2019 The Data Warehouse and Technology - Building the Data Warehouse</p><p> 34/43</p><p>Three Types of Contextual</p><p>Information cont This type of information addresses such</p><p>aspects of data as these:</p><p> Product definitions</p><p>Marketing territories Pricing</p><p> Packaging</p><p> Organization structure</p><p> Distribution</p></li><li><p>7/29/2019 The Data Warehouse and Technology - Building the Data Warehouse</p><p> 35/43</p><p>Three Types of Contextual</p><p>Information cont Some examples of external contextual</p><p>information include the following:</p><p>Economic forecasts: Inflation</p><p> Financial trends</p><p> Taxation</p><p> Economic growth</p><p>Political information</p><p>Competitive information</p><p>Technological advancements</p><p>Consumer demographic movements</p></li><li><p>7/29/2019 The Data Warehouse and Technology - Building the Data Warehouse</p><p> 36/43</p><p>Capturing and Managing</p><p>Contextual Information Complex and external contextual</p><p>types of information are hard to</p><p>capture and quantify because they are</p><p>so unstructured.</p></li><li><p>7/29/2019 The Data Warehouse and Technology - Building the Data Warehouse</p><p> 37/43</p><p>Looking at the Past</p><p>Some of these shortcomings are asfollows:</p><p> The information management</p><p>attempts were aimed at theinformation systems developer, not the</p><p>end user.</p><p>Attempts at contextual managementwere passive.</p><p>Attempts at contextual information</p><p>management were in many cases </p></li><li><p>7/29/2019 The Data Warehouse and Technology - Building the Data Warehouse</p><p> 38/43</p><p>Refreshing the Data</p><p>WarehouseReading a log tape is no small matter,however. Many obstacles are in the</p><p>way, including the following:</p><p> The log tape contains muchextraneous data.</p><p> The log tape format is often arcane.</p><p> The log tape contains spannedrecords.</p><p> The log tape often contains addresses</p><p>instead of data values. </p></li><li><p>7/29/2019 The Data Warehouse and Technology - Building the Data Warehouse</p><p> 39/43</p><p>Testing</p><p>It is very unusual to find a similar testenvironment in the world of the data</p><p>warehouse, for the following reasons:</p><p> Data warehouses are so large that acorporation has a hard time justifying</p><p>one of them, much less two of them.</p><p> The nature of the development lifecycle for the data warehouse is</p><p>iterative.</p><p>For the most part, programs are run in </p></li><li><p>7/29/2019 The Data Warehouse and Technology - Building the Data Warehouse</p><p> 40/43</p><p>Summary</p><p> Manage large amounts of data</p><p> Manage data on a diversemedia</p><p> Easily index and monitor data</p><p> Interface with a wide number</p><p>of technologies Allow the programmer to place</p><p>the data directly on thephysical device</p><p> Store and access data inparallel</p><p> Have metadata control of thewarehouse</p><p> Efficiently load the warehouse</p><p> Efficiently use indexes</p><p> Store data in a compact way</p><p> Support compound keys Selectively turn off the lock</p><p>manager</p><p> Do index-only processing</p><p> Quickly restore from bulkstorage</p><p> Some technological features arerequired: Robust language interface</p><p> Compound keys</p><p> Variable-length data</p><p> The abilities to do the following:</p></li><li><p>7/29/2019 The Data Warehouse and Technology - Building the Data Warehouse</p><p> 41/43</p><p>Summary cont</p><p> The data architect must recognize thedifferences between a transaction-</p><p>based DBMS and a data warehouse-</p><p>based DBMS.</p></li><li><p>7/29/2019 The Data Warehouse and Technology - Building the Data Warehouse</p><p> 42/43</p><p>Summary cont</p><p> Multidimensional OLAP technology is suitedfor data mart processing and not datawarehouse processing.</p><p> When the data mart approach is used, manyproblems become evident: The number of extract programs grows large.</p><p> Each new multidimensional database must returnto the legacy operational environment for its own</p><p>data. There is no basis for reconciliation of differences</p><p>in analysis.</p><p>A tremendous amount of redundant data amongdifferent multidimensional DBMS environments</p><p>exists. </p></li><li><p>7/29/2019 The Data Warehouse and Technology - Building the Data Warehouse</p><p> 43/43</p><p>Summary cont</p><p> Metadata in the data warehouseenvironment plays a very different role</p><p>than metadata in the operational</p><p>legacy environment.</p></li></ul>