data warehouse process

Upload: andy-sweepstaker

Post on 04-Apr-2018

215 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/29/2019 Data Warehouse Process

    1/13

    Data Warehouse Process

    Description

    A Data Warehouse is not an individual repository product. Rather, it is an overall strategy, or process, forbuilding decision support systems and a knowledge-based applications architecture and environment thatsupports both everyday tactical decision making and long-term business strategizing. The Data Warehouseenvironment positions a business to utilize an enterprise-wide data store to link information from diversesources and make the information accessible for a variety of user purposes, most notably, strategic analysis.Business analysts must be able to use the Warehouse for such strategic purposes as trend identification,forecasting, competitive analysis, and targeted market research.

    Data Warehouses and Data Warehouse applications are designed primarily to support executives, seniormanagers, and business analysts in making complex business decisions. Data Warehouse applicationsprovide the business community with access to accurate, consolidated information from various internal andexternal sources.

    The primary objective of Data Warehousing is to bring together information from disparate sources and putthe information into a format that is conducive to making business decisions. This objective necessitates aset of activities that are far more complex than just collecting data and reporting against it. DataWarehousing requires both business and technical expertise and involves the following activities:

    - Accurately identifying the business information that must be contained in the Warehouse- Identifying and prioritizing subject areas to be included in the Data Warehouse- Managing the scope of each subject area which will be implemented into the Warehouse on an iterativebasis- Developing a scaleable architecture to serve as the Warehouses technical and application foundation, andidentifying and selecting the hardware/software/middleware components to implement it- Extracting, cleansing, aggregating, transforming and validating the data to ensure accuracy andconsistency- Defining the correct level of summarization to support business decision making- Establishing a refresh program that is consistent with business needs, timing and cycles- Providing user-friendly, powerful tools at the desktop to access the data in the Warehouse- Educating the business community about the realm of possibilities that are available to them through DataWarehousing- Establishing a Data Warehouse Help Desk and training users to effectively utilize the desktop tools

    - Establishing processes for maintaining, enhancing, and ensuring the ongoing success and applicability ofthe Warehouse

    Until the advent of Data Warehouses, enterprise databases were expected to serve multiple purposes,including online transaction processing, batch processing, reporting, and analytical processing. In mostcases, the primary focus of computing resources was on satisfying operational needs and requirements.Information reporting and analysis needs were secondary considerations. As the use of PCs, relationaldatabases, 4GL technology and end-user computing grew and changed the complexion of informationprocessing, more and more business users demanded that their needs for information be addressed. DataWarehousing has evolved to meet those needs without disrupting operational processing.

    In the Data Warehouse model, operational databases are not accessed directly to perform informationprocessing. Rather, they act as the source of data for the Data Warehouse, which is the informationrepository and point of access for information processing. There are sound reasons for separatingoperational and informational databases, as described below.

    - The users of informational and operational data are different. Users of informational data are generallymanagers and analysts; users of operational data tend to be clerical, operational and administrative staff.

    - Operational data differs from informational data in context and currency. Informational data contains anhistorical perspective that is not generally used by operational systems.

    - The technology used for operational processing frequently differs from the technology required to supportinformational needs.

  • 7/29/2019 Data Warehouse Process

    2/13

    - The processing characteristics for the operational environment and the informational environment arefundamentally different.

    The Data Warehouse functions as a Decision Support System (DSS) and an Executive Information System(EIS), meaning that it supports informational and analytical needs by providing integrated and transformedenterprise-wide historical data from which to do management analysis. A variety of sophisticated tools arereadily available in the marketplace to provide user-friendly access to the information stored in the Data

    Warehouse.

    Data Warehouses can be defined as subject-oriented, integrated, time-variant, non-volatile collections ofdata used to support analytical decision making. The data in the Warehouse comes from the operationalenvironment and external sources. Data Warehouses are physically separated from operational systems,even though the operational systems feed the Warehouse with source data.

    Subject Orientation

    Data Warehouses are designed around the major subject areas of the enterprise; the operationalenvironment is designed around applications and functions. This difference in orientation (data vs. process)is evident in the content of the database. Data Warehouses do not contain information that will not be usedfor informational or analytical processing; operational databases contain detailed data that is needed tosatisfy processing requirements but which has no relevance to management or analysis.

    Integration and Transformation

    The data within the Data Warehouse is integrated. This means that there is consistency among namingconventions, measurements of variables, encoding structures, physical attributes, and other salient datacharacteristics. An example of this integration is the treatment of codes such as gender codes. Within asingle corporation, various applications may represent gender codes in different ways: male vs. female, mvs. f, and 1 vs. 0, etc. In the Data Warehouse, gender is always represented in a consistent way, regardlessof the many ways by which it may be encoded and stored in the source data. As the data is moved to theWarehouse, it is transformed into a consistent representation as required.

    Time Variance

    All data in Data Warehouse is accurate as of some moment in time, providing an historical perspective. Thisdiffers from the operational environment in which data is intended to be accurate as of the moment ofaccess. The data in the Data Warehouse is, in effect, a series of snapshots. Once the data is loaded into the

    enterprise data store and data marts, it cannot be updated. It is refreshed on a periodic basis, as determinedby the business need. The operational data store, if included in the Warehouse architecture, may beupdated.

    Non-Volatility

    Data in the Warehouse is static, not dynamic. The only operations that occur in Data Warehouseapplications are the initial loading of data, access of data, and refresh of data. For these reasons, thephysical design of a Data Warehouse optimizes the access of data, rather than focusing on therequirements of data update and delete processing.

    Data Warehouse Configurations

    A Data Warehouse configuration, also known as the logical architecture, includes the following components:- one Enterprise Data Store (EDS) - a central repository which supplies atomic (detail level) integratedinformation to the whole organization.- (optional) one Operational Data Store - a "snapshot" of a moment in time's enterprise-wide data- (optional) one or more individual Data Mart(s) - summarized subset of the enterprise's data specific to afunctional area or department, geographical region, or time period- one or more Metadata Store(s) or Repository(ies) - catalog(s) of reference information about the primarydata. Metadata is divided into two categories: information for technical use, and information for businessend-users.

    The EDS is the cornerstone of the Data Warehouse. It can be accessed for both immediate informationalneeds and for analytical processing in support of strategic decision making, and can be used for drill-down

  • 7/29/2019 Data Warehouse Process

    3/13

    support for the Data Marts which contain only summarized data. It is fed by the existing subject areaoperational systems and may also contain data from external sources. The EDS in turn feeds individual DataMarts that are accessed by end-user query tools at the user's desktop. It is used to consolidate related datafrom multiple sources into a single source, while the Data Marts are used to physically distribute theconsolidated data into logical categories of data, such as business functional departments or geographicalregions. The EDS is a collection of daily "snapshots" of enterprise-wide data taken over an extended timeperiod, and thus retains and makes available for tracking purposes the history of changes to a given data

    element over time. This creates an optimum environment for strategic analysis. However, access to the EDScan be slow, due to the volume of data it contains, which is a good reason for using Data Marts to filter,condense and summarize information for specific business areas. In the absence of the Data Mart layer,users can access the EDS directly.

    Metadata is "data about data," a catalog of information about the primary data that defines access to theWarehouse. It is the key to providing users and developers with a road map to the information in theWarehouse. Metadata comes in two different forms: end-user and transformational. End-user metadataserves a business purpose; it translates a cryptic name code that represents a data element into ameaningful description of the data element so that end-users can recognize and use the data. For example,metadata would clarify that the data element "ACCT_CD" represents "Account Code for Small Business."Transformational metadata serves a technical purpose for development and maintenance of the Warehouse.It maps the data element from its source system to the Data Warehouse, identifying it by source field name,destination field code, transformation routine, business rules for usage and derivation, format, key, size,index and other relevant transformational and structural information. Each type of metadata is kept in one or

    more repositories that service the Enterprise Data Store.

    While an Enterprise Data Store and Metadata Store(s) are always included in a sound Data Warehousedesign, the specific number of Data Marts (if any) and the need for an Operational Data Store are judgmentcalls. Potential Data Warehouse configurations should be evaluated and a logical architecture determinedaccording to business requirements.

    The Data Warehouse Process

    The james martin + co Data Warehouse Process does not encompass the analysis and identification oforganizational value streams, strategic initiatives, and related business goals, but it is a prescription forachieving such goals through a specific architecture. The Process is conducted in an iterative fashion afterthe initial business requirements and architectural foundations have been developed with the emphasis onpopulating the Data Warehouse with "chunks" of functional subject-area information each iteration. TheProcess guides the development team through identifying the business requirements, developing the

    business plan and Warehouse solution to business requirements, and implementing the configuration,technical, and application architecture for the overall Data Warehouse. It then specifies the iterative activitiesfor the cyclical planning, design, construction, and deployment of each population project. The following is adescription of each stage in the Data Warehouse Process. (Note: The Data Warehouse Process alsoincludes conventional project management, startup, and wrap-up activities which are detailed in the Plan,

    Activate, Control and End stages, not described here.)

    Business Case Development

    A variety of kinds of strategic analysis, including Value Stream Assessment, have likely already been doneby the customer organization at the point when it is necessary to develop a Business Case. The BusinessCase Development stage launches the Data Warehouse development in response to previously identifiedstrategic business initiatives and "predator" (key) value streams of the organization. The organization willlikely have identified more than one important value stream. In the long term it is possible to implement DataWarehouse solutions that address multiple value streams, but it is the predator value stream or highestpriority strategic initiative that usually becomes the focus of the short-term strategy and first run populationprojects resulting in a Data Warehouse.

    At the conclusion of the relevant business reengineering, strategic visioning, and/or value streamassessment activities conducted by the organization, a Business Case can be built to justify the use of theData Warehouse architecture and implementation approach to solve key business issues directed at themost important goals. The Business Case defines the outlying activities, costs, benefits, and critical successfactors for a multi-generation implementation plan that results in a Data Warehouse framework of aninformation storage/access system. The Warehouse is an iterative designed/developed/refined solution tothe tactical and strategic business requirements. The Business Case addresses both the short-term and

  • 7/29/2019 Data Warehouse Process

    4/13

    long-term Warehouse strategies (how multiple data stores will work together to fulfill primary and secondarybusiness goals) and identifies both immediate and extended costs so that the organization is better able toplan its short and long-term budget appropriation.

    Business Question Assessment

    Once a Business Case has been developed, the short-term strategy for implementing the Data Warehouse

    is mapped out by means of the Business Question Assessment (BQA) stage. The purpose of BQA is to:- Establish the scope of the Warehouse and its intended use- Define and prioritize the business requirements and the subsequent information (data) needs theWarehouse will address- Identify the business directions and objectives that may influence the required data and applicationarchitectures- Determine which business subject areas provide the most needed information; prioritize and sequenceimplementation projects accordingly- Drive out the logical data model that will direct the physical implementation model- Measure the quality, availability, and related costs of needed source data at a high level- Define the iterative population projects based on business needs and data validation

    The prioritized predator value stream or most important strategic initiative is analyzed to determine thespecific business questions that need to be answered through a Warehouse implementation. Each businessquestion is assessed to determine its overall importance to the organization, and a high-level analysis of the

    data needed to provide the answers is undertaken. The data is assessed for quality, availability, and costassociated with bringing it into the Data Warehouse. The business questions are then revisited andprioritized based upon their relative importance and the cost and feasibility of acquiring the associated data.The prioritized list of business questions is used to determine the scope of the first and subsequentiterations of the Data Warehouse, in the form of population projects. Iteration scoping is dependent onsource data acquisition issues and is guided by determining how many business questions can be answeredin a three to six month implementation time frame. A "business question" is a question deemed by thebusiness to provide useful information in determining strategic direction. A business question can beanswered through objective analysis of the data that is available.

    Architecture Review and Design

    The Architecture is the logical and physical foundation on which the Data Warehouse will be built. TheArchitecture Review and Design stage, as the name implies, is both a requirements analysis and a gapanalysis activity. It is important to assess what pieces of the architecture already exist in the organization

    (and in what form) and to assess what pieces are missing which are needed to build the complete DataWarehouse architecture.

    During the Architecture Review and Design stage, the logical Data Warehouse architecture is developed.The logical architecture is a configuration map of the necessary data stores that make up the Warehouse; itincludes a central Enterprise Data Store, an optional Operational Data Store, one or more (optional)individual business area Data Marts, and one or more Metadata stores. In the metadata store(s) are twodifferent kinds of metadata that catalog reference information about the primary data.

    Once the logical configuration is defined, the Data, Application, Technical and Support Architectures aredesigned to physically implement it. Requirements of these four architectures are carefully analyzed so thatthe Data Warehouse can be optimized to serve the users. Gap analysis is conducted to determine whichcomponents of each architecture already exist in the organization and can be reused, and whichcomponents must be developed (or purchased) and configured for the Data Warehouse.

    The Data Architecture organizes the sources and stores of business information and defines the quality andmanagement standards for data and metadata.

    The Application Architecture is the software framework that guides the overall implementation of businessfunctionality within the Warehouse environment; it controls the movement of data from source to user,including the functions of data extraction, data cleansing, data transformation, data loading, data refresh,and data access (reporting, querying).

    The Technical Architecture provides the underlying computing infrastructure that enables the data andapplication architectures. It includes platform/server, network, communications and connectivity

  • 7/29/2019 Data Warehouse Process

    5/13

    hardware/software/middleware, DBMS, client/server 2-tier vs.3-tier approach, and end-user workstationhardware/software. Technical architecture design must address the requirements of scalability, capacity andvolume handling (including sizing and partitioning of tables), performance, availability, stability, chargeback,and security.

    The Support Architecture includes the software components (e.g., tools and structures for backup/recovery,disaster recovery, performance monitoring, reliability/stability compliance reporting, data archiving, and

    version control/configuration management) and organizational functions necessary to effectively manage thetechnology investment.

    Architecture Review and Design applies to the long-term strategy for development and refinement of theoverall Data Warehouse, and is not conducted merely for a single iteration. This stage develops theblueprint of an encompassing data and technical structure, software application configuration, andorganizational support structure for the Warehouse. It forms a foundation that drives the iterative DetailDesign activities. Where Design tells you what to do; Architecture Review and Design tells you what piecesyou need in order to do it.

    The Architecture Review and Design stage can be conducted as a separate project that runs mostly inparallel with the Business Question Assessment stage. For the technical, data, application and supportinfrastructure that enables and supports the storage and access of information is generally independent fromthe business requirements of which data is needed to drive the Warehouse. However, the data architectureis dependent on receiving input from certain BQA activities (data source system identification and data

    modeling), so the BQA stage must conclude before the Architecture stage can conclude.

    The Architecture will be developed based on the organization's long-term Data Warehouse strategy, so thatfuture iterations of the Warehouse will have been provided for and will fit within the overall architecture.

    Tool Selection

    The purpose of this stage is to identify the candidate tools for developing and implementing the DataWarehouse data and application architectures, and for performing technical and support architecturefunctions where appropriate. Select the candidate tools that best meet the business and technicalrequirements as defined by the Data Warehouse architecture, and recommend the selections to thecustomer organization. Procure the tools upon approval from the organization.

    It is important to note that the process of selecting tools is often dependent on the existing technicalinfrastructure of the organization. Many organizations feel strongly for various reasons about using tools for

    the Data Warehouse applications that they already have in their "arsenal" and are reluctant to purchase newapplication packages. It is recommended that a thorough evaluation of existing tools and the feasibility oftheir reuse be done in the context of all tool evaluation activities. In some cases, existing tools can be form-fitted to the Data Warehouse; in other cases, the customer organization may need to be convinced that newtools would better serve their needs.

    It may even be feasible that this series of activities is skipped altogether, if the organization is insistent thatparticular tools be used (no room for negotiation), or if tools have already been assessed and selected inanticipation of the Data Warehouse project.

    Tools may be categorized according to the following data, technical, application, or support functions:

    - Source Data Extraction and Transformation- Data Cleansing- Data Load- Data Refresh- Data Access- Security Enforcement- Version Control/Configuration Management- Backup and Recovery- Disaster Recovery- Performance Monitoring- Database Management- Platform- Data Modeling

  • 7/29/2019 Data Warehouse Process

    6/13

    - Metadata Management

    Iteration Project Planning

    The Data Warehouse is implemented (populated) one subject area at a time, driven by specific businessquestions to be answered by each implementation cycle. The first and subsequent implementation cycles ofthe Data Warehouse are determined during the BQA stage. At this point in the Process the first (or next if

    not first) subject area implementation project is planned. The business requirements discovered in BQA and,to a lesser extent, the technical requirements of the Architecture Design stage are now refined through userinterviews and focus sessions to the subject area level. The results are further analyzed to yield the detailneeded to design and implement a single population project, whether initial or follow-on. The DataWarehouse project team is expanded to include the members needed to construct and deploy theWarehouse, and a detailed work plan for the design and implementation of the iteration project is developedand presented to the customer organization for approval.

    Detail Design

    In the Detail Design stage, the physical Data Warehouse model (database schema) is developed, themetadata is defined, and the source data inventory is updated and expanded to include all of the necessaryinformation needed for the subject area implementation project, and is validated with users. Finally, thedetailed design of all procedures for the implementation project is completed and documented. Proceduresto achieve the following activities are designed:

    - Warehouse Capacity Growth- Data Extraction/Transformation/Cleansing- Data Load- Security- Data Refresh- Data Access- Backup and Recovery- Disaster Recovery- Data Archiving- Configuration Management- Testing- Transition to Production- User Training- Help Desk

    - Change Management

    Implementation

    Once the Planning and Design stages are complete, the project to implement the current Data Warehouseiteration can proceed quickly. Necessary hardware, software and middleware components are purchasedand installed, the development and test environment is established, and the configuration managementprocesses are implemented. Programs are developed to extract, cleanse, transform and load the sourcedata and to periodically refresh the existing data in the Warehouse, and the programs are individually unittested against a test database with sample source data. Metrics are captured for the load process. Themetadata repository is loaded with transformational and business user metadata. Canned production reportsare developed and sample ad-hoc queries are run against the test database, and the validity of the output ismeasured. User access to the data in the Warehouse is established. Once the programs have beendeveloped and unit tested and the components are in place, system functionality and user acceptancetesting is conducted for the complete integrated Data Warehouse system. System support processes ofdatabase security, system backup and recovery, system disaster recovery, and data archiving areimplemented and tested as the system is prepared for deployment. The final step is to conduct theProduction Readiness Review prior to transitioning the Data Warehouse system into production. During thisreview, the system is evaluated for acceptance by the customer organization.

    Transition to Production

    The Transition to Production stage moves the Data Warehouse development project into the productionenvironment. The production database is created, and the extraction/cleanse/transformation routines are runon the operations system source data. The development team works with the Operations staff to perform the

  • 7/29/2019 Data Warehouse Process

    7/13

    initial load of this data to the Warehouse and execute the first refresh cycle. The Operations staff is trained,and the Data Warehouse programs and processes are moved into the production libraries and catalogs.Rollout presentations and tool demonstrations are given to the entire customer community, and end-usertraining is scheduled and conducted. The Help Desk is established and put into operation. A Service Level

    Agreement is developed and approved by the customer organization. Finally, the new system is positionedfor ongoing maintenance through the establishment of a Change Management Board and theimplementation of change control procedures for future development cycles.

  • 7/29/2019 Data Warehouse Process

    8/13

    Data Warehousing: Similarities and Differences of Inmon andKimball

    How do the two architectures differ? how great the chasm? Is there a common ground? This

    article attempts to draw out the similarities and differences between the Inmon and Kimballapproaches to the data warehouse.

    On the subject of what the data warehouse is and what the data marts are, both Kimball andInmon have spoken:

    The data warehouse is nothing more than the union of all the data marts Ralph KimballDec. 29, 1997.

    You can catch all the minnows in the ocean and stack them together and they still do not make awhale. Bill Inmon Jan. 8, 1998.

    The Corporate Information Factory (CIF) and the Kimball Data Warehouse Bus (BUS) areconsidered the two main types of data warehousing architecture. Accordingly, the twoarchitectures have some elements in common.

    All enterprises require a means to store, analyze and interpret the data they generate andaccumulate in order to implement critical decisions that range from continuing to exist tomaximizing prosperity. Corporations must develop operating and feedback systems to use theunderlying data means (the data warehouse) to achieve their goals.

    Both the CIF and BUS architectures satisfy these criteria.

    Another requirement of any data warehouse architecture is that the user can depend on theaccuracy and timeliness of the data. The user must also be able to access the data according to

    his or her particular needs through an easily understandable and straightforward manner ofmaking queries.

    The data that is extracted in this manner by one user should be compatible with and translatableto other operations and users within the same group or enterprise that rely on the same data.

    Both Inmon and Kimball share the opinion that stand-alone or independent data marts or datawarehouses do not satisfy the needs for accurate and timely data and ease of access for userson an enterprise or corporate scale.

    In an article for the Business Intelligence Network, Mr. Inmon writes:

    Independent data marts may work well when there are only a few data marts. But over timethere are never only a few data marts ... Once there are a lot of data marts, the independentdata mart approach starts to fall apart. There are many reasons why independent data martsbuilt directly from a legacy/source environment fall apart:

    There is no single source of data for analytical processing ;

    There is no easy reconcilability of data values ;

    There is no foundation to build on for new data marts An independent data mart israrely reusable for other purposes;

    There are too many interface programs to be built and maintained;

  • 7/29/2019 Data Warehouse Process

    9/13

    There is a massive redundancy of detailed data in each data mart ... because there is nocommon place where that detailed data is collected and integrated;

    There is no convenient place for historical data;

    There is no low level of granularity guaranteed for all data marts to use;

    Each data mart integrates data from the source systems in a unique way, which does notpermit reconcilability or integrity of the data across the enterprise; and

    The window for extracting data from the legacy environment is stretched with eachindependent data mart requiring its own window of time for extraction

    In Differences of Opinion (previously cited), Mr. Kimball gives his opinion of independent datamarts:

    Finally stand-alone data marts or warehouses are problematic. These independent silos arebuilt to satisfy specific needs, without regard to other existing or planned analytic data. They tendto be departmental in nature, often loosely dimensionally structured.

    Although often perceived as the path of least resistance because no coordination is required, theindependent approach is unsustainable in the long run. Multiple, uncoordinated extracts from thesame operational sources are inefficient and wasteful.

    They generate similar, but different variations with inconsistent naming conventions and businessrules. The conflicting results cause confusion, rework and reconciliation. In the end, decision-making based on independent data is often clouded by fear, uncertainty and doubt.

    It appears from the above, that both Inmon and Kimball are of the opinion that independent orstand-alone data marts are of marginal use.

    However, for the most part, this is where the perception of similarity stops. You may discern later,as I have, that there are more similarities, but each of our data warehouse architects expressesthem in a very different way.

    Inmon believes that Kimballs star schema-only approach causes inflexibility and therefore leadsto a brittle structure. He writes this basic lack of flexibility is at the heart of the weakness ofthe star schema model as the basis of the data warehouse ... When there is an enterprise needfor data the star schema is not at all optimal.

    Taken together, a series of star schemas and multi-dimensional tables are brittle ... [They] cannotchange gracefully over time Mr. Inmon believes his approach, which uses the dependent datamart as the source for star schema usage, solves the problem of enterprise-wide access to thesame data, which can change over time.

    The relational data warehouse is best served by a relational [3NF] database design running onrelational technology This should be no surprise since the dbms technology the datawarehouse runs on works the best with a relational database design.

    The Kimball BUS architecture expresses that raw data is transformed into presentableinformation in the staging area, ever mindful of throughput and quality. Staging begins withcoordinated extracts from the operational source systems.

    Some staging kitchen activities are centralized, such as maintenance and storage of commonreference data, while others may be distributed. (Data Warehouse Dining Experience, IntelligentEnterprise, Jan 1, 2004.) The above indicates to this author that Kimball has gone beyond theindividual star schema approach, criticized by Inmon and, in fact, has described his multi-

  • 7/29/2019 Data Warehouse Process

    10/13

    dimensional data warehouse. In this approach, the model contains atomic data and thesummarized data, but its construction is based on business measurements, which enabledisparate business departments to query the data from a higher level of detail to the lowest levelwithout reprogramming.

    Although this description appears to indicate that the Kimball staging area is VERY similar to the

    Inmon data warehouse, the Kimball approach does not recommend a real, physicallyimplemented, data warehouse. His data warehouse is still the collection of data marts with theirconformed dimensions.

    In Mastering Data Warehouse Design: Relational and Dimensional Techniques, by ClaudiaImhoff, Nicholas Galemmo and Jonathan Geiger (Wiley, 2003), these authors analyze the Kimballapproach as relying on star schemas for both atomic and aggregated storage.

    Summarizing this point of their research, the Data Warehouse Bus Architecture is said to consistof two types of data marts:

    The Atomic Data Marts, which hold multi-dimensional data at the lowest level. These canalso include aggregated data for improved query performance.

    Aggregated Data Marts. These can store data according to a core business process.

    In both the Atomic and Aggregated Data Marts, the data is stored in a star schema design.

    Their description of the Kimball Bus Architecture seems to indicate that the Kimball Approach stilldoes not recognize a need for nor require a central data warehouse repository.

    The next article will highlight the differences in the two models regarding relational vs.multidimensional data.

  • 7/29/2019 Data Warehouse Process

    11/13

    Layers in data warehouse architecture

    George Albert

    THE Data Warehouse Architecture (DWA) initially consisted of three layers which metmost of an organisation's needs. However, DWA's are now getting more complex andsophisticated to meet the growing need for ``intelligence'' by decision makers in theorganisation.

    As a decision maker for IT in your organisation, how would you know whichcomponents (old and new) of the architecture are required? A close look at whateach component does is worthwhile here. But first, let us start at the basics.

    A DWA is a way of representing the overall structure of data, communication,processing and presentation that exists for end user computing within the enterprise.The architecture is made up of a number of inter-connected parts which include,operational data base / external data base layer, information access layer, dataaccess layer, data directory (metadata) layer, process management layer,application messaging layer, data warehouse layer and data staging layer.

    Initially, a data warehouse could be operated with the first three layers. However,with information getting more complex and the need for meta data among manythings made it necessary for more layers in the DWA.

    Meta data is data describing other data. It is essentially a tag to describe what is insay a column of similar data, such as sales. Meta data can be stored using a COBOLprogram, but the latest tool is extensible mark up language (XML) in an Internet environment.

    With the explosion of data, need for information and the desire by top managementto write into data bases, it is ideal to have all the layers described below in the DWA.

    Operational data base / external data base

    Operational systems process data to support critical operational needs by processinga relatively small number of well-defined business transactions. But these generallyhistoric, systems have limited focus and does not allow easy access data. The data in these databases are also limited. Hence organisations are acquiring information ondemographic, econometric, competitive and purchasing trends and blending it withthe data they already have. The data acquired is stored in an external databaselayer.

    Information access

    The information access layer of the data warehouse architecture is the layer that theend-user deals with directly. In particular, it represents the tools that the end-usernormally uses such as Excel, Access, browsers, and the like. This layer also incl udesthe hardware and software involved in displaying and printing reports, spreadsheets,graphs and charts for analysis and presentation.

  • 7/29/2019 Data Warehouse Process

    12/13

    Data access

    The data access layer of the data warehouse architecture allows the informationaccess layer talk to the operational layer. This is done by interfacing betweeninformation access tools and operational data bases. The language often used forinteraction i s SQL or ASP. One of the keys to a data warehousing strategy is to

    provide end-users with ``universal data access''.

    In theory, universal data access means end-users, regardless of location orinformation access tool used, should be able to access any or all of the data in theenterprise that is necessary for them to do their job. The access will also apply tosupplier s and retailers in a B2B (business-to-business) scenario.

    Metadata

    In order to provide for universal data access, it is absolutely necessary to maintainsome form data directory or repository of meta-data information. This helps the end-user to access the data without having to know the location and form of data. For in

    stance if a end-user types ``sales'', system will know what sales refers to and whereit is located by refering to the metadata layer and display it to the user.

    Process management

    The process management layer is involved in scheduling the various tasks that mustbe accomplished to build and maintain the data warehouse and data directoryinformation.

    Application messaging

    Application messaging is the transport system in the DWA. It involves more than just

    networking protocols. It can, for instance, be used to collect transactions ormessages and deliver them to a certain location at a certain time.

    Data warehouse

    The core data warehouse is where the actual data used primarily for informationaluses occurs. In some cases, one can think of the data warehouse simply as a logicalor virtual view of data. In this layer copies of operational and or external data are actually stored in a form that is easy to access and is highly flexible. Data warehousescan be stored in main frames but are being hosted on client/server platforms in theInternet world.

    Data staging

    This final layer includes processes necessary to select, edit, summarise, combine andload data warehouse and information access data from operational and / or externaldata bases. The complex programming involved in this layer has been reduced withthe availability of off-the-shelf tools. The layer may also include programs to identifypatterns in the data stored or getting compiled everyday.

  • 7/29/2019 Data Warehouse Process

    13/13

    http://www.ibm.com/developerworks/db2/library/techarticle/dm-0505cullen/index.html?ca=drs-