planning & project management for dwh

Download planning & project management for DWH

If you can't read please download the document

Upload: vesituniversity-of-mumbai

Post on 16-Apr-2017

6.464 views

Category:

Documents


3 download

TRANSCRIPT

3.Planning & Project management

3.Planning & Project ManagementOrderly Approach for DWH construction2/23/20123.Planning & Project management/D.S.Jagli1

Topics to be coveredHow is it different?Life-cycle approach The Development PhasesDimensional AnalysisDimensional Modeling Star Schema Snowflake Scheme

2/23/20123.Planning & Project management/D.S.Jagli2

3.Planning & Project managementReasons for DWH projects failureImproper planningInadequate project managementPlanning for Data ware house is necessary.Key issues needs to be plannedValue and expectationRisk assessmentTop-down or bottom upBuild or BuySingle vender or best of breedBusiness requirement ,not technologyTop management supportJustification

2/23/20123.Planning & Project management/D.S.Jagli3

3.Planning & Project managementExample for DWH ProjectOutline for overall plan2/23/20123.Planning & Project management/D.S.Jagli4IntroductionMission statementScopeGoals& objectivesKey issues & OptionsValue & expectationsJustificationExecutive sponsorshipImplementation StrategyTentative scheduleProject authorization

3.1 How is it different?DWH Project Different from OLTP System ProjectDWH Distinguish features and Challenges for Project ManagementData AcquisitionData Storage Information Delivery

2/23/20123.Planning & Project management/D.S.Jagli5

2/23/20123.Planning & Project management/D.S.Jagli6

3.2 The life-cycle Approach

2/23/20123.Planning & Project management/D.S.Jagli7Fig: DW functional components and SDLC

DWH Project Plan: Sample outline

2/23/20123.Planning & Project management/D.S.Jagli8

3.3 DWH Development Phases

2/23/20123.Planning & Project management/D.S.Jagli9

3.3 DWH Development PhasesProject plan Requirements definition Design Construction Deployment Growth and maintenance

Interleaved within the design and construction phases are the three tracks along with the definition of the architecture and the establishment of the infrastructure.2/23/20123.Planning & Project management/D.S.Jagli10

3.4 Dimensional Analysis

A data warehouse is an information delivery system.

It is not about technology, but about solving users problems.

It is providing strategic information to the user.

In the phase of defining requirements, need to concentrate on what information the users need, not on how we are going to provide the required information.2/23/20123.Planning & Project management/D.S.Jagli11

Dimensional Nature of DWHUsage of Information UnpredictableIn providing information about the requirements for an operational system, the users are able to give you precise details of the required functions, information content, and usage patterns.

Dimensional Nature of Business DataEven though the users cannot fully describe what they want in a data warehouse, they can provide you with very important insights into how they think about the business.

2/23/20123.Planning & Project management/D.S.Jagli12

Managers think in business dimensions : example2/23/20123.Planning & Project management/D.S.Jagli13

Dimensional Nature of Business Data2/23/20123.Planning & Project management/D.S.Jagli14

Dimensional Nature of Business Data2/23/20123.Planning & Project management/D.S.Jagli15

Examples of Business Dimensions

2/23/20123.Planning & Project management/D.S.Jagli16

Examples of Business Dimensions

2/23/20123.Planning & Project management/D.S.Jagli17

INFORMATION PACKAGESA NEW CONCEPTA novel idea is introduced for determining and recording information requirements for a data warehouse.

This concept helps us to give A concrete form to the various insights, nebulous thoughts, opinions expressed during the process of collecting requirements.

The information packages, put together while collecting requirements, are very useful for taking the development of the data warehouse to the next phases.2/23/20123.Planning & Project management/D.S.Jagli18

Requirements Not Fully DeterminateInformation packages enable us to:Define the common subject areas Design key business metrics Decide how data must be presented Determine how users will aggregate or roll up Decide the data quantity for user analysis or query Decide how data will be accessed Establish data granularity Estimate data warehouse size Determine the frequency for data refreshing Determine how information must be packaged2/23/20123.Planning & Project management/D.S.Jagli19

An information package.

2/23/20123.Planning & Project management/D.S.Jagli20

Business DimensionsBusiness dimensions form the underlying basis of the new methodology for requirements definition.

Data must be stored to provide for the business dimensions.

The business dimensions and their hierarchical levels form the basis for all further phases.2/23/20123.Planning & Project management/D.S.Jagli21

Dimension Hierarchies/CategoriesExamples:Product: Model name, model year, package styling, product line, product category, exterior color, interior color, first model year

Dealer: Dealer name, city, state, single brand flag, date first operation

Customer demographics: Age, gender, income range, marital status, household size, vehicles owned, home value, own or rent

Payment method: Finance type, term in months, interest rate, agent

Time: Date, month, quarter, year, day of week, day of month, season, holiday flag2/23/20123.Planning & Project management/D.S.Jagli22

Key Business Metrics or FactsThe numbers , users analyze are the measurements or metrics that measure the success of their departments.

These are the facts that indicate to the users how their departments are doing in fulfilling their departmental objectives.2/23/20123.Planning & Project management/D.S.Jagli23

Example: automobile sales The set of meaningful and useful metrics for analyzing automobile sales is as follows:Actual sale priceMSRP sale priceOptions priceFull priceDealer add-onsDealer creditsDealer invoiceAmount of down paymentManufacturer proceedsAmount financed2/23/20123.Planning & Project management/D.S.Jagli24

3.5 DIMENSIONAL MODELING

Star Schema Snowflake Scheme

2/23/20123.Planning & Project management/D.S.Jagli25

FROM REQUIREMENTS TO DATA DESIGNThe requirements definition completely drives the data design for the data warehouse.

A group of data elements form a data structure.

Logical data design includes determination of the various data elements ,structures of data & establishing the relationships among the data structures.

The information package diagrams form the basis for the logical data design for the data warehouse.

The data design process results in a dimensional data model.

2/23/20123.Planning & Project management/D.S.Jagli26

FROM REQUIREMENTS TO DATA DESIGN2/23/20123.Planning & Project management/D.S.Jagli27

Dimensional Modeling Basics: Formation of the automaker sales fact table.2/23/20123.Planning & Project management/D.S.Jagli28

Formation of the automaker dimension tables.2/23/20123.Planning & Project management/D.S.Jagli29

Concept of Keys for Dimension tableSurrogate KeysA surrogate key is the primary key for a dimension table and is independent of any keys provided by source data systems. Surrogate keys are created and maintained in the data warehouse and should not encode any information about the contents of records. Automatically increasing integers make good surrogate keys. The original key for each record is carried in the dimension table but is not used as the primary key. Surrogate keys provide the means to maintain data warehouse information when dimensions change. 30

30

Concept of Keys for Dimension tableBusiness KeysNatural keysWill have a meaning and can be generated out of the data from source system or can be used as is from source system field

31

31

The criteria for combining the tables into a dimensional model.The model should provide the best data access. The whole model must be query-centric. It must be optimized for queries and analyses. The model must show that the dimension tables interact with the fact table. It should also be structured in such a way that every dimension can interact equally with the fact table. The model should allow drilling down or rolling up along dimension hierarchies.2/23/20123.Planning & Project management/D.S.Jagli32

The Dimensional model :a STAR schemaWith these requirements, we find that a dimensional model with the fact table in the middle and the dimension tables arranged around the fact table satisfies the condition2/23/20123.Planning & Project management/D.S.Jagli33

Case study: STAR schema for automaker sales.2/23/20123.Planning & Project management/D.S.Jagli34

E-R Modeling Versus Dimensional ModelingOLTP systems capture details of events transactions OLTP systems focus on individual events An OLTP system is a window into micro-level transactions Picture at detail level necessary to run the business Suitable only for questions at transaction level Data consistency, non-redundancy, and efficient data storage critical

DW meant to answer questions on overall process DW focus is on how managers view the businessDW focus business trends Information is centered around a business process Answers show how the business measures the process The measures to be studied in many ways along several business dimensions2/23/20123.Planning & Project management/D.S.Jagli35

E-R Modeling Versus Dimensional ModelingE-R modeling for OLTP systemsDimensional modeling for the data warehouse.

2/23/20123.Planning & Project management/D.S.Jagli36

THE STAR SCHEMA2/23/20123.Planning & Project management/D.S.Jagli37

Star SchemasData Modeling Technique to map multidimensional decision support data into a relational database.

Current Relational modeling techniques do not serve the needs of advanced data requirements.4 ComponentsFactsDimensionsAttributesAttribute Hierarchies

2/23/20123.Planning & Project management/D.S.Jagli38

FactsNumeric measurements (values) that represent a specific business aspect or activity.

Stored in a fact table at the center of the star scheme.

Contains facts that are linked through their dimensions.

Updated periodically with data from operational databases2/23/20123.Planning & Project management/D.S.Jagli39

DimensionsQualifying characteristics that provide additional perspectives to a given fact

DSS data is almost always viewed in relation to other data

Dimensions are normally stored in dimension tables2/23/20123.Planning & Project management/D.S.Jagli40

AttributesDimension Tables contain Attributes.

Attributes are used to search, filter, or classify facts.

Dimensions provide descriptive characteristics about the facts through their attributed.

Must define common business attributes that will be used to narrow a search, group information, or describe dimensions. (ex.: Time / Location / Product).

No mathematical limit to the number of dimensions (3-D makes it easy to model).2/23/20123.Planning & Project management/D.S.Jagli41

Attribute HierarchiesProvides a Top-Down data organizationAggregationDrill-down / Roll-Up data analysisAttributes from different dimensions can be grouped to form a hierarchy2/23/20123.Planning & Project management/D.S.Jagli42

Concept of Keys for Star schemaSurrogate KeysThe surrogate keys are simply system-generated sequence numbers and is independent of any keys provided by source data systems. They do not have any built-in meanings.Surrogate keys are created and maintained in the data warehouse and should not encode any information about the contents of records; Automatically increasing integers make good surrogate keys. The original key for each record is carried in the dimension table but is not used as the primary key. Business KeysPrimary KeysEach row in a dimension table is identified by a unique value of an attribute designated as the primary key of the dimension.Foreign KeysEach dimension table is in a one-to-many relationship with the central fact table. So the primary key of each dimension table must be a foreign key in the fact table.

43

43

Star Schema for Sales2/23/20123.Planning & Project management/D.S.Jagli44

Fact Table

Dimension Tables

Star Schema RepresentationFact and Dimensions are represented by physical tables in the data warehouse database.

Fact tables are related to each dimension table in a Many to One relationship (Primary/Foreign Key Relationships).

Fact Table is related to many dimension tablesThe primary key of the fact table is a composite primary key from the dimension tables.

Each fact table is designed to answer a specific DSS question

2/23/20123.Planning & Project management/D.S.Jagli45

Star SchemaThe fact table is always the larges table in the star schema.

Each dimension record is related to thousand of fact records.

Star Schema facilitated data retrieval functions.

DBMS first searches the Dimension Tables before the larger fact table2/23/20123.Planning & Project management/D.S.Jagli46

Star Schema : advantagesEasy to understandOptimizes NavigationMost Suitable for Query Processing2/23/20123.Planning & Project management/D.S.Jagli47

THE SNOWFLAKE SCHEMA

2/23/20123.Planning & Project management/D.S.Jagli48

THE SNOWFLAKE SCHEMASnowflaking is a method of normalizing the dimension tables in a STAR schema.2/23/20123.Planning & Project management/D.S.Jagli49

Sales: a simple STAR schema.2/23/20123.Planning & Project management/D.S.Jagli50

Product dimension: partially normalized2/23/20123.Planning & Project management/D.S.Jagli51

When to SnowflakeThe principle behind snowflaking is normalization of the dimension tables by removing low cardinality attributes and forming separate tables.

In a similar manner, some situations provide opportunities to separate out a set of attributes and form a subdimension.2/23/20123.Planning & Project management/D.S.Jagli52

Advantages and DisadvantagesAdvantages Small savings in storage space Normalized structures are easier to update and maintainDisadvantages Schema less intuitive and end-users are put off by the complexity Ability to browse through the contents difficult Degraded query performance because of additional joins2/23/20123.Planning & Project management/D.S.Jagli53

???Thank you2/23/20123.Planning & Project management/D.S.Jagli54