unit - 2 · rajani khushal k. logical design for data warehouse for data warehouse , a client will...

UNIT - 2Prepared by

Rajani Khushal K.

LOGICAL DESIGN FOR DATA

WAREHOUSE

For Data warehouse , A client will define their business requirements and functionality of their business.

Once this stage is over we need to design the logical and physical part of data warehouse.

During Logical design phase , we defined a model for data warehouse consisting of an entities , attributes and relationship.


WAREHOUSE

The process of logical design involves arranging data

into a series of logical relationship called attributes and

entities.

An entity represent chunk of data warehousing schemas

information.

An attributes is a components of an entity that helps

define the uniqueness of the entity.


WAREHOUSE

Our Logical design should result in a set of entities and

attributes corresponding to fact tables and dimension

tables and a model from operational data from your

source into subject-oriented information in our target

warehouse.

Data Warehouse schemas

A schema is a collection of database objects , including

tables , views , indexes and synonyms.

We can arrange schema objects in the schema model

designed for data warehousing in variety of way.

Most data warehouse use dimensional model.

The model of user’s source data and the requirements of

users helps us to design the warehouse schema.

Data Warehouse schemas

The physical implementation of the logical data

warehouse model may require some changes to adapt

it to our system parameters-size of machine , m =

number of users , storage capacity , types of network.

Star Schema

The star schema is the simplest data warehouse schema.

It is called as a star schema because the diagram

resembles a star , with points radiating from a center.

The center of the star consists of one or more fact tables

and the points of the stars are the dimension tables.

Usually the fact tables in a star schema are in third

normal form (3NF) where dimensional tables are de-

normalized.

Star Schema

Star Schema

The most natural way to model a data warehouse is as a

star schema , where only one join establish the

relationship between fact table and dimension tables.

All star schemas optimize performance by keeping

queries simple and providing fast response time.

Snowflake schema

Example

Difference between star and

Snowflake schema

SNOWFLAKE STAR

Normalization 3 normal form 2 normal form

Joins Higher number of joins Fewer joins

Query performance More foreign key and

more query execution

time

Less no of foreign key

and less query execution

time

Ease of maintenance /

change

No redundancy and

hence more easy to

maintain and change

Has redundant data and

hence less easy to

maintain

Dimension table It may have more than

one dimension table for

each dimension

Contain only single

dimension table for each

dimension

Fact Constellation

Example

Granularity

Granularity means the level of detail of your data within

the data structure.

Granularity refers to the level of detail of the data stored fact tables in a data warehouse. Higher granularity

refers to detailed data that is at or near the transaction

level (atomic level). Low granularity refers to data that is

summarized or aggregated, usually from the atomic

level data.

Granularity

In operational system , data is usually kept at the lowest

level of details.

In an order entry system , the quantity ordered is

captured and stored at the level of units of products per

order received from the customer.

If it is required that how many units of product is ordered

in a month , all the orders entered for the entire month

for that product must be read and then add up.

Operational system keeps summary of data.

Granularity

Data in warehouse is granular.

This means that data is carried in the data warehouse at

the lowest level of granularity.

Granularity levels can be decided based on the data

types and the expected system performance queries.

Granularity is the context to which a system is broken

down into small parts.

Example

Example: You can slice an hour down in different

granularity. A very rough/ low granularity would be the 1

hour itself (1 data). But one can also say 60 minutes. (60

data: 1st minute, 2nd minutes, etc.) The finer or higher

your granularity goes the more data you will have to

store. So an hour can also be 3600 seconds or

even 3600000 milliseconds.

Physical Design Data warehouse

Logical design is what we draw with a pen and paper before building our data warehouse whereas physical design is the creation of the database with SQL commands or statements.

During Physical design process , we convert the data gathered during the logical design phase into a description of the physical database structure.

Physical design decisions are mainly driven by query performance and database maintenance aspects.


During the logical design phase , we defined a model

for our data warehouse consisting of entities , attributes

and also relationships.

The entities are linked together using relationships.

Attributes are used to describe the entities.

The UID (Unique Identifier) distinguishes between one

instance of an entity and another.


During Physical design process , we translate the

expected schemas into actual database structure :

means terms called as :

1 Entities to tables

2 Relationship to foreign key constraints

3 Attributes to columns

4 PUI (Primary unique identifier) to primary key

constraints

5 UI (Unique identifiers) to unique key constraints

Physical design structures

Once we have converted our logical design to physical one , we must

create some or all of the following structure :

Tablespaces

A tablespace consists of one or more datafiles, which

are physical structures within the operating system you

are using.

A datafile is associated with only one tablespace.

From a design perspective, tablespaces are containers

for physical design structures.

Tablespaces

Tablespaces need to be separated by differences.

For example, tables should be separated from their

indexes and small tables should be separated from large

tables.

In Database term :

A database is divided into one or more logical storage

units called tablespaces. Tablespaces are divided into

logical units of storage called segments, which are

further divided into extents.

Tables and Partitioned Tables

Tables are the basic unit of data storage.

They are the container for the expected amount of raw

data in your data warehouse.

Using partitioned tables instead of non partitioned ones

addresses the key problem of supporting very large data

volumes by allowing you to divide them into smaller and

more manageable pieces.

Tables and Partitioned Tables

The main design criterion for partitioning is

manageability, though you also see performance

benefits in most cases because of partition pruning or

intelligent parallel processing.

Views

A view is a tailored presentation of the data contained

in one or more tables or other views.

A view takes the output of a query and treats it as a

table.

Views do not require any space in the database.

Integrity Constraints

Integrity constraints are used to enforce business rules

associated with your

database and to prevent having invalid information in the tables.

Integrity constraints in data warehousing differ from constraints

in OLTP environments.

In OLTP environments, they primarily prevent the insertion of

invalid data into a record, which is not a big problem in data

warehousing environments because accuracy has already been

guaranteed.

Integrity Constraints

In data warehousing environments, constraints are only used for

query rewrite.

NOT NULL constraints are particularly common in data

warehouses.

Indexes and Partitioned Indexes

Indexes are optional structures associated with tables or

clusters. In addition to the classical B-tree indexes,

bitmap indexes are very common in data warehousing

environments. Bitmap indexes are optimized index

structures for set-oriented operations. Additionally, they

are necessary for some optimized data access methods

such as star transformations.


A bitmap index is a special kind of database index that uses bitmaps. ... Bitmap indexes are also useful in data warehousing applications for joining a large fact table to smaller dimension tables such as those arranged in a star schema.


Indexes are just like tables in that you can partition them,

although the partitioning strategy is not dependent

upon the table structure. Partitioning indexes makes it

easier to manage the data warehouse during refresh

and improves query performance.

Bitmap with example

In Bitmap index it creates each unique value of single column.

Each bitmap contains single bit(0 or 1) for every row in the table.

1 indicate row has a value and 0 don’t have a value.

Company wants to hire a student whose MCA per is more

than 60 and has a passport and should be male

Materialized Views

materialized view is a database object that contains the

results of a query. For example, it may be a local copy

of data located remotely, or may be a subset of the

rows and/or columns of a table or join result, or may be

a summary using an aggregate function.

From a physical design point of view, materialized views

resemble tables or partitioned tables and behave like

indexes in that they are used transparently and improve

performance.

Materialized Views

In data warehouses, materialized views can be used to precompute and store aggregated data such as sum of sales.

Materialized views in these environments are typically referred to as summaries since they store summarized data

A view is created by combining data from different tables. Hence, a view does not have data of itself.

On the other hand, Materialized view usually used in data warehousing has data. This data helps in decision making, performing calculations etc.

Dimensions

A dimension is a schema object that defines hierarchical

relationships between columns or column sets.

“A dimension is a collection of reference information

about a measurable event”

A dimension is a container of logical relationships. A

typical dimension is city, state (or province), region, and

country.

DESIGN DIMENSION TABLE , FACT TABLE

FOR DATA WAREHOUSE

Dimensional model is the design concept used by many

data warehouse designers to build their data

warehouse.

Dimensional model is the underlying data model used

by many of the commercial OLAP products available

today in the market.


FOR DATA WAREHOUSE

A Dimension Table is a table in a star schema of a data warehouse.

Data warehouses are built using dimensional data models which

consist of fact and dimension tables. Dimension tables are used to

describe dimensions; they contain dimension keys, values and

attributes.

In Data warehouse , a dimension is a collection of reference

information about a measurable events.

Dimensions categorize and describe data warehouse facts and

measure in way that support meaningful answers to business

questions.


FOR DATA WAREHOUSE

Dimension tables provide descriptive or contextual informational for the

measurement of a fact table.

Dimension may contain the following types of columns :

Keys : Used to identify an entity

Name Columns : Used for human names of entity

Attributes : Used for pivoting analysis

Member properties : Used for labels in a report

Designing Fact Table

A fact table is a primary table in a dimensional model.

A Fact Table contains

Measurements/facts

Foreign key to dimension table

A fact table is found at the center of the star schema or snowflake schema

surrounded by dimension table.

The fact table contains business facts or measures , and foreign key which

refers to candidate key or primary key in dimension table.

Designing Fact Table

Fact tables have following column types

Foreign key

Measures

Business key column from the primary source table

unit - 2 · rajani khushal k. logical design for data warehouse for data warehouse , a client will...

Documents