unit - 2 · rajani khushal k. logical design for data warehouse for data warehouse , a client will...
TRANSCRIPT
UNIT - 2Prepared by
Rajani Khushal K.
LOGICAL DESIGN FOR DATA
WAREHOUSE
For Data warehouse , A client will define their business requirements and functionality of their business.
Once this stage is over we need to design the logical and physical part of data warehouse.
During Logical design phase , we defined a model for data warehouse consisting of an entities , attributes and relationship.
LOGICAL DESIGN FOR DATA
WAREHOUSE
The process of logical design involves arranging data
into a series of logical relationship called attributes and
entities.
An entity represent chunk of data warehousing schemas
information.
An attributes is a components of an entity that helps
define the uniqueness of the entity.
LOGICAL DESIGN FOR DATA
WAREHOUSE
Our Logical design should result in a set of entities and
attributes corresponding to fact tables and dimension
tables and a model from operational data from your
source into subject-oriented information in our target
warehouse.
Data Warehouse schemas
A schema is a collection of database objects , including
tables , views , indexes and synonyms.
We can arrange schema objects in the schema model
designed for data warehousing in variety of way.
Most data warehouse use dimensional model.
The model of user’s source data and the requirements of
users helps us to design the warehouse schema.
Data Warehouse schemas
The physical implementation of the logical data
warehouse model may require some changes to adapt
it to our system parameters-size of machine , m =
number of users , storage capacity , types of network.
Star Schema
The star schema is the simplest data warehouse schema.
It is called as a star schema because the diagram
resembles a star , with points radiating from a center.
The center of the star consists of one or more fact tables
and the points of the stars are the dimension tables.
Usually the fact tables in a star schema are in third
normal form (3NF) where dimensional tables are de-
normalized.
Star Schema
Star Schema
The most natural way to model a data warehouse is as a
star schema , where only one join establish the
relationship between fact table and dimension tables.
All star schemas optimize performance by keeping
queries simple and providing fast response time.
Snowflake schema
Snowflake schema
Example
Difference between star and
Snowflake schema
SNOWFLAKE STAR
Normalization 3 normal form 2 normal form
Joins Higher number of joins Fewer joins
Query performance More foreign key and
more query execution
time
Less no of foreign key
and less query execution
time
Ease of maintenance /
change
No redundancy and
hence more easy to
maintain and change
Has redundant data and
hence less easy to
maintain
Dimension table It may have more than
one dimension table for
each dimension
Contain only single
dimension table for each
dimension
Fact Constellation
Example
Granularity
Granularity means the level of detail of your data within
the data structure.
Granularity refers to the level of detail of the data stored fact tables in a data warehouse. Higher granularity
refers to detailed data that is at or near the transaction
level (atomic level). Low granularity refers to data that is
summarized or aggregated, usually from the atomic
level data.
Granularity
In operational system , data is usually kept at the lowest
level of details.
In an order entry system , the quantity ordered is
captured and stored at the level of units of products per
order received from the customer.
If it is required that how many units of product is ordered
in a month , all the orders entered for the entire month
for that product must be read and then add up.
Operational system keeps summary of data.
Granularity
Data in warehouse is granular.
This means that data is carried in the data warehouse at
the lowest level of granularity.
Granularity levels can be decided based on the data
types and the expected system performance queries.
Granularity is the context to which a system is broken
down into small parts.
Example
Example: You can slice an hour down in different
granularity. A very rough/ low granularity would be the 1
hour itself (1 data). But one can also say 60 minutes. (60
data: 1st minute, 2nd minutes, etc.) The finer or higher
your granularity goes the more data you will have to
store. So an hour can also be 3600 seconds or
even 3600000 milliseconds.
Physical Design Data warehouse
Logical design is what we draw with a pen and paper before building our data warehouse whereas physical design is the creation of the database with SQL commands or statements.
During Physical design process , we convert the data gathered during the logical design phase into a description of the physical database structure.
Physical design decisions are mainly driven by query performance and database maintenance aspects.
Physical Design Data warehouse
During the logical design phase , we defined a model
for our data warehouse consisting of entities , attributes
and also relationships.
The entities are linked together using relationships.
Attributes are used to describe the entities.
The UID (Unique Identifier) distinguishes between one
instance of an entity and another.
Physical Design Data warehouse
During Physical design process , we translate the
expected schemas into actual database structure :
means terms called as :
1 Entities to tables
2 Relationship to foreign key constraints
3 Attributes to columns
4 PUI (Primary unique identifier) to primary key
constraints
5 UI (Unique identifiers) to unique key constraints
Physical design structures
Once we have converted our logical design to physical one , we must
create some or all of the following structure :
Tablespaces
A tablespace consists of one or more datafiles, which
are physical structures within the operating system you
are using.
A datafile is associated with only one tablespace.
From a design perspective, tablespaces are containers
for physical design structures.
Tablespaces
Tablespaces need to be separated by differences.
For example, tables should be separated from their
indexes and small tables should be separated from large
tables.
In Database term :
A database is divided into one or more logical storage
units called tablespaces. Tablespaces are divided into
logical units of storage called segments, which are
further divided into extents.
Tables and Partitioned Tables
Tables are the basic unit of data storage.
They are the container for the expected amount of raw
data in your data warehouse.
Using partitioned tables instead of non partitioned ones
addresses the key problem of supporting very large data
volumes by allowing you to divide them into smaller and
more manageable pieces.
Tables and Partitioned Tables
The main design criterion for partitioning is
manageability, though you also see performance
benefits in most cases because of partition pruning or
intelligent parallel processing.
Views
A view is a tailored presentation of the data contained
in one or more tables or other views.
A view takes the output of a query and treats it as a
table.
Views do not require any space in the database.
Integrity Constraints
Integrity constraints are used to enforce business rules
associated with your
database and to prevent having invalid information in the tables.
Integrity constraints in data warehousing differ from constraints
in OLTP environments.
In OLTP environments, they primarily prevent the insertion of
invalid data into a record, which is not a big problem in data
warehousing environments because accuracy has already been
guaranteed.
Integrity Constraints
In data warehousing environments, constraints are only used for
query rewrite.
NOT NULL constraints are particularly common in data
warehouses.
Indexes and Partitioned Indexes
Indexes are optional structures associated with tables or
clusters. In addition to the classical B-tree indexes,
bitmap indexes are very common in data warehousing
environments. Bitmap indexes are optimized index
structures for set-oriented operations. Additionally, they
are necessary for some optimized data access methods
such as star transformations.
Indexes and Partitioned Indexes
A bitmap index is a special kind of database index that uses bitmaps. ... Bitmap indexes are also useful in data warehousing applications for joining a large fact table to smaller dimension tables such as those arranged in a star schema.
Indexes and Partitioned Indexes
Indexes are just like tables in that you can partition them,
although the partitioning strategy is not dependent
upon the table structure. Partitioning indexes makes it
easier to manage the data warehouse during refresh
and improves query performance.
Bitmap with example
In Bitmap index it creates each unique value of single column.
Each bitmap contains single bit(0 or 1) for every row in the table.
1 indicate row has a value and 0 don’t have a value.
Company wants to hire a student whose MCA per is more
than 60 and has a passport and should be male
Materialized Views
materialized view is a database object that contains the
results of a query. For example, it may be a local copy
of data located remotely, or may be a subset of the
rows and/or columns of a table or join result, or may be
a summary using an aggregate function.
From a physical design point of view, materialized views
resemble tables or partitioned tables and behave like
indexes in that they are used transparently and improve
performance.
Materialized Views
In data warehouses, materialized views can be used to precompute and store aggregated data such as sum of sales.
Materialized views in these environments are typically referred to as summaries since they store summarized data
A view is created by combining data from different tables. Hence, a view does not have data of itself.
On the other hand, Materialized view usually used in data warehousing has data. This data helps in decision making, performing calculations etc.
Dimensions
A dimension is a schema object that defines hierarchical
relationships between columns or column sets.
“A dimension is a collection of reference information
about a measurable event”
A dimension is a container of logical relationships. A
typical dimension is city, state (or province), region, and
country.
DESIGN DIMENSION TABLE , FACT TABLE
FOR DATA WAREHOUSE
Dimensional model is the design concept used by many
data warehouse designers to build their data
warehouse.
Dimensional model is the underlying data model used
by many of the commercial OLAP products available
today in the market.
DESIGN DIMENSION TABLE , FACT TABLE
FOR DATA WAREHOUSE
A Dimension Table is a table in a star schema of a data warehouse.
Data warehouses are built using dimensional data models which
consist of fact and dimension tables. Dimension tables are used to
describe dimensions; they contain dimension keys, values and
attributes.
In Data warehouse , a dimension is a collection of reference
information about a measurable events.
Dimensions categorize and describe data warehouse facts and
measure in way that support meaningful answers to business
questions.
DESIGN DIMENSION TABLE , FACT TABLE
FOR DATA WAREHOUSE
Dimension tables provide descriptive or contextual informational for the
measurement of a fact table.
Dimension may contain the following types of columns :
Keys : Used to identify an entity
Name Columns : Used for human names of entity
Attributes : Used for pivoting analysis
Member properties : Used for labels in a report
Designing Fact Table
A fact table is a primary table in a dimensional model.
A Fact Table contains
Measurements/facts
Foreign key to dimension table
A fact table is found at the center of the star schema or snowflake schema
surrounded by dimension table.
The fact table contains business facts or measures , and foreign key which
refers to candidate key or primary key in dimension table.
Designing Fact Table
Fact tables have following column types
Foreign key
Measures
Business key column from the primary source table