datamodel session eltp
Post on 09-Apr-2018
226 Views
Preview:
TRANSCRIPT
8/8/2019 DataModel Session ELTP
http://slidepdf.com/reader/full/datamodel-session-eltp 1/128
© Mahindra Satyam 2009
Data Modeling
8/8/2019 DataModel Session ELTP
http://slidepdf.com/reader/full/datamodel-session-eltp 2/128
© Mahindra Satyam 2009 2
AGENDA
Time Topics to be Covered
9.30am to 11.00am
Over view of Data ModelNeed of Data ModelTypes of Data Model
Overview of Normalized Data Model and Case Study discussion
11.00am to 11.15 Tea Break
1.00pm to 2.00pm Lunch Break
2.00pm 3.30pm Dimensional Data Model (Cont…)
3.30 pm to 3.45pm Tea Break
3.45pm 5.15pm Dimension model with ERWin Demo
5.30pm to 6.30pm ERWin Demo with Q & A Session
8/8/2019 DataModel Session ELTP
http://slidepdf.com/reader/full/datamodel-session-eltp 3/128
8/8/2019 DataModel Session ELTP
http://slidepdf.com/reader/full/datamodel-session-eltp 4/128
© Mahindra Satyam 2009 4
What happens if you don’t have one?
Individual Data Store
8/8/2019 DataModel Session ELTP
http://slidepdf.com/reader/full/datamodel-session-eltp 5/128
© Mahindra Satyam 2009 5
What happens if you don’t have one?
Corporate Data Store
8/8/2019 DataModel Session ELTP
http://slidepdf.com/reader/full/datamodel-session-eltp 6/128
© Mahindra Satyam 2009 6
Where Data Models are used
Operational SystemsTraditional Applications designed to run the day-to-day business of the Enterprise
External Systems ***Data used within an Enterprise that is obtained from outside sources
Staging Areas ***Created to aid in the collection and transformation of data that is targeted for a DataWarehouse
Operational Data Store ***W. H. Inmon and Claudia Imhoff definition: ―A subject -oriented, integrated, volatile, currentvalued data store containing only corporate detailed data‖.
Data Warehouse (DW)W. H. Inmon definition: ―A subject -oriented, integrated, non-volatile, time-variant collection of dataorganized to support management needs‖.
Data Mart (DM)TDWI definition: ―A data structure that is optimized for access. It is designed to facilitate end -user analysisof data. It typically supports a single analytic application used by a distinct set of workers.‖
*** - Not discussed here
8/8/2019 DataModel Session ELTP
http://slidepdf.com/reader/full/datamodel-session-eltp 7/128 © Mahindra Satyam 2009 7
DATA MODELING TECHNIQUES
Entity Relationship Model (ERM)
Dimensional Data Model (DDM)
8/8/2019 DataModel Session ELTP
http://slidepdf.com/reader/full/datamodel-session-eltp 8/128 © Mahindra Satyam 2009 8
Where to use what?
Stages Types of Model
OLTP Normalized Data Model
StagingArea
Flat Table withoutconstraints
ODS Normalized model
Data marts Dimensional model
8/8/2019 DataModel Session ELTP
http://slidepdf.com/reader/full/datamodel-session-eltp 9/128 © Mahindra Satyam 2009 9
DW and role of E/R Modeling
Ralph Kimball says…….ER Models are too complicated for
end users to understandER Modeling/ normalizing only
suitable for OLTP or in data stagingarea since it eliminates redundancyResults in too many tables to be
easy to queryER models are optimized for update
activity not high performancequerying
Who is right?
Bill Inman says…….ER Model is suitable for datawarehouses because it isstable, and supports
consistency and flexibilityNormalized data is idealbasis for the design of theData Warehouse and theODS
May not be suitable for thedata mart, which dealsheavily with regular queryactivity and time-variant
analysis
8/8/2019 DataModel Session ELTP
http://slidepdf.com/reader/full/datamodel-session-eltp 10/128 © Mahindra Satyam 2009 10
Normalized Data Model
8/8/2019 DataModel Session ELTP
http://slidepdf.com/reader/full/datamodel-session-eltp 11/128 © Mahindra Satyam 2009 11
TOPICS TO BE COVERED…
ER Model Concepts☻ ER Diagrams - Notation
☻ Entities and Attributes☻ Weak Entity Types☻ Entity Types, Value Sets, and Key Attributes☻ Relationships and Relationship Types☻ Roles and Attributes in Relationship Types
ER Diagram for COMPANY Schema
8/8/2019 DataModel Session ELTP
http://slidepdf.com/reader/full/datamodel-session-eltp 12/128 © Mahindra Satyam 2009 12
DATABASE DESIGN STEPS
8/8/2019 DataModel Session ELTP
http://slidepdf.com/reader/full/datamodel-session-eltp 13/128 © Mahindra Satyam 2009 13
ENTITIES
Entities principal data object about which information is to becollected.
Recognizable concepts, either concrete or abstract, such asperson, places, things, or events which have relevance to thedatabase.
Examples of entities are EMPLOYEES, PROJECTS, INVOICES.
An entity is analogous to a table in the relational model.
Student is an entity.
Student
8/8/2019 DataModel Session ELTP
http://slidepdf.com/reader/full/datamodel-session-eltp 14/128 © Mahindra Satyam 2009 14
WEAK ENTITY TYPES
An entity that does not have a key attribute
A weak entity must participate in an identifying relationship type withan owner or identifying entity type.
Entities are identified by the combination of:• A partial key of the weak entity type•The particular entity they are related to in the identifying entity type
8/8/2019 DataModel Session ELTP
http://slidepdf.com/reader/full/datamodel-session-eltp 15/128 © Mahindra Satyam 2009 15
Attributes are data objects that eitheridentify or describe entities.
Attributes that identify entities are keyattributes.
Attributes that describe an entity are
non-key attributes.
Student
•Name•Last Name•First Name•Address•Street Address•City
•State or Province
City
First Name
AddressAttributes
ATTRIBUTES
8/8/2019 DataModel Session ELTP
http://slidepdf.com/reader/full/datamodel-session-eltp 16/128 © Mahindra Satyam 2009 16
ATTRIBUTES
Attributes are properties used todescribe an entity.
E.g.: An EMPLOYEE entity may have aName, SSN, Address, Sex, Birthdates
A specific entity will have a valuefor each of its attributes
E.g.: A specific employee entity may haveName='John Smith', SSN='123456789',
Each attribute has a value set (ordata type) associated with it
E.g.: integer, string, date , enumerated type,…
8/8/2019 DataModel Session ELTP
http://slidepdf.com/reader/full/datamodel-session-eltp 17/128 © Mahindra Satyam 2009 17
TYPES OF ATTRIBUTES
Simple Attributes
•Each entity has a single atomic value forthe attribute.
E.g. SSN or Sex
Composite Attributes
•The attribute may be composed of severalcomponents.
•Composition may form a hierarchy wheresome components are themselvescomposite
E.g.: Address (Apt#, House#, Street,City, State, Zip_Code, Country)orName (First_Name, Middle_Name,Last_Name).
Multi-valued Attributes
•An entity may have multiple values for theattribute.
E.g.: Color of a CAR orPrevious Degrees of a STUDENT.
Nested Attributes
In general, composite and multi-valuedattributes may be nested arbitrarily to anynumber of levels although this is rare.
E.g.: Previous Degrees of a STUDENT is acomposite multi-valued attribute denoted by{Previous Degrees (College, Year, Degree,Field)}.
8/8/2019 DataModel Session ELTP
http://slidepdf.com/reader/full/datamodel-session-eltp 18/128
8/8/2019 DataModel Session ELTP
http://slidepdf.com/reader/full/datamodel-session-eltp 19/128 © Mahindra Satyam 2009 19
RELATIONSHIP
BUILDING 1:N
01
APARTMENT
Weak EntityStrong Entity
8/8/2019 DataModel Session ELTP
http://slidepdf.com/reader/full/datamodel-session-eltp 20/128 © Mahindra Satyam 2009 20
CLASSIFYING RELATIONSHIPS
Classified by theirDegreeConnectivityCardinalityDirectionExistence.
8/8/2019 DataModel Session ELTP
http://slidepdf.com/reader/full/datamodel-session-eltp 21/128 © Mahindra Satyam 2009 21
DEGREE OF A RELATIONSHIP
The number of entities associated with the relationship.
Binary relationships, the most common type in the real world.
Ternary relationship when a binary relationship is inadequate.
8/8/2019 DataModel Session ELTP
http://slidepdf.com/reader/full/datamodel-session-eltp 22/128
© Mahindra Satyam 2009 22
DEGREE OF RELATIONSHIP
One entityrelated to
another of thesame entitytype
Entities of twodifferent typesrelated to eachother
Entities of threedifferent typesrelated to eachother
8/8/2019 DataModel Session ELTP
http://slidepdf.com/reader/full/datamodel-session-eltp 23/128
© Mahindra Satyam 2009 23
CONNECTIVITY AND CARDINALITY
Connectivity describes the mapping of associatedentity instances in the relationship.
The values of connectivity are "one" or "many".
Cardinality is the actual number of related
occurrences for each of the two entities.one-to-one,one-to-many,many-to-many.
CA A
8/8/2019 DataModel Session ELTP
http://slidepdf.com/reader/full/datamodel-session-eltp 24/128
© Mahindra Satyam 2009 24
CARDINALITY…
8/8/2019 DataModel Session ELTP
http://slidepdf.com/reader/full/datamodel-session-eltp 25/128
CARDINALITY
8/8/2019 DataModel Session ELTP
http://slidepdf.com/reader/full/datamodel-session-eltp 26/128
© Mahindra Satyam 2009 26
CARDINALITY…
Many-to-many relationships cannot be directly translated torelational tables but instead must be transformed into two or
more one-to-many relationships using associative entities.
Employee Emp_Proj Projects
DIRECTION
8/8/2019 DataModel Session ELTP
http://slidepdf.com/reader/full/datamodel-session-eltp 27/128
© Mahindra Satyam 2009 27
DIRECTION
The direction of a relationship indicates the originating entity of a binary relationship.
The entity from which a relationship originates is the parent entity.
The entity where the relationship terminates is the child entity .
Patient Patient History
Parent Entity Child Entity
EXISTENCE
8/8/2019 DataModel Session ELTP
http://slidepdf.com/reader/full/datamodel-session-eltp 28/128
© Mahindra Satyam 2009 28
EXISTENCE
Denotes whether the existence of an entity instance isdependent upon the existence of another, related, entity
instance.
Either mandatory or optional .
Mandatory - “Every project must be managed by a single
department".Optional - "employees may be assigned to a BU".
CONSTRAINTS ON RELATIONSHIPS
8/8/2019 DataModel Session ELTP
http://slidepdf.com/reader/full/datamodel-session-eltp 29/128
© Mahindra Satyam 2009 29
CONSTRAINTS ON RELATIONSHIPS
Constraints on Relationship Types( Also known as ratio constraints )
•Cardinality Constraints - the number of instances of oneentity that can or must be associated with each instance of another entity.
•Minimum Cardinality(also called participationconstraint or existence dependency constraints)
If zero, then optional participation, not existence-dependentIf one or more, then mandatory, existence-dependent
•Maximum CardinalityThe maximum numberOne-to-one (1:1)One-to-many (1:N) or Many-to-one (N:1)Many-to-many
CONCEPTUAL MODELING
8/8/2019 DataModel Session ELTP
http://slidepdf.com/reader/full/datamodel-session-eltp 30/128
© Mahindra Satyam 2009 30
CONCEPTUAL MODELING
A conceptual model shows data through business eyes.
Identify entities which have business meaning.
Identify important relationships
Identify significant attributes in the entities.
CONCEPTUAL MODELING
8/8/2019 DataModel Session ELTP
http://slidepdf.com/reader/full/datamodel-session-eltp 31/128
© Mahindra Satyam 2009 31
CONCEPTUAL MODELING
Next step is to build the ER Diagram from the entities and dataitems identified in the requirements.
Determine if there are any relationships between the entities.
An entity that does not relate to any other entity may end upas a “stand alone” table with no defined relationships.
ER DIAGRAM NOTATIONS
8/8/2019 DataModel Session ELTP
http://slidepdf.com/reader/full/datamodel-session-eltp 32/128
© Mahindra Satyam 2009 32
ER – DIAGRAM NOTATIONS
CASE STUDY
8/8/2019 DataModel Session ELTP
http://slidepdf.com/reader/full/datamodel-session-eltp 33/128
© Mahindra Satyam 2009 33
CASE STUDY
The XYZ Company wants Satyam to design and develop a database system forits regular operations.
The database should record information about the departments, projects,employees and their dependant. The company is organized into departments.Employees work for a department and may work on many projects. Departmentscontrol the project which are being operated from that location. Department hasto be managed by someone.
There are managers who manages and monitors the work done by theemployees. Suppose an employee is assigned to a project, the hours arecalculated based on number of hours the employee is scheduled to work on aproject.
Although most employees have managers, senior staff. The date on which amanager started managing the department could be stored as an attribute of department.
A department may be spread over many locations. The department name andnumber are unique for the department. Employee may have number of dependants.
IDENTIFYING ENTITIES
8/8/2019 DataModel Session ELTP
http://slidepdf.com/reader/full/datamodel-session-eltp 34/128
© Mahindra Satyam 2009 34
IDENTIFYING ENTITIES
Dependant
Project
Department
Sex Salary
Address
Name
Fname Mname Lname
SSNO
Bdate
Number ofemployees
Dname
Dnumber
Dlocation
Pname Pnumber Plocation
Name Sex Bdate Relationship
Employee
1
N
DEPENDANTS_OF
N 1WORKS_FOR
supervisor supervisee
1 NSUPERVISION
1
N
CONTROLS1 1
MANAGES
Startdate
NM
WORKS_ON
Hours
8/8/2019 DataModel Session ELTP
http://slidepdf.com/reader/full/datamodel-session-eltp 35/128
ONE-TO-MANY (1:N) RELATIONSHIP
8/8/2019 DataModel Session ELTP
http://slidepdf.com/reader/full/datamodel-session-eltp 36/128
© Mahindra Satyam 2009 36
ONE-TO-MANY (1:N) RELATIONSHIP
EmployeesDepartment WORKFOR
The relationship between these two entities is 1 toMany because there can be 1 or more employees ineach department.
Every department is required to have at least oneemployee, and no employee can belong to more thanone department.
What kind of table design does this suggest?
A single table for each entity: the Department Tableand Employee Table.
N1
MANY-TO-ONE (N:1) RELATIONSHIP
8/8/2019 DataModel Session ELTP
http://slidepdf.com/reader/full/datamodel-session-eltp 37/128
© Mahindra Satyam 2009 37
MANY-TO-ONE (N:1) RELATIONSHIP
The relationship between these two entities is Manyto 1 because there can be 1 or more dependants foreach employee.
What kind of table design does this suggest?
A single table for each entity: the Dependants Tableand Employee Table.
EmployeesDependants DEPENDANT_OF 1N
MANY- TO – MANY (N:M) RELATIONSHIP
8/8/2019 DataModel Session ELTP
http://slidepdf.com/reader/full/datamodel-session-eltp 38/128
© Mahindra Satyam 2009 38
MANY TO MANY (N:M) RELATIONSHIP
Employee Project
Works On
Have
These 2 entities have 2 relationships - 1 to many ineach direction - resulting in a many-manyrelationship.
Employees are optionally assigned to one or moreProjects, as appropriate. A Project must have at
least 1 employee.What kind of table design does this suggest?
2 Tables plus a table with a column for each entity.(Employee, Project, Employee_Project)
RECURSIVE RELATIONSHIPS
8/8/2019 DataModel Session ELTP
http://slidepdf.com/reader/full/datamodel-session-eltp 39/128
© Mahindra Satyam 2009 39
RECURSIVE RELATIONSHIPS
We can also have a recursive relationship type.
Both participations are same entity type in different roles.
E.g.: SUPERVISION (MANAGES) relationships betweenEMPLOYEE (in role of supervisor or boss) and (another)EMPLOYEE (in role of subordinate or worker).
In ER diagram, need to display role names to distinguishParticipations.
EMPLOYEE
MANAGES
ATTRIBUTES OF RELATIONSHIP TYPES
8/8/2019 DataModel Session ELTP
http://slidepdf.com/reader/full/datamodel-session-eltp 40/128
© Mahindra Satyam 2009 40
ATTRIBUTES OF RELATIONSHIP TYPES
Here, the date completed attribute pertains specificallyto the employee’s completion of a course…it is anattribute of the relationship
NOTATION
8/8/2019 DataModel Session ELTP
http://slidepdf.com/reader/full/datamodel-session-eltp 41/128
© Mahindra Satyam 2009 41
NOTATION
(1,1)(0,1)
(1,1)(1,N)
The (min, max) notationrelationship constraints
PROBLEM WITH ER
8/8/2019 DataModel Session ELTP
http://slidepdf.com/reader/full/datamodel-session-eltp 42/128
© Mahindra Satyam 2009 42
PROBLEM WITH ER
The Entity Relationship Model In Its OriginalForm Did Not Support
The SpecializationGeneralization
8/8/2019 DataModel Session ELTP
http://slidepdf.com/reader/full/datamodel-session-eltp 43/128
© Mahindra Satyam 2009 4343
Rationale forDimensional Modeling
8/8/2019 DataModel Session ELTP
http://slidepdf.com/reader/full/datamodel-session-eltp 44/128
© Mahindra Satyam 2009 44
Dimensional Model
Definition
Logical data model used to represent the measures and dimensions thatpertain to one or more business subject areasDimensional Model = Star Schema
Serves as basis for the design of a relational database schema
Can easily translate into multi-dimensional database design if required
Overcomes OLTP design shortcomings
Di i l M d l Ad t
8/8/2019 DataModel Session ELTP
http://slidepdf.com/reader/full/datamodel-session-eltp 45/128
© Mahindra Satyam 2009 45
Dimensional Model Advantages
UnderstandableSystematically represents history
Reliable join paths
High performance query
Enterprise scalability
8/8/2019 DataModel Session ELTP
http://slidepdf.com/reader/full/datamodel-session-eltp 46/128
© Mahindra Satyam 2009 46
Subject areadimensional
models
Subject Area Models
Manufacturing andProcess Control
Sales Order Entryand CampaignManagement
Customer Supportand RelationshipManagement
Shipping andInventoryManagement
Subjectarea E/R
models
OperationsSales andMarketing
CustomerServices
ProductDevelopment
8/8/2019 DataModel Session ELTP
http://slidepdf.com/reader/full/datamodel-session-eltp 47/128
© Mahindra Satyam 2009 47
Enterprise Models
EnterpriseScope E/R model
Enterprisescopedimensionalmodel
8/8/2019 DataModel Session ELTP
http://slidepdf.com/reader/full/datamodel-session-eltp 48/128
Star Schema Dimension Tables
8/8/2019 DataModel Session ELTP
http://slidepdf.com/reader/full/datamodel-session-eltp 49/128
© Mahindra Satyam 2009 49
Dimension
Dimension
Dimension
Star Schema Dimension Tables
Dimension tables
Store dimension valuesTextual contentDimension tables usuallyreferred to simply as
'dimensions'Spend extra effort to adddimensional attributes
Dimension Keys
8/8/2019 DataModel Session ELTP
http://slidepdf.com/reader/full/datamodel-session-eltp 50/128
© Mahindra Satyam 2009 50
key
key
key
Dimension
Dimension
Dimension
Dimension Keys
Synthetic keysEach table assigned aunique primary key,specifically generated for
the data warehousePrimary keys from sourcesystems may be presentin the dimension, but arenot used as primary keysin the star schema
Dimension Columns
8/8/2019 DataModel Session ELTP
http://slidepdf.com/reader/full/datamodel-session-eltp 51/128
© Mahindra Satyam 2009 51
Key
attribute
attributeattribute
Key
attribute
attribute
attribute
Key
attribute
attribute
attribute
Dimension
Dimension
Dimension
Dimension Columns
Dimension attributesSpecify the way in whichmeasures are viewed:rolled up, broken out or
summarizedOften follow the word ―by‖as in ―Show me Sales byRegion and Quarter‖Frequently referred to as'Dimensions'
Star Schema Fact Table
8/8/2019 DataModel Session ELTP
http://slidepdf.com/reader/full/datamodel-session-eltp 52/128
© Mahindra Satyam 2009 52
Fact Table
fact1
fact2
fact3
Star Schema Fact Table
Process measures
Start by assigning one facttable per business subjectareaFact tables store the
process measures (akaFacts)Compared to dimensiontables, fact tables usuallyhave a very large numberof rows
Fact Table Primary Key
8/8/2019 DataModel Session ELTP
http://slidepdf.com/reader/full/datamodel-session-eltp 53/128
© Mahindra Satyam 2009 53
Fact Table
fact1
fact2
fact3
keykeykey
Fact Table Primary Key
Every fact tableMulti-part primary keyaddedMade up of foreign keysreferencing dimensions
8/8/2019 DataModel Session ELTP
http://slidepdf.com/reader/full/datamodel-session-eltp 54/128
© Mahindra Satyam 2009 54
Fact Table
Fact Table Grain
Grain
The level of detail represented by arow in the fact tableMust be identified earlyCause of greatest confusion duringdesign process
Example
Each row in the fact table representsthe daily item sales total
8/8/2019 DataModel Session ELTP
http://slidepdf.com/reader/full/datamodel-session-eltp 55/128
© Mahindra Satyam 2009 55
Designing a Star Schema
Five initial design stepsBased on Kimball's six stepsStart designing in orderRe-visit and adjust over project life
Five initial design stepsIdentify fact tableIdentify fact table grainIdentify dimensionsSelect factsIdentify dimensional attributes
EXERCISE 1
8/8/2019 DataModel Session ELTP
http://slidepdf.com/reader/full/datamodel-session-eltp 56/128
© Mahindra Satyam 2009 56
Scenario
Industry: Automobile manufacturingCompany: Millennium Motors
Value chain focus: Sales
Sample business questions:
What are the top 10 selling car models this month?How do this months top 10 selling models compare to the top 10 over
the last six months?
Show me dealer sales by region by model by day
What is the total number of cars sold by month by dealer by state?List facts and dimensions
56
8/8/2019 DataModel Session ELTP
http://slidepdf.com/reader/full/datamodel-session-eltp 57/128
Example Fact Table
8/8/2019 DataModel Session ELTP
http://slidepdf.com/reader/full/datamodel-session-eltp 58/128
© Mahindra Satyam 2009 58
Example Fact Table
Sales Factsmodel_keydealer_keytime_key
revenuequantity
Example Fact Table Records
8/8/2019 DataModel Session ELTP
http://slidepdf.com/reader/full/datamodel-session-eltp 59/128
© Mahindra Satyam 2009 59
p
time_key model_key dealer_key revenue quantity
1 1 1 75840.27 2
1 2 1 152260.37 3
1 3 1 28360.15 1
1 4 1 132675.22 4
1 5 1 43789.45 1
1 1 2 35678.98 1
1 3 2 57864.78 2
1 5 2 92876.67 2Primary Key Facts
Sales Facts
F
8/8/2019 DataModel Session ELTP
http://slidepdf.com/reader/full/datamodel-session-eltp 60/128
© Mahindra Satyam 2009 60
Facts
Fully additive
Can be summed across any and all dimensionsStored in fact tableExamples: revenue, quantity
Modelmodel_key
brandcategorylinemodel
Sales Facts
model_keydealer_keytime_key
revenuequantity
Timetime_key
yearquartermonthdate
Dealerdealer_key
regionstatecitydealer
8/8/2019 DataModel Session ELTP
http://slidepdf.com/reader/full/datamodel-session-eltp 61/128
© Mahindra Satyam 2009 61
Facts
Semi-additive
Can be summed across most dimensions but not allExamples: Inventory quantities, account balances, or personnel countsAnything that measures a ―level‖Must be careful with ad-hoc reportingOften aggregated across the ―forbidden dimension‖ by averaging
Sales Factsmodel_keydealer_keytime_key
inventory
Modelmodel_key
brand
categorylinemodel
Timetime_key
yearquartermonthdate
Dealerdealer_key
regionstatecity
dealer
8/8/2019 DataModel Session ELTP
http://slidepdf.com/reader/full/datamodel-session-eltp 62/128
8/8/2019 DataModel Session ELTP
http://slidepdf.com/reader/full/datamodel-session-eltp 63/128
© Mahindra Satyam 2009 63
Unit Amounts
Unit price, Unit cost, etc.Are numeric, but not measuresStore the extended amounts which are additiveUnit amounts may be useful as dimensions for ―price point analysis‖May store unit values to save space
Factless Fact TableA fact table with no measures in itNothing to measure...
except the convergence of dimensional attributesSometimes store a ―1‖ for convenienceExamples: Attendance, Customer Assignments, Coverage
8/8/2019 DataModel Session ELTP
http://slidepdf.com/reader/full/datamodel-session-eltp 64/128
8/8/2019 DataModel Session ELTP
http://slidepdf.com/reader/full/datamodel-session-eltp 65/128
8/8/2019 DataModel Session ELTP
http://slidepdf.com/reader/full/datamodel-session-eltp 66/128
© Mahindra Satyam 2009 66
Slowly Changing Dimension Example
Example: A woman gets marriedPossible changes to customer dimension
– Last Name – Marriage Status – Address
– Household IncomeExisting facts need to remain associated with her singleprofileNew facts need to be associated with her married profile
8/8/2019 DataModel Session ELTP
http://slidepdf.com/reader/full/datamodel-session-eltp 67/128
© Mahindra Satyam 2009 67
Slowly Changing Dimension Types
Three types of slowly changing dimensionsType 1
– Updates existing record with modifications – Does not maintain historyType 2
– Adds new record – Does maintain history – Maintains old recordType 3:
– Keep old and new values in the existing row – Requires a design change
8/8/2019 DataModel Session ELTP
http://slidepdf.com/reader/full/datamodel-session-eltp 68/128
© Mahindra Satyam 2009 68
Designing Loads to Handle SCD
Design and implementation guidelines
Gather SCD requirements when designing data mappingand loading
SCD needs to be defined and implemented at thedimensional attribute level
Each column in a dimension table needs to be identified as aType 1 or a Type 2 SCD
If one Type 1 column changes, then all Type 1 columns willbe updated
If one Type 2 column changes, then a new record will be
inserted into the dimension table
Type 1 Example
8/8/2019 DataModel Session ELTP
http://slidepdf.com/reader/full/datamodel-session-eltp 69/128
© Mahindra Satyam 2009 69
yp p
CustID Name
MaritalStatus
123 Sue Jones S $30K
HomeIncome
CustID Name
MaritalStatus
1 123 Sue Jones S $30K 0
HomeIncome
CustKey
CustKey
DayKey Sales
1 1 $40
Day DimDayKey
BusinessDate
1 1/31/01
Sales FactsCustomer DimCustomer OLTP
Day
KeyBusinessDate
1 1/31/01
2 2/01/01
Day DimCustKey
DayKey Sales
1 1 $40
1 2 $50
Sales FactsCustID Name
MaritalStatus
123 Sue Smith M $60K
HomeIncome
Customer OLTP
Status
Customer Dim
CustID Name
MaritalStatus
1 123 Sue Smith M $60K 0
HomeIncome
CustKey Status
OLTP Star Schema
Sue Gets Married 2/1/01
Type 2 Example
8/8/2019 DataModel Session ELTP
http://slidepdf.com/reader/full/datamodel-session-eltp 70/128
© Mahindra Satyam 2009 70
CustID Name
MaritalStatus
123 Sue Jones S 30K
Day Dim
HomeIncome
CustID Name
MaritalStatus
1 123 Sue Jones S $30K 0
HomeIncome
CustKey
CustKey
DayKey Sales
1 1 $40
DayKey
BusinessDate
1 1/31/01
Sales FactsCustomer DimCustomer OLTP
CustKey
DayKey Sales
1 1 $40
2 2 $50
Sales Facts
CustID Name
MaritalStatus
1 123 Sue Jones S $30K 1
HomeIncome
CustKey Status
2 123 Sue Smith M $60K 0
Customer Dim
CustID Name
MaritalStatus
123 Sue Smith M $60K
HomeIncome
Customer OLTP
Status
OLTP Star Schema
Sue Gets Married 2/1/01Day Dim
DayKey
BusinessDate
1 1/31/01
2 2/01/01
8/8/2019 DataModel Session ELTP
http://slidepdf.com/reader/full/datamodel-session-eltp 71/128
8/8/2019 DataModel Session ELTP
http://slidepdf.com/reader/full/datamodel-session-eltp 72/128
A T
8/8/2019 DataModel Session ELTP
http://slidepdf.com/reader/full/datamodel-session-eltp 73/128
© Mahindra Satyam 2009 73
Aggregate Types
Separate Tables
Separate fact table for every aggregateSeparate dimension table for every aggregate dimensionSame number of fact records as level field tables
Advantage
Removes possibility of double countingSchema clarity
Separate Tables
8/8/2019 DataModel Session ELTP
http://slidepdf.com/reader/full/datamodel-session-eltp 74/128
© Mahindra Satyam 2009 74
One Way Aggregate month_key
product_keymarket_keyQuantity
Amount
Mthly SalesFacts Agg
time_keyproduct_keymarket_key
Quantity Amount
Sales Factsproduct_keyCategoryBrandProductDiet Indicator
Product
month_key YearFiscal PeriodMonth
Month
market_keyRegion DistrictStateCity
Market
time_key YearFiscal PeriodMonthDayDay of Week
Time
8/8/2019 DataModel Session ELTP
http://slidepdf.com/reader/full/datamodel-session-eltp 75/128
8/8/2019 DataModel Session ELTP
http://slidepdf.com/reader/full/datamodel-session-eltp 76/128
8/8/2019 DataModel Session ELTP
http://slidepdf.com/reader/full/datamodel-session-eltp 77/128
8/8/2019 DataModel Session ELTP
http://slidepdf.com/reader/full/datamodel-session-eltp 78/128
CONFORMED DIMENSIONS
8/8/2019 DataModel Session ELTP
http://slidepdf.com/reader/full/datamodel-session-eltp 79/128
© Mahindra Satyam 2009 79
Definition
Dimensions are conformed when they are the same-or-When one dimension is a strict rollup of another
79
CONFORMED DIMENSIONS
8/8/2019 DataModel Session ELTP
http://slidepdf.com/reader/full/datamodel-session-eltp 80/128
© Mahindra Satyam 2009 80
Same dimensions must:
1. ... have exactly the same set of primary keysand
2. ... have the same number of records
CONFORMED DIMENSIONS
8/8/2019 DataModel Session ELTP
http://slidepdf.com/reader/full/datamodel-session-eltp 81/128
© Mahindra Satyam 2009 81
Rolled up dimension
When one dimension is a strict rollup of another
Which meansTwo conformed dimensions can be combined into a single
logical dimension by creating a union of the attributes
8/8/2019 DataModel Session ELTP
http://slidepdf.com/reader/full/datamodel-session-eltp 82/128
CONFORMED DIMENSIONS
8/8/2019 DataModel Session ELTP
http://slidepdf.com/reader/full/datamodel-session-eltp 83/128
© Mahindra Satyam 2009 83
Advantages
Enables an incremental development approachEasier and cheaper to maintainDrastically reduces extraction and loading complexityAnswers business questions that cross data marts
Supports both centralized and distributed architectures
8/8/2019 DataModel Session ELTP
http://slidepdf.com/reader/full/datamodel-session-eltp 84/128
© Mahindra Satyam 2009
Erwin
ERWIN
8/8/2019 DataModel Session ELTP
http://slidepdf.com/reader/full/datamodel-session-eltp 85/128
© Mahindra Satyam 2009 85
All Fusion Erwin Data Modeler commonly known as
Erwin , is a powerful and leading data modeling toolfrom Computer Associates.Has many powerful features that you can use todesign entity relation data models and dimensionalmodelsCurrently used Version : 4.1.4CA has recently released version Erwin Data Modelerr7Has many powerful features that you can use to
design entity relation data models and dimensionalmodels
ERWIN BASIC FEATURES
8/8/2019 DataModel Session ELTP
http://slidepdf.com/reader/full/datamodel-session-eltp 86/128
© Mahindra Satyam 2009 86
Creating a ModelTemplates - To save time, you can also start working from atemplate that you or others in your workgroup have created. Whenyou create a model from a template, all the objects and displaysettings in the template are automatically applied to the newmodel.Subject Areas - For each new model, ERwin also automatically
creates a subject area (Main Subject Area). You can createadditional subject areas.Stored Displays – Represent a different view of a subject areawithout the need to change setting repeatedly.Model Types – Logical, Physical , Logical/Physical orLogical/Dimensional
Modeling Preferences - You can customize your workingenvironment using ERwin's many display options and modelpreferences. You can also choose to create your model usingIDEF1X or IE notation.
8/8/2019 DataModel Session ELTP
http://slidepdf.com/reader/full/datamodel-session-eltp 87/128
ERWIN FILE FORMATS
8/8/2019 DataModel Session ELTP
http://slidepdf.com/reader/full/datamodel-session-eltp 88/128
© Mahindra Satyam 2009 88
ER1 - Standard ERwin file format. ERwin version 3.5.2 and later aresupported.
XML - ERwin metamodel saved as an Extensible Markup Language file.When you open an ERwin model saved in XML format, ERwin reads thedata structure specified in the XML file and automatically reverseengineers the database and creates a matching data model diagram.
ERS,SQL DDL (Data Definition Language) - schema script text file.When you open a text file with this extension, ERwin reads the datastructure specified in the text file and automatically reverse engineersthe database and creates a matching data model.
DBF- A file name with this extension is a database file in dBASEformat. When you open a DBF file, ERwin automatically reverseengineers the database and creates a matching data model.
MDB - A file name with this extension is a database file in MicrosoftAccess format. When you open an *.mdb file, ERwin automaticallyreverse engineers the database and creates a matching data model.
ERWIN WORKPLACE
8/8/2019 DataModel Session ELTP
http://slidepdf.com/reader/full/datamodel-session-eltp 89/128
© Mahindra Satyam 2009 89
Model Explorer & Toolbars
MODEL EXPLORER
8/8/2019 DataModel Session ELTP
http://slidepdf.com/reader/full/datamodel-session-eltp 90/128
© Mahindra Satyam 2009 90
LOGICAL AND PHYSICAL MODELS
8/8/2019 DataModel Session ELTP
http://slidepdf.com/reader/full/datamodel-session-eltp 91/128
© Mahindra Satyam 2009 91
NOTATIONS
8/8/2019 DataModel Session ELTP
http://slidepdf.com/reader/full/datamodel-session-eltp 92/128
© Mahindra Satyam 2009 92
DIMENSIONAL MODEL NOTATION
8/8/2019 DataModel Session ELTP
http://slidepdf.com/reader/full/datamodel-session-eltp 93/128
© Mahindra Satyam 2009 93
8/8/2019 DataModel Session ELTP
http://slidepdf.com/reader/full/datamodel-session-eltp 94/128
RELATIONSHIPS
8/8/2019 DataModel Session ELTP
http://slidepdf.com/reader/full/datamodel-session-eltp 95/128
© Mahindra Satyam 2009 95
DOMAINS
8/8/2019 DataModel Session ELTP
http://slidepdf.com/reader/full/datamodel-session-eltp 96/128
© Mahindra Satyam 2009 96
RELATIONSHIP
8/8/2019 DataModel Session ELTP
http://slidepdf.com/reader/full/datamodel-session-eltp 97/128
© Mahindra Satyam 2009 97
RELATIONSHIP
8/8/2019 DataModel Session ELTP
http://slidepdf.com/reader/full/datamodel-session-eltp 98/128
© Mahindra Satyam 2009 98
ROLENAMES
8/8/2019 DataModel Session ELTP
http://slidepdf.com/reader/full/datamodel-session-eltp 99/128
© Mahindra Satyam 2009 99
DISPLAY LEVEL
8/8/2019 DataModel Session ELTP
http://slidepdf.com/reader/full/datamodel-session-eltp 100/128
© Mahindra Satyam 2009 100
8/8/2019 DataModel Session ELTP
http://slidepdf.com/reader/full/datamodel-session-eltp 101/128
TRANSFORMS
8/8/2019 DataModel Session ELTP
http://slidepdf.com/reader/full/datamodel-session-eltp 102/128
© Mahindra Satyam 2009 102
NAMING STANDARDS
8/8/2019 DataModel Session ELTP
http://slidepdf.com/reader/full/datamodel-session-eltp 103/128
© Mahindra Satyam 2009 103
NAMING STANDARDS
8/8/2019 DataModel Session ELTP
http://slidepdf.com/reader/full/datamodel-session-eltp 104/128
© Mahindra Satyam 2009 104
FORWARD ENGINEERING
8/8/2019 DataModel Session ELTP
http://slidepdf.com/reader/full/datamodel-session-eltp 105/128
© Mahindra Satyam 2009 105
8/8/2019 DataModel Session ELTP
http://slidepdf.com/reader/full/datamodel-session-eltp 106/128
REPORTS
8/8/2019 DataModel Session ELTP
http://slidepdf.com/reader/full/datamodel-session-eltp 107/128
© Mahindra Satyam 2009 107
8/8/2019 DataModel Session ELTP
http://slidepdf.com/reader/full/datamodel-session-eltp 108/128
Are the expected benefits being realized?
8/8/2019 DataModel Session ELTP
http://slidepdf.com/reader/full/datamodel-session-eltp 109/128
© Mahindra Satyam 2009 109
There is no magic solution!
Are the expected benefits being realised?
8/8/2019 DataModel Session ELTP
http://slidepdf.com/reader/full/datamodel-session-eltp 110/128
© Mahindra Satyam 2009 110
The data model is required for good data management but it is only one of the elements.
Today's systems are tomorrow's legacy systems!
Barriers to good data management
8/8/2019 DataModel Session ELTP
http://slidepdf.com/reader/full/datamodel-session-eltp 111/128
© Mahindra Satyam 2009 111
Barriers to good data management
8/8/2019 DataModel Session ELTP
http://slidepdf.com/reader/full/datamodel-session-eltp 112/128
© Mahindra Satyam 2009 112
Data problems
– lack of resources, data hoarding, lack of data knowledge
System users – not committed, not convinced, lack of time
Legacy systems and data stores
Different business interests
Cost
Barriers to good data management
8/8/2019 DataModel Session ELTP
http://slidepdf.com/reader/full/datamodel-session-eltp 113/128
© Mahindra Satyam 2009 113
8/8/2019 DataModel Session ELTP
http://slidepdf.com/reader/full/datamodel-session-eltp 114/128
CASE STUDY - 1
8/8/2019 DataModel Session ELTP
http://slidepdf.com/reader/full/datamodel-session-eltp 115/128
© Mahindra Satyam 2009 115
PURPOSE:-The aim of the case study is to introduce you to the concepts
and principles involved in dimensional modeling design anddevelopment.
ou are expected to produce a small dimensional model based onthe scenario given in following slides.
CASE STUDY - 1PROBLEM STATEMENT
8/8/2019 DataModel Session ELTP
http://slidepdf.com/reader/full/datamodel-session-eltp 116/128
© Mahindra Satyam 2009 116
PROBLEM STATEMENTTelecom Sales Assignment (Star Schema) : -
telecom company wants to develop a data warehouse system to computerize its salesmanagement. Here are the details:-
The company is tracking the sales of its products (made in different manufacturing plants) todifferent customers.The company is basically comprised of two broad operations :
– Manufacturing products in its manufacturing plants – Sales of these products by its sales outlets to customers
The customers of the company are either big corporate companies or retailers who buy directlyover the counter.Each customer purchases one or more products through an order.There are two types of seller outlets:
– Corporate sales office – Retail stores
The products can be bought in the following two ways: – In the case of retail (non-corporate) customers, the products are
purchased over the counter from retail outlets.
– In the case of corporate customers, orders can be placed over the phoneand goods are delivered directly from plant to the particular corporate office.
CASE STUDY - 1
8/8/2019 DataModel Session ELTP
http://slidepdf.com/reader/full/datamodel-session-eltp 117/128
© Mahindra Satyam 2009 117
Business Questions to be answered : -1 What are the total cost and revenue for each model sold today, summarized by outlet,
outlet type, region?2 What are the total cost and revenue for each model sold today, summarized bymanufacturing plant and region?3 For each month how much was the ordered revenue by customer region? How much
as delivered?4 What are the top five models sold last month by total revenue? By quantity sold? By total
cost?
CASE STUDY - 2
8/8/2019 DataModel Session ELTP
http://slidepdf.com/reader/full/datamodel-session-eltp 118/128
© Mahindra Satyam 2009 118
PURPOSE:-The aim of the case study is to introduce you to the concepts
and principles involved in dimensional modeling design anddevelopment.
ou are expected to produce a small dimensional model based onthe scenario given in following slides.
CASE STUDY - 2
8/8/2019 DataModel Session ELTP
http://slidepdf.com/reader/full/datamodel-session-eltp 119/128
© Mahindra Satyam 2009 119
PROBLEM STATEMENTCompany Payroll Assignment (Star Schema) : -
software company wants to develop a data warehouse system to computerize its payrollmanagement. Here are the details:-
The company has 10000 employees on payroll out of which 9000 arepermanent employees and 1000 are contract employees.The company has 20 divisions.The company offices and development centers are in 50 locations (offshoreand onsite both included)The payroll cycle is monthly and payment is made on first of every monthThe paychecks are made in local currency (depending upon the assignment of employee)The salary of the employee depends upon his grade. For every grade there isa lower and higher salary bandwidth.
CASE STUDY - 2Business Questions to be answered :
8/8/2019 DataModel Session ELTP
http://slidepdf.com/reader/full/datamodel-session-eltp 120/128
© Mahindra Satyam 2009 120
Business Questions to be answered : -1 What is the total payroll cost for each division for each pay cycle?
2 What is the payroll cost employee grade wise as a percentage of total payroll cost per cycle?3 What is location wise payroll cost every month?4 Which are the top 5 divisions that have incurred maximum payrollcost?5 What is the ratio of supporting divisions payroll cost to the totalpayroll cost?6 What is the payroll cost of temporary employees as a ratio of totalpayroll cost?
8/8/2019 DataModel Session ELTP
http://slidepdf.com/reader/full/datamodel-session-eltp 121/128
CASE STUDY - 3PROBLEM STATEMENT
8/8/2019 DataModel Session ELTP
http://slidepdf.com/reader/full/datamodel-session-eltp 122/128
© Mahindra Satyam 2009 122
PROBLEM STATEMENTutomobile Finance Assignment (Star Schema) : -
n automobile company wants to develop a data warehouse system to computerize its financemanagement. Here are the details:-
The company has 1000 dealers (i.e. customers).The company has 10 profit centers.The revenue is accrued in local currency
The company has 5 product groups. Each product group has several modelsThe region for the sales person and customer is same as that of profit center
CASE STUDY - 3
8/8/2019 DataModel Session ELTP
http://slidepdf.com/reader/full/datamodel-session-eltp 123/128
© Mahindra Satyam 2009 123
Business Questions to be answered : -
1 What is the total revenue for each profit center for each month?2 How is the revenue growth for each profit center on year-on-year basis?3 Which are top 10 customers by revenue?4 Which are top 10 products by revenue?5 Which regions are not doing well revenue wise?6 Who are the 5 best sales representatives by revenue accrualsfor this year?
CASE STUDY - 4
8/8/2019 DataModel Session ELTP
http://slidepdf.com/reader/full/datamodel-session-eltp 124/128
© Mahindra Satyam 2009 124
PURPOSE:-
The aim of the case study is to introduce you to the conceptsand principles involved in dimensional modeling design anddevelopment.
ou are expected to produce a small dimensional model based onthe scenario given in following slides.
CASE STUDY - 4
8/8/2019 DataModel Session ELTP
http://slidepdf.com/reader/full/datamodel-session-eltp 125/128
© Mahindra Satyam 2009 125
PROBLEM STATEMENT
utomobile Inventory Assignment (Star Schema) : -n automobile company (say Tata Motors) wants to develop a data warehouse system tocomputerize its inventory management. Here are the details:-
The company has 3 manufacturing plant units (Pune, Lucknow and JSR)The company has 5 cost centers. (cost centers categorized by product group)
(HCV,MCV,LCV, Tata Indica and Tata Safari)Each plant has several store locationsThe company has 5 product groups. Each product group has several models
CASE STUDY - 4
8/8/2019 DataModel Session ELTP
http://slidepdf.com/reader/full/datamodel-session-eltp 126/128
© Mahindra Satyam 2009 126
Business Questions to be answered : -1 What is the total inventory quantity and amount for each plant location at opening of each month?2 How much is the inventory cost for each cost center for each quarter end?3 Which are the top 10 products that has maximum inventory cost at opening of eachmonth?4 What is the total inventory quantity and amount for each store location at opening of
each month?
8/8/2019 DataModel Session ELTP
http://slidepdf.com/reader/full/datamodel-session-eltp 127/128
© Mahindra Satyam 2009 127
Q & A
8/8/2019 DataModel Session ELTP
http://slidepdf.com/reader/full/datamodel-session-eltp 128/128
Thank you
top related