dwh simplified
DESCRIPTION
DWH principlesTRANSCRIPT
![Page 1: DWH Simplified](https://reader031.vdocuments.mx/reader031/viewer/2022020118/577cc3fe1a28aba71197d260/html5/thumbnails/1.jpg)
![Page 2: DWH Simplified](https://reader031.vdocuments.mx/reader031/viewer/2022020118/577cc3fe1a28aba71197d260/html5/thumbnails/2.jpg)
• Source Data Component•Production data•Internal data•Archive data•External data
• Data staging component•Extraction•Transformation
•Cleaning•standardization
•Loading• Data storage component• Information delivery component• Metadata component• Management and control component
Overview of the Components
![Page 3: DWH Simplified](https://reader031.vdocuments.mx/reader031/viewer/2022020118/577cc3fe1a28aba71197d260/html5/thumbnails/3.jpg)
Architectural Framework
![Page 4: DWH Simplified](https://reader031.vdocuments.mx/reader031/viewer/2022020118/577cc3fe1a28aba71197d260/html5/thumbnails/4.jpg)
Data AcquisitionYou are the data analyst on the project team building a DW for an insurance company. List the possible data sources from which you will bring data into DWProduction data: data from various operational systemsExternal data: for finding trends and comparisons against other organizations. Internal data: private confidential data important to an organizationArchived data:for getting some historical information
![Page 5: DWH Simplified](https://reader031.vdocuments.mx/reader031/viewer/2022020118/577cc3fe1a28aba71197d260/html5/thumbnails/5.jpg)
Architectural Framework
![Page 6: DWH Simplified](https://reader031.vdocuments.mx/reader031/viewer/2022020118/577cc3fe1a28aba71197d260/html5/thumbnails/6.jpg)
Data StagingPerforms ETL
Extraction Select data sources, determine filters Automatic replicate Create intermediary files
Transformation Clean, merge, de-duplicate data Covert data types Calculate derived data Resolve synonyms and homonyms
Loading Initial loading Incremental loading
![Page 7: DWH Simplified](https://reader031.vdocuments.mx/reader031/viewer/2022020118/577cc3fe1a28aba71197d260/html5/thumbnails/7.jpg)
Why is a separate data staging area required?Data is across various operational databases It should be subject-oriented dataData staging is mandatory
![Page 8: DWH Simplified](https://reader031.vdocuments.mx/reader031/viewer/2022020118/577cc3fe1a28aba71197d260/html5/thumbnails/8.jpg)
Architectural Framework
![Page 9: DWH Simplified](https://reader031.vdocuments.mx/reader031/viewer/2022020118/577cc3fe1a28aba71197d260/html5/thumbnails/9.jpg)
Characteristics of data storage areaSeparate repositoryData content
Read onlyIntegratedHigh volumesGrouped by business subjects
Metadata drivenData from DW is aggregated in MDDBs
![Page 10: DWH Simplified](https://reader031.vdocuments.mx/reader031/viewer/2022020118/577cc3fe1a28aba71197d260/html5/thumbnails/10.jpg)
Architectural Framework
![Page 11: DWH Simplified](https://reader031.vdocuments.mx/reader031/viewer/2022020118/577cc3fe1a28aba71197d260/html5/thumbnails/11.jpg)
Information delivery componentDepends on the user
Novice user: prefabricated reports, preset queries
Casual user: once in a while information business analyst: complex analysisPower users: picks up interesting data
![Page 12: DWH Simplified](https://reader031.vdocuments.mx/reader031/viewer/2022020118/577cc3fe1a28aba71197d260/html5/thumbnails/12.jpg)
Information delivery component
![Page 13: DWH Simplified](https://reader031.vdocuments.mx/reader031/viewer/2022020118/577cc3fe1a28aba71197d260/html5/thumbnails/13.jpg)
Architectural Framework
![Page 14: DWH Simplified](https://reader031.vdocuments.mx/reader031/viewer/2022020118/577cc3fe1a28aba71197d260/html5/thumbnails/14.jpg)
Metadata componentData about data in the datawarehouseMetadata can be of 3 types
Operational metadata: contains information about operational data sources
Extraction and transformation metadata: Details pertaining to extraction frequencies, extraction methods, business rules for data extraction
End-user metadata: navigational map of DW
![Page 15: DWH Simplified](https://reader031.vdocuments.mx/reader031/viewer/2022020118/577cc3fe1a28aba71197d260/html5/thumbnails/15.jpg)
Why is metadata especially important in a data warehouse? It acts as the glue that connects all parts of
the data warehouse. It provides information about the contents
and structures to the developers. It opens the door to the end-users and makes
the contents recognizable in their own terms.
![Page 16: DWH Simplified](https://reader031.vdocuments.mx/reader031/viewer/2022020118/577cc3fe1a28aba71197d260/html5/thumbnails/16.jpg)
![Page 17: DWH Simplified](https://reader031.vdocuments.mx/reader031/viewer/2022020118/577cc3fe1a28aba71197d260/html5/thumbnails/17.jpg)
Management and ControlSits on top of all components
Coordinates the services and activities within the DW
Controls the data transformation and transfer in DW storage
![Page 18: DWH Simplified](https://reader031.vdocuments.mx/reader031/viewer/2022020118/577cc3fe1a28aba71197d260/html5/thumbnails/18.jpg)
Summing upData warehouse building blocks or
components are: source data, data staging, data storage, information delivery, metadata, and management and control.
In a data warehouse, metadata is especially significant because it acts as the glue holding all the components together and serves as a roadmap for the end-users.
![Page 19: DWH Simplified](https://reader031.vdocuments.mx/reader031/viewer/2022020118/577cc3fe1a28aba71197d260/html5/thumbnails/19.jpg)
Doubts????????????????
![Page 20: DWH Simplified](https://reader031.vdocuments.mx/reader031/viewer/2022020118/577cc3fe1a28aba71197d260/html5/thumbnails/20.jpg)
![Page 21: DWH Simplified](https://reader031.vdocuments.mx/reader031/viewer/2022020118/577cc3fe1a28aba71197d260/html5/thumbnails/21.jpg)
Case study 1As a senior analyst on DW project of a large
retail chain, you are responsible for improving data visualization of the output results. Make a list of recommendations
![Page 22: DWH Simplified](https://reader031.vdocuments.mx/reader031/viewer/2022020118/577cc3fe1a28aba71197d260/html5/thumbnails/22.jpg)
![Page 23: DWH Simplified](https://reader031.vdocuments.mx/reader031/viewer/2022020118/577cc3fe1a28aba71197d260/html5/thumbnails/23.jpg)
Parallel processingPerformance of DW may be improved using
parallel processing with appropriate hardware and software options.
Parallel processing optionsSymmetric multiprocessingMassively parallel processingclusters
![Page 24: DWH Simplified](https://reader031.vdocuments.mx/reader031/viewer/2022020118/577cc3fe1a28aba71197d260/html5/thumbnails/24.jpg)
DW with ERP packages
![Page 25: DWH Simplified](https://reader031.vdocuments.mx/reader031/viewer/2022020118/577cc3fe1a28aba71197d260/html5/thumbnails/25.jpg)