ibm info sphere suite overview

22
© 2011, GAVS Technologies IBM Infosphere Suite Overview

Upload: kombs

Post on 06-Mar-2015

69 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: IBM Info Sphere Suite Overview

© 2011, GAVS Technologies

IBM Infosphere Suite Overview

Page 2: IBM Info Sphere Suite Overview

© 2011, GAVS Technologies

Datawarehousing

► Datawarehouse is a DBMS platform where historical data of an organization is stored.

► The concepts and methods used to create a datawarehouse is collectively known as datawarehousing.

► Why datawarehouse when we have transactional databases?– Databases are designed for fast query and editing of information whereas

warehouses are designed for analytical & reporting purposes.– Databases can hold 3 to 6 months of data due to design constraints (E-R

Modeling) whereas warehouses can hold years of data (Dimensional Modeling).– Datawarehouses are also known as Decision Support Systems (DSS) since they

help the management to make informed policy & product related decisions.

► The subsets of Datawarehouses are known as Data Marts.► The tools that are used to create a warehouse can be categorized into Data

Modeling tools & ETL Tools.

Page 3: IBM Info Sphere Suite Overview

© 2011, GAVS Technologies

ETL

► ETL stands for Extract, Transform and Load► Extract is the process of getting the data from various source

systems & files.► Transformation is the stage where the data is checked for

consistency, cleansed and transformed as per the business requirements.

► Load is the process of updating or inserting the transformed data into the datawarehouse.

► There are many ETL tools available in the market like Informatica, Abinitio etc.

► The ETL tool selected for IHS Newton project is IBM Infosphere Suite 8.5

Page 4: IBM Info Sphere Suite Overview

© 2011, GAVS Technologies

Components

► The Infosphere suite comprises the following softwares in it– Datastage - ETL tool

– Qualitystage - Standardizing & cleansing tool

– Information Analyzer - Analysis & understanding of the data structure.

– Metadata Workbench - Centralized repository of metadata.

– Business Glossary - Web-based tool to create, manage, and share an enterprise vocabulary and classification system

– Fast Track ETL - job creation assistance

– Blueprint Director - Project flow assistance

Page 5: IBM Info Sphere Suite Overview

© 2011, GAVS Technologies

Architecture

Page 6: IBM Info Sphere Suite Overview

© 2011, GAVS Technologies

Datastage

► Datastage provides a graphical framework that can be used to design and run the jobs that transform the data

► Datastage delivers four core capabilities:– Connectivity to a wide range of mainframe, legacy, and enterprise

applications, databases, file formats, and external information sources.

– Prebuilt library of more than 300 functions including data validation rules and very complex transformations.

– Maximum throughput using a parallel, high-performance processing

architecture.

– Enterprise-class capabilities for development, deployment, maintenance, and high-availability. It leverages metadata for analysis and maintenance. It also operates in batch, real time, or as a Web service.

Page 7: IBM Info Sphere Suite Overview

© 2011, GAVS Technologies

Datastage Cont...

► Data transformation and movement is the process by which source data is selected, converted, and mapped to the format required by targeted systems.

► The process manipulates data to bring it into compliance with business, domain, and integrity rules and with other data in the target environment.

► Data Transformation can take some of the following forms:– Aggregation

Consolidating or summarizing data values into a single value. Collecting daily sales data to be aggregated to the weekly level is a common example of aggregation.

– Basic conversion

Ensuring that data types are correctly converted and mapped from source to target columns.

Page 8: IBM Info Sphere Suite Overview

© 2011, GAVS Technologies

Datastage Cont...

► Data Transformations Contd...– Derivation

Transforming data from multiple sources by using a complex business rule or algorithm.

– Enrichment

Combining data from internal or external sources to provide additional meaning to the data.

– Normalizing

Reducing the amount of redundant and potentially duplicated data.– Combining

The process of combining data from multiple sources via parallel Lookup, Join, or Merge operations.

– Pivoting

Converting records in an input stream to many records in the appropriate table in the data warehouse or data mart.

– Sorting

Grouping related records and sequencing data based on data or string values.

Page 9: IBM Info Sphere Suite Overview

© 2011, GAVS Technologies

Datastage – Stage Examples

Page 10: IBM Info Sphere Suite Overview

© 2011, GAVS Technologies

Datastage Sample Jobs

Page 11: IBM Info Sphere Suite Overview

© 2011, GAVS Technologies

Datastage Sample Jobs Contd....

Page 12: IBM Info Sphere Suite Overview

© 2011, GAVS Technologies

Datastage Sample Jobs Contd....

Page 13: IBM Info Sphere Suite Overview

© 2011, GAVS Technologies

Qualitystage

► Qualitystage is the data cleansing and standardizing tool of the Infosphere suite

► The main functionalities of Qualitystage are:– Investigation of source data to understand the nature, scope, and detail

of data quality challenges.

– Standardization to ensure that data is formatted and conforms to organization-wide specifications, including name and firm standards as well as address cleansing and verification.

– Matching of data to identify duplicate records within and across data sets.

– Survivorship to eliminate duplicate records and create the “best record view” of data.

Page 14: IBM Info Sphere Suite Overview

© 2011, GAVS Technologies

Qualitystage Sample Jobs

Page 15: IBM Info Sphere Suite Overview

© 2011, GAVS Technologies

Qualitystage Sample Jobs Contd...

Page 16: IBM Info Sphere Suite Overview

© 2011, GAVS Technologies

Information Analyzer

► Information Analyzer is used to understand the content, structure, and overall quality of the data at a given point in time.

► This analysis aids in understanding the inputs to the integration process, ranging from individual fields to high-level data entities.

► Information analysis also enables to correct problems with structure or validity before they affect the project.

Page 17: IBM Info Sphere Suite Overview

© 2011, GAVS Technologies

Information Analyzer Interface

Page 18: IBM Info Sphere Suite Overview

© 2011, GAVS Technologies

Metadata Workbench

► This tool provides a visual, Web-based exploration of metadata that is generated, used, and imported by the InfoSphere Information Server.

► InfoSphere Information Server components store design time, runtime, and glossary metadata in the metadata repository.

► Users can also import database and data file information into the metadata repository and create extended data sources and extension mappings that represent objects and processes that exist outside of InfoSphere Information Server.

► Metadata Workbench helps business and IT users explore and manage those metadata assets.

► The metadata workbench gives reports on data flow, data lineage, and the impact of changes to data assets or physical assets.

Page 19: IBM Info Sphere Suite Overview

© 2011, GAVS Technologies

Metadata Workbench Interface

Page 20: IBM Info Sphere Suite Overview

© 2011, GAVS Technologies

Business Glossary

► Business Glossary is an interactive, Web-based tool that enables users to create, manage, and share an enterprise vocabulary and classification system.

► A business glossary is designed to help users understand business language and the business meaning of information assets like databases, jobs, database tables and columns, and business intelligence reports.

► In addition to categories and terms, the business glossary also contains information about other assets such as database tables, jobs, and reports that are in the metadata repository.

Page 21: IBM Info Sphere Suite Overview

© 2011, GAVS Technologies

Page 22: IBM Info Sphere Suite Overview

© 2011, GAVS Technologies

Fast Track

► FastTrack automates multiple data integration tasks from analysis to code generation, while incorporating the business perspective and maintaining lineage and documented requirements.