chapter 5 using sas ® etl studio. section 5.1 sas etl studio overview
TRANSCRIPT
3
What Is SAS ETL Studio?SAS ETL Studio, a Java application, is a visual design tool that helps organizations quickly build, implement, and manage ETL processes from source to destination, regardless of the data sources or platforms.
Users can standardize metadata across the organization and perform in-depth transformations with minimal programming or manual work to meet enterprise data integration requirements and to support business and analytic intelligence.
4
What Is SAS ETL Studio?SAS ETL Studio enables you to perform the following tasks: the Extraction of data from operational data stores the Transformation of this data the Loading of the extracted data into your data
warehouse or data mart.
5
What Is SAS ETL Studio?SAS ETL Studio is an application that enables you to manage ETL process flows by allowing: specification of metadata for sources, such as tables
in an operational system specification of metadata for targets – the tables and
other data stores in a data warehouse creation of jobs that specify how data is extracted,
transformed, and loaded from a source to a target.
6
SAS ETL Studio: Change ManagementIn SAS ETL Studio, the change management facility enables multiple SAS ETL Studio users to work with the same metadata repository at the same time without overwriting each other’s changes.
7
SAS ETL Studio: Data Surveyor WizardsOptional Data Surveyor wizards can be licensed that provide access to the metadata in enterprise applications, such as PeopleSoft SAP R/3 Siebel Oracle Applications.
8
SAS ETL Studio: Metadata CWM CompliantThe metadata maintained by SAS ETL Studio is CWM (Common Warehouse Metamodel) compliant and portable to other CWM-compliant applications. Likewise, metadata from other CWM-compliant applications (that is, data modeling tools) can be imported easily into SAS ETL Studio.
9
SAS ETL Studio: Data QualitySAS ETL Studio is fully integrated with the data quality software from DataFlux Corporation. Both products now use the same Quality Knowledge Base (QKB), which contains rules, routines, and schemes necessary to integrate data quality into the ETL process.
10
Extending SAS ETL Studio FunctionalityThe SAS ETL Studio functionality is extended by Java plug-ins packaged with the product.
Further extensions can be implemented by writing additional plug-ins
(Java programming required) using the Transformation Generator Wizard
(no Java programming required).
11
Server Connections and SAS ETL StudioAs a client, SAS ETL Studio must connect to a SAS Metadata Server to read or write metadata. It must connect to other servers to run SAS code, connect to a third-party database management system, or to perform other tasks.
12
Interaction with SAS Application ServersSAS ETL Studio can use different types of application servers:
SAS Metadata ServerSAS Metadata Server Required to read and write Required to read and write metadata in a SAS metadata metadata in a SAS metadata repository.repository.
SAS Workspace ServerSAS Workspace Server Required to execute SAS code Required to execute SAS code and access data.and access data.
SAS/CONNECT ServerSAS/CONNECT Server Required to submit generated Required to submit generated SAS code to machines that are SAS code to machines that are remote to the default SAS remote to the default SAS application server.application server.
...
14
SAS ETL Studio: The InterfaceSAS ETL Studio is a Java client developed to control the ETL process. The interface has several “ease-of-use” features including copy and paste in any text field multiple windows can be open at one time (including
multiple process flow diagrams) Windows look and feel wizard-driven interfaces.
15
Tools, Menus, and Online HelpSAS ETL Studio takes full advantage of toolbars and pull-down menus. The icons available on the toolbar depend on which window is active from within the interface.
Menus and Tools
16
The Shortcut BarOne of the most significant features of SAS ETL Studio is the new process-driven functionality.
Processes are available via a Shortcut bar on the far left side of the main SAS ETL Studio window.
Shortcut Bar
17
The Shortcut BarThe Shortcut bar is populated with icons for each task an ETL user would typically perform, including:
Source DesignerSource Designer defines metadata about the defines metadata about the source(s) for a process.source(s) for a process.
Metadata ImporterMetadata Importer imports metadata from other imports metadata from other applications.applications.
Metadata ExporterMetadata Exporter exports metadata to be used by exports metadata to be used by other applications.other applications.
Process DesignerProcess Designer defines metadata about the ETL defines metadata about the ETL processes.processes.
...continued...
18
The Shortcut Bar
Target DesignerTarget Designer defines metadata about the defines metadata about the target table(s) to be created by target table(s) to be created by the process.the process.
OptionsOptions provides numerous options for provides numerous options for the SAS ETL Studio user to the SAS ETL Studio user to customize the look and feel of the customize the look and feel of the application.application.
...
19
Tree ViewThe SAS ETL Studio Tree View enables you to view the metadata
associated with the current metadata repository
display different views or “trees” of the current repository.
Tree View
20
Tree ViewThere are several tabs available in the tree view area:
...continued...
Inventory TreeInventory Tree lists the metadata objects in the lists the metadata objects in the default metadata repository (and default metadata repository (and any dependant repositories), any dependant repositories), organized by organized by predeterminedpredetermined groupings.groupings.
21
Tree View
...continued...
Custom TreeCustom Tree lists the metadata objects in the lists the metadata objects in the default metadata repository default metadata repository (and any dependant (and any dependant repositories), organized by repositories), organized by user-defineduser-defined groupings of groupings of objects.objects.
22
Tree View
Process Library TreeProcess Library Tree lists the available data lists the available data transformations to be used in transformations to be used in the ETL process. the ETL process.
...
23
Process Library TreeThe Process Library tree displays a collection of transformation templates.
There are four collections (folders) of templates that are provided with SAS ETL Studio: Analysis Data Transforms Output Publish.
24
Process Designer ViewThe Process Designer window is the workspace for building ETL processes. The Process Designer view appears as a final step in the Process Designer wizard.
Once the process is defined, the Process Designer view is populated with icons that represent the chosen processes.
The Process Designer window can be used to view SQL source code review the SAS log (from submitting jobs) view the resulting output from running a SAS job.
26
Overview WindowThe Overview window shows you the complete process from the process view.
From within the Overview window, you can control which part of the process is displayed in the Process View window.
27
SAS ETL Studio WizardsThere are shortcuts which invoke wizards that aid the user in performing various tasks with SAS ETL Studio.
Some of these wizards are Source Designer Target Designer New Job.
28
Source DesignerThe Source Designer is a wizard-driven interface that enables you to define the physical layout of existing tables using a data dictionary or metadata information from the source system.
The result of running the Source Designer successfully is a metadata registration that describes the data source.
29
Target DesignerThe Target Designer is a wizard that allows metadata to be entered for a target.
In designing the target table, you can access any metadata about any
source tables and columns registered in the metadata repository
override any metadata that was imported from another source and add new columns to the target table
create indexes on the target table being created.
30
Target DesignerThe person designing the target table has full control over the type of table being built.
The types of targets that can be built include database types that are supported by the
SAS/ACCESS products SAS data sets (including both data files and data views) SAS/SHARE data sets SPDE tables.
31
New Job WizardThe New Job wizard enables you to define the metadata necessary to run an ETL process to load data into a target or targets.
32
Additional WizardsOther wizards available to provide assistance with various tasks in SAS ETL Studio include Metadata Importer Metadata Exporter Cube Designer Transformation Generator wizard.
You can also install optional data surveyor wizards, which provide access to the metadata in enterprise applications, such as PeopleSoft, SAP R/3, Siebel, and Oracle.
33
Options WindowThe Options window can be used to define standard settings for the SAS ETL Studio interface.
There are several tabs in the Options window: General Process Editor Metadata Tree SAS Server Data Quality.
34
Course Case Study TasksRecall the case study tasks diagram discussed earlier. Each of these tasks involves either reading or writing (or both) metadata.
Register Source Tables
DefineData Libraries
CreateETL Jobs
DefineTarget Tables
CreateOLAP Cubes
View and AnalyzeData
CreateStored Processes
Create Reports
CreateInformation Maps
Use the InformationDelivery Portal
Metadata
35
SAS ETL Studio Case Study TasksSAS ETL Studio will concentrate on the following four tasks:
Register Source Tables
DefineData Libraries
CreateETL Jobs
DefineTarget Tables
CreateOLAP Cubes
View and AnalyzeData
CreateStored Processes
Create Reports
CreateInformation Maps
Use the InformationDelivery Portal
Metadata
36
SAS ETL Studio Case Study
DefineData Libraries (+)
These tasks will be performed in sequence:
Define Source Tables Metadata
Define TargetTables Metadata
Define and Run Jobs
1.
2.
3.
4.
37
SAS ETL Studio Case Study – Setup TasksDefine
Data Libraries (+)
Define Source Tables Metadata
Define TargetTables Metadata
Define and Run Jobs
1.
2.
3.
4.
Build Custom Tree Groupings
Libraries
Jobs Source Tables Target Tables
Define Additional Library Definitions
Target Tables Library
Source Tables Library
Demo
Exercises
Demo
Exercises
...
38
SAS ETL Studio Case Study – Define SourcesDefine
Data Libraries (+)
Define Source Tables Metadata
Define TargetTables Metadata
Define and Run Jobs
1.
2.
3.
4.
The Source Designer defines metadata for the source tables.
Orders
Order_Item
Product_List
Demo
Exercises
...
39
SAS ETL Studio Case Study – Define TargetsDefine
Data Libraries (+)
Define Source Tables Metadata
Define TargetTables Metadata
Define and Run Jobs
1.
2.
3.
4.
The Target Designer defines metadata for the target tables.
OrderFact
ProductDim
Demo*
Exercises
* Some derived columns for OrderFact are completed in the exercises.
...
40
SAS ETL Studio Case Study – Define JobsDefine
Data Libraries (+)
Define Source Tables Metadata
Define TargetTables Metadata
Define and Run Jobs
1.
2.
3.
4.The Process Designer defines metadata for jobs that contain the process flow diagrams necessary to load the target tables.
Demo
Exercises
Populate the OrderFact table
Populate the ProductDim table
...
41
Creating the OrderFact TableThe OrderFact table will be created from the Orders and Order_Item tables.
Target Table
Source Tables
...
42
Creating the OrderFact TableThe source tables, Orders and Order_Item, will be combined using the SQL Join transformation.
SQL Join
The SQL Join will be used to define computed columns.
...
43
Creating the OrderFact TableThe table that is the result of the SQL Join will then be loaded into the OrderFact table.
Loader
...
44
Creating the ProductDim TableThe ProductDim table will be created from the Product_List table.
Target Table
Source Table
...
45
Creating the ProductDim TableThe Extract transformation will be used so that a computed column can be defined.
SAS Extract
...
46
Creating the ProductDim TableThe results of the Extract transformation will then be loaded into the target table, ProductDim.
Loader
...
47
SAS ETL Studio Case Study – Setup TasksDefine
Data Libraries (+)
Define Source Tables Metadata
Define TargetTables Metadata
Define and Run Jobs
1.
2.
3.
4.
Build Custom Tree Groupings
Libraries
Jobs Source Tables Target Tables
Define Additional Library Definitions
Target Tables Library
Source Tables Library
Demo
Exercises
Demo
Exercises
48
This demonstration shows how to define a logical grouping object and create a library definition to store in the new grouping.
Create a Logical Grouping and Adding a Library Definition
Register Source Tables
DefineData Libraries
CreateETL Jobs
DefineTarget Tables
CreateOLAP Cubes
View and AnalyzeData
CreateStored Processes
Create Reports
CreateInformation Maps
Use the InformationDelivery Portal
Metadata
49
This exercise creates logical grouping elements and defines two SAS libraries.
Exercises
Register Source Tables
DefineData Libraries
CreateETL Jobs
DefineTarget Tables
CreateOLAP Cubes
View and AnalyzeData
CreateStored Processes
Create Reports
CreateInformation Maps
Use the InformationDelivery Portal
Metadata
50
Using the Source DesignerThe Source Designer is a wizard that generates metadata for one or more selected tables, based on the physical structure of the table(s)
The Source Designer can be used to specify metadata for any existing table, not just tables used as data sources for ETL jobs.
52
SAS ETL Studio Case Study – Define SourcesDefine
Data Libraries (+)
Define Source Tables Metadata
Define TargetTables Metadata
Define and Run Jobs
1.
2.
3.
4.
The Source Designer defines metadata for the source tables.
Orders
Order_Item
Product_List
Demo
Exercises
53
This demonstration shows how to add a source table definition for the Orders table.
Add a Source Table Definition
Register Source Tables
DefineData Libraries
CreateETL Jobs
DefineTarget Tables
CreateOLAP Cubes
View and AnalyzeData
CreateStored Processes
Create Reports
CreateInformation Maps
Use the InformationDelivery Portal
Metadata
54
These exercises add source table definitions for several source tables.
Exercises
Register Source Tables
DefineData Libraries
CreateETL Jobs
DefineTarget Tables
CreateOLAP Cubes
View and AnalyzeData
CreateStored Processes
Create Reports
CreateInformation Maps
Use the InformationDelivery Portal
Metadata
55
Using the Target DesignerThe Target Designer is a wizard that can create new metadata about a single table that might or might not already exist in physical storage.
It can also be used to create and edit metadata about an OLAP cube.
57
SAS ETL Studio Case Study – Define TargetsDefine
Data Libraries (+)
Define Source Tables Metadata
Define TargetTables Metadata
Define and Run Jobs
1.
2.
3.
4.
The Target Designer defines metadata for the target tables.
OrderFact
ProductDim
Demo*
Exercises
* Some derived columns for OrderFact are completed in the exercises.
58
This demonstration illustrates defining a target table definition for the OrderFact table.
Defining a Target Table
Register Source Tables
DefineData Libraries
CreateETL Jobs
DefineTarget Tables
CreateOLAP Cubes
View and AnalyzeData
CreateStored Processes
Create Reports
CreateInformation Maps
Use the InformationDelivery Portal
Metadata
59
These exercises add target table definitions for several tables.
Exercises
Register Source Tables
DefineData Libraries
CreateETL Jobs
DefineTarget Tables
CreateOLAP Cubes
View and AnalyzeData
CreateStored Processes
Create Reports
CreateInformation Maps
Use the InformationDelivery Portal
Metadata
60
Using the Process DesignerThe Process Designer invokes the New Job wizard to create metadata about a job. That metadata is used to build a process flow diagram for the job.
A job is a metadata object that specifies processes that create output. SAS ETL Studio organizes sources, targets, and transformations into jobs that can be displayed in a process flow diagram.
SAS ETL Studio uses each job to generate and/or retrieve SAS code that reads sources and creates targets on a file system.
61
Using the Process DesignerThe New Job wizard prompts for information that is used to build a template in the Process Designer.
...
62
SAS ETL Studio Case Study – Define JobsDefine
Data Libraries (+)
Define Source Tables Metadata
Define TargetTables Metadata
Define and Run Jobs
1.
2.
3.
4.The Process Designer defines metadata for jobs that contain the process flow diagrams necessary to load the target tables.
Demo
Exercises
Populate the OrderFact table
Populate the ProductDim table
63
This demonstration shows how to define a job for the OrderFact target table and enter information about the extraction and transformation of data
Defining a Job
Register Source Tables
DefineData Libraries
CreateETL Jobs
DefineTarget Tables
CreateOLAP Cubes
View and AnalyzeData
CreateStored Processes
Create Reports
CreateInformation Maps
Use the InformationDelivery Portal
Metadata
64
This exercise shows how to create a new job and enter information about the extraction and transformation of data.
Exercises
Register Source Tables
DefineData Libraries
CreateETL Jobs
DefineTarget Tables
CreateOLAP Cubes
View and AnalyzeData
CreateStored Processes
Create Reports
CreateInformation Maps
Use the InformationDelivery Portal
Metadata
65
This demonstration shows how to specify the load process attributes as well as executing and verifying a job.
Loading the Target Tables
Register Source Tables
DefineData Libraries
CreateETL Jobs
DefineTarget Tables
CreateOLAP Cubes
View and AnalyzeData
CreateStored Processes
Create Reports
CreateInformation Maps
Use the InformationDelivery Portal
Metadata
66
This exercise shows how to specify the load process attributes as well as executing and verifying a job.
Exercises
Register Source Tables
DefineData Libraries
CreateETL Jobs
DefineTarget Tables
CreateOLAP Cubes
View and AnalyzeData
CreateStored Processes
Create Reports
CreateInformation Maps
Use the InformationDelivery Portal
Metadata
68
Advanced FeaturesThis section introduces several advanced features to review on your own, including: Data Quality Plug-Ins Importing and Exporting Metadata Change Management.
69
Data Quality Plug-InsSAS ETL Studio contains two data quality transformation templates in the Process Library tree:
Create Match Create Match CodeCode
Used to create a job that creates match Used to create a job that creates match codes and cluster numbers for a codes and cluster numbers for a specified source column and based on specified source column and based on a set of criterion.a set of criterion.
Apply Lookup Apply Lookup StandardizationStandardization
Used to Used to create a job that standardizes create a job that standardizes the values of a source column the values of a source column according to the contents of a specified according to the contents of a specified standardization scheme.standardization scheme.
These templates increase the value of data through data analysis and data cleansing.
...
70
Data Quality Plug-InsTo use the data quality transformation templates, the SAS Data Quality Server software must be installed a SAS application server must be configured to access
a Quality Knowledge Base the Quality Knowledge Base must contain the locales
needed to reference data quality jobs.
When the prerequisites have been met, the data quality transformations can be dragged into process flow diagrams.
71
Create Match Code Plug-InThe Create Match Code plug-in is a tabbed dialog box that reads the Quality Knowledge Base for the specified
locale creates match codes based on the user-specified
criterion.
The match code can then be used to de-duplicate data or join data as part of the transformation step in defining the target table.
72
Apply Lookup Standardization Plug-InThe Apply Lookup Standardization plug-in is a tabbed dialog box that reads the Quality Knowledge Base for the specified
locale loads all of the available standardization schemes.
You can then apply the scheme to one of the source columns as part of the transformation step in defining the target table.
73
Metadata Import WizardThe Metadata Import Wizard is an interface for importing metadata files that are compliant with the Common Warehouse Metamodel (CWM) standard.
By using the Import Wizard to import the metadata from a previously defined data model (source tables or target tables), you do not have to enter the metadata for each table individually. You simply reference a location for the model file, which was created by a third-party modeling tool.
74
Which Metadata Can Be Imported?The CWM standard for metadata was developed by Object Management Group (OMG).
More information about OMG and the CWM metadata standard can be obtained from: http://www.omg.org
More information about Meta Integration Technology, Inc., and the purchase of MIMBs, can be obtained from the following location: http://www.metaintegration.net
75
Metadata Export WizardThe Metadata Export Wizard is an interface for exporting metadata from within SAS ETL Studio to third-party CWM-compliant applications.
The user has the ability to specify the path and the file to create from the export of the metadata.
Once the user completes the Metadata Exporter wizard, a confirmation window verifies all of the selections the user has made for the export of the metadata. Upon exiting this window, the metadata is written to the external file that was specified in the wizard.
76
Change ManagementSAS ETL Studio enables you to create metadata objects that define sources, targets, and the transformations that connect them. These objects are saved to one or more metadata repositories.
The change management feature (or more specifically, metadata source control) enables multiple SAS ETL Studio users to work with the same metadata repository at the same time without overwriting each other's changes.
77
Change ManagementChange management features in SAS ETL Studio include: menus that support change management operations
such as check out and check in the Inventory tree and the Custom tree for working
with metadata that is contained in a change-managed repository
the Project tree for working with metadata that is contained in a project repository
an audit history for each metadata object.
78
Change ManagementAfter an object has been checked out by one person, it is locked so that it cannot be updated by another person until the object has been checked back in.
The only people who can change the metadata in a change-managed repository are the person who started the metadata server administrators who have write access to the repository any users who are authorized to use a project repository
for the change-managed repository.