odi bigdata hol

Oracle Data Integrator for Big

Data: Hands-on Lab

Oracle Data Integrator for Big Data: Hands-on Lab

Last Updated: 25-Sep-14 of 43

Hands on Lab - Oracle Data Integrator for Big Data Abstract: This lab will highlight to Developers, DBAs and Architects some of the best practices for implementing a Big Data implementation of a Data Reservoir using E-LT techniques to improve performance and reduce data integration costs using Oracle Data Integrator. In this lab, participants will walk through the steps that are needed to load and transform sources into a Hadoop cluster, transform it, and load it into a relational target. The following lessons will walk us through various steps that are needed to create the Oracle Data Integrator mappings, packages and Oracle GoldenGate processes required to load and transform the data.

HANDS ON LAB - ORACLE DATA INTEGRATOR FOR BIG DATA ............................................................................................... 2 ARCHITECTURE OVERVIEW ............................................................................................................................................ 3 OVERVIEW ................................................................................................................................................................. 3

Time to Complete ............................................................................................................................................... 3 Prerequisites ...................................................................................................................................................... 3

TASK 0: PREPARATION STEPS ......................................................................................................................................... 4 TASK 1: REVIEW TOPOLOGY AND MODEL SETUP ............................................................................................................... 5 TASK 2: LOAD HIVE TABLES USING SQOOP ....................................................................................................................... 9 TASK 3: TRANSFORMING DATA WITHIN HIVE .................................................................................................................. 19 TASK 4: LOAD ORACLE FROM HIVE TABLES USING ORACLE LOADER FOR HADOOP .................................................................. 30 TASK 5: CREATING A NEW ODI PACKAGE TO EXECUTE END-TO-END LOAD. ........................................................................... 37 TASK 6: REPLICATING NEW RECORDS TO HIVE USING ORACLE GOLDENGATE ......................................................................... 40 SUMMARY ............................................................................................................................................................... 43



Architecture Overview This Hands-on lab is based on a fictional movie streaming company that provides online access to movie media. The goal of this lab is to load customer activity data that includes movie rating actions as well as a movie database sourced from a MySQL DB into Hadoop Hive, aggregate and join average ratings per movie, and load this data into an Oracle DB target.

We are distributing the work into 6 tasks:

1. Review the prepared ODI topology and models connecting to MySQL, Hadoop, and Oracle DB.

2. Create a mapping that uses Apache Sqoop to load movie data from MySQL to Hive tables

3. Create a mapping that joins data from customer activity with movie data and aggregates average movie ratings into a target Hive table.

4. Load the movie rating information from Hive to Oracle DB using Oracle Loader for Hadoop.

5. Create a package workflow that orchestrates the mappings of tasks 2, 3,and 4 in one end-to-end load.

6. Create Oracle GoldenGate processes that will detect inserts in the MySQL movie database and add them to the Hive movie table in realtime.

Overview

Time to Complete

Perform all 6 tasks – 60 Minutes

Prerequisites

Before you begin this tutorial, you should

Have a general understanding of RDBMS and Hadoop concepts.

Have a general understanding of ETL concepts.

Task 2: Sqoop Map

Task 4: OLH Map

Hive ext. table Activity

Hive movie

MySQL Movie

Oracle MOVIE_RATING

Hive movie_rating

Task 6: OGG Load

Task 1: Topology and Models

Task 5: ODI Package

Task 3: Hive Map Avg. Movie Ratings

HDFS file Activity

Flume Log-stream

Logs



Task 0: Preparation Steps In these steps you will clean and setup the environment for this exercise

1. Double-click Start/Stop Services on the desktop

2. In the Start/Stop Services window, move with arrow keys to ORCL Oracle Database 12c and select it. Please verify that the following services are also started: Zookeeper, HDFS, Hive, YARN (need to scroll). Press OK.



Task 1: Review Topology and Model Setup The connectivity information has already been setup for this hand on lab. This information is setup within the Topology Manager of ODI. The next steps will walk you through how to review this information.

1. Start ODI Studio: On the toolbar single-click (No double-click!) the ODI Studio icon

2. Go to the Topology Navigator and press Connect to Repository…

3. In the ODI Login dialog press OK.

4. Within the Physical Architecture accordion on the left, expand the Technologies folder

Note: You might see more technologies than shown on this screenshot. This has no effect on this tutorial and is controlled with the setting “Hide Unused Technologies” in the Topology menu.



5. For this HOL the connectivity information has already been setup. Connectivity

information is setup for Hive, MySQL and Oracle sources and targets. Please expand these technologies to see the configured dataservers.

Info: A technology is a type of datasource that can be used by ODI as source, target, or other connection. A data server is an individual server of a given technology, for example a database server. A data server can have multiple schemas. ODI uses a concept of logical and physical schemas to allow execution of the same mapping on different environments, for example on development, QA, and production environments.

6. Double-click on the Hive data server ( ) to review settings



7. Click on the JDBC tab on the left to view Hive connection information.

8. Switch to the Designer navigator and open the Models accordion. Expand all models.



Info: A model is a set of metadata definitions regarding a source such as a database schema or a set of files. A model can contain multiple data stores, which follow the relational concept of columns and rows and can be database tables, structured files, or XML elements within an XML document .



Task 2: Load Hive Tables using Sqoop

In this task we use Apache Sqoop to load data from an external DB into Hive tables. Sqoop starts parallel Map-Reduce processes in Hadoop to load chunks of the DB data with high performance. ODI can generate Sqoop code transparently from a Mapping by selecting the correct Knowledge module.

1. The first mapping to be created will load the MySQL Movie table into the Hive movie

table To create a new mapping, open the Project accordion within the Designer navigator:

Task 2:

Sqoop Map

Task 4: OLH Map


Hive movie

MySQL Movie

Oracle MOVIE_RATING

Hive movie_rating

Task 6: OGG Load


Task 5: ODI Package


HDFS file Activity

Flume Log-stream

Logs



2. Expand the Big Data Hands-On Lab > First Folder folder

3. Right click on Mappings and click New Mapping

Info: A mapping is a data flow to move and transform data from sources into targets. It contains declarative and graphical rules about data joining and transformation.

9. In the New Mapping dialog change the name to A - Sqoop Movie Load and press OK.



4. For this mapping we will load the table MOVIE from model MySQL to the table movie within the model HiveMovie. To view the models open the Models accordion

5. Drag the datastore MOVIE from model MySQL as a source and the datastore movie from Model HiveMovie as a target onto the mapping diagram panel.



6. Drag from the output port of the source MOVIE to the input port of the target movie.

7. Click OK on the Attribute Matching dialog. ODI will map all same-name fields from

source to target.

8. The logical flow has now been setup. To set physical implementation click on the Physical tab of the editor.



9. The physical tab shows the actual systems involved in the transformation, in this case

the MySQL source and the Hive target. In the physical tab users can choose the Load Knowledge Module (LKM) that controls data movement between systems as well as the Integration Knowledge Module (IKM) that controls transformation of data. Select the access point MOVIE_AP to select an LKM.

Note: The KMs that will be used have already been imported into the project.

Info: A knowledge module (KM) is a template that represents best practices to perform an action in an interface, such as loading from/to a certain technology (Load knowledge module or LKM), integrating data into the target (Integration Knowledge Module or IKM), checking data constraints (Check Knowledge Module or CKM), and others. Knowledge modules can be customized by the user.



10. Go to the Properties Editor underneath the Mapping editor. There is a section Loading Knowledge Module; you might have to scroll down to see it. Open this section and pick the LKM SQL Multi-Connect.GLOBAL. This LKM allows the IKM to perform loading activities. Note: If the Property Editor is not visible in the UI, go to the menu Window > Properties to open it. Depending on the available size of the Property Editor, the sections within the editor (such as “General”) might be shown as titles or tabs on the left.

11. Select the target datastore MOVIE.



12. In the Property Editor open section Integration Knowledge Module and pick IKM SQL to Hive-HBase-File (SQOOP).GLOBAL. Note: If this IKM is not visible in the list, make sure that you performed the previous tutorial step and chose the LKM SQL Multi-Connect.

13. Review the list of IKM Options for this KM. These options are used to configure and

tune the Sqoop process to load data. Change the option TRUNCATE to true.



14. The mapping is now complete. Press the Run button on the taskbar above the mapping editor. When asked to save your changes, press Yes.

15. Click OK for the run dialog. We will use all defaults and run this mapping on the local agent that is embedded in the ODI Studio UI. After a moment a Session started dialog will appear, press OK there as well.

16. To review execution go to the Operator navigator and expand the All Executions node to see the current execution. The execution might not have finished, then it will show

the icon for an ongoing task. You can refresh the view by pressing to refresh

once or to refresh automatically every 5 seconds.



17. Once the load is complete, the warning icon will be displayed. A warning icon is ok for this run and still means the load was successful. You can expand the Execution tree to see the individual tasks of the execution.

18. Go to Designer navigator and Models and right-click HiveMovie.movie. Select View Data from the menu to see the loaded rows.



19. A Data editor appears with all rows of the movie table in Hive.



Task 3: Transforming Data within Hive In this task we will design a transformation in an ODI mapping that will be executed in Hive. Please note that with ODI you can create logical mappings declaratively without considering any implementation details; those can be added later in the physical design.

For this mapping we will use two Hive source tables movie and movieapp_log_avro as sources and the Hive table movie_rating as target.

1. To create a new mapping, open the Project accordion within the Designer navigator,

expand the Big Data Hands-On Lab > First Folder folder, and right-click on Mappings and click New Mapping

2. In the New Mapping dialog change the name to B - Hive Calc Ratings and press OK.

Task 2: Sqoop Map

Task 4: OLH Map


Hive movie

MySQL Movie

Oracle MOVIE_RATING

Hive movie_rating

Task 6: OGG Load


Task 5: ODI Package

Task 3: Hive Map Avg. Movie

Ratings

HDFS file Activity

Flume Log-stream

Logs



3. Open the Models accordion and expand the model HiveMovie. Drag the datastores

movie and movieapp_log_avro as sources and movie_rating as target into the new mapping.

4. First we would like to filter the movie activities to only include rating activities (ID 1). For this drag a Filter from the Component Palette behind the movieapp_log_avro source.

5. Drag the attribute activity from movieapp_log_avro onto the FILTER component. This will connect the components and use the attribute activity in the filter condition.



6. Select the FILTER component and go to the Property Editor. Expand the section Condition and complete the condition to movieapp_log_avro.activity = 1



7. We now want to aggregate all activities based on the movie watched and calculate an average rating. Drag an Aggregate component from the palette onto the mapping.

8. Drag and drop the attributes movieid and rating from movieapp_log_avro directly onto AGGREGATE in order to map them. They are automatically routed through the filter.



9. Select the attribute AGGREGATE.rating and go to the Property Editor. Expand the section Target and complete the expression to AVG (movieapp_log_avro.rating).

Note: The Expression Editor ( icon right of Expression field) can be used to edit expressions and provides lists of available functions.

10. Now we would like to join the aggregated ratings with the movie table to obtain enriched movie information. Drag a Join component from the Component Palette to the mapping.



11. Drop the attributes movie.movie_id and AGGREGATE.movieid onto the JOIN component. These two attributes will be used to create an equijoin condition. Note: The join condition can also be changed in the Property Editor

12. Highlight the JOIN component and go to the property editor. Expand the Condition section and check the property “Generate ANSI Syntax”



13. Drag from the output port of JOIN to the input port of the target movie_rating.

14. Click OK on the Attribute Matching dialog. ODI will map all same-name fields from source to target.

15. Drag the remaining unmapped attribute AGGREGATE.rating over to movie_rating.avg_rating.



16. The logical flow has now been setup. Compare the diagram below with your actual mapping to spot any differences. To set physical implementation click on the Physical tab of the editor.

17. The physical tab shows that in this mapping everything is performed in the same system, the Hive server. Because of this no LKM is necessary. Select the target MOVIE_RATING to select an IKM.



18. Go to the Property Editor and expand the section Integration Knowledge Module. The correct IKM Hive Control Append.GLOBAL has already been selected by default, no change is necessary. In the IKM options change TRUNCATE to True, leave all other options to default.


20. Click OK for the run dialog. After a moment a Session started dialog will appear, press OK there as well.

21. To review execution go to the Operator navigator and expand the All Executions node to see the current execution.



22. Once the load is complete, expand the Execution tree to see the individual tasks of the execution. Double-click on Task 50 – Insert (new) rows to see details of the execution

23. In the Session Task Editor that opens click on the Code tab on the left. The generated SQL code will be shown. The code is generated from the mapping logic and contains a WHERE condition, JOIN and GROUP BY statement that is directly related to the mapping components.



24. Go to Designer navigator and Models accordion and right-click HiveMovie.movie_rating. Select View Data from the menu to see the loaded rows.

25. A data view editor appears with all rows of the movie_rating table in Hive.



Task 4: Load Oracle from Hive Tables using Oracle Loader for Hadoop In this task we load the results of the prior Hive transformation from the resulting Hive table into the Oracle DB data warehouse. We are using the Oracle Loader for Hadoop (OLH) build data loader which uses mechanisms specifically optimized for Oracle DB.

1. To create a new mapping, open the Project accordion within the Designer navigator, expand the Big Data Hands-On Lab > First Folder folder, and right-click on Mappings and click New Mapping

Task 2: Sqoop Map

Task 4:

OLH Map Hive ext. table

Activity

Hive movie

MySQL Movie

Oracle MOVIE_RATING

Hive movie_rating

Task 6: OGG Load


Task 5: ODI Package


HDFS file Activity

Flume Log-stream

Logs



2. In the New Mapping dialog change the name to C - OLH Load Oracle and press OK.

3. Open the Models accordion and expand the model HiveMovie. Drag the datastore movie_rating as source into the new mapping. Then open model OracleMovie and drag in the datastore MOVIE_RATING_ODI as a target.

4. Drag from the output port of the source movie_rating to the input port of the target MOVIE_RATING_ODI.



5. Click OK on the Attribute Matching dialog. ODI will map all same-name fields from source to target.

6. The logical flow has now been setup. To set physical implementation click on the Physical tab of the editor.



7. Select the access point MOVIE_RATING_AP (only MOVIE_RA is visible) to select an LKM. Go to the Property Editor and choose LKM SQL Multi-Connect.GLOBAL because the IKM will perform the load.

8. Select the target datastore MOVIE_RATING_ODI, then go to the Property Editor to select an IKM. Choose IKM File-Hive to Oracle (OLH-OSCH).GLOBAL. Note: If this IKM is not visible in the list, make sure that you performed the previous tutorial step and chose the LKM SQL Multi-Connect.



9. Review the list of IKM options for this KM. These options are used to configure and tune the OLH or OSCH process to load data. We will use the default setting of OLH through JDBC. Change the option TRUNCATE to true.


11. Click OK for the run dialog. We will use all defaults and run this mapping on the local agent that is embedded in the ODI Studio UI. After a moment a Session started dialog will appear, press OK there as well.



12. To review execution go to the Operator navigator and expand the All Executions node to see the current execution. Wait until the execution is finished, check by refreshing the view.

13. Go to Designer navigator and Models and right-click OracleMovie.MOVIE_RATING_ODI. Select View Data from the menu to see the loaded rows.



14. A data view editor appears with all rows of the table MOVIE_RATING_ODI in Oracle.



Task 5: Creating a new ODI Package to execute end-to-end load. Now that the mappings have been created we can create a package within ODI that will execute all of the interfaces in order.

1. To create a new package, open the Designer navigator and Project accordion on the

Big Data Hands-On Lab / First Folder, then right-click on Packages and select New Package.

Info: A package is a task flow to orchestrate execution of multiple mappings and define additional logic, such as conditional execution and actions such as sending emails, calling web services, uploads/downloads, file manipulation, event handling, and others.

Task 2: Sqoop Map

Task 4: OLH Map


Hive movie

MySQL Movie

Oracle MOVIE_RATING

Hive movie_rating

Task 6: OGG Load


Task 5:

ODI Package


HDFS file Activity

Flume Log-stream

Logs



2. Name the package Big Data Load and press OK.

3. Click the Diagram tab Drag and Drop the interfaces from the left onto the diagram panel, starting with the mapping A - Sqoop Movie Load.

Notice the green arrow on this mapping which means it is the first step.

4. Drag the mappings B – Hive Calc Ratings and C – OLH Load Oracle onto the panel

5. Click the OK arrow toolbar button to select the order of precedence.



6. Drag and drop from the A - Sqoop Movie Load to the B – Hive Calc Ratings to set the link. Then drag and drop from B – Hive Calc Ratings to C – OLH Load Oracle.

Note: If you need to rearrange steps, switch back to the select mode ( )

7. The package is now setup and can be executed. To execute the interface click the

Execute ( ) button in the toolbar. When prompted to save click Yes.

8. Click OK in the Run dialog. After a moment a Session started dialog will appear, press OK there as well.

9. To review execution, go to the Operator navigator and open the latest session execution. The 3 steps are separately shown and contain the same tasks as the mapping executions in the prior tutorials.



Task 6: Replicating new records to Hive using Oracle GoldenGate Oracle GoldenGate allows the capture of completed transactions from a source database, and the replication of these changes to a target system. In this tutorial we will replicate inserts into the MOVIE table in MySQL to the respective movie table in Hive. Oracle GoldenGate provides this capability through the GoldenGate Adapters and implemented examples for Hive, HDFS, and HBase.

The GoldenGate processes in detail are as following:

Hive table movie

Capture Extract EMOV

Pump Extract PMOV

Java Adapter

HDFS file ogg_movie

Trail File TM

EMOV.prm PMOV.prm PMOV.properties myhivehandler.jar

MySQL table MOVIE

Task 2: Sqoop Map

Task 4: OLH Map


Hive movie

MySQL Movie

Oracle MOVIE_RATING

Hive movie_rating

Task 6:

OGG Load


Task 5: ODI Package


HDFS file Activity

Flume Log-stream

Logs



1. Start a terminal window from the menu bar by single-clicking on the Terminal icon

2. In the terminal window, execute the commands: cd /u01/ogg ggsci

3. Start the GoldenGate manager process by executing start mgr

4. Add and start the GoldenGate extract processes by executing obey dirprm/bigdata.oby Note: Ignore any errors shown from the stop and delete commands at the beginning.

5. See the status of the newly added processes by executing info all



6. Start a second terminal window from the menu bar and enter the command: mysql --user=root --password=welcome1 odidemo

7. Insert a new row into the MySQL table movie by executing the following command: insert into MOVIE (MOVIE_ID,TITLE,YEAR,BUDGET,GROSS,PLOT_SUMMARY) values (1, 'Sharknado 2', 2014, 500000, 20000000, 'Flying sharks attack city'); Note: Alternatively you can execute the following command: source ~/movie/moviework/ogg/mysql_insert_movie.sql;

8. Go to the ODI Studio and open the Designer navigator and Models accordion. Right-click on datastore HiveMovie.movie and select View Data.



9. In the View Data window choose the Move to last row toolbar button ( ). The inserted row with movie_id 1 should be in the last row. You might have to scroll all the way down to see it. Refresh the screen if you don’t see the entry.

Summary You have now successfully completed the Hands on Lab, and have successfully performed an end-to-end load through a Hadoop Data Reservoir using Oracle Data Integrator and Oracle GoldenGate. The strength of this products is to provide an easy-to-use approach to developing performant data integration flows that utilize the strength of the underlying environments without adding proprietary transformation engines. This is especially relevant in the age of Big Data.

odi bigdata hol

Documents

lab oracle data integrator

data reservoir

load oracle

big data implementation

big data abstract

data integration costs

customer activity data

oracle loader