integrating sap businessobjects with hadoop · install hadoop in all machines using hadoop rpm from...

Integrating SAP

BusinessObjects with

Hadoop Using a multi-node Hadoop Cluster

May 17, 2013

Integrating SAP BusinessObjects with Hadoop

Visual BI Solutions Inc. http://www.visualbis.com

1

SAP BO – HADOOP INTEGRATION

Contents 1. Installing a Single Node Hadoop Server .................................................................................................... 2

2. Configuring a Multi-Node Hadoop Cluster ............................................................................................... 6

3. Configuring Hive Data Warehouse .......................................................................................................... 10

4. Integrating SAP BusinessObjects with Hadoop ....................................................................................... 12



2

1. Installing a Single Node Hadoop Server Installing a single node Hadoop server involves the following steps

1. Install a stable Linux OS(Preferably CENT OS) with ssh, rsync and recent jdk from Oracle.

2. Download Hadoop .rpm(Equivalent to windows .exe) from the apache website.

3. Install the downloaded file with rpm or yum package manager.

4. Apache provides generic configuration options (mentioned below) that can be deployed by

executing the scripts packed with the .rpm file.

5. Execute the configuration process by running the hadoop-setup-conf.sh script with root

privilege. Select the “default” option for config, log, pid, NameNode, DataNode, job-tracker

and task-tracker directories and provide the system name for NameNode and DataNode

hosts.

6. To install single node server .conf files, run hadoop-setup-single-node.sh script with root

privilege and select the default option for all categories.

7. Setup the single node and start Hadoop services by running hadoop-setup-hdfs.sh script

with root privilege. The .rpm file used comes with some basic examples like wordcount, pi,

teragen etc. This can be used to test if all the services are working.

8. Hadoop requires six different services to run for perfect functioning.

(a) Hadoop NameNode

(b) Hadoop DataNode

(c) Hadoop JobTracker

(d) Hadoop TaskTracker

(e) Hadoop Secondary NameNode

(f) Hadoop History Server

9. If all services are running then the single node cluster is ready for operation.

10. Hadoop services status can be checked with the following linux commands.

$root : service hadoop-namenode status (These services are located in /etc/init.d dir)

11. Similarly to start or stop services service Linux command can be used.

$root : service hadoop-datanode start

$root : service hadoop-jobtracker stop.

For more Detailed Info on Hadoop Services: http://www.cloudera.com , http://www.wikipedia.org

For more Installation Options: http://hadoop.apache.org

http://www.cloudera.com/

http://www.wikipedia.org/

http://hadoop.apache.org/



3

Hadoop Running Services can be monitored through the web interfaces.

NameNode

DataNode



4

JobTracker

TaskTracker



5

Hadoop Basic Commands



6

2. Configuring a Multi-Node Hadoop Cluster Single node Hadoop server can be expanded to a Hadoop cluster. In cluster mode the Hadoop

NameNode will have many live DataNode and many TaskTracker.

Steps involved in the installation of multi-node Hadoop cluster.

1. Install stable Linux (preferably CENT OS) in all machines (master and slaves).

2. Install Hadoop in all machines using Hadoop RPM from Apache.

3. Update /etc/hosts file in each machine, so that every single node in cluster knows the IP

address of all other nodes.

4. In Master node /etc/hadoop directory update the master and slaves file with the domain

names of master node and slaves nodes respectively.

5. Generate SSH key pair for the master node and place the public key in all the slave nodes.

This will enable password-less ssh login from master to all slaves

6. Run the script hadoop-setup-conf.sh in all nodes. In master let all nodes point to the master

Url. In slaves update NameNode and JobTracker urls to point to master node, other urls

point to the localhost.

7. Open firewall ports for communication in both master and slave nodes.

8. In master run the command start-dfs.sh, this will start NameNode (In master) and

DataNodes (Both Master and Slaves)

9. In master run the command start-mapred.sh, this will start JobTracker (In master) and

TaskTracker (Both Master and Slaves).

10. Now the NameNode and JobTracker will have more active nodes compared to single node

server.

For More configuration options, refer:

http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-multi-node-cluster/ , http://hadoop.apache.org/docs/stable/cluster_setup.html

http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-multi-node-cluster/

http://hadoop.apache.org/docs/stable/cluster_setup.html



7

Some Screenshots of the Multi-node Hadoop Cluster at work

NameNode

DataNode



8

List of DataNodes

List of TaskTrackers



9

JobTracker Job Status

TaskTracker Task Status



10

3. Configuring Hive Data Warehouse Hive Data Warehousing environment runs on top of Hadoop. It performs ETL at run time and makes data

available for reporting. Hive has to be installed initially and then hosted as a service using Hive-Server

option.

Steps Involved in Configuring Hive

1. Install and Configure Hadoop on all machines and make sure all the services are running.

2. Download Hive from Apache website.

3. Now install MySQL for HIVE metadata storage or just configure the default Derby Database.

Any RDBMS system can be used for Hive metadata. This can be done by placing the correct JDBC

connector in the hive lib directory. For detailed info on connectivity follow this link

https://ccp.cloudera.com/display/CDHDOC/Hive+Installation#HiveInstallation-HiveConfiguration

4. Copy the needed .jar files to the required directories as per the instructions in the above link.

5. Now go to /bin directory in Hive package folder and execute hive command.

6. Queries can now be executed in the shell.

7. Hive Web Interface can be started by executing hive command as -> hive --service hwi.

8. Hive Thrift Server can be started by executing hive command as -> hive --service hiveserver.

9. Open the Hive server port (default 10000) in firewall for connection through JDBC.

10. If security is needed for hive server then configure Kerberos network authentication and bind it

to hive server. For more information, refer http://www.cloudera.com.

For more config options: http://hive.apache.org

For Hive – JDBC Connection:https://cwiki.apache.org/Hive/hiveclient.html#HiveClient-JDBC

https://ccp.cloudera.com/display/CDHDOC/Hive+Installation#HiveInstallation-HiveConfiguration

http://www.cloudera.com/

http://hive.apache.org/

https://cwiki.apache.org/Hive/hiveclient.html#HiveClient-JDBC



11

Screenshots of the Hive Server

Hive Web Interface

Hive Command Line



12

4. Integrating SAP BusinessObjects with Hadoop Universe Design Using IDT

Steps Involved in Configuring SAP BusinessObjects for use with Hadoop

1. Configure SAP BusinessObjects with Hive JDBC drivers, if the server is of a version lower than BO 4.0

with SP5. In BO Server 4 SP5, SAP Provides Hive connectivity by default.

In order to configure JDBC drivers in earlier versions refer to page 77 of this document

http://help.sap.com/businessobject/product_guides/boexir4/en/xi4sp4_data_acs_en.pdf.

2. Create BO universe.

1. Open SAP IDT and create a user session with login credentials.

2. Under sessions, open connections folder. Create a new Relational connection.

http://help.sap.com/businessobject/product_guides/boexir4/en/xi4sp4_data_acs_en.pdf



13

3. Under Driver selection menu, select Apache -> Hadoop Hive -> JDBC Drivers.

4. In the next tab enter The Database URL:port, Username & Password and Click Test

Connectivity. If it is successful, save the connection by clicking finish.

5. Now create a new project in IDT and create a shortcut for the above connection in the

project.



14

6. Now create a new Data Foundation layer and bind the connection with the data

foundation layer.

7. This connection will be used by Data Foundation layer to import data from Hive Server.

8. From the Data Foundation layer, drag and drop the tables which are needed by the

universe. Create views in the Data foundation if required.

9. Create a new Business layer and bind the data foundation layer with the business layer.

10. Attributes can be set as measures with suitable aggregators in Data Foundation Layer.

11. Right click the business layer and select Publish -> Publish to Repository. Use integrity

before publishing to check dependencies

12. Now log on to CMC and Set universe access policy for users.

13. Now Open WEBI Launchpad or Rich Client and select Universe as Source. The Published

universe must be listed.

For Detailed Info Refer http://scn.sap.com, http://help.sap.com

http://scn.sap.com/

http://help.sap.com/



15

Some Screenshots of Universe Design

Data Foundation Layer

Business Layer



16

Convert To Measure

Publish Universe



17

3. Create reports

Published universe can be accessed through WEBI, Dashboards or Crystal Reports. Select Hive universe

as Data Source and build queries using the Query Panel. Universe will convert user queries to HiveQL

Statements and return the results for the report.

Some Screenshots of Text Processing Reports

WEBI Mobile Report on Word Count

integrating sap businessobjects with hadoop · install hadoop in all machines using hadoop rpm from...

Documents