using oracle big data manager...this document describes how to use oracle big data manager to move...

Oracle® CloudUsing Oracle Big Data Manager

Release 19.3.1E91848-20July 2019

Oracle Cloud Using Oracle Big Data Manager, Release 19.3.1

E91848-20

Copyright © 2018, 2019, Oracle and/or its affiliates. All rights reserved.

This software and related documentation are provided under a license agreement containing restrictions onuse and disclosure and are protected by intellectual property laws. Except as expressly permitted in your li‐cense agreement or allowed by law, you may not use, copy, reproduce, translate, broadcast, modify, license,transmit, distribute, exhibit, perform, publish, or display any part, in any form, or by any means. Reverse engi‐neering, disassembly, or decompilation of this software, unless required by law for interoperability, is prohibit‐ed.

The information contained herein is subject to change without notice and is not warranted to be error-free. Ifyou find any errors, please report them to us in writing.

If this is software or related documentation that is delivered to the U.S. Government or anyone licensing it onbehalf of the U.S. Government, then the following notice is applicable:

U.S. GOVERNMENT END USERS: Oracle programs, including any operating system, integrated software,any programs installed on the hardware, and/or documentation, delivered to U.S. Government end users are"commercial computer software" pursuant to the applicable Federal Acquisition Regulation and agency-spe‐cific supplemental regulations. As such, use, duplication, disclosure, modification, and adaptation of the pro‐grams, including any operating system, integrated software, any programs installed on the hardware, and/ordocumentation, shall be subject to license terms and license restrictions applicable to the programs. No otherrights are granted to the U.S. Government.

This software or hardware is developed for general use in a variety of information management applications.It is not developed or intended for use in any inherently dangerous applications, including applications thatmay create a risk of personal injury. If you use this software or hardware in dangerous applications, then youshall be responsible to take all appropriate fail-safe, backup, redundancy, and other measures to ensure itssafe use. Oracle Corporation and its affiliates disclaim any liability for any damages caused by use of thissoftware or hardware in dangerous applications.

Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks oftheir respective owners.

Intel and Intel Xeon are trademarks or registered trademarks of Intel Corporation. All SPARC trademarks areused under license and are trademarks or registered trademarks of SPARC International, Inc. AMD, Opteron,the AMD logo, and the AMD Opteron logo are trademarks or registered trademarks of Advanced Micro Devi‐ces. UNIX is a registered trademark of The Open Group.

This software or hardware and documentation may provide access to or information about content, products,and services from third parties. Oracle Corporation and its affiliates are not responsible for and expressly dis‐claim all warranties of any kind with respect to third-party content, products, and services unless otherwiseset forth in an applicable agreement between you and Oracle. Oracle Corporation and its affiliates will not beresponsible for any loss, costs, or damages incurred due to your access to or use of third-party content, prod‐ucts, or services, except as set forth in an applicable agreement between you and Oracle.

Contents

Preface

Audience vi

Documentation Accessibility vi

Related Resources vi

Conventions vii

1 Overview of Oracle Big Data Manager

Storage Providers Supported by Oracle Big Data Manager 1-1

2 Get Started with Oracle Big Data Manager

Open a Port for Oracle Big Data Manager 2-1

Open the Oracle Big Data Manager Console 2-2

Navigate the Oracle Big Data Manager Console 2-3

Manage Oracle Big Data Manager Users, Roles, and Access 2-7

Add Oracle Big Data Manager Users 2-8

Edit User Details and Manage Roles 2-9

Control Access to Specific Providers 2-9

Register Storage Providers with Oracle Big Data Manager 2-10

View Oracle Big Data Manager Settings 2-12

3 View Data in Oracle Big Data Manager

Display and Navigate Storage Providers 3-1

Preview Content from GitHub 3-2

View Data Properties 3-2

4 Transfer and Compare Data

Copy Data (Including Drag and Drop) 4-1

Copy Data (Including from Multiple Sources) 4-3

Copy Data Via HTTP 4-5

iii

Upload Files from a Local Computer to HDFS 4-7

Move Data in HDFS 4-8

Import Data into Apache Hive 4-8

Compare Data Sets 4-9

5 Manage Jobs in Oracle Big Data Manager

View Execution History of All Jobs 5-1

View Summary Information About a Job 5-1

View the Arguments for a Job 5-2

View Comparison Results for a Job 5-2

View Job Output 5-2

6 Manage Pipelines in Oracle Big Data Manager

Create a Pipeline 6-1

Configure Pipeline Properties 6-4

Enable a Pipeline 6-6

Run a Pipeline 6-7

View Pipeline Details 6-7

Edit a Pipeline 6-8

Copy a Pipeline 6-8

Delete a Pipeline 6-8

7 Configure the Oracle Big Data Cloud Service Firewall

Create Firewall Rules 7-1

Edit Firewall Rules 7-2

Refresh Firewall Rules 7-3

Remove Firewall Rules 7-3

8 Monitor

Monitor the Status of Services 8-1

Monitor the Status of Clusters 8-1

Monitor YARN Queues in Clusters 8-1

9 Analyze Data Interactively With Notes

Work with Notes 9-1

Use the Commands on the Note Toolbar 9-2

Use the Commands on the Paragraph Toolbar 9-3

iv

Import a Note 9-4

Export a Note 9-5

Create a Note 9-5

Rename a Note 9-6

Rename a Note Without Displaying the Note 9-6

Renaming a Note That’s Currently Displayed 9-6

Clear the Output from Paragraphs in a Note 9-6

Delete a Note 9-7

View and Edit a Note 9-7

Run a Note 9-7

Organize Notes 9-8

Monitor and Troubleshoot Notes 9-8

Manage Notebook Interpreters Settings 9-8

10

Manage Data and Copy Jobs With the Oracle Big Data ManagerSDKs

11

Use the Oracle Big Data Manager bdm-cli Utility

Install the bdm-cli Utility 11-1

Usage 11-1

Options 11-3

Subcommands 11-4

bdm-cli abort_job 11-4

bdm-cli copy 11-5

bdm-cli create_job 11-6

bdm-cli create_job_template 11-7

bdm-cli get_data_source 11-8

bdm-cli get_job 11-8

bdm-cli get_job_log 11-9

bdm-cli list_all_jobs 11-9

bdm-cli list_template_executions 11-10

bdm-cli ls 11-10

A Keyboard Shortcuts for Oracle Big Data Manager

v

Preface

This document describes how to use Oracle Big Data Manager to move data betweenHadoop Distributed File System (HDFS) and other storage providers. It also describeshow to analyze data by using Apache Zeppelin notes.

Topics:

• Audience

• Documentation Accessibility

• Related Resources

• Conventions

AudienceThis document is intended for administrators who want to manage access to OracleBig Data Manager and for users who want to transfer and analyze data.

Documentation AccessibilityFor information about Oracle's commitment to accessibility, visit the Oracle Accessibili‐ty Program website at http://www.oracle.com/pls/topic/lookup?ctx=acc&id=do-cacc.

Access to Oracle Support

Oracle customers that have purchased support have access to electronic supportthrough My Oracle Support. For information, visit http://www.oracle.com/pls/topic/lookup?ctx=acc&id=info or visit http://www.oracle.com/pls/topic/lookup?ctx=acc&id=trs if you are hearing impaired.

Related ResourcesFor related information, see these Oracle resources:

• Get Started with Oracle Big Data Cloud Service in Using Oracle Big Data CloudService

• Oracle Public Cloud

http://cloud.oracle.com

Preface

vi

http://www.oracle.com/pls/topic/lookup?ctx=acc&id=docacc

http://www.oracle.com/pls/topic/lookup?ctx=acc&id=docacc

http://www.oracle.com/pls/topic/lookup?ctx=acc&id=info

http://www.oracle.com/pls/topic/lookup?ctx=acc&id=info

http://www.oracle.com/pls/topic/lookup?ctx=acc&id=trs

http://www.oracle.com/pls/topic/lookup?ctx=acc&id=trs

http://cloud.oracle.com

ConventionsThe following text conventions are used in this document:

Convention Meaning

boldface Boldface type indicates graphical user interface elements associatedwith an action, or terms defined in text or the glossary.

italic Italic type indicates book titles, emphasis, or placeholder variables forwhich you supply particular values.

monospace Monospace type indicates commands within a paragraph, URLs, codein examples, text that appears on the screen, or text that you enter.

Preface

vii

1Overview of Oracle Big Data Manager

Oracle Big Data Manager is a browser-based tool that gives you broad capabilities tomanage data across your enterprise. Oracle Big Data Manager is installed and config‐ured by default when you create an Oracle Big Data Cloud Service cluster.

You can use Oracle Big Data Manager to connect to and interconnect a range of sup‐ported Oracle and non-Oracle data storage providers. After you register storage pro‐viders with Oracle Big Data Manager, you can preview data and (depending upon theaccessibility of each storage provider) compare, copy, and move data between them.With a Hadoop storage provider, you can also move data internally within HDFS, dodata import/export and analytics with Apache Zeppelin, and import data into ApacheHive tables. You can also upload data from your local computer to HDFS.

Oracle Big Data Manager provides several methods for data transfer. You can use theconsole, which includes drag and drop data selection. Python and Java SDKs areavailable for building data management scripts and applications. There's also a CLI forperforming copy operations and for creating and administering data managementjobs.

The Oracle Big Data Manager administrator can create other user accounts and selec‐tively grant access privileges to storage providers.

By default, Oracle Big Data Manager is started on port 8888 on the third node of acluster, the same node as Cloudera Manager.

Storage Providers Supported by Oracle Big Data ManagerYou can use Oracle Big Data Manager to connect to and interconnect a range of Ora‐cle and non-Oracle data storage providers.

The storage providers supported by Oracle Big Data Manager are:

• Oracle Cloud Infrastructure Object Storage Classic

• Oracle Cloud Infrastructure Object Storage

• Amazon Simple Storage Service (S3)

• Hadoop Distributed File System (HDFS)

• GitHub

• Oracle Database

• Oracle MySQL

Storage providers must be registered with a cluster before they can be accessedthrough Oracle Big Data Manager.

• You can register all providers (except Oracle Cloud Infrastructure Object Storage)through the Oracle Big Data Manager console. See Register Storage Providerswith Oracle Big Data Manager.

1-1

• You must register Oracle Cloud Infrastructure Object Storage at the commandline. See Use bda-oss-admin to Manage Storage Resources in Using Oracle BigData Cloud Service.

Chapter 1Storage Providers Supported by Oracle Big Data Manager

1-2

2Get Started with Oracle Big Data Manager

Oracle Big Data Manager is installed and configured by default when you create anOracle Big Data Cloud Service cluster. Before using it, you must configure a few moredetails, as described in the following topics.

Note:

You must have administrative privileges to perform some of these tasks.User bigdatamgr is created by default when a cluster is created, and thatuser has the appropriate administrative privileges.

Topics:

• Open a Port for Oracle Big Data Manager

• Open the Oracle Big Data Manager Console

• Navigate the Oracle Big Data Manager Console

• Manage Oracle Big Data Manager Users, Roles, and Access

• Register Storage Providers with Oracle Big Data Manager

• View Oracle Big Data Manager Settings

Open a Port for Oracle Big Data ManagerOracle Big Data Manager is included with Oracle Big Data Cloud Service, and it’s in‐stalled, configured, and started when you create a cluster. However, you must open aport to make it available for use.

To open the port:

1. Open a command shell and use SSH to connect to the first node of the cluster.

2. Run this command:

sudo /opt/oracle/bda/bin/bdacli firewall add src-ipaddr[/mask] third-node-name 8888 tcp --comment comment

where:

• src-ipaddr[/mask] is the IP address where the connection will come, in theformat AA.BB.CC.DD in dot decimal notation. A whole network can be added byadding the prefix mask with the notation AA.BB.CC.DD/EE.

• third-node-name is the host name of the node that's running Oracle Big DataManager (the third node).

2-1

For example:

sudo /opt/oracle/bda/bin/bdacli firewall add 198.51.100 myhost 8888 tcp --comment "I can connect to BDM from host 198.51.100"

sudo /opt/oracle/bda/bin/bdacli firewall add 203.0.113/24 myhost.us2.ex-ample.com 8888 tcp --comment "I can connect to BDM from class C network 203.0.113"

Open the Oracle Big Data Manager ConsoleOracle Big Data Manager is integrated with Oracle Big Data Cloud Service and can beaccessed from the Oracle Big Data Cloud Service console or by using a URL.

Note:

Oracle Big Data Manager credentials are assigned separately from those forOracle Big Data Cloud Service. See Manage Oracle Big Data ManagerUsers, Roles, and Access.

To open the Oracle Big Data Manager console from Oracle Big Data Cloud Service:

1. Open your Oracle Big Data Cloud Service console.

2. On the Instances page, on the row for the service instance you want to use, click

Manage this instance and select Big Data Manager console.

3. Sign in to Oracle Big Data Manager with your Oracle Big Data Manager creden‐tials.

To open the Oracle Big Data Manager console by using a URL:

By default, Oracle Big Data Manager is started on port 8888 on the third node of anOracle Big Data Cloud Service cluster, the same node as Cloudera Manager. Use aURL with the following format to open the Oracle Big Data Manager console:

https://host_name.oraclecloud.com:port

where:

• host_name is the host name of the node that's running Oracle Big Data Manager(the third node by default).

• port is the port used for Oracle Big Data Manager (port 8888 by default).

For example, if the host name of the third node is node03, and the port is left at thedefault, the URL used to open the Oracle Big Data Manager console would be this:

https://node03.oraclecloud.com:8888

Chapter 2Open the Oracle Big Data Manager Console

2-2

Navigate the Oracle Big Data Manager ConsoleThe Oracle Big Data Manager console is displayed in a web browser and containsgraphical tools for transferring and analyzing data and managing data providers, andfor managing users and roles.

The console has the following main sections, which you can access by clicking thelinks in the tab bar at the top of the page.

Many Tutorials are also available that show how to use Oracle Big Data Manager.

Home

Home has two pages that provide different ways to quickly access tools and controlsfor viewing and working with data. The Quick Start page directs you to the pages formanaging data, registering storage providers, creating pipelines to automate datamovement, and analyzing data. It also provides a quick link for asking questions.

Use the Dashboard to build and access custom views of data, activity, status, and his‐tory.

Tutorial

Chapter 2Navigate the Oracle Big Data Manager Console

2-3

https://apex.oracle.com/pls/apex/f?p=44785:112:0::::P112_CONTENT_ID:22932

Data

Use the Data pages to compare, copy, and move files and containers between datasources, including HDFS, on your Oracle Big Data Cloud Service instance. You canalso upload files, extract the contents from a ZIP archive, import data to Apache Hive,and import and export Apache Zeppelin notes, among other tasks. The counters in thebanner show the number of registered storage providers of a given type in Oracle BigData Manager.

Tutorial


2-4


Jobs

Use the Jobs pages to review, manage, and create data transfer jobs.

Users with the user role will see just the jobs they've created.

Pipelines

Use the Pipelines pages to chain jobs into pipelines to define workflows.

Users with the user role will see just the pipelines they've created.

Tutorial


2-5


Monitoring

Use the Monitoring pages to monitor the performance of your cluster and the servicesrunning on it.

Administration

Use the Administration pages to manage users, roles, storage providers, and set‐tings.

Users with the admin role, such as the default bigdatamgr user created when a clusteris provisioned, have full administrative privileges and will see all of the tabs on the leftas shown below. Users with the user role have access to just the storage provider pa‐ges. These users can register new storage providers, and manage the storage provid‐ers they've added.


2-6

Notebook

Use the Notebook pages to process and analyze data by using Apache Zeppelinnotes.

Tutorial

When a note is open in the console, you can click Home in the page banner toreturn to the Notebook home page.

Manage Oracle Big Data Manager Users, Roles, and Ac‐cess

An administrator must create Oracle Big Data Manager users at the command line.Once they’ve been created, you can edit user details and manage access in the Ora‐cle Big Data Manager console.

Note:

By default, the bigdatamgr user is created and granted the administrator rolein Oracle Big Data Manager. This user should be used to grant roles andregister providers. The bigdatamgr user has the same password as theCloudera Manager administrator that was defined in the Create Instance wiz‐ard when creating the cluster.

Chapter 2Manage Oracle Big Data Manager Users, Roles, and Access

2-7


Add Oracle Big Data Manager UsersAn administrator must create Oracle Big Data Manager users at the command line.Once they’ve been created, you can edit user details and manage access in the Ora‐cle Big Data Manager console.

To add a user:

1. Open a command shell and use SSH to connect to a cluster node as the bigda-tamgr user (or another user with administration privileges).

2. Open a root shell:

su -

3. Export the new user’s password to a password file:

user_password > user_password_filechmod 600 user_password_file

where:

• user_password is the password for the new user.

• user_password_file is the password file for the new user. This file must havepermissions 600.

Note:

It is a safer practice to define the user password as an environment vari‐able and then pass that value to the command. When the value ispassed as the value of the environment variable, the actual value won'tbe visible in the bash history. In this case, use the following, instead ofthe commands listed above.

echo ${USER_PASSWORD}>${USER_PASSWORD_FILE}chmod 600 ${USER_PASSWORD_FILE}

where:

• USER_PASSWORD is the environment variable containing the value ofthe password for the new user. The name of the environment varia‐ble can be any valid environment variable name.

• USER_PASSWORD_FILE is the environment variable containing the val‐ue of the password file for the new user. The name of the environ‐ment variable can be any valid environment variable name. This filehas to have permissions 600.

4. Add the user and create a home directory for the user in the cluster’s HDFS filesystem:

/usr/bin/bdm-add-user--create-hdfs-home new_user user_password_file


2-8

where new_user is the new user name.

5. On the node where Oracle Big Data Manager runs, enter the following commandto restart Oracle Big Data Manager. This reloads the user configuration from thedatabase.

service bigdatamanager restart

Note:

By default, Oracle Big Data Manager is started on port 8888 on the thirdnode of a cluster, the same node as Cloudera Manager.

Edit User Details and Manage RolesA user with administrator privileges can edit user details and manage roles in the Ora‐cle Big Data Manager console.

Note:

User accounts can't be permanently deleted but they can be disabled. Userswith disabled accounts won't be able to sign in to Oracle Big Data Manager.

To access and modify user details and manage user roles:

1. Sign in to the Oracle Big Data Manager console as the bigdatamgr user, or as an‐other user with administrator privileges.

2. Click the Administration tab at the top of the page, and then click Users on theleft side of the page to show the list of users that have been added.

3. Click the menu in the row for the user, and then select Edit. To disable theuser account, select Disable account.

4. Edit user details as desired. Click inside the Roles field to see and select fromavailable roles.

The Roles page on the Administration tab lists the roles available for users and thepermissions associated with each. The permissions for each role are configured by de‐fault in Oracle Big Data Manager and cannot be changed.

Control Access to Specific ProvidersA user with administrator privileges can control access to storage providers.

Note:

When an Oracle Big Data Cloud Service cluster is created, the bigdatamgruser is created and granted the Oracle Big Data Manager Administrator role.


2-9

To control access to storage providers:

1. Sign in to the Oracle Big Data Manager console as the bigdatamgr user, or anoth‐er user with administrator privileges.

2. Click the Administration tab at the top of the page, and then click the Storageproviders tab on the left side of the page to show a list of registered storage pro‐viders.

3. Click the menu icon to the right of the storage provider you are providing accessto, and select Manage users.

4. Use the arrows to move users from the left panel to the right panel to create anaccess list of users who will be able to see that provider in the web application.This doesn’t give Write access to the storage. Users must have appropriate per‐missions to work with data in the provider.

Register Storage Providers with Oracle Big Data ManagerYou must register storage providers with Oracle Big Data Manager to be able to seeand use them in the console.

Note:

When an Oracle Big Data Cloud Service instance is created, an OracleCloud Infrastructure Object Storage Classic instance can be associated withit. When that’s done, that storage provider is automatically registered withOracle Big Data Manager.

To register a new provider:


2. Click the Administration tab at the top of the page, and then click the Storageproviders tab on the left side of the page to show a list of registered storage pro‐viders.

3. Click the Register a new storage provider button.

4. On the General page of the Register storage providers wizard, enter a nameand description for the storage provider, select the storage type, and then clickNext.

5. On the Storage Details page, provide details for accessing the storage provider.You can test connection details by clicking the Test access to storage button.

• Amazon S3

– Access Key: Enter the access key ID.

– Secret Key: Enter the secret key.

See About Access Keys in the Amazon S3 documentation.

• GitHub

Chapter 2Register Storage Providers with Oracle Big Data Manager

2-10

https://docs.aws.amazon.com/AmazonS3/latest/dev/MakingRequests.html#TypesofSecurityCredentials

– Access token: Enter the access token for the GitHub repository. The to‐ken should have a scope of repo or public_repo. See Creating a personalaccess token for the command line in the GitHub documentation.

• HDFSNo settings are required because HDFS is part of the cluster where Oracle BigData Manager is running and is registered automatically. Registration of re‐mote HDFS (HDFS running on a different cluster than where Oracle Big DataManager is running) is not supported.

• Oracle Cloud Infrastructure Object Storage

– Authentication type: Select the authentication type to be used for dataaccess. Choices are Credentials or Oracle Cloud Instance Principal.See Calling Services from an Instance in the Oracle Cloud Infrastructuredocumentation. If you use an instance principal, you'll need to specify justthe Storage URL and Compartment settings listed below.

– Storage URL: Enter the Oracle Cloud Infrastructure Object Storage URLor select one from the list.

– Tenant: Enter the Tenancy OCID.

– User: Enter the User OCID.See Where to Get the Tenancy's OCID and User's OCID in the OracleCloud Infrastructure documentation.

– Access to: The Compartment option is selected by default and is the on‐ly option that's currently available.

– Compartment: Enter the OCID of the compartment you want to register.The compartment OCID can be found in the Oracle Cloud Infrastructureconsole. Once this storage provider is registered, the compartment and itscontents (buckets) can be accessed from the Oracle Big Data Managerconsole.

– Public key fingerprint: Enter the fingerprint of the public key. See Re‐quired Keys and OCIDs in the Oracle Cloud Infrastructure documentation.

– Private key: From the drop-down list, choose how you want to provide theprivate key. Depending on your choice, either paste the private key valuein the text box, or upload the private key file. This is the private key inPEM format that corresponds to the public key whose fingerprint you en‐tered above. See Required Keys and OCIDs in the Oracle Cloud Infra‐structure documentation.

– Key passphrase: (Optional) If the private key is encrypted with a pass‐word, enter that password here so the private key file can be decryptedand used for authentication.

– Automatic proxy detection: Select this check box if the cluster is runningbehind a proxy server and you want to use automatic proxy detection.

• Oracle Cloud Infrastructure Object Storage Classic

– Storage URL: Enter the authentication URL of the Oracle Cloud Infra‐structure Object Storage Classic service instance. For example: https://storage.us2.oraclecloud.com/auth/v2.0/tokens

– Tenant: Enter the tenant name for your Oracle Cloud Infrastructure Clas‐sic service instance.

Chapter 2Register Storage Providers with Oracle Big Data Manager

2-11

https://help.github.com/en/articles/creating-a-personal-access-token-for-the-command-line

https://help.github.com/en/articles/creating-a-personal-access-token-for-the-command-line

https://docs.cloud.oracle.com/iaas/Content/Identity/Tasks/callingservicesfrominstances.htm

https://docs.cloud.oracle.com/iaas/Content/API/Concepts/apisigningkey.htm#Other

https://docs.cloud.oracle.com/iaas/Content/API/Concepts/apisigningkey.htm



– Username: Enter a user name that corresponds to a user account associ‐ated with the storage service.

– Password: Enter the password associated with the user name.

• Oracle Database

– Username: Enter a user name that corresponds to a user account for thedatabase. This is the user name that should be used to connect to the da‐tabase.


– JDBC URL: Enter the Oracle database connect string in the following for‐mat: jdbc:oracle:thin:@//host:port/service_name, where host andport are the database host and port and service_name is the databaseservice name. For example: jdbc:oracle:thin:@//myhost:1521/myser-vicename

• Oracle MySQL

– Username: Enter a user name that corresponds to a user account for thedatabase. This is the user name that should be used to connect to the da‐tabase.


– JDBC URL: Enter the MySQL database connect string in the following for‐mat: jdbc:mysql@host:port/[database_name], where host and port arethe database host and port and database_name is optional. For example:jdbc:mysql://myhost:3306 or jdbc:mysql://myhost:3306/mydatabase

6. On the Access page, use the arrows to specify which users can see this storagefrom within Oracle Big Data Manager. This setting only affects whether users cansee the storage. Whether they can access data on a given storage depends on thesettings of the service that provides the storage.

7. Review the details on the Confirmation page, and then click the Register button.

View Oracle Big Data Manager SettingsOracle Big Data Manager is installed and configured by default when you create anOracle Big Data Cloud Service cluster. Users with administrator privileges can managesome configuration settings if necessary (such as for troubleshooting and so on), butin general the settings should be left at the defaults.

To view and manage configuration settings:


2. Click the Administration tab at the top of the page, and then click the Settingstab on the left side of the page to show configuration settings.

3. Manage settings as desired. You can:

• Use the text box to filter settings

• Double-click values to edit settings

• Click the Restore icon to restore the default value of a single setting

• Click the Restore all button to restore the default values of all settings

Chapter 2View Oracle Big Data Manager Settings

2-12

3View Data in Oracle Big Data Manager

You can view data sources, data, and data properties in the Oracle Big Data Managerconsole.

Topics:

• Display and Navigate Storage Providers

• Preview Content from GitHub

• View Data Properties

Display and Navigate Storage ProvidersYou can display and navigate through storage providers in the Data section of the Ora‐cle Big Data Manager console.

1. Click the Data tab at the top of the page, and then click the Explorer tab on theleft side of the page.

The Data explorer page contains two panels, each of which shows a data source.(One way to copy data is to drag items from one panel to the other.)

To display and navigate through a data provider:

• Display a storage provider by clicking the Storage list at the top of the paneland selecting the storage provider.

• Drill down by double-clicking items (folders, and so on) under Name in thepanel.

• Navigate back up the hierarchy by clicking on an item in the breadcrumbs be‐low the toolbar, for example: / user hive warehouse.

3-1

Preview Content from GitHubIn the Data section of the Oracle Big Data Manager console, you can preview the con‐tents of data in GitHub, including table data presented in a table viewer.

To preview content:


2. Right-click the item in GitHub whose content you want to preview and select thecommand to show the data. The menu command varies depending on the type ofdata; for example, Show file content or Show table data.

View Data PropertiesIn the Data section of Oracle Big Data Manager console, you can view properties ofdata objects and containers.

To view the properties:


2. Navigate to the object or container, right-click it, and select Properties from themenu.

Depending on what kind of item you examined, properties such as the following aredisplayed:

• Location

• Size

• Modified date

• Owner

• Roles

Chapter 3Preview Content from GitHub

3-2

• Read/write permission

Chapter 3View Data Properties

3-3

4Transfer and Compare Data

In the Oracle Big Data Manager console, you can create jobs to copy, move, and com‐pare data. You can run the jobs once or repeatedly, on a set schedule. You can alsoupload files from your local computer to HDFS and upload data into Apache Hive.

Topics:

• Copy Data (Including Drag and Drop)

• Copy Data (Including from Multiple Sources)

• Copy Data Via HTTP

• Upload Files from a Local Computer to HDFS

• Move Data in HDFS

• Import Data into Apache Hive

• Compare Data Sets

Copy Data (Including Drag and Drop)In the Data section of the Oracle Big Data Manager console, you can copy data be‐tween storage providers by creating copy jobs.

To copy data from one storage provider to another:


2. In one panel, select a destination storage provider for the copy job from the Stor‐age drop-down list, and then navigate to a folder or container by selecting a loca‐tion in the breadcrumbs or by drilling down in the list below it. For example:

3. In the other panel, select a source storage provider from the Storage drop-downlist, and then navigate to the folder or container containing the file, folder, or con‐tainer you want to copy.

4-1

4. Do any of the following:

a. Drag the source file, folder, or container from the source and drop it on the tar‐get.

b. Right-click the item you want to copy and select Copy from the menu.

c. Select the item you want to copy and click Copy in the toolbar.

5. In the New copy data job dialog box, provide values as described below.

General tab

• Job name: A name is provided for the job, but you can change it if you want.

• Job type: This read-only field describes the type of job. In this case, it’s Datatransfer - copy.

• CPU utilization: Use the slider to specify CPU utilization for the job. The prop‐er job configuration will be calculated based on the cluster's shape. This is setto 30 percent by default. If you set this to a higher value, you'll have moreCPUs for the job, which can mean better performance when you're copying alarge number of files. But assigning more CPUs to a job also means there willbe fewer CPUs in the cluster available for other tasks.

• Memory utilization: Use the slider to specify memory utilization for the job.The proper job configuration will be calculated based on the cluster's shape.This is set to 30 percent by default. Assigning more memory to a job can in‐crease its performance, but also leaves less free memory available for othertasks. If the job is given too little memory, it will crash. If the job is given morememory than what's currently available, it remains in a PENDING state untilthe requested amount of memory becomes available.

• Synchronize destination with sources: Select this option to synchronize thedestination with sources so files or parts of files that have already been copiedaren't copied again. This check box is deselected by default. Generally, ifyou're copying data that's not in the destination location at all, it's better toleave this option deselected so faster data transfer takes place. If you're copy‐ing data that's already in the destination location and you just want to updatesome of it, it's better to select this check box so only new and updated data isdetected and transferred.

• Overwrite existing files: Select this option to overwrite existing files of thesame name in the target destination. This is selected by default.

• Run immediately: Select this option to run the job immediately and only once.This is selected by default.

• Repeated execution: Select this option to schedule the time and frequency ofrepeated executions of the job. You can specify a simplified entry, or click Ad‐vanced entry to enter a cron expression.

Advanced tab

• Block size: Select the file chunk size in HDFS from the drop-down list. Thissetting defaults to the default block size in Hadoop.

• Number of executors per node: Specify the number of CPU cores. The de‐fault is 5. If you want to execute this job in parallel with other Spark or MapRe‐duce jobs, decrease the number of cores to increase performance.

Chapter 4Copy Data (Including Drag and Drop)

4-2

• Memory allocated for driver: Select the memory limit from the drop-downlist. Memory allocated for the driver is memory allocated for the ApplicationDriver responsible for task scheduling. The default is 1 GB.

• Custom logging level: Select this option to log the job’s activity and to selectthe logging level. The default logging level is INFO.

6. Click Create.

The Data copy job job_number created dialog box shows minimal status infor‐mation about the job. When the job completes, click View more details to showmore details about the job in the Jobs section of the console. You can also clickthis link while the job is running.

7. Review the job results. The tabs on the left provide different types of information.You can also stop or remove running jobs and rerun or remove completed jobs

from the menu for the job on each tab.

• The Summary tab shows summary information for the job.

• The Arguments tab shows the parameters passed to the job.

• The Job output tab shows job output, which you can also download.

Also see Manage Jobs in Oracle Big Data Manager.

Copy Data (Including from Multiple Sources)In the Data section of the Oracle Big Data Manager console, you can create, schedule,and run jobs that include multiple sources. You can also copy via HTTP(S).


2. In either panel, select a destination storage provider for the copy job from theStorage drop-down list.

3. On the toolbar for that panel, click Copy here from HTTP(S) .

4. In the New copy data job dialog box, select the source to be copied:

a. Click the Select file or directory button next to Sources.

b. From the Location drop-down list, select the storage provider from which youare copying. The list shows the storage providers registered with Oracle BigData Manager.

c. If desired, select Use file as link to list of files to link to a list of files insteadof linking directly to a single file.

If you select this option, the file must be a .csv file, and each line of the filemust satisfy the following schema: link_to_file[,http_headers_ encod-ed_in_Base64], where ,http_headers_ encoded_in_Base64 is optional. Forexample:

http://172.16.253.111/public/big.filehttps://172.16.253.111/public/small.filehttp://172.16.253.111/private/secret.file,QXV0aG9yaXphdGlvbjogQmF-zaWMgYjNKaFkyeGxPa2cwY0hCNVJqQjQKhttps://oracle:[email protected]/private/small.file

Chapter 4Copy Data (Including from Multiple Sources)

4-3

d. Navigate to and select the item you're copying. You can also specify a pathmanually in the Path field. For HTTP(S), enter the URI of the source in theURI field.

e. Click Select.

5. If you want to copy from multiple sources in the same copy job, click the Addsource button and repeat the tasks in the previous step.

6. If you want to change the destination for the copy job, click inside the Destinationfield and edit the current location.

7. In the New copy data job dialog box, provide values as described below.

General tab


• Job type: This read-only field describes the type of job. In this case, it’s Datatransfer - import from HTTP.






Advanced tab

• Block size: Select file chunk size in HDFS from the drop-down list.

• Number of executors per node: Specify the number of cores. The default is5. If you want to execute this job in parallel with other Spark or MapReducejobs, decrease the number of cores to increase performance.


• Custom logging level: Select this option to log the job’s activity and to selectthe logging level. The default logging level is INFO. If this check box is not se‐lected, the logging level for the job defaults to whatever is configured in thecluster.

Chapter 4Copy Data (Including from Multiple Sources)

4-4

• HTTP proxy: If this data transfer type is HTTP(S) and if you have HTTP(S)header information stored in a file, you can use that header information in theHTTP(S) request header. Click Select file, select the storage provider fromthe Location drop-down list, and navigate to and select the file that containsthe HTTP(S) header information. For HTTP(S), enter the URI for the file in theURI field. Click Select file.

The structure of the files with HTTP(S) headers is regex_pattern,http_head-ers. For example, the following file applies specific HTTP(S) headers for filesthat contain image in their path or name. Note that HTTP(S) headers must beBase64 encoded:

.*image.*,QXV0aG9yaXphdGlvbjogQmFzaWMgYjNKaFkyeGxPa2cwY0hCNVJqQjQK

8. Click Create.








Copy Data Via HTTPIn the Data section of the Oracle Big Data Manager console, you can create, schedule,and run jobs that copy data from a source on a web server by using the HTTP proto‐col.


2. In either panel, select a destination storage provider for the copy job from theStorage drop-down list.

3. On the toolbar for that panel, click Copy here from HTTP(S) .

4. In the New copy data job dialog box, select the source to be copied:

a. Click the Select file or directory button next to Sources.

b. From the Location drop-down list, select the storage provider from which youare copying. The list shows the storage providers registered with Oracle BigData Manager.

c. If desired, select Use file as link to list of files to link to a list of files insteadof linking directly to a single file.

If you select this option, the file must be a .csv file, and each line of the filemust satisfy the following schema: link_to_file[,http_headers_ encod-

Chapter 4Copy Data Via HTTP

4-5

ed_in_Base64], where ,http_headers_ encoded_in_Base64 is optional. Forexample:

http://172.16.253.111/public/big.filehttps://172.16.253.111/public/small.filehttp://172.16.253.111/private/secret.file,QXV0aG9yaXphdGlvbjogQmF-zaWMgYjNKaFkyeGxPa2cwY0hCNVJqQjQKhttps://oracle:[email protected]/private/small.file

d. Navigate to and select the item you're copying. You can also specify a pathmanually in the Path field. For HTTP(S), enter the URI of the source in theURI field.

e. Click Select.

5. If you want to copy from multiple sources in the same copy job, click the Addsource button and repeat the tasks in the previous step.

6. If you want to change the destination for the copy job, click inside the Destinationfield and edit the current location.

7. In the tabs of the New copy data job dialog box, provide values as described be‐low.

General tab


• Job type: This read-only field describes the type of job. In this case, it’s Datatransfer - import from HTTP.






Advanced tab

• Block size: Select file chunk size in HDFS from the drop-down list.

Chapter 4Copy Data Via HTTP

4-6

• Number of executors per node: Specify the number of cores. The default is5. If you want to execute this job in parallel with other Spark or MapReducejobs, decrease the number of cores to increase performance.



• HTTP proxy: If this data transfer type is HTTP(S) and if you have HTTP(S)header information stored in a file, you can use that header information in theHTTP(S) request header. Click Select file, select the storage provider fromthe Location drop-down list, and navigate to and select the file that containsthe HTTP(S) header information. For HTTP(S), enter the URI for the file in theURI field. Click Select file.

The structure of the files with HTTP(S) headers is regex_pattern,http_head-ers. For example, the following file applies specific HTTP(S) headers for filesthat contain image in their path or name. Note that HTTP(S) headers must beBase64 encoded:

.*image.*,QXV0aG9yaXphdGlvbjogQmFzaWMgYjNKaFkyeGxPa2cwY0hCNVJqQjQK

8. Click Create.








Upload Files from a Local Computer to HDFSIn the Data section of the Oracle Big Data Manager console, you can upload files anddirectories from a local computer to HDFS.

To upload files from a local computer to HDFS:


2. From the Storage drop-down list in either panel, select HDFS storage (hdfs) andnavigate to the destination for the uploaded files.

3. On the toolbar of the panel you chose above, click File upload .

Chapter 4Upload Files from a Local Computer to HDFS

4-7

4. In the Files upload dialog box, click Select files and then Files upload or Direc‐tory upload to select the files or directories from your computer’s file system. Al‐ternatively, you can drag files or directories from your computer’s file system to theOr drop files or directories here box.

You can upload multiple files or directories at one time by using either or both ofthe above methods.

5. Click Upload to upload the selected files or directories. You can expand Details tosee details of the running or completed job, or click Close to close the dialog box.If you click Close while the job is running, you also have the option to cancel thejob.

Move Data in HDFSIn the Data section of the Oracle Big Data Manager console, you can move data fromone HDFS location to another.


2. From the Storage drop-down list in one of the panels, select HDFS storage(hdfs) and navigate to the target location.

3. From the Storage drop-down list in the other panel, select HDFS storage (hdfs),navigate to the item you want to move, and select it.

4. On the toolbar for the panel containing the item to be moved, click Move .

5. When prompted, click Move, and then click Close to close the dialog box.

Import Data into Apache HiveIn the Data section of the Oracle Big Data Manager console, you can import .csvfiles, Apache Avro files, and Apache Parquet files from HDFS to the HiveServer2 that'son an Oracle Big Data Cloud Service cluster.

To import one of the supported files into Hive:


2. From the Storage drop-down list in one of the panels, select HDFS storage(hdfs).

Apache Hive import might not work, depending on the access rights of the file andits parent directories. If so, you can copy or move the file to the /tmp directory andimport from there.

3. Navigate to the file you want to import, right-click it, select Import to ApacheHive, and select how to import it: Import as CSV, Import as Apache Avro, orImport as Apache Parquet.

4. Provide import details.

• For Import as CSV, provide values on each tab of the Create a new job wiz‐ard and then click Create.

Chapter 4Move Data in HDFS

4-8

• For Import as Apache Avro and Import as Apache Parquet, specify theHive table in the Table name field, and select the Hive database from the Da‐tabase name drop-down list. The table name defaults to the name of the fileyou selected to import. The database defaults to the default Hive database. Ifother databases have been created in your Hive metastore, those databaseswill also be listed in the drop-down list. Click Import to Hive.

Compare Data SetsIn the Oracle Big Data Manager console, you can create, schedule, and run jobs thatcompare large data sets in different storage providers.

A compare job uses the Oracle Big Data Cloud Service odiff utility, and the computa‐tion runs as a distributed Spark application.


2. Navigate to and select an item in the left panel and an item in the right panel tocompare. You can only compare like items, for example file to file or directory todirectory.

3. On the toolbar, click Compare .

4. In the New compare data job dialog box, provide values as described below.

General tab


• Job type: This read-only field describes the type of job. In this case, it’s Ora‐cle Distributed Diff - compare.





Advanced tab

• Diff file block size: Select a value from the drop-down list to specify the levelof comparison. The lower the value the more detailed the comparison will be

Chapter 4Compare Data Sets

4-9

and the more time it will take. The default value is 512 MB. This option doesnot apply to files stored in Oracle Cloud Infrastructure Object Storage Classic.

• Number of executors per node: Specify the number of CPU cores. The de‐fault is 5. If you want to execute this job in parallel with other Spark or MapRe‐duce jobs, decrease the number of cores to increase performance.



5. Click Create.

The Data compare job job_number created dialog box shows minimal status in‐formation about the job. When the job completes, click View more details to showmore details about the job in the Jobs section of console. You can also click thislink while the job is running.

6. Review the job results. In particular, click the Comparison results tab on the leftside of the page to display what’s the same and what’s different about the com‐pared items.

Chapter 4Compare Data Sets

4-10

5Manage Jobs in Oracle Big Data Manager

Copying and comparing data in Oracle Big Data Manager is handled by creating jobs.

Topics

• View Execution History of All Jobs

• View Summary Information About a Job

• View the Arguments for a Job

• View Comparison Results for a Job

• View Job Output

View Execution History of All JobsYou can view the execution history of all jobs in the Jobs section of the Oracle Big Da‐ta Manager console.

To view the execution history of all jobs:

1. Click the Jobs tab at the top of the page.

2. Click Executions history on the left side of the page.

View Summary Information About a JobYou can view summary information about running and completed jobs in the Jobs sec‐tion of the Oracle Big Data Manager console.

To view summary information about a job:

1. Click the Jobs tab at the top of the page, and then click Executions history onthe left side of the page.

2. In the row for the job you want to review, click the menu, and then select ViewDetails. You can also rerun and remove completed jobs from this menu, or stopand remove running jobs.

3. Click the Summary tab on the left.

The information shown can include the following:

• Name

• Source and destination

• Description

• Schedule

• Status

• Progress

5-1

• Start and stop times

• Duration

View the Arguments for a JobYou can view the parameters that were passed to a job in the Jobs section of the Ora‐cle Big Data Manager console.

To view the parameters for a job:



3. Click the Arguments tab on the left.

The parameters defined for the job are shown, for example number of executors, blocksize, and so on.

View Comparison Results for a JobYou can view comparison results for a compare job in the Jobs section of the OracleBig Data Manager console.

To view comparison results for a compare job:


2. In the row for the compare job you want to review, click the menu, and thenselect View Details. You can also rerun and remove completed jobs from thismenu, or stop and remove running jobs.

3. Click the Comparison results tab on the left.

Job details are shown.

View Job OutputYou can view output for a job in the Jobs section of the Oracle Big Data Manager con‐sole.

To view job output:



3. Click the Job output tab on the left.

The output of the job is shown. You can also download the output.

Chapter 5View the Arguments for a Job

5-2

6Manage Pipelines in Oracle Big Data Man‐ager

You can use pipelines to automate data movement and chain jobs into workflows, soone job can automatically trigger another without the need for manual intervention.Pipelines are created and managed in the Pipelines section of the Oracle Big DataManager console.

Oracle Big Data Manager has a graphical pipeline editor that makes it easy to buildand schedule data movement pipelines. You can start with an empty pipeline, or use apipeline template and then customize it to fit your needs. Pipelines in Oracle Big DataManager are built on top of the Apache Oozie workflow scheduler for Hadoop and useOozie in the background.

Tutorial

Topics:

• Create a Pipeline

• Configure Pipeline Properties

• Enable a Pipeline

• Run a Pipeline

• View Pipeline Details

• Edit a Pipeline

• Copy a Pipeline

• Delete a Pipeline

Create a PipelineYou can use the graphical pipeline editor to build and run a pipeline. You can createand add new jobs to a pipeline, or add existing jobs. You can also schedule when youwant the pipeline to run, and configure other pipeline properties.

To create a pipeline:

1. Click the Pipelines tab in the Oracle Big Data Manager console.

The Pipelines page displays. What you see on the page depends on whether apipeline has already been created.

2. Do one of the following:

• If a pipeline has not yet been created, click Create new empty pipeline orCreate pipeline from template. These options are displayed only if no pipe‐lines exist. If you select the template option, the pipeline is populated with twojobs that you'll need to configure. To do so, double-click or right-click the jobicons and specify job properties.

6-1


• If a pipeline has already been created, click the Create a new pipeline button.

The pipeline editor opens. The editor has two tabs on the left that are used to addjobs to the pipeline. Use the New component tab to create and add new jobs, andthe Existing job tab to add existing jobs.

3. In the New pipeline field, enter a pipeline name or accept the default.

You can also click the Pipeline properties tab on the right and enter a name anddescription that way, as well as configure other pipeline properties. See ConfigurePipeline Properties.

4. Add a job to the pipeline:

• To create and add a new job:

a. Click the New component tab.

b. Drag and drop the icon for the desired job type onto the plus sign (+), orclick the plus sign (+) and select a job type from the Job list. The Edit jobtype properties dialog box opens, where job type is the type of job.

Chapter 6Create a Pipeline

6-2

c. Provide details for the job, and then click Update. You'll see the job in thepipeline.If required information is missing, you'll see an exclamation mark in the job

icon, such as , indicating more configuration is needed. Double-clickthe job icon, or right-click the icon and select Properties, to open the editdialog box and provide job properties.

• To add an existing job:

a. In the pipeline editor, click the Existing job tab.

b. Drag and drop a job onto the plus sign (+), or click the plus sign (+), ex‐pand Existing job, and select a job from the list. The existing job is addedto the pipeline.

You can remove a job from the pipeline by right-clicking the job and selecting Re‐move from pipeline.

5. Repeat the previous step for all jobs you want to add to the pipeline.

Click the Save icon in the editor's toolbar (or use Ctrl + s) to save your pipelineperiodically as you're creating it. You can also use the toolbar to perform other ac‐tions, such as close the editor (you'll be prompted to save changes), and undo orredo your changes.

6. Enable the pipeline. To run a pipeline, it needs to be enabled.

a. In the pipeline editor, click the Pipeline properties tab on the right. The Pipe‐line properties dialog box opens.

Chapter 6Create a Pipeline

6-3

b. Click Yes next to Pipeline enabled (it's set to No by default), and then clickSave. This saves changes but doesn't run the pipeline.

7. To run the pipeline, click the Save and run pipeline icon in the toolbar, andthen confirm the action. If the pipeline hasn't been enabled, you'll be prompted toenable, save, and run the pipeline.

The page for the pipeline displays and shows execution status (you may need torefresh the page to see this).

Configure Pipeline PropertiesYou can configure many different properties for your pipeline, including whether it's en‐abled or disabled, how it's displayed in the pipeline editor, and when you want it to run.You can configure pipeline properties when you're creating a pipeline, or after it's beencreated.

To configure pipeline properties:



• For an existing pipeline:

a. In the row for the pipeline for which you want to configure properties, click

the menu and select Edit. The edit page for the pipeline displays.

b. Click the Pipeline properties tab on the right, or right-click the Start iconin your pipeline and select Edit pipeline properties. The Pipeline prop‐erties dialog box opens.

Chapter 6Configure Pipeline Properties

6-4

• For a new pipeline, when you're creating it:

– In the pipeline editor, click the Pipeline properties tab on the right. ThePipeline properties dialog box opens.

3. In the Pipeline properties dialog box, configure properties as desired:

• On the General tab, you can:

– Provide a name and description for the pipeline.

– Enable or disable the pipeline. It's disabled (set to No) by default.

– Configure how you want the pipeline to display in the pipeline editor.

– Specify whether you want the job properties dialog box to open automati‐cally when you add a new job to the pipeline. If you select No, you'll needto double-click or right-click the job icon in the pipeline to open the dialogbox.

• On the Schedule tab, you can configure when the pipeline should run. Youcan specify a simplified entry, or click Advanced entry to enter a cron expres‐sion. You can also specify a data trigger. To run as scheduled, the pipelinemust be enabled.

Chapter 6Configure Pipeline Properties

6-5

If a schedule is set for a pipeline, you'll see a clock icon at the start of thepipeline. To remove a schedule for a pipeline, deselect it in the Pipeline prop‐erties dialog box.

4. When you're done with your changes, click Save in the Pipeline properties dialogbox. You can also exit the dialog box by clicking outside of it, and then click the

Save icon in the editor's toolbar (or use Ctrl + s). Both options save changeswithout running the pipeline.

Enable a PipelineTo run a pipeline you must first enable it. You can enable a pipeline when you're firstcreating it, or do it later.

To enable a pipeline:




– In the row for the pipeline you want to enable, click the menu and se‐lect Enable. Conversely, if you want to disable a pipeline that's enabled,you'd select Disable from this menu.


a. In the pipeline editor, click the Pipeline properties tab on the right.

b. Click Yes next to Pipeline enabled. This is set to No by default, whichmeans the pipeline is disabled.

c. Click Save in the Pipeline properties dialog box. This saves changes butdoesn't run the pipeline.

Chapter 6Enable a Pipeline

6-6

Run a PipelineA pipeline can be run on demand, or on a set schedule. A pipeline must be enabledbefore you can run it.

To run a pipeline:




– In the row for the pipeline you want to run, click the menu, select Runnow, and confirm the action. If the pipeline isn't enabled, select Enable inthe menu, and then Run now.


– In the pipeline editor, click the Save and run pipeline icon in the tool‐bar, and then confirm the action. If the pipeline isn't enabled, you'll beprompted to enable it.

You can also configure a pipeline to run automatically, on a set schedule. You do thison the Schedule tab in the Pipeline properties dialog box. See Configure PipelineProperties.

View Pipeline DetailsYou can view many details about your pipeline.

To view pipeline details:


2. In the row of the pipeline for which you want to view details, click the pipeline

name, or click the menu for the pipeline and select View Details.

The Pipeline pipeline name page displays, where pipeline name is the name ofthe pipeline.

Chapter 6Run a Pipeline

6-7

3. Explore pipeline details:

• Expand Pipeline overview to get a graphical view of the pipeline and its jobs.

• In the Execution history section, click the link for an execution (say Execu‐tion #1) to see detailed information about the execution. You can also click the

menu for the execution and select View Details to get to the same infor‐mation.

Edit a PipelineYou can edit many properties of an existing pipeline.

To edit a pipeline:


2. In the row for the pipeline you want to edit, click the menu and select Edit.

The edit page for the pipeline displays.

3. Make the desired changes, and then save them when you're ready. You can:

• Edit the pipeline name.

• Add a job to the pipeline.

• Remove a job from the pipeline (right-click the icon for the job and select Re‐move from pipeline).

• Edit a job's label and description (right-click the icon for the job and selectProperties).

• Edit job properties (right-click the icon for the job, select Job details, and then

select Edit from the menu to the right of the job name).

• Change pipeline properties (click the Pipeline properties tab).

Copy a PipelineYou can copy a pipeline.

To copy a pipeline:


2. In the row for the pipeline you want to copy, click the menu and select Copy.

A copy of the pipeline is created with (copy) in the name.

Delete a PipelineYou can delete a pipeline. Note that deleting a pipeline doesn't delete the jobs used inthe pipeline.

To delete a pipeline:


Chapter 6Edit a Pipeline

6-8

2. In the row for the pipeline you want to delete, click the menu, select Remove,and then confirm the action.

Chapter 6Delete a Pipeline

6-9

7Configure the Oracle Big Data Cloud Serv‐ice Firewall

An administrator can configure the Oracle Big Data Cloud Service firewall for a clusterhosted in an availability domain by using graphical tools in the Oracle Big Data Manag‐er console.

When You Can Use Oracle Big Data Manager to Configure the Firewall

You can use Oracle Big Data Manager to manage the Oracle Big Data Cloud Servicefirewall only for clusters hosted in availability domains. See How Network Access Con‐trol is Determined by the Host Region in Using Oracle Big Data Cloud Service.

By default, the only port open in a new Oracle Big Data Cloud Service cluster is port22, which is used for SSH access. To use the Oracle Big Data Manager console toconfigure firewall rules in a new cluster, you must first open 8888 on the third node ofthe cluster to allow connections to Oracle Big Data Manager. See Open a Port for Ora‐cle Big Data Manager.

Who Can Configure the Firewall in Oracle Big Data Manager

A user with administrative credentials for Oracle Big Data Cloud Service and for Ora‐cle Big Data Manager can configure the firewall through the Oracle Big Data Managerconsole.

Create Firewall RulesCreate firewall rules for clusters hosted in availability domains by setting options onthe Administration Firewall page of the Oracle Big Data Manager console.

To create a firewall rule:

1. Sign in to Oracle Big Data Manager as the bigdatamgr user, or another user withadministrative privileges.

2. Click the Administration tab at the top of the page, click the Firewall tab on theleft side of the page, and then click the Create new rule button.

3. Enter values to configure one firewall rule, as shown below, and then click theCreate button: Repeat this step for any additional rules you want to add.

7-1

Field Description

Port Choose one of these options to open a port or ports for incoming re‐quests. (Only requests that meet the other restrictions defined in thisrule are allowed access through the specified port(s).)

• Port number—Select a port number to allow connections on thatport.

• Port range—Select beginning and ending port numbers to allowconnections on that range of ports.

• SSH (22)—Select to allow connections on port 22, which is the portused for SSH access. (This port is open by default.)

• Hadoop User Experience (Hue) Oracle Data Manager (8888)—Select to allow connections on port 8888, which is the port used forboth Apache Hue and Oracle Big Data Manager.

• Cloudera Manager (7183)—Select to allow connections on port7183, which is the port used for Cloudera Manager.

• All ports—Select to allow connections on all ports.

Protocol Choose one of these options to restrict connections to requests re‐ceived via the specified protocol:

• TCP—Select to allow only requests received via the TransmissionControl Protocol (TCP).

• UDP—Select to allow only requests received via the User Data‐gram Protocol (UDP).

• ICMP—Select to allow only requests received via the Internet Con‐trol Message Protocol (ICMP).

• All—Select to allow requests received via any protocol.

Source Enter the IP address of the incoming connection in dot decimal notation,in the format AA.BB.CC.DD . A whole network can be added, by addingthe prefix mask with the notation AA.BB.CC.DD/EE.

Destination Choose one of these options to specify which node or nodes of the clus‐ter can receive the incoming connection:

• All—Select to allow connections on all nodes.• hostname1—Select to allow a connection on the given node.• hostname2—Select to allow a connection on the given node.• other hostnames...—Select to allow connections on the given no‐

des.

Comment Optionally enter a comment.

Edit Firewall RulesEdit firewall rules for the cluster on the Firewall page of the Oracle Big Data Managerconsole.

To edit a firewall rule:

1. On the Firewall page, find the row for the rule you want to edit, click the menuon the right side of the row, and select Edit.

2. Change any values, and then click the Update button: Repeat this step for any ad‐ditional rules you want to edit. For descriptions of the field, see Create FirewallRules, above.

Chapter 7Edit Firewall Rules

7-2

Refresh Firewall RulesRefresh firewall settings for the cluster on the Firewall page of the Oracle Big DataManager console.

To refresh the firewall rules:

• On the Firewall page, click the Apply rules button.

Remove Firewall RulesRemove firewall rules for the cluster on the Firewall page of the Oracle Big Data Man‐ager console.

To remove a firewall rule:

• On the Firewall page, find the row for the rule you want to remove, click the menu on the right side of the row, and select Remove.

Chapter 7Refresh Firewall Rules

7-3

8Monitor

In the Oracle Big Data Manager console, you can monitor the status of services, clus‐ters, and YARN queues in clusters.

Topics:

• Monitor the Status of Services

• Monitor the Status of Clusters

• Monitor YARN Queues in Clusters

Monitor the Status of ServicesTo monitor the status of services:

1. Click Monitoring in the Oracle Big Data Manager console menu bar.

2. Click Service Monitoring on the left side of the page.

In the Service Monitoring page:

• All services are listed with the current status of each - either OK or Down.

• The Service URL(s) for each service are listed along with the last update dayand time.

3. Optionally, enable the Receive notifications option to receive background notifi‐cations for finished jobs, pipelines, and cluster monitoring.

Monitor the Status of ClustersTo monitor the status of clusters:


2. If it’s not already selected, click Cluster Monitoring on the left side of the page.

A graphic and numeric display of cluster status is displayed, including:

• Memory usage (Used and Available)

• Nodes (the number of nodes and current status: Healthy or Unhealthy)

• CPU Cores (Used and Available)


Monitor YARN Queues in ClustersTo monitor YARN queues:


8-1

At the bottom of the Monitoring page, the YARN queue for root is displayed.

2. In the Yarn Queues section, drill down on the root queue.

3. Then, drill down on root.users to display YARN queues for all root users.

The following YARN queue information is provided for each user:

• Minimum and maximum resources for CPU and memory

• Used resources for CPU and memory

• Cluster resources for CPU and memory

• The number of containers that are pending, allocated, and reserved


Chapter 8Monitor YARN Queues in Clusters

8-2

9Analyze Data Interactively With Notes

Use notes to explore and visualize data iteratively.

Oracle Big Data Manager uses Apache Zeppelin as its notebook interface and codingenvironment. The following topics describe how to do some of the most common taskswith notes in Oracle Big Data Manager. For detailed information about Zeppelin, seethe Zeppelin documentation. Not all Zeppelin features are supported in Oracle Big Da‐ta Manager.

Topics:

• Work with Notes

• Import a Note

• Export a Note

• Create a Note

• Rename a Note

• Clear the Output from Paragraphs in a Note

• Delete a Note

• View and Edit a Note

• Run a Note

• Organize Notes

• Monitor and Troubleshoot Notes

• Manage Notebook Interpreters Settings

Work with NotesImport, create, and run notes in the Notebook section of the Oracle Big Data Managerconsole.The Notebook home page lists your existing notes, along with controls for importingand creating new notes. When you open a note, it’s displayed in its own Note page asa collection of paragraphs that contain snippets of code for accessing services, run‐ning jobs, and displaying results. You can define and run the code quickly and itera‐tively, which provides flexibility for analyzing and visualizing your data.

Commands for performing actions on the entire note are on the toolbar at the top ofeach Note page.

Commands for performing actions on individual paragraphs are on the toolbar on theright side of each paragraph on the Note page. Paragraphs contains a code section,where you enter your source code, and an output section, which displays the outputfrom executing that code.

9-1

http://zeppelin.apache.org/docs/0.8.0/

Use the Commands on the Note ToolbarUse the toolbar at the top of the Note page to perform actions on the entire note.

Item Action

Run all paragraphs Executes all paragraphs in the note sequentially, in the orderthey’re displayed.

Show/hide the codeShows or hides the code sections of all paragraphs in the note.

Show/hide the output Shows or hides the output sections of all paragraphs in thenote.

Clear output Clears the output sections of all paragraphs in the note.

Clone this noteMakes a copy of the note.

Export this note Exports the code and output sections of all paragraphs in thenote to a JSON file in your web browser’s default download di‐rectory. If the output sections are long, consider clearing theoutput before exporting the note, to save space.

Switch to personal mode Enables multiple users to run the same note with slightly differ‐ent input and see the output for their input, as opposed to see‐ing output of the latest execution regardless of who executed it.

Version controlCommits the content of the note to the current repository.When you click this icon, you’re prompted for a commit mes‐sage. The message you enter here is displayed in the Headlist, described below.

Set revisionSets the head to the current revision.

Compare revisions Enables you to compare two different commits made using theVersion control icon. See Revisions comparator in the Zeppe‐lin documentation.

Head (revision) drop-down list Displays a list of previously committed revisions of the note, ifany. By default, the head revision is selected. If you want toview a previous revision, select it from the list.

Search code Enables you to search code. Use the fields that display to findand replace code.

Move this note to trashDeletes the note.

Chapter 9Work with Notes

9-2

http://zeppelin.apache.org/docs/0.8.0/usage/other_features/notebook_actions.html

Item Action

Run schedulerNote: This feature is disabled by default, but can be enabled.See Enable cron in the Zeppelin documentation.

Schedule the execution of all paragraphs in the note with acron scheduler. When you select this option, a pop-up windowdisplays the following options:

• Preset —A list of preset intervals. If one of the presets isadequate for your needs, click the link for the interval. It’sadded as an expression to the cron expression field. Op‐tions are None 1m, 5m, 1h, 3h, 6h, 12h, 1d. Select Noneto remove any expressions that were added.

• Cron expression—Enter a custom cron expression, ifyou need something other than the above presets.

• Cron executing user— Enter the name of the user forrunning the cron job, if other than root.

• Auto-restart interpreter on cron execution —Termi‐nates the interpreter after the cron job completes running.The interpreter restarts automatically the next time a para‐graph is run.

List of shortcuts Displays keyboard shortcuts for notes.

Interpreter bindingEnables you to configure interpreter settings for this note. Usethis to enable and disable interpreters for this note, and to se‐lect the default interpreter that should be used when a para‐graph doesn't specify an interpreter.

Note permissionsEnables note owners to change note permissions. Enter the fulluser name for Oracle Big Data Manager users.

Note display mode drop-

down-list

Controls the icons that are shown on the paragraph and notemenus. Options are:• default: All toolbars are always shown.• simple: Toolbars are hidden and only shown when you

hover over the paragraph.• report: Toolbars are never shown.

Use the Commands on the Paragraph ToolbarUse the toolbar on the right side of a paragraph panel to perform actions on that para‐graph only.

Item Action

Spark job details Displays the Spark job details in your cluster's Spark web UI.This icon is displayed only for paragraphs that trigger a Sparkjob. To view these details the cluster's firewall must be config‐ured to allow access to the Spark web UI. If the firewall isn'tconfigured to allow access to the Spark web UI, clicking theicon displays a blank page.

Chapter 9Work with Notes

9-3

https://zeppelin.apache.org/docs/0.8.0/usage/other_features/cron_scheduler.html#enable-cron

Item Action

Status Shows the status of the paragraph. It can be one of the follow‐ing:• READY• FINISHED• ABORT• ERROR• PENDING• RUNNING

Run this paragaph (Shift

+Enter)

Executes the code in the code section of the paragraph.

Show/hide editor (Ctrl+Alt

+E)

Shows or hides the code section of the paragraph.

Show/hide output (Ctrl+Alt

+O)

Shows or hides the output section of the paragraph.

Menu Opens a menu with the following options that apply to the cur‐rent paragraph:• nnnnnnnnn-nnnnnn_nnnnnnnnnn—The paragraph ID.

Click the ID to copy it to the clipboard.• Font size—Select a number from the drop-down list to set

the font size for paragraph text.• Width—Select a number from the drop-down list to set a

width for the paragraph on a grid of 12 units. This allowsyou to organize the paragraphs in the grid.

• Move down—Move the paragraph one level down.• Insert new—Insert a new paragraph below the current

one.• Run all below—Run the current paragraph and all para‐

graphs below it.• Clone paragraph—Create and show a copy of the current

paragraph.• Show/Hide title—Show or hide the title of the paragraph.

You can edit the title when it’s shown.• Show/Hide line numbers—Show or hide line numbers in

the code section of the paragraph.• Disable run—Disable the Run button for this paragraph.• Link this paragraph—Export the paragraph as an iframe

and open the iframe in a new window.• Clear output—Clear the output section for this paragraph.• Remove—Delete the paragraph.

Import a NoteYou can import a note in the Notebook section of the Oracle Big Data Manager con‐sole.

To import a note:

1. If you’re not already in the Notebook section of the console, click the Notebooktab at the top of the page. If you’re already in the Notebook section, click the

Home icon in the banner near the top of the page.

Chapter 9Import a Note

9-4

2. On the Notebook home page, click the Import note link.

3. In the Import New Note dialog box, do the following:

a. Leave the Import As field blank to keep the original name of the note, or entera new name to replace the original name.

b. Click Select JSON File to upload a file from your local computer, or click Addfrom URL to upload from a location on the internet.

Export a NoteYou can export a note from the Notebook section of the Oracle Big Data Manager con‐sole.

To export a note:



2. On the Notebook home page, click the name of the note you want to export. Thenote opens.

3. On the toolbar next to the note’s title, click the Export this note icon. The noteis exported to a JSON file in your web browser’s default download directory. Theexported note has the same name as the original note.

Create a NoteYou can create a note in the Notebook section of the Oracle Big Data Manager con‐sole.

To create a note:



2. Click the Create new note link, or click the Notebook drop-down list from the ban‐ner at the top of the page and select Create new note.

3. In the Create New Note dialog box, enter a name in the Note Name field. If youwant to save the note to a different location, you can specify a path to a folder. Ifthe folder doesn’t exist, Oracle Big Data Manager will create it.

For example, to create a note named my_note in a new or existing directory namedmy_notes_dir, enter the following in the Note Name field:

my_notes_dir/my_note

4. Select an interpreter from the Default Interpreter drop-down list. The availablechoices are spark, md, sh, python, and jdbc.

5. Click Create. The note displays with an empty paragraph. Each note is composedof one or more paragraphs.

Chapter 9Export a Note

9-5

Rename a NoteYou can rename a note in the Notebook section of the Oracle Big Data Manager con‐sole.

Rename a Note Without Displaying the NoteTo rename a note without displaying the note:



2. In the list of notes on the home page, hover the mouse pointer over the note you

want to rename, and then click the Rename note icon.

3. In the Rename note dialog box, enter the new name for the note, and then clickRename. If you want to save the note to a different location, you can specify apath to a folder. If the folder doesn’t exist, Oracle Big Data Manager will create it.

For example, to rename a note named my_note to my_note_001 and move it to adirectory named project_notes, enter the following:

project_notes/my_note_001

Renaming a Note That’s Currently DisplayedTo rename a note that’s currently displayed:

1. Click the name of the note under the banner at the top of the page, and edit asneeded. If you want to save the note to a different location, you can specify a pathto a folder, as described above.

2. Click anywhere in the note or press the Enter key to accept the changes.

Clear the Output from Paragraphs in a NoteYou can clear the output from a note in the Notebook section of the Oracle Big DataManager console.

When you run the paragraphs in a note, the results are displayed beneath the code ineach paragraph. To clear that output from all paragraphs in a note:



2. In the Notebook home page, hover over the note for which you want to clear the

output, click Clear output, and then click OK.

3. Reopen the note and confirm that the output is cleared from the result section ofall paragraphs.

Chapter 9Rename a Note

9-6

Delete a NoteYou can delete a note in the Notebook section of the Oracle Big Data Manager con‐sole.

To delete a note:



2. On the Notebook home page, hover over the note that you want to delete, click the

Move note to Trash icon, and then click OK.

View and Edit a NoteYou can view and edit a note in the Notebook section of the Oracle Big Data Managerconsole.

To view and edit a note:



2. The Notebook home page lists all existing notes. Select the note from that list. Al‐ternatively, from anywhere in the Notebook section, click the Notebook drop-downlist from the banner at the top of the page, and then select the name of the note.

3. Edit the note as desired. You can modify, add, remove, and run paragraphs. Youcan also perform other actions on the note and its paragraphs by using the tool‐bars on the page. When you make changes to a note or a paragraph, the changesare automatically saved. See Work with Notes.

Run a NoteYou can run a note in the Notebook section of the Oracle Big Data Manager console.You can run an entire note or individual paragraphs in the note.

To run the note:

1. If you’re not already in the Notebook section of the console, click the Notebook tabat the top of the page. If you’re already in the Notebook section, click the Home

icon in the banner near the top of the page.

2. On the Notebook home page, click the name of the note you want to run. The noteopens.

3. Click the Run all paragraphs icon in the toolbar at the top of the page to exe‐cute all paragraphs in the note sequentially, in the order they’re displayed. If aparagraph contains code in the code section, the output of the code section is dis‐played beneath it.

Chapter 9Delete a Note

9-7

To run an individual paragraph, click the Run this paragraph icon in the tool‐bar for the paragraph.

Organize NotesYou can organize notes into directories in the Notebook section of the Oracle Big DataManager console.

You give a name to a note when you create it, and you can change the name of anexisting note. See Create a Note and Rename a Note.

To specify that the note should be contained in a directory, add a qualifying path to thename. For example, to put a note named note1 into the Demo directory, specify itsname as Demo/note1. To move that note to the Test directory, rename it as Test/note1. If the directory doesn’t exist, Oracle Big Data Manager creates it.

Monitor and Troubleshoot NotesYou can monitor the status of notes in the Notebook section of the Oracle Big DataManager console.

To monitor the status of notes:

1. Click the Notebook tab at the top of the page, and then click the Job link in thebanner.

The Job page displays, listing notes and their status.

2. Do any of the following:

• View the list of notes and their status

• Search notes

• Click a note name to open the note

• Mouse over the circle icons displayed for a note to quickly see the status ofspecific paragraphs

• Click the circle icons displayed for a note to easily open a note and go to spe‐cific paragraphs

• Run all paragraphs in a note

Manage Notebook Interpreters SettingsYou can configure interpreters for running notes in the Notebook section of the OracleBig Data Manager console.

Interpreters are bindings for how code should be interpreted and where it should besubmitted for execution. The Zeppelin interpreter allows any language and data proc‐essing back end to be plugged into Zeppelin. Oracle Big Data Manager supports thefollowing interpreters:

• JDBC

• Markdown language (md)

• Python

Chapter 9Organize Notes

9-8

• Unix shell (sh)

• Spark

To configure interpreters:



2. On the Notebook home page, click the Menu drop-down list, and then select Inter‐preter.

3. Use the Interpreters page to manage the available interpreters’ settings. You cancreate, edit, and remove settings. You can also restart interpreters.

Every interpreter belongs to a single interpreter group; however, an interpretergroup can contain several interpreters. For example, the Spark interpreter groupincludes the highlighted interpreters in the following image:

Chapter 9Manage Notebook Interpreters Settings

9-9

10Manage Data and Copy Jobs With the Ora‐cle Big Data Manager SDKs

You can use the Oracle Big Data Manager SDKs to manage data and copy jobs fromapplications.

The Oracle Big Data Manager SDKs are available from GitHub:

• Python SDK

• Java SDK

10-1

https://github.com/jazeman/bdm-python-sdk

https://github.com/jazeman/bdm-java-sdk

11Use the Oracle Big Data Manager bdm-cliUtility

Use the bdm-cli (Oracle Big Data Manager Command Line Interface) utility to copydata and manage copy jobs at the command line.

bdm-cli has several commands that duplicate odcp commands , but bdm-cli also in‐cludes additional commands for scheduling and managing copy jobs and other admin‐istrative tasks.

You have to download and install bdm-cli yourself, either on a node of the cluster oron a remote operating system. If you install it on your cluster, you must use SSH toconnect to the cluster. If you install it on a remote system, you can run the commandswithout SSH. See Install the bdm-cli Utility.

There are no special requirements for using bdm-cli when it’s installed outside thecluster.

Install the bdm-cli UtilityThe bdm-cli (Big Data Command Line Interface) is a command line utility for copyingdata and managing copy jobs. You can download and install bdm-cli from GitHub.You can install it on a remote operating system, so you don’t have to use SSH to con‐nect to the cluster.

To install bdm-cli:

1. If you use a proxy server, first call:

export http_proxy="your_proxy_server" export https_proxy="your_proxy_server"

2. Then call:

curl -L https://github.com/jazeman/bdm-python-cli/blob/1.0/install-rpm?raw=true | bash

UsageYou can use bdm-cli at the command line to create and manage copy jobs.

Syntax

bdm-cli [global_options] subcommand [options][arguments]...

11-1

Supported Storage Protocols and Paths

The protocols and paths to the file systems and storage services supported by bdm-cli are:

• HDFS:

hdfs:///

• Oracle Cloud Infrastructure Object Storage Classic (formerly known as OracleStorage Cloud Service):

swift://container.provider/

• Oracle Cloud Infrastructure Object Storage (formerly known as Oracle Bare MetalCloud Object Storage Service):

oss:///container

For operations with Oracle Cloud Infrastructure Object Storage, you must specifythe provider by using the options src-provider and dst-provider. For example,those options are used with bdm-cli create_job when used with Oracle Cloud Infra‐structure Object Storage.

Finding a Job’s UUID

A number of bdm-cli subcommands require that you identify a job by its UniversallyUnique Identifier (UUID). To find UUIDs, execute bdm-cli list_all_jobs.

Specifying Source and Destination Paths

When specifying sources and destinations, fully qualify the paths:

• source ...

File name qualified by protocol and full path, for example: hdfs:///user/oracle/test.raw

• destination

Directory name qualified by protocol and full path, for example: swift://contain-er.storagename/test-dir

Setting Environment Variables

You can set some bdm-cli options as environment variables. For example, you canset Oracle Big Data Manager URL and user password file, as follows:

export BDM_URL=https://hostname:8888/bdcs/api && export BDM_PASSWORD=/tmp/password_file

All the bdm-cli options that can be set as environment variables are documented inthe sections below.

Chapter 11Usage

11-2

Getting Help

To get help for bdm-cli use:

bdm-cli --help

To get help for a specific command use:

bdm-cli command --help

For example:

bdm-cli edit_job_template --help

OptionsOptions that can be used by all bdm-cli commands are explained below.

Option Description

--bdm-passwd path_to_pass-word_file

Path to the Oracle Big Data Manager user password file.

Environment variable: BDM_PASSWORD

--bdm-url bdm_url Oracle Big Data Manager server URL.

Environment variable: BDM_URL

--bdm-username username Oracle Big Data Manager server user name.

Default value: oracle

Environment variable: BDM_USERNAME

-f [table|csv|json] Specify the output format:• table (default)

Each field is displayed in a separate column.• csv

Each record is displayed as a comma-separated liston a single line.

• json:

The output is displayed in JavaScript Object Nota‐tion (JSON) format.

--fields fields Specifies comma-separated fields depending on thetype of object.

-h

--help

Show this message and exit.

--no-check-certificate Don't validate the server's certificate.

--proxy proxy Proxy server.

--tenant-name tenant_name Name of the tenant.

Default value: admin

-v Print the REST request body.

--version Show the Oracle Big Data Manager version and exit.

Chapter 11Options

11-3

SubcommandsThe following table summarizes the bdm-cli subcommands. For more details on each,click the name of the command.

Command Description

bdm-cli abort_job Abort a running job.

bdm-cli copy Execute a job to copy sources to destination.

bdm-cli create_job Execute a new job from an existing template.

bdm-cli cre‐ate_job_template

Create a new job template.

bdm-cli get_da‐ta_source

Find a data source by name.

bdm-cli get_job Get a job by UUID.

bdm-cli get_job_log Get a job log.

bdm-cli list_all_jobs List all jobs from the execution history.

bdm-cli list_tem‐plate_executions

List all jobs from the execution history for the given template.

bdm-cli ls List files from a specific location.

bdm-cli abort_jobAbort a running job.

Syntax

bdm-cli abort_job [options] job_uuid

Options

Option Description

--force Force abort job.

-h

--help


Example

Abort a job.

/usr/bin/bdm-cli -f json --no-check-certificate --bdm-url ${DATA_HOST}:8888/bdcs/api --bdm-username ${DATA_USER} --bdm-passwd ${USER_PASS-WORD_FILE} abort_job 24ef30e8-913b-4402-baf8-74b99c211f50

Chapter 11Subcommands

11-4

bdm-cli copyExecute a job to copy sources to destination.

Syntax

bdm-cli copy [options] source... destination

Options

Option Description

block-size block_size Specify the block size in bytes.

--description description Data source description.

--driver-memory-size driv-er_memory_size

Specify the maximum amount of memory for the OracleStorage Cloud Service driver.

--dst-provider oss_destina-tion_provider

Specify the provider of the destination, when using Ora‐cle Cloud Infrastructure Object Storage Classic destina‐tion.

-h

--help


--memory-size-per-node memo-ry_size_per_node

Specify the Spark executors memory limit in GB pernode, for example, 40GB.

--number-of-executor-no-desnumber_of_execu-tors_per_node

Specify the maximum number of Spark executors pernode, for example, 10GB.

--number-of-threads—per-nodenum-ber_of_threads_per_node

Specify the maximum number of threads per node.

part-size part_size Specify the part size in bytes.

--recursive

--no-recursive

Recursively copy (enabled by default).

--retry

--no-retry

Retry data transfer in case of failure.

--src-provideross_source_provider

Specify the provider of the source, when using for Ora‐cle Cloud Infrastructure Object Storage Classic.

--sync

--no-sync

Synchronize the source with the destination.

Example

Copy a file from HDFS to Oracle Storage Cloud Service:

/usr/bin/bdm-cli -f json --no-check-certificate --bdm-url ${DATA_HOST}:8888/bdcs/api --bdm-username ${DATA_USER} --bdm-passwd ${USER_PASS-WORD_FILE} copy hdfs:///user/${DATA_USER}/1MFile.raw oss:///${DATA_USER}--dst-provider ${OSS_PROVIDER}

Chapter 11bdm-cli copy

11-5

bdm-cli create_jobExecute a new job from an existing template.

Syntax

bdm-cli create_job [options] job_template_name

Options

Option Description

--run-now Execute job immediately if job scheduling is set. Ignoredotherwise.

--source source Source file, for example:

hdfs:///user/oracle/test.raw

--destination destination The destination directory, for example: swift://container.storagename/test-dir.

--driver-memory-size driv-er_memory_size

Specify the maximum amount of memory for an OracleStorage Cloud Service driver.


Specify the Spark executors memory limit in GB pernode, for example: 40G.

--number-of-executor-nodesnumber_of_executors_per_node

Specify the maximum number of Spark executors pernode, for example: 10G.

--number-of-threads-per-nodenumber_of_threads_per_node

Specify the maximum number of threads per node.

--block-size block_size Specify the block size in bytes.

--part-size part_size Specify the part size in bytes.

--retry

--no-retry


--sync

--no-sync

Synchronize the source with the destination.

--recursive

--no-recursive


--job-executable-classjob_executable_class

Main Java class used for the Spark job execution.


Specify the provider of the source, when using an Ora‐cle Cloud Infrastructure Object Storage Classic source.

--dst-provider oss_destina-tion_provider

Specify the provider of the destination, when using anOracle Cloud Infrastructure Object Storage Classic des‐tination.

-h

--help


Chapter 11bdm-cli create_job

11-6

bdm-cli create_job_templateCreate a new job template.

Syntax

bdm-cli create_job_template [options] job_template_name source ... destina-tion

Options

Option Description

--abort-running-job

--no-abort-running-job

Abort an already running execution if the next scheduledexecution is started.

--block-size block_size Specify block size in bytes.

--data-source-name da-ta_source_name

Job's data source name.

--description description Job template description.

--dst-provider destina-tion_provider

Specify for oss:/// destination.

--environment environment Environment in JSON format:

{"envName1": "envValue2", "envName2": "env-Value2"}

-h

--help


--history-size history_size Count of executions history log.

--job-executable-classjob_executable_class

Main Java class used for the Spark job execution.

--job-schedule job_schedule Specify cron-like job schedule, for example:

"0 56 8 * * ?" means run every day at 08h 56mUTC time.

--job-template-type job_tem-plate_type

Specify job template type. Allowed values are:

• DATA_MOVEMENT_COPY• GENERAL

--libraries libraries Hadoop libraries, for example: OdcpLibraries.

This option can have multiple values, for example:

--libraries OdcpLibraries --libraries Odc-pLibraries


Specify the Spark executors memory limit in GB pernode, for example: 40G.

--number-of-executor-nodesnumber_of_executor_per_node

Specify the maximum number of Spark executors pernode, for example: 10G.

Chapter 11bdm-cli create_job_template

11-7

Option Description

--number-of-threads-per-nodenumber_of_threads_per_node

Specify the maximum of threads per node.

--part-size part_size Specify part size in bytes.

--recursive

--no-recursive


--retry

--no-retry



Specify the provider of the source, when using for Ora‐cle Bare Metal Cloud Object Storage Service.

--sync

--no-sync

Synchronize source with destination.

--tags tags User defined tag. This option can have multiple values,for example:

--tags system --tags datamovement --tagscopy

bdm-cli get_data_sourceFind a data source by name.

Syntax

bdm-cli get_data_source [options] data_source_name

Options

Option Description

-h

--help


bdm-cli get_jobGet a job by UUID.

Syntax

bdm-cli get_job [options] job_uuid

Chapter 11bdm-cli get_data_source

11-8

Options

Option Description

-h

--help


Example

Get information on a job.

/usr/bin/bdm-cli -f json --no-check-certificate --bdm-url ${DATA_HOST}:8888/bdcs/api --bdm-username ${DATA_USER} --bdm-passwd ${USER_PASS-WORD_FILE} get_job ${JOB_UUID}

bdm-cli get_job_logGet a job log.

Syntax

bdm-cli get_job_log [options] job_uuid

Options

Option Description

-h

--help


bdm-cli list_all_jobsList all jobs from the execution history.

Syntax

bdm-cli list_all_jobs [options]

Options

Option Description

-h

--help


--limit limit Specify the size of the page.

--offset offset Specify the paging offset.

Chapter 11bdm-cli get_job_log

11-9

Example

List all jobs.

/usr/bin/bdm-cli -f json --no-check-certificate --bdm-url ${DATA_HOST}:8888/bdcs/api --bdm-username ${DATA_USER} --bdm-passwd ${USER_PASS-WORD_FILE} list_all_jobs

Use the --offset and --limit options to restrict the results. For example to get theeighth page when there are 20 rows per page, do the following:

bdm-cli list_all_jobs --offset 8 --limit 20

bdm-cli list_template_executionsList all jobs from the execution history for the given template.

Syntax

bdm-cli list_template_executions [options] job_uuid

Options

Option Description

-h

--help


bdm-cli lsList files from a specific location.

Syntax

bdm-cli ls [options] path_1 ... path_n

Options

Option Description

-h

-–human-readable

Human readable file sizes.

-d

--dirs-only

List directories only.

--provider oss_provider Specify for Oracle Bare Metal Cloud Object StorageService paths.

Chapter 11bdm-cli list_template_executions

11-10

Option Description

-h

--help


Examples

List HDFS content under selected user.

/usr/bin/bdm-cli -f json --no-check-certificate --bdm-url ${DATA_HOST}:8888/bdcs/api --bdm-username ${DATA_USER} --bdm-passwd ${USER_PASS-WORD_FILE} ls hdfs:///user/${DATA_USER}/integration_in --provider hdfs

List Oracle Cloud Infrastructure Object Storage Classic content under selected user.

/usr/bin/bdm-cli -f json --no-check-certificate --bdm-url ${DATA_HOST}:8888/bdcs/api --bdm-username test20170324113533 --bdm-passwd ${USER_PASS-WORD_FILE} ls oss:///${OSS_CONTAINER}/ --provider ${OSS_PROVIDER}

Chapter 11bdm-cli ls

11-11

AKeyboard Shortcuts for Oracle Big DataManager

You can use the keystroke shortcuts to perform actions in the Oracle Big Data Manag‐er console, as described below.

Table A-1 Keyboard Shortcuts in the Oracle Big Data Manager Console

Task Keyboard Shortcut

Change the currently selected item Up/Down/Left/Right Arrow

Open the selected directory/container Enter

Navigate back to parent directory/container Backspace

Select the first item in list Home or PageUp

Select the last item in list End or PageDown

Switch between left and right panel in the DataExplorer

Tab

Deselect the currently selected item Esc

Open the Rename dialog (supported only onHDFS)

F2

Reload the content of the current panel (sameas the Refresh button)

F5 or Ctrl+R

Invokes copy/move/paste actions Ctrl+C/X/V

If you’re using a Mac, use the Command key instead of the Control (Ctrl) key.

A-1

using oracle big data manager...this document describes how to use oracle big data manager to move...

Documents