sap vora installation and administration guide that the landscape architecture has changed...

PUBLIC

SAP Vora 2.0Document Version: 1.2 – 2017-11-14

SAP Vora Installation and Administration Guide

Content

1 Introduction to SAP Vora. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41.1 SAP Vora Component Overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2 Installation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .82.1 SAP Vora Installation Prerequisites. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .82.2 Preparing for Installation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

Validate the Hadoop Cluster. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11Collect Hadoop Cluster Information. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

2.3 SAP Vora Software Downloads. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122.4 Installing SAP Vora on the Kubernetes Cluster. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

Install SAP Vora on Kubernetes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13Install SAP Vora on OpenShift 3.6 Kubernetes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17Update the SAP Vora Configuration. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18Deployment Configuration Parameters. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19SAP Vora Directory Structure (Kubernetes). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22Command Line Parameters (Kubernetes). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23Migrate from SAP Vora 1.4.3. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24Uninstall SAP Vora from the Kubernetes Cluster. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25Troubleshooting. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

2.5 Installing SAP Vora on the Hadoop Cluster. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30SAP Vora Installer Phases. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .30Get Connection Details from the Kubernetes Dashboard. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31Install SAP Vora on Hortonworks, Cloudera, or MapR. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32SAP Vora Directory Structure (Hadoop). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34Command Line Parameters (Hadoop). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

2.6 Validate the SAP Vora Installation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

3 Administration. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 393.1 Enable Spark Auto-Registration. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 393.2 Manage Users for SAP Vora Tools. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .403.3 Check the SAP Vora Tools Connection Status. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .413.4 SAP Vora Logs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 423.5 SAP Vora Diagnostic Tools. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 423.6 Accessing SAP Vora from SAP HANA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 433.7 Best Practices: Administration and Operations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

HDFS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44Choosing a Cluster Manager. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

2 P U B L I CSAP Vora Installation and Administration Guide

Content

Example Cluster Configuration Including a Client Machine (Jump Box). . . . . . . . . . . . . . . . . . . . 45

4 Security. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 474.1 Introduction to SAP Vora Security Features and Security Operator. . . . . . . . . . . . . . . . . . . . . . . . . .474.2 Enabling Authentication for SAP Vora Services and SAP Vora Users. . . . . . . . . . . . . . . . . . . . . . . . .484.3 Enabling Authentication for SAP Vora Diagnostic Services. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 494.4 Enabling TLS for SAP Vora services. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 494.5 SAP Vora with Kerberos-Enabled Hadoop Clusters. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .50

Prerequisites for Creating a Kerberized Hadoop Security Context. . . . . . . . . . . . . . . . . . . . . . . . 514.6 SAP Vora with MapR. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 524.7 Security of SAP HANA Connections to SAP Vora with VORAODBC. . . . . . . . . . . . . . . . . . . . . . . . . .524.8 SAP Vora on Kubernetes Security. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .524.9 Data Protection in SAP Vora. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .53

SAP Vora Installation and Administration GuideContent P U B L I C 3

1 Introduction to SAP Vora

SAP Vora is a distributed database system for big data processing. SAP Vora can run on a cluster of commodity hardware compute nodes and is built to scale with the size of the data by scaling up the compute cluster.

SAP Vora Engines

SAP Vora comes with support for various data types, such as relational data, graph data, collections of JSON (Java Script Object Notation) documents, and time series. Each of these data types is managed by a specialized engine, which has tailored internal data structures and algorithms to natively support and efficiently process that type of data.

● The relational in-memory store allows you to load relational data into main memory for fast access using code generation for query processing.

● The relational disk engine provides relational data processing for data sets that do not fit into main memory.

● The time series engine allows you to compress time series data using various compression techniques, while it provides algorithms like cross correlation or histogram computation on the compressed data.

● The graph engine lets you perform commonly used graph operations on its data and is particularly suited for handling complex read-only analytical queries on very large graphs.

● The document store supports rich query processing over JSON data.

SAP Vora can load and index data from external distributed data stores, such as HDFS and Amazon S3. The data is either kept in main memory for fast processing, or, in the case of the relational disk engine, it is indexed and stored on the hard disks which are locally attached to the compute nodes. Data loaded to SAP Vora can be partitioned by user-defined partitioning schemes, such as range, block, or hash partitioning. SAP Vora contains a distributed query processor, which can evaluate queries on the partitioned data. Metadata (that is, table schemas, partition schemes, and so on) is stored in SAP Vora's own catalog, which persists the catalog entries using SAP Vora's Distributed Log (DLog) infrastructure.

Apache Spark Integration

The SAP Vora application programming interface (API) has a two-fold integration. Firstly, it provides an SAP Vora client that can be used independently of Spark. This client allows you to create, update, and delete database objects (tables, collections, and graphs). The second part of the API consists of an integration with Apache Spark, which enables you to query data stored in SAP Vora from Spark.

Spark queries are either evaluated using Spark mechanisms (after transferring data from the SAP Vora engines to Spark workers), or - wherever possible using the Spark Data Sources API - are pushed for fast processing into the respective SAP Vora engines. The SAP Vora Spark Extension takes care of analyzing Spark plans and pushing the query (or parts of it) into the SAP Vora engines.


Introduction to SAP Vora

Spark applications can interact with SAP Vora by referencing the SAP Vora Spark Extension JAR file delivered with SAP Vora. External tools that make use of the Java Database Connectivity (JDBC) protocol can interact with SAP Vora through the Spark/Hive Thriftserver shipped with SAP Vora.

SAP Vora - SAP HANA Integration

SAP Vora integrates with SAP HANA. There are two approaches to combine or federate queries across SAP HANA and SAP Vora:

1. From SAP Vora to SAP HANA via the SAP HANA data source Spark extensionThe SAP Vora Spark extension contains a Spark data source implementation that allows you to interact with SAP HANA systems. Therefore, Spark applications can combine data from both SAP HANA and SAP Vora.

2. From SAP HANA to SAP Vora via SAP HANA SDA and the VoraODBC protocolSAP HANA SDA allows you to register SAP Vora as a remote source. Tables stored in SAP Vora can be exposed to the SAP HANA catalog as virtual tables. SAP HANA queries that involve virtual tables from the SAP Vora system lead to query execution on the SAP Vora system and an exchange of intermediate data with the SAP HANA system.

SAP Vora Tools

SAP Vora comes with a graphical UI that contains an SQL console, a graphical view modeler, and a data browser.

Kubernetes and Hadoop

SAP Vora is deployed to and runs in Kubernetes clusters. All SAP Vora services are containerized using Docker. Based on a cluster description specification (defined in a configuration file), SAP Vora clusters can be booted up in a set of Docker containers and run in a Kubernetes environment. Hardware and software failures are mitigated by failover mechanisms. The Hadoop-related part of SAP Vora (that is, the SAP Vora Spark Extension) is deployed to Hadoop clusters.

1.1 SAP Vora Component Overview

The SAP Vora landscape architecture consists of a Kubernetes cluster and Hadoop cluster set up side by side on separate servers. The SAP Vora distributed runtime, which includes the SAP Vora engines, is located on the Kubernetes cluster. Spark runs on the Hadoop cluster.

Note that the landscape architecture has changed significantly as of SAP Vora 2.0. In SAP Vora 1.4, SAP Vora was co-deployed on the Hadoop cluster. As of SAP Vora 2.0, SAP Vora runs on a separate Kubernetes cluster.

SAP Vora Installation and Administration GuideIntroduction to SAP Vora P U B L I C 5

A high-level overview of the SAP Vora component architecture is shown below:

The components outlined in blue are included in the SAP Vora installation software:

Component Description

Operator The Vora operator is used to keep track of events for StatefulSets and pods, and manage garbage collection when host volumes are used

Consul SAP Vora uses HashiCorp Consul for service discovery and as a key-value store. Consul manages the service endpoints in the cluster and provides embedded health checks.

Transaction Coordinator Controls the execution of queries on the graph, disk, document, and time series engines

Monitoring A diagnostic framework providing metrics (in Grafana) and logs (in Kibana)

Vora Tools A web-based user interface with a data browser, SQL editor, and OLAP modeler

Engines Provide specialized storage and processing capabilities for relational, graph, time series, and document data

Catalog Provides a distributed metadata store. It stores changes to the metadata in the DLog Server.

DLog A distributed commit log providing persistence for the Vora Catalog

Spark Extensions The Spark extension library enhances Spark with additional functionality, such as DDL/SQL parsers, hierarchies, and OLAP modeling, and adds the semantics for persistent tables managed by the SAP Vora engines.

Thriftserver Apache Hive Thrift Server with Vora extensions for connecting external tools such as Tableau and Zeppelin


Introduction to SAP Vora

NoteYou need to provide Kubernetes, Docker Registry, NFS (on-premise), Hadoop, SAP HANA, and so on yourself.

SAP Vora Installation and Administration GuideIntroduction to SAP Vora P U B L I C 7

2 Installation

Before installing SAP Vora, review the installation prerequisites and perform the preparatory steps to ensure your Kubernetes and Hadoop clusters are properly configured. Then download the SAP Vora installation packages and install them on your clusters.

Complete the individual tasks in the following order:

Task See

Ensure your clusters meet the installation requirements for SAP Vora

SAP Vora Installation Prerequisites [page 8]

Ensure your clusters are correctly set up Preparing for Installation [page 11]

Download the SAP Vora installation packages from the SAP Software Download Center

SAP Vora Software Downloads [page 12]

Install the SAP Vora distributed runtime on Kubernetes Installing SAP Vora on the Kubernetes Cluster [page 13]

Install the SAP Vora Spark extensions on Hadoop Installing SAP Vora on the Hadoop Cluster [page 30]

Verify that SAP Vora is correctly installed Validate the SAP Vora Installation [page 36]

2.1 SAP Vora Installation Prerequisites

A Kubernetes cluster and a Hadoop cluster are prerequisites for installing SAP Vora. Review the prerequisites below to ensure your cluster setup meets the requirements for using SAP Vora.

Kubernetes (K8s)

Kubernetes must already be installed and must meet the following requirements:

● It must contain an overlay network● It can access the Docker registry (see below)● A Docker daemon socket (/var/run/docker.sock) must be mountable from the pods● The Kubernetes DNS extension must be installed

The following Kubernetes platforms are supported:

● OpenShift 3.6● CaaSP 1.0


Installation

Docker

You need to have set up a Docker infrastructure. You require Docker 1.12.6.

If you are behind a proxy server, Docker must be set up to use the proxy. This is necessary to build the Docker images during installation.

For information about how to configure your Docker proxy, see Control and configure Docker with systemd .

Docker Registry

Your docker registry must meet the following requirements:

● It must be accessible during installation● It must be accessible by the Kubernetes cluster● We recommend that you use a secure registry.● You can see how to quickly run a registry (not production ready) at https://hub.docker.com/_/registry/ .

Kubernetes Helm

The Kubernetes package manager, Kubernetes Helm v2.4.2, is required. See https://github.com/kubernetes/helm . For installation instructions, see https://github.com/kubernetes/helm/blob/master/docs/install.md

.

Kubernetes kubectl

The Kubernetes command-line tool, kubectl, is required. See https://kubernetes.io/docs/tasks/kubectl/install/ .

Jumpbox Node

The node for running the SAP Vora Kubernetes Installer must meet the following requirements:

● A Linux machine with Bash installed● It is connected to the Internet● Docker is installed and able to push to the internal registry● kubectl is installed and properly configured● Helm is installed and properly configured● Python is installed● The Python YAML package (PyYAML) is installed

SAP Vora Installation and Administration GuideInstallation P U B L I C 9

http://help.sap.com/disclaimer?site=https://docs.docker.com/engine/admin/systemd/

http://help.sap.com/disclaimer?site=https://hub.docker.com/_/registry/

http://help.sap.com/disclaimer?site=https://github.com/kubernetes/helm

http://help.sap.com/disclaimer?site=https://github.com/kubernetes/helm

http://help.sap.com/disclaimer?site=https://github.com/kubernetes/helm/blob/master/docs/install.md

http://help.sap.com/disclaimer?site=https://github.com/kubernetes/helm/blob/master/docs/install.md

http://help.sap.com/disclaimer?site=https://kubernetes.io/docs/tasks/kubectl/install/

http://help.sap.com/disclaimer?site=https://kubernetes.io/docs/tasks/kubectl/install/

● If NFS Persistent Volumes (PVs) are to be created with the installer (see the Install SAP Vora on Kubernetes procedure), NFS should already be mounted on this node.

Hadoop Distributions

SAP Vora can be used with Hortonworks Data Platform (HDP), Cloudera Distribution Including Apache Hadoop(CDH), and MapR.

Operating Systems

For information about the supported operating systems, see the Product Availability Matrix (PAM).

Cluster Sizing

To enable efficient cluster computation using the SAP Vora extensions, the cluster nodes should have at least the following:

● 4 cores● 16 GB of RAM● 20 GB of free disk space for HDFS data

Required Components

You require Spark 1.6.x or Spark 2.1.x on the Hadoop cluster running in Yarn mode.

Kubernetes-Hadoop Communication

The Kubernetes cluster and Hadoop cluster must be able to communicate with each other through TCP. If your Hadoop cluster is configured using domain names, the Kubernetes pods must be able to resolve these.


Installation

2.2 Preparing for Installation

Perform these preparatory steps to ensure your clusters are correctly set up.

2.2.1 Validate the Hadoop Cluster

To ensure that your Hadoop cluster is working correctly, run a sample Spark application on the cluster, such as SparkPi, which calculates the approximate value of Pi.

Prerequisites

● SPARK_HOME has been set correctly. For example:

$SPARK_HOME=/usr/hdp/current/spark-client/

● You are able to access HDFS

Procedure

Execute the following:

Sample Code

spark-submit --class org.apache.spark.examples.SparkPi --master yarn-client --num-executors 2 --driver-memory 512m --executor-memory 512m --executor-cores 2 --queue default $SPARK_HOME/lib/spark-examples*.jar 10 2>/dev/null

You should see something like this:

Pi is roughly 3.140292

For more information, see Spark Examples .


http://help.sap.com/disclaimer?site=http://spark.apache.org/examples.html

2.2.2 Collect Hadoop Cluster Information

Before proceeding with the installation of SAP Vora on Hadoop, collect and document the following information about your Hadoop cluster. You will need to have this information at hand during the installation.

Procedure

Make a note of the following information:

○ SSH user and password if passwordless-ssh is not enabled○ List of host names in a text file if you want to install the Spark extensions on a subset of nodes. Otherwise

this file will be generated automatically.○ HDFS user○ Kubernetes connection details (Transaction Coordinator and Catalog): available after the installation of

SAP Vora on Kubernetes○ For Hortonworks clusters:

○ Ambari user and password, cluster name, and UI host name○ Hortonworks version

○ For Cloudera clusters:○ Cloudera manager user and password, cluster name, and UI host name

○ For MapR clusters:○ CLDB address○ Path to the Spark configuration directory

○ For extensions for Spark 1.6.x:○ Paths to datanucleus JAR files

○ If authentication is enabled in SAP Vora on Kubernetes:○ Authentication credentials○ Path to a directory containing an authentication configuration file

2.3 SAP Vora Software Downloads

Download the two installation packages from the SAP Software Download Center.

The names of the installation packages are:

● SAP VORA 2 (containing the distributed runtime on Kubernetes)● SAP VORA SPARK EXT 2 (containing the Spark integration running on Hadoop)

You must install the runtime package first.

You can find the packages in the SAP Software Download Center as follows:

1. Open the SAP Support Launchpad .2. Choose Software Downloads.


Installation

http://help.sap.com/disclaimer?site=https://launchpad.support.sap.com/

3. Locate the installation packages by searching directly using "vora" name combinations, for example, "vora 2.0". Then choose Available To Download or scroll down in the results list to the Available To Download section.

2.4 Installing SAP Vora on the Kubernetes Cluster

Use the SAP Vora Installer command-line tool to install SAP Vora on the Kubernetes cluster.

You execute the installer on one of the nodes of the Kubernetes cluster or on a jumpbox node. The selected node must meet the Jumpbox Node requirements listed in SAP Vora Installation Prerequisites.

The SAP Vora distributed runtime package contains the following:

● The SAP Vora Installer command line tool (written in Bash)● Kubernetes deployment scripts for Consul and the SAP Vora services (Helm charts)● Dockerfiles and binaries for the SAP Vora engines (technical name: Vora DQP (Distributed Query

Processing)), Vora Tools, and Vora Thriftserver● Dockerfiles and binaries for SAP Vora Operator and SAP Vora Security Operator● Dockerfile for Consul● The SAP Vora password generator● Client tools for validation

2.4.1 Install SAP Vora on Kubernetes

Use the command-line tool to install SAP Vora on the Kubernetes cluster.

Prerequisites

● You have configured the kubectl command line tool to use the cluster on which you intend to install SAP Vora.

● You have configured your Docker registry and Docker daemon on the installation node such that you can build and push images to your Docker registry.

● To create a security context for a Kerberized Hadoop cluster, make sure that the required resources are located on the node where you run the SAP Vora installer. For more information, see SAP Vora with Kerberos-Enabled Hadoop Clusters [page 50] and Prerequisites for Creating a Kerberized Hadoop Security Context [page 51].

● If you are using OpenShift 3.6, consider the additional information in Install SAP Vora on OpenShift 3.6 Kubernetes [page 17].


Procedure

1. Select a node for the installation. Ensure that it meets the Jumpbox Node requirements listed in SAP Vora Installation Prerequisites.○ The selected node can be one of the nodes in the cluster or a jumpbox node that is not a member of

the Kubernetes cluster, as long as it satisfies the requirements.○ If you are behind a proxy server, make sure the following environment variables have been set in the

shell in which you run the installer:

http_proxy https_proxy no_proxy

2. Download the SAP Vora distributed runtime package to the selected node.3. Extract the package into a directory of your choice.

For an example of how the directory structure should look, see SAP Vora Directory Structure (Kubernetes).4. Set the following environment variables:

export DOCKER_REGISTRY=< DOCKER_REGISTRY_HOST>:<DOCKER_REGISTRY_PORT> export NAMESPACE=<K8S_NAMESPACE>

5. Optional: If you want to change the default configuration for any of the SAP Vora services, edit the appropriate deployment/helm/<vora-deployment-name>/values.yaml file. See Deployment Configuration Parameters for parameter details.

6. Provide the names of the nodes on which the Vora stateful components will run in the stateful-replica-conf.yaml file:

a. Either manually edit the stateful-replica-conf.yaml file with the help of the kubectl get nodes command or run the following command to let the installer make an initial assignment:

./install.sh --assign-nodes

b. Verify that the assigned nodes are ready to be scheduled (that is, they are not cordoned, tainted, or in the NotReady status) by checking the content of the stateful-replica-conf.yaml file.

7. Start the installer by running the following command:

./install.sh

8. Enter the deployment type a for on premise.

CautionFor on-premise installations, host paths are used by default. In this case, the SAP Vora Operator deletes the data of Consul, DLog, and the disk engine if their StatefulSets are deleted (either manually or by the installer when deleting the SAP Vora deployment).

CautionFor on-premise installations, host paths are used by default. In this case, the data of an SAP Vora installation can be accessed by other applications in another namespace. This access is not restricted by Kubernetes.


Installation

9. To enable authentication for SAP Vora, enter true.

This enables the following security features: Username and password authentication for SAP Vora services and SAP Vora UIs, CSRF protection on SAP Vora UIs, deployment of the SAP Vora Kubernetes security operator for deploying security context definitions.

a. Provide a user name and password for the Vora admin user.b. Specify the context configuration:

Output Code

Security context configuration: [0] Hadoop Default[1] Kerberized Hadoop Default Please enter index of context type to add or any other input to cancel:

For a target Hadoop that is Kerberos enabled, enter 1 and press ENTER . Then enter the following information about the resource paths, when prompted:

Parameter Example

Path to Kerberos configuration /kerberized-hadoop-conf/krb5.conf

Principal name vora

Path to keytab file /kerberized-hadoop-conf/vora.keytab

Key renewal period in seconds 60

Path to Hadoop configuration ($HADOOP_CONF_DIR) /kerberized-hadoop-conf/hadoop

Path to Spark configuration ($SPARK_CONF_DIR) /kerberized-hadoop-conf/spark

If Hadoop is not Kerberos enabled, you can still create a Hadoop default context. The SAP Vora engines, however, do not need a specific Hadoop context when Kerberos is not enabled.

NoteThe installer temporarily copies files to an installation directory, automatically creates the relevant Kubernetes configmaps and secrets, and deletes them when the installation finishes. If the installation ends prematurely, make sure that you delete the directory since it contains sensitive information. This applies to the following paths:

./deployment/helm/vora-security-context/localized ./deployment/htpasswd

10. Specify whether you want to provision NFS PersistentVolumes (yes/no).

Enter no if you have already created persistent volumes (not necessarily with NFS). Otherwise enter yes and provide the following information when prompted:a. Address of the NFS serverb. Remote NFS path: exported NFS path on the NFS serverc. Local NFS path: local mount path of the NFS server

11. Optional: Configure the SAP Vora Tools for SAP HANA connections.12. Optional: Configure HDFS.


Enter the HDFS namenode in the following format: <namenode-host>:<namenode-port>

Results

The installation process now starts. The installer builds the required Docker images and pushes them to the Docker registry. It then starts the Kubernetes deployment using Helm. If the installer fails, see the Troubleshooting section for more information.

On completion, the installer validates the installation and displays information about the deployed SAP Vora services, including their internal cluster IP addresses and ports. This information is also available on the Kubernetes dashboard. Note that SAP Vora services are NodePort services. By default, Kubernetes assigns the ports 30000-32767 to services with the type NodePort.

Next Steps

Open the Kubernetes dashboard and select your namespace on the left. Verify that the deployed SAP Vora services are shown under Deployments on the right.

To find the port numbers of the services from the Kubernetes dashboard, choose Services and discoveryServices :

● You require the Vora transaction coordinator and catalog port numbers when installing the SAP Vora Spark extensions on the Hadoop cluster.

● You can open the Vora Tools UI by adding the port number to the IP address of any node in the cluster:

http://<node-ip>:<port>

Related Information

SAP Vora Software Downloads [page 12]SAP Vora Directory Structure (Kubernetes) [page 22]Deployment Configuration Parameters [page 19]


Installation

2.4.2 Install SAP Vora on OpenShift 3.6 Kubernetes

In general, an SAP Vora installation on OpenShift is the same as on normal Kubernetes, but due to some additional features in OpenShift you need to make a few adaptations.

Prerequisites

● Ensure that you talk to your OpenShift support contact first, because OpenShift will apply some changes on their side to enable SAP Vora deployment.

● Disk requirements:○ /var/ - 130 GB, used in:

○ /var/lib/docker○ /var/local/vora○ /var/local/db○ /var/log/dlog

○ 100 GB of free LVM storage for the docker-pool, required to support OpenShift (see Use the Device Mapper storage driver )

Procedure

1. Prepare your own namespace.a. Log in to the OpenShift cluster once to make sure your account is in the system:

$ oc login -u <your username> [...] Password: <your password>

b. Create a project.This creates your namespace and prepares an SAP Vora OpenShift environment for you (for example, with an OpenShift-specific RBAC).

$ oc new-project <projectname>

2. Grab the kubeconfig in either of the following ways:

○ By exporting it:

$ export KUBECONFIG=/path/to/kube.config

○ By copying it to ~/.kube/config.

3. Start the SAP Vora installation.Ensure that the installer is called with the following parameters, which are important for OpenShift (these are additional to the parameters you normally require):

○ The Docker registry setting:

--docker-registry=docker-registry.default.<your-cluster-domain>:5000


http://help.sap.com/disclaimer?site=https://docs.docker.com/engine/userguide/storagedriver/device-mapper-driver/#manage-devicemapper

http://help.sap.com/disclaimer?site=https://docs.docker.com/engine/userguide/storagedriver/device-mapper-driver/#manage-devicemapper

# example docker-registry.default.cluster.local:5000

○ The Docker domain must be set to your project name (=namespace), since the OpenShift Docker registry checks both registry authorization and project authorization on your Docker images:

--docker-repository-domain=<projectname>

○ RBAC needs to be disabled, since OpenShift provides the service accounts:

--enable-rbac=no

○ The Tiller namespace needs to be set to your project name:

--tiller-namespace=<projectname>

2.4.3 Update the SAP Vora Configuration

Upgrade an existing release on Kubernetes with an updated configuration.

Procedure

1. Go to the deployment/helm/<vora-deployment-name> directory.

2. Update the appropriate values.yaml file.

3. Run the installer with the update flag as well as the name of the service to be upgraded:

./install.sh --update-installation --limit-service=<vora-deployment-name>

Related Information

Deployment Configuration Parameters [page 19]


Installation

2.4.4 Deployment Configuration Parameters

The following parameters are used to configure SAP Vora for deployment on the on-premise Kubernetes cluster. The parameters are contained in the Helm values.yaml file for each Helm chart. The files are located in the deployment/helm/<component> directory.

Vora Consul

Parameter Name Default Value Description

volumeClaimAnnotations {} Annotation for PersistentVolumeClaim resources

dontUseExternalStorage true If set to true, external storage is not used. Only host paths (see the useHostPath parameter) and temporary directories can be used.

useHostPath true If set to true, host paths of nodes are used. If set to false when dontUseExternalStorage is true, temporary directories will be attached, which will cause data loss.

hostPath /var/local/vora/vora-consul

Path used by Consul for storing persistent data

Vora DQP

NoteVora DQP (Distributed Query Processing) is the technical name for the SAP Vora engines.


docStore.replicas 1 Number of document store engine instances

graph.replicas 1 Number of graph engine instances

timeseries.replicas 1 Number of time series engine instances

relational.replicas 1 Number of relational engine instances

disk.hostPath /var/local/db Path used by the disk engine for storing persistent data

disk.dbSpaceSize 10 Gi Initial size of the disk engine dbspaces. Three dbspaces are used per instance. The minimum supported value is 10 Gi for a dbspace.

disk.replicas 1 Number of disk engine instances



disk.storageSize 50 Gi Storage size of the disk engine volumes: (disk.storageSize > 3 * disk.dbSpaceSize)

disk.volumeClaimAnnotation {} Annotation for PersistentVolumeClaim resources for the disk engine

disk.networkDriversList none Comma-separated list of network drivers to run (TCP/IP)

disk.temporaryCacheMemoryLimit

3000 Temporary buffer cache size in megabytes

disk.mainCacheMemoryLimit 3000 Main buffer cache size in megabytes

disk.largeMemoryLimit 3000 Maximum amount of heap memory in megabytes that can be used as the large memory pool

dlog.hostPath /var/local/vora/vora-dlog

Path used by the distributed log for storing persistent data

dlog.StorageSize 10 Gi Storage size of the distributed log volumes

dlog.volumeClaimAnnotation {} Annotation for PersistentVolumeClaim resources for the distributed log

traceLevel.global warning Trace level for all components, optional values: debug, info, warning, error, fatal. It can be overridden when traceLevel.COMPONENT is set.

traceLevel.[component] null Trace level for component [component], optional values: debug, info, warning, error, fatal.

hanaWire.enabled true HANA Wire activation flag. When set to true, the connection is enabled on port 3hanaWire.instanceNumber15.

hanaWire.instanceNumber "01" HANA Wire connection instance number. It is valid when hanaWire.enabled is set to true.

Vora Thriftserver

A standard SAP Vora installation has one Thriftserver instance running internally, which is used by the SAP Vora Tools to connect to the SAP Vora Transaction Coordinator. This Thriftserver instance uses local Spark, and is therefore limited by node resources (such as CPU and RAM).

If you want to connect to SAP Vora with an external JDBC client like HiveServer2 JDBC or Simba JDBC, you need to modify the vora-thriftserver Helm chart before installing the distributed runtime.

Access the deployment/helm/vora-thriftserver/values.yaml file and change the property thriftserver.service.expose to true. This will deploy an additional Thriftserver instance that can be accessed through a special port on all Kubernetes workers. To get the connection port, you can visit the Kubernetes dashboard page Services and discovery and locate the service vora-thriftserver-http.

Unlike in previous versions of SAP Vora, the Thriftserver uses the HTTP protocol. Thus, it can be further exposed in a cloud deployment using an Ingress, Application Gateway, or similar Layer7 load balancing


Installation

technique. The JDBC client needs to be configured to use the HTTP protocol and the httpPath must be set to cliservice. An example JDBC URL for the Apache Hive driver is given below:

jdbc:hive2://<kubernetes_agent>:<assigned_port>/db;transportMode=http;httpPath=cliservice

If authentication is enabled on the cluster, clients need to authenticate with their username and password. Some drivers might need additional configuration to use SASL-PLAIN as the authentication mechanism. For example, the Simba driver needs to be configured as shown below:

jdbc:hive2://<kubernetes_agent>:<assigned_port>;AuthMech=3;transportMode=http;httpPath=cliservice;UID=<user>;PWD=<password>;


auth.enable false Enable or disable authentication for the HTTP endpoint

thriftserver.service.expose false Enable or disable the HTTP endpoint

thriftserver.args Add additional startup arguments for the Thriftserver

hdfs.configure false Enable or disable passing a HDFS namenode address that can be used by SAP Vora Tools

hdfs.args.namenode Actual HDFS namenode passed to SAP Vora Tools

hana.plain.configure false Enable or disable passing SAP HANA credentials that can be used by SAP Vora Tools

hana.plain.args.host Host of the SAP HANA machine

hana.plain.args.port Port of the SAP HANA machine

hana.plain.args.instance Instance number of the SAP HANA machine

hana.plain.args.tenantdatabase Tenant database of the SAP HANA machine

hana.plain.secrets.user User that is used to connect to the SAP HANA machine

hana.plain.secrets.password Password that is used to connect to the SAP HANA machine

Install MapR Client

If you want to use SAP Vora with a Hadoop cluster that is managed by MapR and read data from MapR-FS, you should execute the installer with the --install-mapr-client flag enabled.

Related Information

Simba JDBCHiveServer2 JDBC


http://help.sap.com/disclaimer?site=https://www.simba.com/products/Hive/doc/JDBC_InstallGuide/content/jdbc/intro.htm

http://help.sap.com/disclaimer?site=https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients#HiveServer2Clients-JDBC

2.4.5 SAP Vora Directory Structure (Kubernetes)

The directory structure of the SAP Vora installation package for Kubernetes is shown below.

. ├── deployment │ └── helm │ ├── README.md │ ├── vora-consul │ ├── vora-diagnostic │ ├── vora-dqp │ ├── vora-nfs │ ├── vora-operator │ ├── vora-security-context │ ├── vora-security-operator │ ├── vora-textanalysis │ ├── vora-thriftserver │ ├── vora-tools │ └── vora-vflow ├── images │ ├── consul │ │ ├── docker-entrypoint.sh │ │ └── Dockerfile │ ├── vora-dqp │ │ ├── bin │ │ ├── common │ │ ├── deps │ │ ├── Dockerfile │ │ ├── download_and_install_3rd_party_software.sh │ │ ├── download_and_install_os_packages.sh │ │ ├── etc │ │ ├── IQ-16_1 │ │ ├── lib │ │ ├── local_deps │ │ ├── proto │ │ ├── python │ │ └── scripts │ ├── vora-operator │ │ ├── Dockerfile │ │ └── vora-k8s-operator │ ├── vora-security-operator │ │ ├── init-container │ │ └── operator │ ├── vora-textanalysis │ │ ├── bin │ │ ├── common │ │ ├── deps │ │ ├── Dockerfile │ │ ├── download_and_install_3rd_party_software.sh │ │ ├── download_and_install_os_packages.sh │ │ ├── helm │ │ ├── lexicon │ │ ├── lib │ │ ├── local_deps │ │ ├── protocol │ │ ├── python │ │ └── scripts │ ├── vora-thriftserver │ │ ├── Dockerfile │ │ ├── sapjvm │ │ ├── spark │ │ └── vora-spark │ ├── vora-tools │ │ ├── Dockerfile │ │ ├── hl-lib │ │ └── vora-tools


Installation

│ └── vora-vflow │ ├── bin │ ├── copy-stdlib.sh │ ├── Dockerfile │ ├── Dockerfile-stdlib │ └── stdlib ├── install.sh ├── license_agreement └── tools ├── import.sh ├── log_collector.py ├── v2catalog_client ├── v2client └── vorapwd

2.4.6 Command Line Parameters (Kubernetes)

The SAP Vora Installer provides the following command line parameters:

# ./install.sh --help Usage: ./install.sh [options]End to end SAP Vora installation including preparing images and Kubernetes deploymentOptions: -h, --help print this help message -v, --version print version information for the package and individual components -s, --show-summary show deployment summary in the namespace, prints available services -t, --validate validate installation by running queries -b, --build-images build and push images into docker registry -i, --install install vora services on Kubernetes -u, --update build image(s) and upgrade helm deployment(s) consider using "--force-rebuilding-images" and/or "--image-tag-suffix" options -ui, --update-installation upgrade helm deployment(s) without building image(s) -d, --delete delete vora deployments -p, --purge delete vora deployments including the security configurations -ls, --limit-service=SERVICE_NAME install/upgrade/delete only specific service(s), available services: vora-operator vora-security vora-consul vora-dqp vora-tools vora-thriftserver vora-diagnostic -ss, --skip-service=SERVICE_NAME skip installation of specific service(s), available services: vora-operator vora-security vora-consul vora-dqp vora-tools vora-thriftserver vora-diagnostic -a, --accept-license accept license agreement -dt, --deployment-type=[onpremise/cloud] specify deployment type -n, --namespace=NAMESPACE Kubernetes namespace to install SAP Vora --enable-rbac=[yes/no] do you want to enable RBAC? defaults to "yes" --docker-registry=HOST:PORT docker registry to push the SAP Vora images --docker-repository-domain=DOMAIN domain for docker images, defaults to "vora"


--force-deletion force deleting deployment and volumes --skip-preflight-checks skip preflight checks --enable-authentication=[yes/no] enable user/password authentication? --enable-security-operator=[yes/no] enable security operator? --vora-admin-username=USERNAME username for vora admin --vora-admin-password=PASSWORD password for vora admin --provision-persistent-volumes=[yes/no] do you want to provision persistent nfs volumes? --nfs-address=ADDR address of nfs server --nfs-path=PATH path on nfs --local-nfs-path=PATH local path that nfs is mounted --tiller-namespace=NAMESPACE tiller-namespace which defaults to "kube-system" --force-rebuilding-images docker images will be forced for rebuilding with "--no-cache" option to docker build command --image-tag-suffix=SUFFIX suffix to append to the end of docker image tags --vora-thriftserver-spark-version=VERSION spark version to use in vora-thriftserver image, defaults to "1.6.1" --configure-hdfs=[yes/no] do you want to configure HDFS? --hdfs-namenode=HOST:PORT hdfs namenode as <namenode-host>:<namenode-port> --configure-hana=[yes/no] do you want to configure HANA? --hana-host=HOST hana host --hana-port=POST hana port --hana-instance=INSTANCE hana instance --hana-user=USER hana user --hana-passwd=PASSWD hana password --hana-tenantdatabase=TENANTDATABASE hana tenant database --image-pull-secret=SECRETNAME image pull secret name for private registry access --jce-policy-zip=JCE_ZIP_FILEPATH path for zip file of Java Cryptography Extension (JCE) Jurisdiction Policy Files for SAP JVM --install-mapr-client include installation of Mapr client library in vora-dqp docker image --use-hostpath-for-consul=[yes/no] use hostpath for consul, defaults to "yes" for "onpremise" and to "no" for "cloud" --use-hostpath-for-dqp=[yes/no] use hostpath for disk/dlog, defaults to "yes" for "onpremise" and to "no" for "cloud"

2.4.7 Migrate from SAP Vora 1.4.3

For backup and migration purposes, SAP Vora provides an export tool that allows you to export all schemas, tables, and data sources from an SAP Vora instance. It is available with SAP Vora 1.4.3.

Procedure

1. Before upgrading, log in to one of your nodes as the vora user and execute the export tool from the SAP Vora binary directory, as follows:

cd /opt/vora/lib/vora-catalog/bin export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/vora/lib/vora-v2server/lib/ ./v2catalog_client -i 2 --export --discovery localhost:8500 > export.sql


Installation

The SQL statements are written to export.sql and can be used to re-create all objects in a new SAP Vora instance.

2. After the installation of SAP Vora 2.0, use the import script that can be found in the installation package under tools to execute all statements at once:

./import.sh export.sql

Please note that the data source files need to be accessible in the new system under the old system path and that you have to issue LOAD TABLE [tablename] statements manually to populate the tables from the data sources. In Spark 1.6, you also have to register the SAP Vora tables to make them accessible. Use the SAP Vora Tools to do so.

2.4.8 Uninstall SAP Vora from the Kubernetes Cluster

Use the SAP Vora installer to uninstall SAP Vora from the Kubernetes cluster.

Context

CautionDo not manually delete Kubernetes objects (namespace, deployments, pods, and so on) associated with your SAP Vora deployment. This will cause issues with the next installation in the same namespace. See the Troubleshooting section for more details.

Procedure

1. Run the installer with the --delete or --purge option:

./install.sh --delete # This will delete all SAP Vora deployments in the namespace.

./install.sh --purge # This will delete all SAP Vora deployments in the namespace and also remove security configurations if there are any.

NotePersistentVolumeClaims (if there are any) are not deleted in this step. You therefore still have the chance to back up your data before deleting it permanently in the next step.

2. Use the following command to delete all persistent data:

kubectl delete -n <NAMESPACE> pvc –all


Note that if you are using host paths for persistent data (which is the default for on-premise installations), this step is not necessary. The SAP Vora operator has already removed the persistent date from the nodes in the previous step.

3. On the Kubernetes dashboard, check that your namespace no longer contains any objects. Sometimes objects are not properly deleted by Kubernetes so you need to delete them manually.

2.4.9 Troubleshooting

Installer failed in the middle of a deployment

1. Check the installer logs.2. Open the Kubernetes dashboard and check the logs of the failing deployments.3. After fixing the issue, you can rerun the installer without deleting the previous deployment. The installer

should skip the existing deployments and only install the missing ones.

Installer fails at the validation step

If the installer fails or is stuck in the validation step, simply exit from the installer ( CTRL + C ) and rerun the validation step:

./install.sh --validate

If the issue still persists, first look for an error in the installer logs. Then check on the Kubernetes dashboard whether all pods are up and running. If everything is green, you need to check the logs of the DQP components that start with tx-coordinator.

Can I just delete the pods or namespace instead of running ./install.sh --delete/purge?

No, this will cause issues with the next deployments in the same namespace.

The installer considers the correct order when deleting. By deleting the vora-operator last, this allows it to clean old data from the host paths. It also uses helm delete, since helm install is used for installation.

If you directly delete the Kubernetes resources (namespace, deployments, pods, and so on) before deleting SAP Vora, you will probably end up with host paths with old data and Tiller (a Helm server component) will not know that the deployment no longer exists. This will definitely cause issues with the next deployment in the same namespace.

If the Kubernetes resources are mistakenly deleted, you should either use another namespace or delete all Helm deployments in the applicable namespace and clean old data on the hosts.


Installation

Installer failed while building the Consul image

The following error occurred:

rm: can't remove '/root/.gnupg/S.gpg-agent.extra': No such file or directory The command '/bin/sh -c apk add --no-cache ca-certificates curl gnupg libcap openssl wget && if [ $http_proxy != "" ]; then gpg --keyserver-options http-proxy=$http_proxy --keyserver pgp.mit.edu --recv-keys 91A6E7F85D05C65630BEF18951852D87348FFC4C; else gpg --keyserver pgp.mit.edu --recv-keys 91A6E7F85D05C65630BEF18951852D87348FFC4C; fi && mkdir -p /tmp/build && cd /tmp/build && wget ${HASHICORP_RELEASES}/docker-base/${DOCKER_BASE_VERSION}/docker-base_${DOCKER_BASE_VERSION}_linux_amd64.zip && wget ${HASHICORP_RELEASES}/docker-base/${DOCKER_BASE_VERSION}/docker-base_${DOCKER_BASE_VERSION}_SHA256SUMS && wget ${HASHICORP_RELEASES}/docker-base/${DOCKER_BASE_VERSION}/docker-base_${DOCKER_BASE_VERSION}_SHA256SUMS.sig && gpg --batch --verify docker-base_${DOCKER_BASE_VERSION}_SHA256SUMS.sig docker-base_${DOCKER_BASE_VERSION}_SHA256SUMS && grep ${DOCKER_BASE_VERSION}_linux_amd64.zip docker-base_${DOCKER_BASE_VERSION}_SHA256SUMS | sha256sum -c && unzip docker-base_${DOCKER_BASE_VERSION}_linux_amd64.zip && cp bin/gosu bin/dumb-init /bin && wget ${HASHICORP_RELEASES}/consul/${CONSUL_VERSION}/consul_${CONSUL_VERSION}_linux_amd64.zip && wget ${HASHICORP_RELEASES}/consul/${CONSUL_VERSION}/consul_${CONSUL_VERSION}_SHA256SUMS && wget ${HASHICORP_RELEASES}/consul/${CONSUL_VERSION}/consul_${CONSUL_VERSION}_SHA256SUMS.sig && gpg --batch --verify consul_${CONSUL_VERSION}_SHA256SUMS.sig consul_${CONSUL_VERSION}_SHA256SUMS && grep consul_${CONSUL_VERSION}_linux_amd64.zip consul_${CONSUL_VERSION}_SHA256SUMS | sha256sum -c && unzip -d /bin consul_${CONSUL_VERSION}_linux_amd64.zip && cd /tmp && rm -rf /tmp/build && apk del gnupg openssl && rm -rf /root/.gnupg' returned a non-zero code: 1 Docker build failed !

This is a sporadic error involving the Consul Docker file.

Try rebuilding the Consul image as follows:

./install.sh -b -li=vora-consul --force-rebuilding-images

"Error: transport is closing"

This is a sporadic error with Helm. Just rerun the installer.

"Error: could not find a ready tiller pod"

This a Helm error. You can run helm list and see what happens. See Helm list does not work for a workaround for an issue regarding RBAC.


In a security-enabled cluster, almost all pods are stuck in the "Waiting: PodInitializing" phase

The pods are waiting for the vora-security-operator service.

1. First check the logs of the vora-security-operator from the Kubernetes dashboard. If there are any issues with the security configuration, purge and reinstall with ./install.sh --purge and ./install.sh --install.

2. If there are no errors in the logs in step 1, delete the vora-security-operator pod (not the deployment) from the Kubernetes dashboard and check again after waiting for some time.

Validation step shows: error on server response: "could not handle api call, failure reason : Cannot create DLog Client: No projection for log id 2"

Make sure that you cleaned up old data prior to the installation.

Installer did not ask for security configuration

There is an existing security configuration in the namespace and the installer reuses it. Make sure that you removed all security configurations prior to the installation with ./install.sh --purge.

Updating existing security configurations is currently not supported.

Update with ./install.sh --update failed

This option should not be used to update the SAP Vora packages. It only allows you to change certain parameters (for example, the log level) of an existing deployment.

For SAP Vora version updates, please run the following:

./install.sh --purge # with old package ./install.sh # with new package

There might be some limitations depending on your Kubernetes version. For example, if you are running a Kubernetes version lower than 1.7, pods in a StatefulSet will not be automatically updated. Instead, you need to manually delete the existing pods to make your changes effective. For more information, see Updating StatefulSets .

The installer calls helm upgrade behind the scenes. Helm/Kubernetes itself has some limitations regarding updates to existing objects. The error you got should indicate what is not allowed with an update. For example:

Output Code

Error: UPGRADE FAILED: StatefulSet.apps "vora-dlog" is invalid: spec:


Installation

http://help.sap.com/disclaimer?site=https://kubernetes.io/docs/tutorials/stateful-application/basic-stateful-set/#updating-statefulsets

http://help.sap.com/disclaimer?site=https://kubernetes.io/docs/tutorials/stateful-application/basic-stateful-set/#updating-statefulsets

Forbidden: updates to statefulset spec for fields other than 'replicas' and 'containers' are forbidden. && StatefulSet.apps "vora-disk" is invalid: spec: Forbidden: updates to statefulset spec for fields other than 'replicas' and 'containers' are forbidden

Failing pod logs show "Consul 500 internal server error"

Make sure that there is no data left from previous installations.

Prior to the installation, make sure you clean the following folders on all nodes:

● /var/local/vora/vora-consul/<namespace>● /var/local/vora/vora-dlog/<namespace>● /var/local/db/<namespace>

Helm list does not work

Symptoms

The installer script returns either of the following errors:

● Error: the server does not allow access to the requested resource (get configmaps)● Error: User "system:serviceaccount:kube-system:default" cannot list configmaps in the namespace "kube-

system". (get configmaps)

Root Cause

The issue is a Helm bug with RBAC authentication (milestone version 2.4.2). For more information, see Helm 2.2.3 not working properly with kubeadm 1.6.1 default RBAC rules .

Workaround

Run the following commands:

kubectl create serviceaccount --namespace kube-system tiller kubectl create clusterrolebinding tiller-cluster-rule --clusterrole=cluster-admin --serviceaccount=kube-system:tillerkubectl patch deploy --namespace kube-system tiller-deploy -p '{"spec":{"template":{"spec":{"serviceAccount":"tiller"}}}}' helm init --service-account tiller --upgrade


http://help.sap.com/disclaimer?site=https://github.com/kubernetes/helm/issues/2224

http://help.sap.com/disclaimer?site=https://github.com/kubernetes/helm/issues/2224

2.5 Installing SAP Vora on the Hadoop Cluster

Use the SAP Vora Installer command-line tool to install the SAP Vora Spark extension library on a Hadoop cluster.

You execute the installer on the node running the cluster provisioning manager. For example, when you are running Hortonworks Ambari on your cluster, execute the install script on the node running the Hortonworks Ambari UI.

The SAP Vora downloadable package for Hadoop contains the following:

● The SAP Vora Spark Extensions Installer command line tool (written in Python)● The SAP Vora Spark Extensions● A README file

NoteIf your Hadoop cluster requires an HTTP(S) proxy to access content through the HTTP(S) protocol, make sure that the proxy is configured before starting SAP Vora.

2.5.1 SAP Vora Installer Phases

The SAP Vora Installer has distinct phases, in which it performs the following tasks:

1. Detecting and collecting configuration information2. Distributing the SAP Vora Spark extensions to cluster nodes3. Uploading the SAP Vora Spark extensions JAR file to HDFS4. Configuring Spark5. Restarting the Spark clients

Detecting and Collecting Configuration InformationIn the first phase, the SAP Vora Installer detects and collects information from you that is needed for the installation. You can also customize the installation according to your needs. For example, you are asked for connection and authentication parameters for the SAP Vora Kubernetes cluster. You can suppress being prompted for information by adding the --prompt flag, so that the installer will detect and use default values,

Distributing the SAP Vora Spark Extensions to Cluster NodesThe SAP Vora Spark extensions are distributed to nodes in the cluster using SSH and SCP.

The installer uses a file containing the host names of the nodes where the Spark extensions should be installed. This file is automatically generated by the installer and contains all nodes in the cluster. If you want to distribute the Spark extensions to only a subset of the nodes, you need to manually create this file and add the specific host names, one host name per line. For an example, see the example configuration file config/hosts_example.txt.

You need to provide the SSH user name and password for every command that is executed on a remote node. To simplify SSH-based installation, the Vora Installer supports the Python Paramiko SSH tool (http://


Installation

http://help.sap.com/disclaimer?site=http://www.paramiko.org/

www.paramiko.org/ ). Paramiko only requires you to specify your SSH user name and password once and then provides them throughout the installation. Note that Paramiko is optional and has to be installed manually. Please follow the Paramiko documentation to install Paramiko on the node where the Vora Installer is invoked. Please also make sure that you use an up-to-date version of Paramiko (2.2.1 is known to be working). An alternative to Paramiko is to set up the cluster with password-less SSH.

Uploading the SAP Vora Spark Extensions JAR File to HDFSThe JAR files with the Spark 1.6.x and/or Spark 2.1.x support are uploaded to HDFS so that any Spark job that wants to use SAP Vora can add these JARs as a dependency. The JAR files are uploaded by default to /user/vora/lib.

Note that there are different JAR files depending on whether you want to use SAP Vora with Spark 1.6.x or Spark 2.1.x.

Prior to running the installer in a Kerberized Hadoop cluster, you have to make sure that a valid Kerberos TGT for the HDFS superuser exists in the Kerberos credential cache of the user executing the installer.

Configuring SparkYou can specify the connection and authentication parameters for the SAP Vora Kubernetes cluster so that they are added to the Spark default configuration. This makes it easier to connect to SAP Vora when submitting Spark applications or writing Spark SQL code in the Spark shell.

Restarting Spark ClientsIn the fifth phase, the Spark clients need to be restarted so that they can pick up the new Spark default configuration. This is only true for Ambari and Cloudera.

2.5.2 Get Connection Details from the Kubernetes Dashboard

The Spark extensions installer allows you to configure Spark so that you can use SAP Vora out of the box. When prompted, you need to enter the connection and authentication details of the Transaction Coordinator and Catalog that are running on your Kubernetes cluster.

Context

The Kubernetes installer prints out these connection and authentication details after a successful installation. If you don't have this information at hand, you can get the same information from the Kubernetes dashboard, as described below.

Procedure

1. Make a note of the Kubernetes host name.


http://help.sap.com/disclaimer?site=http://www.paramiko.org/

The host name of the Transaction Coordinator and Catalog is always the same as that used for Kubernetes.2. Open the Kubernetes dashboard.3. On the left, select the namespace in which SAP Vora is running (if you have used a namespace).

4. Choose Discovery and Load Balancing Services .5. Locate the vora-tx-coordinator service.

There are usually four internal endpoints listed. The first one is an internal endpoint with the port 10002. Use the port of the second endpoint when the installer prompts you to enter the Transaction Coordinator port. For example:

6. Locate the vora-catalog service.The connection details for the Catalog are only needed for Spark 1.6.x support, not for Spark 2.1.x support.

As in the previous step, ignore the endpoint with the port 10002. When prompted for the Catalog port, enter the port of the second endpoint. For example:

2.5.3 Install SAP Vora on Hortonworks, Cloudera, or MapR

Use the command-line tool to install the SAP Vora Spark extensions on Ambari, Cloudera, or MapR.

Prerequisites

You have installed SAP Vora on the Kubernetes cluster and know your Kubernetes connection and authentication details. For more information, see Get Connection Details from the Kubernetes Dashboard [page 31].

Context

During the installation, the SAP Vora Spark extensions are distributed to the nodes in the cluster using SSH and SCP. Consider setting up Paramiko or password-less SSH to avoid repeated password requests during the installation procedure.


Installation

Procedure

1. Download the SAP Vora Spark integration package to the cluster manager node.2. Extract the package to the /tmp folder.

For an example of how the directory structure should look, see SAP Vora Directory Structure (Hadoop).3. Log in as root.

4. If you don't want the installer to automatically determine the list of host names for the installation, configure the list of cluster hosts in the config/hosts.txt file.

5. Execute the install script:

cd SAPVora-SparkIntegration ./install.sh

For more information about the installation parameters, see Command Line Parameters.6. Specify ambari, cloudera, or mapr as the cluster manager.

7. Specify whether you want to install Spark 1.6.x and/or Spark 2.1.x support. Ambari and Cloudera support running both Spark versions side by side, whereas MapR only supports a single Spark version at a time.

8. Specify the folder where the SAP Vora Spark extensions should be installed. It must start with /opt/.

9. Specify the HDFS folder where the SAP Vora Spark extensions JAR files should be installed.If support for Spark 1.6.x is enabled, the JAR file spark-sap-datasources-spark1.6.jar is copied to HDFS. If Spark 2.1.x support is enabled, the JAR file spark-sap-datasources-spark2.jar is copied to HDFS.

10. Specify the user to be used to upload the files to HDFS.11. For Spark 1.6.x support, specify the path to the datanucleus JARs. They are usually in the Spark lib

folder.12. Make sure the HDFS user has access to the files in /tmp/vora-spark/lib.

13. Specify the connection parameters for the SAP Vora Kubernetes cluster. You can get this information from the Kubernetes dashboard or the Kubernetes installer.

14. Specify the authentication parameters for the SAP Vora Kubernetes cluster. You can get this information during the installation of SAP Vora on the Kubernetes cluster.

15. For Ambari and Cloudera: Specify the cluster name, address, user, and password that are used to authenticate against the cluster manager web interface.For Cloudera, you might need to set the correct Spark service name if it differs from the default. You need to do this with the command line parameter --cloudera-spark-1.6-service-name or --cloudera-spark-2-service-name.

16. Specify the path to the hosts file (for example, config/hosts.txt).

If you choose the proposed default, the file will be automatically populated with all nodes in the cluster that are managed by the cluster manager.

17. Accept the information shown on the overview page.18. For Ambari and Cloudera: Accept the Spark client restart so that all nodes are updated with the new Spark

configuration.The installation of the SAP Vora Spark extensions has now been completed.


Related Information

SAP Vora Software Downloads [page 12]SAP Vora Directory Structure (Hadoop) [page 34]Command Line Parameters (Hadoop) [page 34]

2.5.4 SAP Vora Directory Structure (Hadoop)

The directory structure of the SAP Vora installation package for Hadoop is shown below.

SAPVora-SparkIntegration +-- config| +-- hosts_example.txt+-- install.sh+-- lib| +-- install.py +-- README.md

2.5.5 Command Line Parameters (Hadoop)

The SAP Vora Installer provides the following command line parameters:

./install.sh --help Usage: install.py [options]SAP Vora 2.0 Spark Integration InstallerOptions:-h, --help show this help message and exit--no-prompt installer attempts to run without prompting the user (all arguments need to be passed via command line)--cluster-manager=CLUSTER_MANAGER cluster manager: Ambari, Cloudera, MapR, None--ssh-username=SSH_USERNAME SSH user name--ssh-password=SSH_PASSWORD SSH password--ssh-options=SSH_OPTIONS specify any options the ssh command might need--vora-folder=VORA_FOLDER Folder for vora spark extensions, must start with /opt/--hdfs-folder=HDFS_FOLDER HDFS folder for vora spark extensions--hdfs-user=HDFS_USER OS user to access HDFS--host-file=HOST_FILE path to file with hosts--dry-run Print the possible execution trace without making changes--txcoordinator-host=TXCOORDINATOR_HOST Host for SAP Vora Transaction Coordinator--txcoordinator-port=TXCOORDINATOR_PORT Port for SAP Vora Transaction Coordinator--catalog-host=CATALOG_HOST Host for SAP Vora Catalog, only Spark 1.6.x


Installation

--catalog-port=CATALOG_PORT Port for SAP Vora Catalog, only Spark 1.6.x--catalog-timeout=CATALOG_TIMEOUT Time out for connecting to the SAP Vora Catalog in seconds, only Spark 1.6.x--ambari-cluster-name=AMBARI_CLUSTER_NAME Ambari cluster name--ambari-user-id=AMBARI_USER_ID Ambari UI user--ambari-password=AMBARI_PASSWORD Ambari UI password--ambari-cluster-address=AMBARI_CLUSTER_ADDRESS Ambari UI address, for example http://localhost:8080--hortonworks-version=HORTONWORKS_VERSION Hortonworks version--cloudera-cluster-name=CLOUDERA_CLUSTER_NAME Cloudera cluster name--cloudera-cluster-address=CLOUDERA_CLUSTER_ADDRESS Cloudera UI address, for example http://localhost:7180--cloudera-user-id=CLOUDERA_USER_ID Cloudera UI user--cloudera-password=CLOUDERA_PASSWORD Cloudera UI password--cloudera-spark-1.6-service-name=CLOUDERA_SPARK_1_6_SERVICE_NAME Cloudera name for the Spark 1.6.x service. Usually spark or spark_on_yarn--cloudera-spark-2-service-name=CLOUDERA_SPARK_2_SERVICE_NAME Cloudera name for the Spark 2.x service. Usually spark or spark_on_yarn--mapr-spark-conf-dir-path-spark-1.6=MAPR_SPARK_CONF_DIR_PATH_SPARK_1_6 Path to Spark 1.6.x conf dir in MapR--mapr-spark-conf-dir-path-spark-2=MAPR_SPARK_CONF_DIR_PATH_SPARK_2 Path to Spark 2.x conf dir in MapR--mapr-cldb-master=MAPR_CLDB_MASTER Address of MapR CLDB master--bare-spark-conf-dir-path-spark-1.6=BARE_SPARK_CONF_DIR_PATH_SPARK_1_6 Path to Spark 1.6.x conf dir when installing on a bare Hadoop cluster--bare-spark-conf-dir-path-spark-2=BARE_SPARK_CONF_DIR_PATH_SPARK_2 Path to Spark 2.x conf dir when installing on a bare Hadoop cluster--datanucleus-path=DATANUCLEUS_PATH Path to folder containing datanucleus jars--datanucleus-api-jdo-path=DATANUCLEUS_API_JDO_PATH Path to datanucleus api-jdo jar--datanucleus-core-path=DATANUCLEUS_CORE_PATH Path to datanucleus core jar--datanucleus-rdbms-path=DATANUCLEUS_RDBMS_PATH Path to datanucleus rdbms jar--spark-1.6 Install for Spark 1.6.x, not supported for FusionINsight--spark-2 Install for Spark 2.x--authentication Create and distribute authentication files--authentication-username=AUTHENTICATION_USERNAME Authentication username--authentication-password=AUTHENTICATION_PASSWORD Authentication password--authentication-v2-conf-dir=AUTHENTICATION_V2_CONF_DIR Path to directory that contains the v2auth.conf file--authentication-v2-conf-file-owner=AUTHENTICATION_V2_CONF_FILE_OWNER OS user that owns the v2auth.conf file--authentication-v2-conf-file-group=AUTHENTICATION_V2_CONF_FILE_GROUP OS group that owns the v2auth.conf file


2.6 Validate the SAP Vora Installation

To check that the SAP Vora engines and Spark extension library have been correctly installed and that you can use the SAP Vora features in Spark, create a table and load data into it from a file stored in HDFS.

Prerequisites

● You have already successfully deployed SAP Vora on the Kubernetes cluster.● You have already installed Spark on the Hadoop cluster.

Context

The SAP Vora Spark extension is located in the vora-spark directory, which contains the following folders:

● lib/: Contains the spark-sap-datasources-spark1.6.jar and spark-sap-datasources-spark2.jar files with all necessary dependencies (excluding Spark).

● bin/: Contains scripts for ease of use.● META-INF/: Contains the pom.properties and pom.xml files.

Procedure

1. Create a file in HDFS:

Sample Code

echo "1,2,Hello" > test.csv hadoop fs -put test.csv /path/to/test.csvhadoop fs -cat /path/to/test.csv 1,2,Hello

2. Make sure the VORA_SPARK_HOME environment variable points to the installation directory of the SAP Vora Spark extension (default: /opt/vora-spark), for example:

export VORA_SPARK_HOME=/opt/vora-spark

3. Open a Spark shell by using, for example, the following shell script (make sure that SPARK_HOME has been set before starting the Spark shell):

$VORA_SPARK_HOME/bin/start-spark-shell.sh

4. Enter the following statements in the Spark shell to create a table and check that it has been successfully created:


Installation

○ For Spark 1.6.x:

scala> import org.apache.spark.sql.SapSQLContext scala> val vc = new SapSQLContext(sc)scala> val testsql = """ CREATE TABLE table001 (a1 double, a2 int, a3 string) USING com.sap.spark.engines.relational OPTIONS ( files "/path/to/test.csv" )"""scala> vc.sql(testsql)scala> vc.sql("show tables").show+---------+-----------+|tableName|isTemporary|+---------+-----------+| table001| false|+---------+-----------+scala> vc.sql("SELECT * FROM table001").show+---+--+-----+| a1|a2| a3|+---+--+-----+|1.0| 2|Hello|+---+--+-----+ scala > <Ctrl-D to quit>

○ For Spark 2.1.x, there are several ways to query a table:

scala> import sap.spark.vora._ scala> val client = PublicVoraClientUtils.createClient(spark)scala> client.execute("CREATE TABLE TABLE001 (a1 double, a2 int, a3 varchar(*)) STORE IN MEMORY")scala> client.execute("ALTER TABLE TABLE001 ADD DATASOURCE HDFS('hdfs://<hdfs_namenode_address>/user/vora/test.csv')")scala> client.execute("LOAD TABLE TABLE001")scala> client.query("SELECT * FROM TABLE001").foreach(println)VoraRow(Some(1.0), Some(2), Some(Hello))scala> spark.read.format("sap.spark.vora").option("table", "TABLE001").load().show()+---+---+-----+| A1| A2| A3|+---+---+-----+|1.0| 2|Hello|+---+---+-----+scala> spark.read.format("sap.spark.vora").option("query", "SELECT a1 from TABLE001").load().show()+---+| A1|+---+|1.0|+---+scala> spark.sql("""CREATE TEMPORARY VIEW t1 USING sap.spark.vora OPTIONS (table "TABLE001")""")scala> spark.sql("SELECT * FROM t1").show()+---+---+-----+| A1| A2| A3|+---+---+-----+|1.0| 2|Hello|+---+---+-----+ scala > <Ctrl-D to quit>


Results

You have now successfully validated the SAP Vora extension and can use it as follows:

● The JAR file in the lib folder (spark-sap-datasources.jar) can be provided to Spark using the --jars option.For example, assuming the spark-shell command is on the user's path:

$ spark-shell --jars $VORA_SPARK_HOME/lib/spark-sap-datasources-VERSION.jar

● Alternatively, the shell scripts in the bin folder can be used to run a Spark shell with the SAP Vora extension library. To do so, the SPARK_HOME environment variable needs to point to the Spark folder on the jump box.You can then start the Spark shell in Yarn client mode as follows:

$ ./start-spark-shell.sh --master yarn-client


Installation

3 Administration

There are some standard administration tasks you need to perform and best practices for the ongoing operation of your SAP Vora services and Hadoop cluster.

See the following topics:

Topic Description

Enable Spark Auto-Registration [page 39] Automatically load data sources on startup

Check the SAP Vora Tools Connection Status [page 41]

Check the status of the connections between SAP Vora and other components and systems

Manage Users for SAP Vora Tools [page 40] Manage users for the SAP Vora Tools

SAP Vora Logs [page 42] Check the locations of the SAP Vora logs

SAP Vora Diagnostic Tools [page 42] Use the Grafana and Kibana UIs for monitoring and troubleshooting purposes

Accessing SAP Vora from SAP HANA [page 43]

Connect from SAP HANA to SAP Vora using SAP HANA smart data access (SDA)

Best Practices: Administration and Operations [page 44]

Achieve higher performance on your cluster by observing some basic best practices

3.1 Enable Spark Auto-Registration

The spark.sap.autoregister option is a Spark configuration parameter that specifies the data sources that should be automatically loaded on startup. The value of the parameter is a comma-separated string of data sources.

Context

The parameter allows all tables that were previously loaded and saved in the SAP Vora catalog to be re-registered in the Spark context automatically.

When you run the Thriftserver, for example, all tables will be automatically registered at startup if Spark auto-registration is enabled.

Procedure

To enable Spark auto-registration, set the Spark auto-registration option in the Spark defaults configuration file or when executing spark-submit:

SAP Vora Installation and Administration GuideAdministration P U B L I C 39

○ Set the spark.sap.autoregister parameter in the spark-defaults.conf file:

Sample Code

spark.sap.autoregister com.sap.spark.engines.disk,com.sap.spark.engines.relational

○ Set the spark.sap.autoregister parameter when executing spark-submit:

Sample Code

spark-submit --conf spark.sap.autoregister=com.sap.spark.engines.relational

3.2 Manage Users for SAP Vora Tools

You can create, edit, and delete users for the SAP Vora Tools.

Context

SAP Vora supports two types of users, admin and member:

● AdminAn admin user can create new users, delete users, and change their own and other users' passwords. They can also promote member users to admin users and demote admin users to member users.

● MemberAs a member, you can only edit your own user.The service user _vora_service is created by default and is used for internal authentication between the SAP Vora services. It has a random password.

Procedure

1. Open the SAP Vora Tools and choose User Management.2. Choose the appropriate option:

Option Description

Create a new user 1. Choose Create.2. In the Create User dialog box, enter the new user's name.3. Enter the new user's password twice.4. Choose OK to save your entries.

Change a user's password 1. Select the user and choose Edit.


Administration

Option Description

2. Enter the new password twice.3. Choose OK to save your entries.

Delete a user 1. Select the user and choose Delete.2. Choose OK to confirm.

Promote a user 1. Select the user and choose Promote.2. Choose OK to confirm.

Demote a user 1. Select the user and choose Demote.2. Choose OK to confirm.

Note that it can take a while for new users and changes to be distributed to other pods.

3.3 Check the SAP Vora Tools Connection Status

The connection status indicates whether there are active connections to the components and systems used by the SAP Vora Tools.

Procedure

1. In the top right corner of the SAP Vora Tools, choose the Connection: <status> button:

2. Check the information displayed in the Connection Status dialog box:

Component Description

Client Version The client version of the SAP Vora Tools

Vora The status of the connection to the Thrift server, shown in the form host name and port, together with the user used to connect to the Thrift server. For example, vora@thriftserverhost:19123.

The status of the connection is indicated by the displayed icon and can have one of the following values: OK (Connected), Disconnected, Error.

The not configured text indicates that the UI is not yet connected with the Vora Tools backend.

HANA The status of the connection to SAP HANA. The status not configured indicates that a connection to SAP HANA has not been defined (connections are configured in the Spark defaults configuration file spark-defaults.conf).


3.4 SAP Vora Logs

You can access the logs for the SAP Vora components from the Kubernetes dashboard.

On the Kubernetes dashboard, find the appropriate pod and choose the Logs icon to display its logs.

3.5 SAP Vora Diagnostic Tools

The SAP Vora diagnostic tools provide both monitoring and troubleshooting features, based on system and application metrics, as well as a consolidated set of SAP Vora logs.

● System and application metricsSystem metrics are generated by Kubernetes, Docker, and an operating system, while the application metrics are provided by the SAP Vora engines. The user interface for various visualizations of these metrics is provided by Grafana.

● Trace logsThe SAP Vora trace logs are consolidated in one data storage and can be filtered and visualized in a variety of ways using Kibana.

Grafana

SAP Vora uses Grafana for visualizing metrics. Grafana is one of the most usable tools for visualizing time series diagnostic data. For more information about Grafana, see the Grafana web site at http://www.grafana.org/.

Grafana is exposed from Kubernetes through the Kubernetes NodePort. You can log into it using any SAP Vora Kubernetes URL by adding the external service port of Grafana. You can find this port on the Kubernetes dashboard in the Grafana service properties.

When Grafana is open in a browser, use the following credentials for the initial login:

● username: Admin● password: Admin

After logging in, you will see that several dashboards are immediately available in the main menu. There should be two dashboards representing the system metrics and several dashboards with the metrics of specific SAP Vora engines.

Using these dashboards, it is fairly easy to check the memory and CPU consumption per node or pod, as well as various application metrics provided by SAP Vora.

For advanced users, we recommend that you develop your own custom visualizations to monitor the overall health of the SAP Vora cluster.


Administration

Kibana

SAP Vora uses Kibana in order to access the SAP Vora application trace logs. For more information about Kibana, see https://www.elastic.co/products/kibana.

Note that before logs can be browsed, filtered, and visualized in Kibana, they are collected from all the Kubernetes nodes and consolidated in a data backend.

Kibana is exposed from Kubernetes through the Kubernetes NodePort. In order to launch Kibana in your browser, you need to obtain the external port of the Kibana service from the Kubernetes dashboard. It is shown in the Kibana service properties. Then attach it to the end of a URL for any SAP Vora Kubernetes cluster node, for example, http://mycould.mycompany.com/node123:32456.

In Kibana, go to the Discover panel (the default one), where you can see all available indexes on the left, the time line at the top, and the search/filter bar directly under it.

You can get started by filtering only "error" messages, or by selecting a certain index or time interval.

For more information, see the Kibana Data Discovery documentation at https://www.elastic.co/guide/en/kibana/5.5/tutorial-discovering.html.

For advanced users, we recommend that you develop your own custom visualizations to monitor the log data of the SAP Vora cluster.

3.6 Accessing SAP Vora from SAP HANA

You can connect to and access data in SAP Vora from SAP HANA using SAP HANA smart data access (SDA).

You can create an SDA remote source connection directly to the SAP Vora cluster using the SAP Vora remote source adapter voraodbc. For more information, see Accessing SAP Vora from SAP HANA in the SAP Vora Developer Guide.

Note● The voraodbc SDA adapter is delivered with SAP HANA 1.0 SPS 12 revision 122.05 and higher.● You can currently only create virtual tables based on tables in the SAP Vora disk engine or relational in-

memory engine.

Related Information

SAP Vora Guides


https://help.sap.com/hana_vora

3.7 Best Practices: Administration and Operations

By observing some basic best practices, you can achieve higher performance on your Hadoop cluster.

A Hadoop cluster typically involves a very large number of relatively similar computers so, in general, a good way to install a cluster is by distinguishing between four types of machines:

1. Cluster provisioning system with Ambari, Cloudera, or MapR installed2. Master cluster nodes that contain systems such as HDFS NameNodes and central cluster management

tools (such as the Yarn resource manager)3. Worker nodes that do the actual computing and contain HDFS data4. Jump boxes that contain only client components. These machines allow users to start their jobs.

If you have a very specific setup where you have, for example, divided compute nodes and HDFS data nodes, be aware that this might not be the best choice.

3.7.1 HDFS

By default HDFS stores three replicas of each data block on different machines. Besides providing the necessary fault tolerance, this also increases data locality.

Be aware that if the data that is used for SQL processing is not evenly distributed, this might lead to longer loading times for tables and therefore affect the performance of the cluster when used in combination with SAP Vora. This might be the case if you delete a large amount of data (it will be unbalanced) or if you also use HDFS for data that is not used for processing with SAP Vora.

NoteIt is important to keep the data that you use in SAP Vora/Spark as evenly distributed as possible on HDFS to increase speed. There are a number of HDFS tools available to re-balance the data.

3.7.2 Choosing a Cluster Manager

The cluster manager is responsible for distributing tasks throughout the compute nodes of the cluster. Each node that assumes computation tasks is managed by a cluster manager.

In order to run, an application requests resources from the cluster manager. If this is successful, the cluster manager transfers the actual application to the nodes in question and starts it.

The cluster manager therefore serves as an abstraction layer for the application, allowing it to be developed independently of the cluster setup. This means that Spark, as well as all its extensions for SAP Vora, can be installed on a single node and will then be automatically transferred to the compute nodes. The problem with this, however, is that Spark itself also includes a cluster manager, called Spark standalone mode. Logically, however, it is an independent system that is not related to the computational capabilities of Spark.


Administration

The system provided by SAP Vora is completely independent of the cluster manager. If you are deploying a test and development environment with a small number of nodes, we recommend that you choose Spark’s standalone cluster manager. For information about how to install it, see the Spark manual.

Your Hadoop distribution usually comes with a built-in cluster manager. In most cases, this is Yarn. Yarn distinguishes between Node Managers, which are responsible for a compute node, and the Resource Manager, which keeps track of the overall workload of the cluster and distributes tasks to the Node Managers.

NoteIf your cluster manager has central components, such as the Resource Manager, you should put them on separate machines that do not compute jobs.

Related Information

Spark Standalone Mode

3.7.3 Example Cluster Configuration Including a Client Machine (Jump Box)

This example shows how a small Hadoop system consisting of 60 nodes in total can be configured.

Each node is quite small and contains 32 GB of RAM. Yarn is used as the cluster manager. The nodes are configured as follows:

● 1 Ambari server● 2 master nodes (Resource Manager and NameNodes)● 56 worker/compute nodes● 1 jump box containing client components

All components are provisioned by Ambari with the standard settings. Particularly noteworthy is the way the jump box is configured to enable a user to easily deploy applications and use the platform.

The SAP Vora Spark Extensions installer copied the spark-sap-datasources.jar file to every node in the cluster, including the client machine. The default location is /opt/vora-spark/lib/spark-sap-datasources.jar and every user can access it there.

Each user is assigned a separate Linux user, including a home directory containing Spark binaries. Each user then has the following directory structure:

● /home/user/spark: Symlink to the current Spark installation● Each user also has a home directory on HDFS

For convenience, the environment variables are configured as follows in the .profile file:

# Include spark home export SPARK_HOME="$HOME/spark"# include Vora homeexport VORA_SPARK_HOME="/opt/vora-spark"


http://help.sap.com/disclaimer?site=https://spark.apache.org/docs/latest/spark-standalone.html

# Hadoop conf direxport HADOOP_CONF_DIR="/etc/hadoop/conf"export YARN_CONF_DIR="/etc/hadoop/conf"export JAVA_HOME="/usr/jdk64/jdk1.7.0_67/" export PATH="$PATH:$SPARK_HOME/bin:$VORA_SPARK_HOME/bin"

To use the SAP Vora Spark extensions, certain system-specific variables need to be configured in Spark. Some of them have already been set at install time. See the developer manual for more details. For convenience, these are configured in the spark-defaults.conf file so that all system-specific variables are located in one place:

spark.driver.extraJavaOptions -XX:MaxPermSize=256m # Uncomment the following line and enter your Amazon S3 secret access key, if # you have one# spark.vora.engines.s3secretaccesskeyid <S3 secret access key>

Based on this configuration, users can easily start a shell or deploy an application with the following commands:

spark-shell --num-executors 3 --driver-memory 4g --executor-memory 2g --master yarn-client --jars /opt/vora-spark/lib/spark-sap-datasources.jarspark-submit --class com.sap.spark.vora.examples.LoadDataIntoVora --master yarn-client --jars /opt/vora-spark/lib/spark-sap-datasources.jar SparkVoraTrialProject-0.0.1.jar

For convenience, the Spark extensions provide a wrapper script to run the Spark shell that automatically includes the spark-sap-datasources.jar file:

start-spark-shell.sh --num-executors 3 --driver-memory 4g --executor-memory 2g --master yarn-client


Administration

4 Security

When using a distributed system, you need to be sure that your data and processes support your business needs without allowing unauthorized access to critical information. User errors, negligence, or attempted manipulation of your system should not result in loss of information or processing time.

These demands on security apply likewise to SAP Vora.

4.1 Introduction to SAP Vora Security Features and Security Operator

Leveraging Kubernetes and the microservice infrastructure, SAP Vora security features are isolated from the standard service features and are managed by the SAP Vora Kubernetes security operator.

Once the security operator has been enabled during installation, two additional Kubernetes services are deployed to the cluster:

● Vora security operator● Vora security context

The Vora security operator manages the security environment for all SAP Vora services, while the Vora security context contains the configmaps and secrets to be distributed to all SAP Vora services.

If the security operator is enabled, a container named service-vora-security is mapped to all service nodes. This container manages all security-related configurations, including but not limited to:

● Synchronising updated credentials throughout the Vora ecosystem● Renewing Kerberos tickets● Providing the necessary configuration files to the SAP Vora services to access any security configuration

needed by external interfaces (that is, Kerberized Hadoop)

A security context, on the other hand, refers to a set of security configurations. By using different security contexts, SAP Vora components can select the right configuration during runtime, allowing them to access different environments. Please note that the SAP Vora engines just use the default context if there is one provided.

SAP Vora Installation and Administration GuideSecurity P U B L I C 47

4.2 Enabling Authentication for SAP Vora Services and SAP Vora Users

SAP Vora supports username/password authentication for all services. You can enable authentication during the SAP Vora installation. By enabling authentication, you automatically enable the SAP Vora security operator.

The user created during installation is the first "vora user". It is enabled on all SAP Vora endpoints, including all UIs and programmatic endpoints.

NoteSynchronizing this user for all SAP Vora services can take up to 5 minutes.

NoteThe SAP Vora diagnostic framework has its own users. For more information, see Enabling Authentication for SAP Vora Diagnostic Services.

Additional users can be created and managed through the SAP Vora Tools user management component.

Note that the SAP Vora installer also creates a service user (_vora_service) with a random password that is used for internal authentication between SAP Vora services.

The following table summarizes the SAP Vora endpoints that are reachable by the outside world (exposed from Kubernetes by default) and the authentication methods that apply to them:

Endpoint Authentication Type and Proposed Security Measures

Transaction coordinator Vora users with username/password authentication via a proprietary Vora protocol

Thrift server external endpoint Vora users with plain SASL authentication

Vora Tools UI Vora users with a UI authentication form

HANA wire on transaction coordinator Does not support any authentication. Please see Accessing SAP Vora from SAP HANA with VORAODBC.

Diagnostic service - Grafana Grafana user with a UI authentication form

Diagnostic service - Kibana Please see Enabling Authentication for SAP Vora Diagnostic Services

Vora catalog* Vora users with username/password authentication via a proprietary Vora protocol

* This port is used for internal SAP Vora purposes only.

Related Information

Install SAP Vora on Kubernetes [page 13]Manage Users for SAP Vora Tools [page 40]


Security

4.3 Enabling Authentication for SAP Vora Diagnostic Services

The SAP Vora diagnostic services, Kibana and Grafana, do not use "vora users" for authentication.

For Kibana, please consult your SAP partner to enable authentication (that is, Ingress) on Kubernetes when accessing Kibana endpoints.

For Grafana, as the administrator, you need to change the default user and password to something safe before the installation. You can find the relevant parts in vora-diagnostic/values.yaml, where the following keys need to be changed:

adminUser: "admin" adminPassword: "admin"

NoteSAP strongly recommends that you enable SSL/TLS and authentication for the Kibana service, since logs can contain sensitive information.

4.4 Enabling TLS for SAP Vora services

The SAP Vora services don't enable SSL per service natively. However, for the following ports that are exposed to the outside world from the Kubernetes environment, SAP strongly recommends that you enable HTTPS on Kubernetes.

Please consult your consultant or SAP partner to determine the best way to enable HTTPS on the following services, depending on your environment:

● Vora Tools UI● Thrift server● Diagnostic service - Grafana● Diagnostic service - Kibana

NoteNot enabling HTTPS on the above services can cause major security issues.

Please note that the following services do not currently support enabling TLS. Although plain text passwords are not exchanged in these environments, the data transferred is nevertheless not encrypted:

● Transaction coordinator● HANA wire● Vora catalog


4.5 SAP Vora with Kerberos-Enabled Hadoop Clusters

To be able to access Hadoop, you need to create a security context. As the administrator, you do this when you run the SAP Vora installer.

Note that SAP Vora can access Hadoop from the SAP Vora engines.

Creating a security context involves the following high-level steps:

1. Before installation: Before running the installer, you need to ensure that the required resources are located on the node where the SAP Vora installer runs. For more information, see Prerequisites for Creating a Kerberized Hadoop Security Context.

2. During installation: During the installation, the installer prompts you to specify the context configuration, as follows:

Output Code

Security context configuration: [0] Hadoop Default[1] Kerberized Hadoop Default Please enter index of context type to add or any other input to cancel:

For a target Hadoop that is Kerberos enabled, you need to enter 1 and press ENTER . After that, you will be prompted for information about the resource paths. For more information about this procedure, see Install SAP Vora on Kubernetes.

NoteThe installer temporarily copies files to an installation directory, automatically creates the relevant Kubernetes configmaps and secrets, and deletes them when the installation finishes. If the installation ends prematurely, make sure that you delete the directory since it contains sensitive information. This applies to the following paths:

./deployment/helm/vora-security-context/localized ./deployment/htpasswd

CautionPlease make sure that the files in the Hadoop configuration directory contain only ASCII characters.

Also make sure that the XML configuration files in the Hadoop configuration directory do not start with extra spaces. Even though they will be correct XML files, the SAP Vora installer will not parse them.

If the above criteria are not met, the SAP Vora installer will end with the following issue: "Error: error converting YAML to JSON: yaml: line 284: did not find expected key"

Related Information

Install SAP Vora on Kubernetes [page 13]


Security

4.5.1 Prerequisites for Creating a Kerberized Hadoop Security Context

Make sure that the following resources are located on the node where you run the SAP Vora installer. If the installation is done on a Hadoop edge node, then simply pointing to the correct paths (directories and files) is sufficient.

The following directories and files are required:

● Hadoop configuration directory (/etc/hadoop/conf/)● Spark configuration directory (/etc/spark/conf/)● Kerberos configuration (krb5.conf)● Keytab for the client principal (suggested principal: vora)

If the installation is not done on a Hadoop edge node, you might find it easiest to copy all sources to one directory and then copy that directory to the node where the installation will take place.

An example of the necessary files in a directory looks something like this:

kerberized-hadoop-conf ├── hadoop │ ├── __cloudera_generation__ │ ├── __cloudera_metadata__ │ ├── core-site.xml │ ├── hadoop-env.sh │ ├── hdfs-site.xml │ ├── log4j.properties │ ├── ssl-client.xml │ ├── topology.map │ └── topology.py ├── krb5.conf ├── spark │ ├── classpath.txt │ ├── __cloudera_generation__ │ ├── __cloudera_metadata__ │ ├── log4j.properties │ ├── spark-defaults.conf │ ├── spark-env.sh │ └── yarn-conf │ ├── core-site.xml │ ├── hadoop-env.sh │ ├── hdfs-site.xml │ ├── mapred-site.xml │ ├── ssl-client.xml │ ├── topology.map │ ├── topology.py │ └── yarn-site.xml └── vora.keytab 3 directories, 25 files

New Kerberos environments come with stronger and more secure crypto ciphers that need Jurisdiction Policy Files to be installed to use them. Please download the JCE files for Java 8 from Oracle if those ciphers are used in your environment: http://www.oracle.com/technetwork/java/javase/downloads/jce8-download-2133166.html


http://help.sap.com/disclaimer?site=http://www.oracle.com/technetwork/java/javase/downloads/jce8-download-2133166.html

http://help.sap.com/disclaimer?site=http://www.oracle.com/technetwork/java/javase/downloads/jce8-download-2133166.html

4.6 SAP Vora with MapR

SAP Vora cannot access security-enabled MapR clusters.

4.7 Security of SAP HANA Connections to SAP Vora with VORAODBC

The SAP HANA wire connection does not currently support any authentication.

Although username/password authentication is used when a HANA wire connection is created using the voraodbc driver, please note that this endpoint does not check the correctness of the password, but just accepts it.

Please consider preventing access by other means, such as by enabling perimeter security between SAP HANA and SAP Vora.

4.8 SAP Vora on Kubernetes Security

SAP recommends that you enable security features on your Kubernetes cluster. Please consult SAP partners to enable the appropriate Kubernetes features for your environment.

For more information, see the following:

● Users in Kubernetes● Configuring Service Accounts for Pods● Security Best Practices for Kubernetes Deployment

CautionPlease note that particularly for on-premise Kubernetes installations, applications deployed on the same Kubernetes clusters can access data of SAP Vora instances. While this can mainly happen if hostPath is used, NFS scenarios also require you, as an administrator, to take the necessary precautions to not allow external applications to access Vora files on the same Kubernetes cluster.


Security

http://help.sap.com/disclaimer?site=https://kubernetes.io/docs/admin/authentication/#users-in-kubernetes

http://help.sap.com/disclaimer?site=https://kubernetes.io/docs/tasks/configure-pod-container/configure-service-account/

http://help.sap.com/disclaimer?site=http://blog.kubernetes.io/2016/08/security-best-practices-kubernetes-deployment.html

4.9 Data Protection in SAP Vora

SAP Vora provides the technical enablement and infrastructure to allow you to run applications on SAP Vora to conform to the legal requirements of data protection in the different scenarios in which SAP Vora is used.

Introduction to Data Protection

Data protection is associated with numerous legal requirements and privacy concerns. In addition to compliance with general data privacy acts, it is necessary to consider compliance with industry-specific legislation in different countries. This section describes the specific features and functions that SAP Vora provides to support compliance with the relevant legal requirements and data privacy.

This section and any other sections in this Security Guide do not give any advice on whether these features and functions are the best method to support company, industry, regional or country-specific requirements. Furthermore, this guide does not give any advice or recommendations with regard to additional features that would be required in a particular environment; decisions related to data protection must be made on a case-by-case basis and under consideration of the given system landscape and the applicable legal requirements.

NoteIn the majority of cases, compliance with data privacy laws is not a product feature. SAP software supports data privacy by providing security features and specific functions relevant to data protection, such as functions for the simplified blocking and deletion of personal data. SAP does not provide legal advice in any form. The definitions and other terms used in this guide are not taken from any given legal source.

Term Definition

Personal data Information about an identified or identifiable natural person

Business purpose A legal, contractual, or in other form justified reason for the processing of personal data. The assumption is that any purpose has an end that is usually already defined when the purpose starts.

Blocking A method of restricting access to data for which the primary business purpose has ended

Deletion Deletion of personal data so that the data is no longer usable

Retention period The time period during which data must be available

End of purpose (EoP) A method of identifying the point in time for a data set when the processing of personal data is no longer required for the primary business purpose. After the EoP has been reached, the data is blocked and can only be accessed by users with special authorization.


SAP Vora Approach to Data Protection

Many data protection requirements depend on how the business semantics or context of the data stored and processed in SAP Vora are understood.

NoteUsing capabilities to communicate with other data sources, SAP Vora may also be used to process data that is stored in other systems and accessed through virtual tables.

In SAP Vora installations, the business semantics of data are part of the application definition and implementation. SAP Vora provides the features for working with technical database objects, such as tables. It is therefore the application that "knows", for example, which tables in the database contain sensitive personal data, or how business level objects, such as sales orders, are mapped to technical objects in the database. Applications built on top of SAP Vora need to make use of features provided by SAP HANA to implement compliance requirements for their specific use case.

CautionDatabase trace and dump files may potentially expose personal data, for example, a trace set to a very high trace level, such as DEBUG or FINE.

SAP Vora provides a variety of security-related features to implement general security requirements that are also required for data protection and privacy:

Aspect of Data Protection and Privacy More Information

Access control Enabling Authentication for SAP Vora Services and SAP Vora Users [page 48]

Access logging SAP Vora Logs [page 42]

Transmission control/communication security Enabling TLS for SAP Vora services [page 49]

Separation by purpose Enabling Authentication for SAP Vora Services and SAP Vora Users [page 48]

CautionThe extent to which data protection is ensured depends on secure system operation. Network security, security note implementation, adequate logging of system changes, and appropriate usage of the system are the basic technical requirements for compliance with data privacy legislation and other legislation

Important Points

For information about shared usage of Kubernetes clusters and possible impacts on data privacy, see SAP Vora on Kubernetes Security [page 52].


Security

Important Disclaimers and Legal Information

Coding SamplesAny software coding and/or code lines / strings ("Code") included in this documentation are only examples and are not intended to be used in a productive system environment. The Code is only intended to better explain and visualize the syntax and phrasing rules of certain coding. SAP does not warrant the correctness and completeness of the Code given herein, and SAP shall not be liable for errors or damages caused by the usage of the Code, unless damages were caused by SAP intentionally or by SAP's gross negligence.

AccessibilityThe information contained in the SAP documentation represents SAP's current view of accessibility criteria as of the date of publication; it is in no way intended to be a binding guideline on how to ensure accessibility of software products. SAP in particular disclaims any liability in relation to this document. This disclaimer, however, does not apply in cases of willful misconduct or gross negligence of SAP. Furthermore, this document does not result in any direct or indirect contractual obligations of SAP.

Gender-Neutral LanguageAs far as possible, SAP documentation is gender neutral. Depending on the context, the reader is addressed directly with "you", or a gender-neutral noun (such as "sales person" or "working days") is used. If when referring to members of both sexes, however, the third-person singular cannot be avoided or a gender-neutral noun does not exist, SAP reserves the right to use the masculine form of the noun and pronoun. This is to ensure that the documentation remains comprehensible.

Internet HyperlinksThe SAP documentation may contain hyperlinks to the Internet. These hyperlinks are intended to serve as a hint about where to find related information. SAP does not warrant the availability and correctness of this related information or the ability of this information to serve a particular purpose. SAP shall not be liable for any damages caused by the use of related information unless damages have been caused by SAP's gross negligence or willful misconduct. All links are categorized for transparency (see: https://help.sap.com/viewer/disclaimer).

SAP Vora Installation and Administration GuideImportant Disclaimers and Legal Information P U B L I C 55

https://help.sap.com/viewer/disclaimer

go.sap.com/registration/contact.html

© 2017 SAP SE or an SAP affiliate company. All rights reserved.No part of this publication may be reproduced or transmitted in any form or for any purpose without the express permission of SAP SE or an SAP affiliate company. The information contained herein may be changed without prior notice.Some software products marketed by SAP SE and its distributors contain proprietary software components of other software vendors. National product specifications may vary.These materials are provided by SAP SE or an SAP affiliate company for informational purposes only, without representation or warranty of any kind, and SAP or its affiliated companies shall not be liable for errors or omissions with respect to the materials. The only warranties for SAP or SAP affiliate company products and services are those that are set forth in the express warranty statements accompanying such products and services, if any. Nothing herein should be construed as constituting an additional warranty.SAP and other SAP products and services mentioned herein as well as their respective logos are trademarks or registered trademarks of SAP SE (or an SAP affiliate company) in Germany and other countries. All other product and service names mentioned are the trademarks of their respective companies.Please see https://www.sap.com/corporate/en/legal/copyright.html for additional trademark information and notices.

https://go.sap.com/registration/contact.html

https://go.sap.com/registration/contact.html

https://www.sap.com/corporate/en/legal/copyright.html

sap vora installation and administration guide that the landscape architecture has changed...

Documents