sap hana vora sitmty 20160707

SAP HANA VoraBridging the gap between Corporate and Big DataHenrique Pinto, Global HANA COEJuly 2016

© 2016 SAP SE or an SAP affiliate company. All rights reserved. 2

The Five Megatrends Driving Our Digitized WorldAnd Their Implications for Distributed Big Data Management

Hyper ConnectivityEverybody has

access

Super Computing

Super computers power everywhere

Cloud Computing

The cloud is where we compute

Smart World

Your fridge knows what you want for

dinner

Cyber-Security

High-powered security is now the

norm


Hadoop and Spark at a Glance

What is it?• Scalable fault-tolerant and distributed file system• Sits on top of a native file system• HDFS (Hadoop File System) is an append-only

file system, designed for batch, not real-time• Splits files in blocks and distributes them to data

nodes

Why?• Organizations want more business value from Big

Data• Hadoop configurations scale and perform at very

low cost• Hadoop complements Data Warehouses, Data

Integration and Analytics, but doesn’t replace them

Data Processing• MapReduce was invented to query data

residing in a Hadoop file system• MapReduce was not designed for interactive

queries but long running batch jobs• For more details see

http://hortonworks.com/hadoop/mapreduce/

• An open source in-memory analytics execution engine for fast, large-scale data processing

• Used on top of Hadoop• Does not replace Hadoop• Built to replace MapReduce

Hadoop and Spark at a Glance


What’s Stopping Us?The Digital Divide between Enterprise and Big Data

© 2016 SAP SE or an SAP affiliate company. All rights reserved. 4Internal

Too Complex Too Slow Unable to Work Together

ENTERPRISE BIG DATA


ENTERPRISE BIG DATA

Bridging the Digital Divide

Introducing

SAP HANA Vora



SAP HANA VoraWhat’s Inside and What Does It Do?

DemocratizeData Access

Make PrecisionDecisions

SimplifyBig DataOwnership

SAP HANA Vora is an in-memory query engine which leverages and extends the Apache Spark execution framework to provide enriched interactive analytics on Hadoop. Drill Downs on HDFS

Mashup API EnhancementsCompiled Queries

HANA-Spark ControllerUnified LandscapeOpen Programming

Any Hadoop Clusters


YARN

HDFS

Enable Precision DecisionsWith Contextual Insights In Enterprise Systems

Other Apps

Files Files Files

HANA-Spark Controller for improved performance between distributed systems

Gain business coherence with business data and big data

Compiled queries enable applications & data analysis to work more efficiently across nodes

Familiar OLAP experience on Hadoop to derive business insights from big data such as drill-down into HDFS data

Compiled Queries

Spark Controller

Drill Downs

SAP HANA in-memory platform

Vora

Spark

Vora

SparkIn-Memory

StoreApplication Services

Database Services

Integration Services

Processing Services

SAP HANA Platform

Vora

SparkHANA Smart Data

Access Spark Controller


Democratize Data Access for Data Science Discovery

Extensive programming support for Scala, python, C, C++, R, and Java allow data scientists to use their tool of choice,

Pursue new inquiries without compromise on data and easily integrate these insights with all data

Enable data scientists and developers who prefer Spark R, Spark ML to mash up corporate data with Hadoop/Spark data easily

Optionally, leverage HANA’s multiple data processing engines for developing new insights from business and contextual data.

Mashup Enhancements

Open Programming

Optional Use of SAP HANA for Delegated, multi-engine pre-processing

Spark Data-source API enhancement

In-Memory Store

SAP HANA Platform

YARN

HDFSFiles Files Files

Vora

Spark

Vora

Spark

Vora

Spark

Application Services

Database Services

Integration Services

Processing Services


Vora Modeling Tool

• Vora Tools use the Thriftserver to provide access to the Modeler underhttp://<DNS_NAME_OF_JUMPBOX_NODE>:9225

• Perspectives:• Data Browser

• SQL Editor

• Modeler


SAP HANA Vora: Use Cases

Fraud Detection

Get access to all your data including historical and contextual trends and current business datato analyze anomalies

Risk Mitigation

Be assured of more precise data to perform

Monte Carlo simulations to produce distributions of

possible outcome values with more precise context

Targeted Marketing Campaigns

React rapidly to customer sentiment and pinpoint targeting for sales and

marketing campaigns with a more complete view of

customer needs and wants

360° Customer Service

Ensure a more complete picture of the customer with

analysis of unstructured customer data, such as social media profiles, emails, calls, complaint logs, discussion forums, and website history


Challenges Solution Why Vora• Current DW with more than 100TB of

data at end of life and not cost effective anymore

• Regulatory requirement to retain data for 10 years

• SAP HANA for most recent data, Hadoop for historical data

• SAP HANA Vora accesses and queries data across all tiers

• SAP HANA Vora provides enterprise analytics & OLAP like experience across data warehouse and HDFS.

• Perform detailed predictive analytics throughout the manufacturing processes based on sensor data

• More than 1PB of data

• SAP HANA Vora rapidly processes sensor data in HDFS and combines it with data in SAP HANA for predictive analytics

• SAP HANA Vora processing of HDFS data combined with HANA data reduced query runtime dramatically

• Demand forecast accuracy for flu related products is relatively low

• Difficult to detect and react quickly/intelligently to anticipate demand spikes created by outbreaks

• Data Lake using Vora combining internal and external data sources (Internal- shipment, External –Weather, Twitter, Google Search, Center of Disease Control)

• SAP HANA Vora enables fast analysis and forecasting of all types of data in HDFS

DW / Tiering

IoT

DataLake

Use Cases of Existing Vora Customers


SAP HANA Multi-temperature Data ManagementBig Data: HANA In-Memory + HANA Dynamic Tiering + Hadoop

• Modern in-memory platform

• Transact/analyze data in real-time

• Native predictive, text, graph and spatial algorithms

• Real time analytics on top of streaming data

• Disk backed, smart column store

• High performance and efficient compression

• Transparent for all operations. No changes required for BW operations

• Excels at queries on structured data from terabyte to petabyte scale

• No data duplication

Hot Data

HANA In-Memory

Warm Data

HANA DT

• Hadoop virtualization possible with Smart Data Access (read only), via Hive or Spark (SP10+)

• Also possible to access HDFS & MR Jobs directly via HANA vUDFs, which can be embedded in SQL queries

• Future roadmap and new functionalities available on top of SAP HANA Vora:

• Native bi-directional communication between HANA & Hadoop via Spark for fast analytical scenarios

• Added ”BI-like” features on top of Hadoop (Hierarchies, UoM & currency conversions, etc.)

Hadoop

Cold Data


Data Tiering w/ HANA & VoraComparison of the different strategies

Component Performance Cost Factor Volume Processing

HANAIn-Memory

$$$$ (4 out of 4) Up to several TBs (no technical limit)

• ACID compliant• SQL, SQLScript, predictive,

time series, spatial, text, …

HANADynamic Tiering $$$ (3 out of 4) 100s of TB natively

integrated in HANA• ACID compliant• SQL

HadoopVora $$ (2 out of 4)

100s of TB (depending on available memory in Hadoop cluster)

• In-memory OLAP engine for Hadoop

• Compiled SQL code

HadoopSpark $ (1 out of 4) 100s of PB or more

• General Purpose In-memory engine

• Transformations and Actions

(4 out of 4)

(3 out of 4)

(1 out of 4)

(2 out of 4)


Vora 1.3 Highlights (Beta Program)

• Simplified installation• Enhanced modeler• New engines (graph, time-series, doc store, disk store)• Kerberos support• UoM conversion, currency conversion


� Graph engine – SAP HANA Vora embeds an in-memory graph database for real-time graph analysis. The primary focus is on complex read-only analytical queries on very large graphs.

� Time Series – SAP HANA Vora provides a highly-distributed time series analysis engine which supports storing and analyzing time series data. By enabling efficient (memory and speed) time series compression and supporting features like standard aggregation, granularization, and advanced analysis; SAP HANA Vora allows you to join the relational data with series data to build efficient SQL models in Hadoop and other Big Data environments

� Document Store – SAP HANA Vora introduces NoSQL features like storing JSON documents using the new Document Store as part of the SAP HANA Vora 1.3 release. The new DocStore supports schema-less tables, allowing you to flexibly add or remove fields from any documents and helps scale horizontally

� Disk store – SAP HANA Vora provides relation capabilities without loading all the data into memory due to the data size

SAP HANA Vora – Latest innovations

-30

-20

-10

0

10

Temperature °C

Halifax Waterloo

SAP HANA Platform

The SAP focus: End-to-end value chain

SPATIAL PROCESSING

ANALYTICS, TEXT, GRAPH, PREDICTIVE

ENGINES

CONSUME

COMPUTE

STORAGE

SOURCE

INGEST

Application Development Environment

Transformations & Cleansing

Smart Data IntegrationSmart Data Quality

StreamProcessing

Smart Data Streaming

STREAM PROCESSING

LogsTextOLTP Social MachineGeoERP SensorStore & forward

Mobile applications and BI

Smart Data Access

Virtual Tables

User Defined Functions

101010010101101001110

Dynamic Tiering

Aged datain Disk

In-Memory

Data model& data

Calculation engine

Fastcomputing

Column Storage

High performance analytics

Series Data Storage

Store time-series data

Reporting &Dashboards

High Performance Applications

Data Exploration& Visualization

Adhoc & OLAP Analytics

PredictiveAnalysis

Business Planning & Forecasting Lumira / BI

But there is more work to do…

Hadoop / Vora

MapReduce

YARN

HDFS

sap hana vora sitmty 20160707

Technology