analyze big data faster and store it cheaper - … big data faster and store it cheaper ... ¾over...

Analyze Big Data Faster and Store it Cheaper Dominick Huang – CenterPoint Energy Russell Hull - SAP

ABOUT CENTERPOINT ENERGY, INC.

¾ Publicly traded on New York Stock Exchange

¾ Headquartered in Houston, Texas

¾ Over 5000 square miles of electric transmission and distribution service area

¾ Assets total more than $22 billion

¾ Over 8,700 plus employees

¾ CNP & its predecessor companies in business for over 130 years

¾ Domestic Energy Delivery

¾ Operate, Serve, and Grow

¾ Smart Grid Enabled

¾ Twenty-Eight State Geography

¾ Over Five Million Metered Customers

¾ 2.3 million Smart Meters

¾ 4000 Miles of Transmission

¾ 47,000 Miles of Distribution

¾ Electric Transmission & Distribution

¾ Natural Gas Distribution

¾ Competitive Natural Gas Sales and Services

CenterPoint Energy Proprietary and Confidential

AGENDA

¾ Key Drivers and Strategy of HANA Initiative

¾ Use Case – Smart Meter Big Data Analytics

¾ Technology Overview

¾ POC Results

¾ Value and Comparison

KEY DRIVERS FOR HANA INITIATIVES

¾ SAP HANA as CNP strategic platform for critical transactional applications and Analytics

¾ Cost effective solution to manage and contain data storage growth

¾ Analytics platform simplification and consolidation to HANA

¾ Key technology enabler for future business solutions

¾ Maximize CNP investment on HANA license (40TB)

¾ Enable business resiliency implementation for CRM/ECC/BPC

¾ Leverage HANA in-memory capability for real time analytics

STRATEGY – 3 YEAR HANA ROADMAP

¾ Technical Migration and Consolidation ¾ Migrate critical business applications (SAP and Mainframe)

¾ Consolidate Analytics solutions (BW, ISAS, eMA, etc.) onto HANA

¾ HANA Platform Optimization ¾ Enhance performance of core business process and mass business functions

¾ Enable real-time reporting from the HANA (in-memory) database

¾ HANA Platform Innovation ¾ Innovative solutions to align with long-term business strategy and roadmap

¾ SIMPLE Finance, Predictive Asset Health Analytics, Situational Awareness, Internet of Things, Predictive Analytics for customer services, etc.

USE CASE – SMART METER BIG DATA ANALYTICS

BUSINESS CHALLENGE

• 1+ PB of SmartMeter Data • 2.3MM SmartMeters taking readings every 15 minutes

creating 225MM Readings per day, or over 800 Billion Readings in a Year.

• Regulatory requirements require historical readings to be available for 10 years.

• Uncompressed Data Growth of 8TB per month and over 1PB in a 10 year period.

• Current DW technology is approaching End of Life

• Massive amounts of data stored in proprietary vendor solution, was hard to manage and has a significantly high total cost of ownership.

• Need a cost effective solution for today's analytics, regulatory requirements and preparation for future use cases.

• Data is read and/or written frequently • In memory • No restrictions, all features available

• Infrequent access • On disk, no need to keep in memory all the time • No restrictions, all features available

• Sporadic access • Not stored in HANA DB; stored in Near-line Storage • Restricted to NLS capabilities

DATA TIER SOLUTION DATA VOLUME MANAGEMENT: MULTI TEMPERATURE DATA APPROACH

Non-Active Data Concept

Providing lower TCO by optimized data volume management

cold NLS Management for read-only data

2016 2017 2018 2019 2020 2021 2022 2023 2024 2025 2026$0

$2520162017201820192020202120222023202420252026

HANA O&M HANA Capital NZ O&M NZ Capital

280 380

480 580

680 780

880 980

1080 1180

2016 2017 2018 2019 2020 2021 2022 2023 2024 2025

9 CenterPoint Energy Proprietary and Confidential Information

Projected Total Spend (Cumulative & Estimated)

Projected Data Capacity (TB)

O&M Saving

Projected Growth Projected Savings

75% Capex and Opex saving

BUSINESS CASE – CAPEX & OPEX SAVINGS

Smart Meter Data grows more than 100TB/year, 1PB+ in 10 years

Capex Saving

Business as usual

Move to HANA/Hadoop

SOLUTION BENEFITS

¾ Cost effective HOT+WARM+COLD data management strategy leveraging HANA

data compression and data tiering technology ¾ Simplified Big Data ownership by combining SAP HANA, Dynamic Tiering and

Hadoop into a single landscape. ¾ Single Database Experience. Query Execution utilizes SDA and automatically

accesses data stored in HANA, Dynamic Tiering and Hadoop/Vora depending on location of data.

¾ Data Movement automated between storage tiers using the Database Lifecycle Manager (DLM).

¾ Foundation for advanced predictive analytics and future business capabilities ¾ Instant Real time Analytics via HANA ¾ 75% savings in storage cost compared to current solution.

¾ Data tiering technology (Dynamic Tiering, Hadoop) to manage data size and growth.

¾ Seamless integration with Hadoop integration allows for data scientist to use HANA toolset to access and manage Hadoop data

¾ Ability to charge business based on the data being stored and performance requirements

TECHNOLOGY REVIEW

SAP Big Data Platform

Tier 1 (SAN,..)

Tier 2 (Hadoop) Batch Layer

Tier 0 (Memory) Speed Layer

NEW SMART METER ANALYTICS ARCHITECTURE

26 months of data are stored in DT (Sybase IQ)

10 years of meter data is stored in Hadoop. The plan is to use SAP HANA Vora to access the data

13 months of data are stored in HANA for fast analytics

50TB Dynamic Tiering Extended Storage

36TB HANA EDW

Hadoop (Vora)

Planned Architecture

Netezza

Current Architecture

Application Business Objects / SAS / Custom Application

DYNAMIC TIERING

¾ SAP Dynamic Tiering is a warm store traditional disk based database system fully integrated into HANA.

¾ Based upon Sybase IQ: Column Store & Disk based ¾ Reduced TCO by lowering HANA memory footprint ¾ All HANA functions are available. Read/Write/Update ¾ Single Database experience: All DB access requests are managed

through the HANA platform. ¾ Centralized operation control: All administration tasks are handled

through the HANA interface.

SAP HANA DYNAMIC TIERING DISK-BACKED COLUMN STORE EXTENSION TO HANA FOR WARM DATA MANAGEMENT

WHAT IS APACHE HADOOP?

HADOOP TECHNICAL ARCHITECTURE – HADOOP CLUSTER

SAP VORA - HANA/HADOOP INTEGRATION WHAT’S INSIDE AND WHAT DOES IT DO?

Democratize Data Access

Make Precision Decisions

Simplify Big Data Ownership

SAP HANA Vora is an in-memory query engine which leverages and extends the Apache Spark execution framework to provide enriched interactive analytics on Hadoop. Drill Downs on HDFS

Mashup API Enhancements

Compiled Queries HANA-Spark Adapter

Unified Landscape Open Programming

Any Hadoop Clusters

SAP DATA LIFECYCLE MANAGER (DLM)

SAP DATA TIERING ARCHITECTURE

Processing Engines

Index Server

In-Memory Stores

Dynamic Tiering XS Engine

Data Lifecycle Manager (DLM)

SDA (Virtual Table)

Files Extended Storage Files Files

Hadoop

HANA Spark Controller

Spark SQL

Upload Table into Vora

DLM Writes Data to ORC File

DLM Reads Data from HANA

DLM Writes Data

POC REVIEW

POC OBJECTIVES

¾ Research and test SAP HANA Data Tiering technology, i.e. DLM (Data Life Cycle Management), Dynamic Tiering, Vora Hadoop Integration

¾ Evaluate Hadoop technology, understand Hadoop ecosystem and TCO ¾ Test SAP VORA - HANA and Hadoop integration technology ¾ Develop and validate solution options for several critical 2016 projects: Smart

Meter Analytics, customer document repository for Mainframe Migration ¾ Build CNP in-house expertise in Hadoop and SAP HANA/Hadoop integration

technology ¾ Identify use case and innovation opportunities at CNP

POC ENVIRONMENT AND TEST CASES

¾ POC Team • CenterPointEnergy (Lead and Architects); SAP (CoE, PE, Global ITP); HP (Hardware); IBM(IBM Hadoop and

Cloud)

¾ Environment ¾ Hardware • HP Lab: Hadoop 12 nodes cluster, CS500 HANA 2TB, HANA Dynamic Tiering Node • IBM BigInsights Cloud ¾ Software • SAP HANA SPS 10, DLM, Dynamic Tiering, VORA • Hortonworks HDP Hadoop, RedHat Linux • IBM Apache Hadoop with BigSQL

¾ Test Cases • Data Load - Extract 800GB, 7 Billion Smart Meter records from Netezza and ISAS, load data into HANA

(Meter data scrambled to protect data security) • DLM – Use DLM tool to move data from HANA to Dynamic Tiering Extended Storage and Hadoop • Run queries across all data tiers and measure performance • Load, query and display 19 million PDFs of Customer Bills (Dummy PDF files used, no real customer data)

POC SUCCESS CRITERIA

¾ Data Tiering – Move data among different tiers including HANA, DT and Hadoop ¾ Run SQL queries within and across data tiers ¾ Performance – Measure response time for each data tier ¾ Data Compression – evaluate compression ratio of HANA, DT and Hadoop ¾ SAP DLM – Utilize the tool to move data from Hot to Warm and Cold tier ¾ Customer document storage – Store and retrieve PDF documents with one second ¾ Comparison of storage costs: HANA, DT (Dynamic Tiering Extended Storage) and

Hadoop

POC TEST RESULTS

Hadoop HANA / DT / Spark/ Vora DLM

HDP Customer Bill Store and Retrieval

Æ 40ms response time to search and display a document from 19 million PDFs

HDP Batch data load via SQOOP into Hadoop

Æ 4 min 24s to load 2.5 million records (single thread);1 min 10s (10 threads)

Data load from HANA to HDP Hadoop via VORA

Æ Total of 6.2GB ORC files stored in HDFS against original size of 172GB.

Æ Compression Rate: 9 (3 copies in HDFS)

Move data from HANA to DT Æ 289 million records moved from HANA to DT

Æ 670K records per minute

Move data from HANA to Hadoop via VORA into HDFS Æ 1.57 billion records moved from HANA to Hadoop

Æ 22 million records per minute

Run aggregation query across SAP HANA, HDP Hadoop & DT (~4 billion records):

0.2 2.6

50100150200250300350400

Query Response Time [s]

VALUE AND COMPARISON BETWEEN DATA TIERS

COMPARISON BETWEEN DATA TIERS

Component Performance Cost Factor Volume Processing

Up to 10s TBs (no technical limit)

• ACID compliant

• SQL, SQLscript, graph, time series, spatial, text, …

Dynamic Tiering or Sybase IQ

$$ 100s of TB integrated in HANA

Several PBs with Sybase IQ

• ACID compliant

• SQL

Hadoop – Spark/Vora $ 100s of PB or more

• ANSI SQL compliant

• Read-only SQL when used from HANA via SDA

• 15 times less expensive than T1 storage

• Transformations and Actions

• Performance can be improved significantly by increasing compute nodes and using SSD with higher cost

Hadoop – Vora in Memory $$

100s of TB (depending on available memory in Hadoop cluster)

• Data loaded in memory to achieve better performance

• Read-only SQL when used from HANA via SDA

RECOMMENDED USE CASES – SHORT TERM

Component Recommended Use Case

• Managing up to several TBs of high value data • Very high processing performance required • SAP HANA native processing features (PAL,..) required • OLTP with many fine-granular updates needed

Dynamic Tiering

• Managing up to several PBs of data at T2/T3 storage cost • High performance for complex queries required • Deep SAP HANA integration required (single database experience) • Updates and deletes required

Hadoop - Spark

• Managing up to 100s PBs of data at T4 storage cost, 15 times less expensive than T1 storage

• Read-only sufficient (bulk load, no fine granular writes) • Comparatively low-cost storage important • Loose integration of administration and life-cycle management acceptable

Hadoop - Vora • High OLAP query performance on Hadoop • Additional query features (hierarchies)

THANK YOU

• Contact information:

• Dominick Huang

• Sr. Manager, Enterprise Technology & Architecture

• CenterPoint Energy

• Yong.huang@centerpointenergy.com Tel 713-207-6659

Russell Hull

Chief Support Architect

SAP America

Russell.hull@sap.com

Thank you for your time Follow us on at @ASUG365

APPENDIX

CNP HANA LANDSCAPE - ANALYTICS (BW + OW)

Existing blade

New HP Node

0.5TB 0.5TB

0.5TB 0.5TB 0.5TB 0.5TB

0.5TB 0.5TB 0.5TB

HIP(PRD) 36TB (Memory)

Situation Awareness,

MfM Testing & other Apps

4.5TBs

2 TB Failover blade

HIS (SBX)

0.25TB

HIQ(QA) HID(DEV)

Analytics (BW + OW) ES Extended Storage

(NLS/DT/Hadoop)

ES(NLS/DT/Hadoop)

HADOOP ARCHITECTURE

analyze big data faster and store it cheaper - … big data faster and store it cheaper ... ¾over...

Documents

geek sessions - cheaper, better, stronger, faster

more, faster, harder, cheaper?

clustering with k -means: faster, smarter, cheaper

online collaboration - better, faster, cheaper

user stories. develop better products faster and cheaper

create online content better, faster and cheaper

yamz.net: better, faster, cheaper taxonomy building

does16 london - better faster cheaper .. how?

faster, cheaper, safer -...

cheaper, faster, cleaner - kirkensnodhjelp.no

better cheaper faster board-ceo partnership for change

delivering enterprise applications: faster. cheaper. better

cheaper, faster, better dita implementations, part 2

analyze big data faster and store it cheaper - squarespace...

cleaner, cheaper and faster

agile 101 - building software faster, cheaper & better

projects better faster cheaper

reforming federal hiring - beyond faster and cheaper

faster, better, cheaper just with 1 click

faster, cheaper , safer : public policy for the internet