analyze big data faster and store it cheaper - … big data faster and store it cheaper ... ¾over...
Post on 15-Mar-2018
222 Views
Preview:
TRANSCRIPT
ABOUT CENTERPOINT ENERGY, INC.
¾ Publicly traded on New York Stock Exchange
¾ Headquartered in Houston, Texas
¾ Over 5000 square miles of electric transmission and distribution service area
¾ Assets total more than $22 billion
¾ Over 8,700 plus employees
¾ CNP & its predecessor companies in business for over 130 years
¾ Domestic Energy Delivery
¾ Operate, Serve, and Grow
¾ Smart Grid Enabled
¾ Twenty-Eight State Geography
¾ Over Five Million Metered Customers
¾ 2.3 million Smart Meters
¾ 4000 Miles of Transmission
¾ 47,000 Miles of Distribution
¾ Electric Transmission & Distribution
¾ Natural Gas Distribution
¾ Competitive Natural Gas Sales and Services
CenterPoint Energy Proprietary and Confidential
AGENDA
¾ Key Drivers and Strategy of HANA Initiative
¾ Use Case – Smart Meter Big Data Analytics
¾ Technology Overview
¾ POC Results
¾ Value and Comparison
KEY DRIVERS FOR HANA INITIATIVES
¾ SAP HANA as CNP strategic platform for critical transactional applications and Analytics
¾ Cost effective solution to manage and contain data storage growth
¾ Analytics platform simplification and consolidation to HANA
¾ Key technology enabler for future business solutions
¾ Maximize CNP investment on HANA license (40TB)
¾ Enable business resiliency implementation for CRM/ECC/BPC
¾ Leverage HANA in-memory capability for real time analytics
STRATEGY – 3 YEAR HANA ROADMAP
¾ Technical Migration and Consolidation ¾ Migrate critical business applications (SAP and Mainframe)
¾ Consolidate Analytics solutions (BW, ISAS, eMA, etc.) onto HANA
¾ HANA Platform Optimization ¾ Enhance performance of core business process and mass business functions
¾ Enable real-time reporting from the HANA (in-memory) database
¾ HANA Platform Innovation ¾ Innovative solutions to align with long-term business strategy and roadmap
¾ SIMPLE Finance, Predictive Asset Health Analytics, Situational Awareness, Internet of Things, Predictive Analytics for customer services, etc.
BUSINESS CHALLENGE
• 1+ PB of SmartMeter Data • 2.3MM SmartMeters taking readings every 15 minutes
creating 225MM Readings per day, or over 800 Billion Readings in a Year.
• Regulatory requirements require historical readings to be available for 10 years.
• Uncompressed Data Growth of 8TB per month and over 1PB in a 10 year period.
• Current DW technology is approaching End of Life
• Massive amounts of data stored in proprietary vendor solution, was hard to manage and has a significantly high total cost of ownership.
• Need a cost effective solution for today's analytics, regulatory requirements and preparation for future use cases.
CenterPoint Energy Proprietary and Confidential
• Data is read and/or written frequently • In memory • No restrictions, all features available
• Infrequent access • On disk, no need to keep in memory all the time • No restrictions, all features available
• Sporadic access • Not stored in HANA DB; stored in Near-line Storage • Restricted to NLS capabilities
DATA TIER SOLUTION DATA VOLUME MANAGEMENT: MULTI TEMPERATURE DATA APPROACH
Non-Active Data Concept
Providing lower TCO by optimized data volume management
hot
warm
cold NLS Management for read-only data
2016 2017 2018 2019 2020 2021 2022 2023 2024 2025 2026$0
$5
$10
$15
$20
$2520162017201820192020202120222023202420252026
Mill
ions
HANA O&M HANA Capital NZ O&M NZ Capital
280 380
480 580
680 780
880 980
1080 1180
0
200
400
600
800
1000
1200
1400
2016 2017 2018 2019 2020 2021 2022 2023 2024 2025
9 CenterPoint Energy Proprietary and Confidential Information
Projected Total Spend (Cumulative & Estimated)
Projected Data Capacity (TB)
O&M Saving
Projected Growth Projected Savings
75% Capex and Opex saving
BUSINESS CASE – CAPEX & OPEX SAVINGS
Smart Meter Data grows more than 100TB/year, 1PB+ in 10 years
Capex Saving
Business as usual
Move to HANA/Hadoop
SOLUTION BENEFITS
¾ Cost effective HOT+WARM+COLD data management strategy leveraging HANA
data compression and data tiering technology ¾ Simplified Big Data ownership by combining SAP HANA, Dynamic Tiering and
Hadoop into a single landscape. ¾ Single Database Experience. Query Execution utilizes SDA and automatically
accesses data stored in HANA, Dynamic Tiering and Hadoop/Vora depending on location of data.
¾ Data Movement automated between storage tiers using the Database Lifecycle Manager (DLM).
¾ Foundation for advanced predictive analytics and future business capabilities ¾ Instant Real time Analytics via HANA ¾ 75% savings in storage cost compared to current solution.
¾ Data tiering technology (Dynamic Tiering, Hadoop) to manage data size and growth.
¾ Seamless integration with Hadoop integration allows for data scientist to use HANA toolset to access and manage Hadoop data
¾ Ability to charge business based on the data being stored and performance requirements
Tier 1 (SAN,..)
Tier 2 (Hadoop) Batch Layer
Tier 0 (Memory) Speed Layer
NEW SMART METER ANALYTICS ARCHITECTURE
3
26 months of data are stored in DT (Sybase IQ)
2
10 years of meter data is stored in Hadoop. The plan is to use SAP HANA Vora to access the data
13 months of data are stored in HANA for fast analytics
1
50TB Dynamic Tiering Extended Storage
36TB HANA EDW
Hadoop (Vora)
750TB
Planned Architecture
Stor
age
Tier
s (C
osts
and
Per
form
ance
)
Netezza
zOS
Current Architecture
1
2
3
Application Business Objects / SAS / Custom Application
Aggr
egat
ion
Agi
ng
DLM
DYNAMIC TIERING
¾ SAP Dynamic Tiering is a warm store traditional disk based database system fully integrated into HANA.
¾ Based upon Sybase IQ: Column Store & Disk based ¾ Reduced TCO by lowering HANA memory footprint ¾ All HANA functions are available. Read/Write/Update ¾ Single Database experience: All DB access requests are managed
through the HANA platform. ¾ Centralized operation control: All administration tasks are handled
through the HANA interface.
SAP VORA - HANA/HADOOP INTEGRATION WHAT’S INSIDE AND WHAT DOES IT DO?
Democratize Data Access
Make Precision Decisions
Simplify Big Data Ownership
SAP HANA Vora is an in-memory query engine which leverages and extends the Apache Spark execution framework to provide enriched interactive analytics on Hadoop. Drill Downs on HDFS
Mashup API Enhancements
Compiled Queries HANA-Spark Adapter
Unified Landscape Open Programming
Any Hadoop Clusters
SAP DATA TIERING ARCHITECTURE
HANA
Processing Engines
Index Server
In-Memory Stores
Dynamic Tiering XS Engine
Data Lifecycle Manager (DLM)
SDA (Virtual Table)
HDFS
Files Extended Storage Files Files
Spark
Hadoop
HANA Spark Controller
Spark SQL
Upload Table into Vora
DLM Writes Data to ORC File
DLM Reads Data from HANA
Vora
DLM Writes Data
POC OBJECTIVES
¾ Research and test SAP HANA Data Tiering technology, i.e. DLM (Data Life Cycle Management), Dynamic Tiering, Vora Hadoop Integration
¾ Evaluate Hadoop technology, understand Hadoop ecosystem and TCO ¾ Test SAP VORA - HANA and Hadoop integration technology ¾ Develop and validate solution options for several critical 2016 projects: Smart
Meter Analytics, customer document repository for Mainframe Migration ¾ Build CNP in-house expertise in Hadoop and SAP HANA/Hadoop integration
technology ¾ Identify use case and innovation opportunities at CNP
POC ENVIRONMENT AND TEST CASES
¾ POC Team • CenterPointEnergy (Lead and Architects); SAP (CoE, PE, Global ITP); HP (Hardware); IBM(IBM Hadoop and
Cloud)
¾ Environment ¾ Hardware • HP Lab: Hadoop 12 nodes cluster, CS500 HANA 2TB, HANA Dynamic Tiering Node • IBM BigInsights Cloud ¾ Software • SAP HANA SPS 10, DLM, Dynamic Tiering, VORA • Hortonworks HDP Hadoop, RedHat Linux • IBM Apache Hadoop with BigSQL
¾ Test Cases • Data Load - Extract 800GB, 7 Billion Smart Meter records from Netezza and ISAS, load data into HANA
(Meter data scrambled to protect data security) • DLM – Use DLM tool to move data from HANA to Dynamic Tiering Extended Storage and Hadoop • Run queries across all data tiers and measure performance • Load, query and display 19 million PDFs of Customer Bills (Dummy PDF files used, no real customer data)
POC SUCCESS CRITERIA
¾ Data Tiering – Move data among different tiers including HANA, DT and Hadoop ¾ Run SQL queries within and across data tiers ¾ Performance – Measure response time for each data tier ¾ Data Compression – evaluate compression ratio of HANA, DT and Hadoop ¾ SAP DLM – Utilize the tool to move data from Hot to Warm and Cold tier ¾ Customer document storage – Store and retrieve PDF documents with one second ¾ Comparison of storage costs: HANA, DT (Dynamic Tiering Extended Storage) and
Hadoop
POC TEST RESULTS
Hadoop HANA / DT / Spark/ Vora DLM
HDP Customer Bill Store and Retrieval
Æ 40ms response time to search and display a document from 19 million PDFs
HDP Batch data load via SQOOP into Hadoop
Æ 4 min 24s to load 2.5 million records (single thread);1 min 10s (10 threads)
Data load from HANA to HDP Hadoop via VORA
Æ Total of 6.2GB ORC files stored in HDFS against original size of 172GB.
Æ Compression Rate: 9 (3 copies in HDFS)
Move data from HANA to DT Æ 289 million records moved from HANA to DT
Æ 670K records per minute
Move data from HANA to Hadoop via VORA into HDFS Æ 1.57 billion records moved from HANA to Hadoop
Æ 22 million records per minute
Run aggregation query across SAP HANA, HDP Hadoop & DT (~4 billion records):
0.2 2.6
360
19 0
50100150200250300350400
Resp
onse
Tim
e [s
]
Query Response Time [s]
COMPARISON BETWEEN DATA TIERS
Component Performance Cost Factor Volume Processing
HANA
$$$$
Up to 10s TBs (no technical limit)
• ACID compliant
• SQL, SQLscript, graph, time series, spatial, text, …
Dynamic Tiering or Sybase IQ
$$ 100s of TB integrated in HANA
Several PBs with Sybase IQ
• ACID compliant
• SQL
Hadoop – Spark/Vora $ 100s of PB or more
• ANSI SQL compliant
• Read-only SQL when used from HANA via SDA
• 15 times less expensive than T1 storage
• Transformations and Actions
• Performance can be improved significantly by increasing compute nodes and using SSD with higher cost
Hadoop – Vora in Memory $$
100s of TB (depending on available memory in Hadoop cluster)
• Data loaded in memory to achieve better performance
• Read-only SQL when used from HANA via SDA
RECOMMENDED USE CASES – SHORT TERM
Component Recommended Use Case
HANA
• Managing up to several TBs of high value data • Very high processing performance required • SAP HANA native processing features (PAL,..) required • OLTP with many fine-granular updates needed
Dynamic Tiering
• Managing up to several PBs of data at T2/T3 storage cost • High performance for complex queries required • Deep SAP HANA integration required (single database experience) • Updates and deletes required
Hadoop - Spark
• Managing up to 100s PBs of data at T4 storage cost, 15 times less expensive than T1 storage
• Read-only sufficient (bulk load, no fine granular writes) • Comparatively low-cost storage important • Loose integration of administration and life-cycle management acceptable
Hadoop - Vora • High OLAP query performance on Hadoop • Additional query features (hierarchies)
THANK YOU
• Contact information:
• Dominick Huang
• Sr. Manager, Enterprise Technology & Architecture
• CenterPoint Energy
• Yong.huang@centerpointenergy.com Tel 713-207-6659
Russell Hull
Chief Support Architect
SAP America
Russell.hull@sap.com
CNP HANA LANDSCAPE - ANALYTICS (BW + OW)
0.5TB
2TB
Existing blade
New HP Node
0.5TB 0.5TB
0.5TB 0.5TB 0.5TB 0.5TB
0.5TB 0.5TB 0.5TB
2TB
HIP(PRD) 36TB (Memory)
Situation Awareness,
MfM Testing & other Apps
4.5TBs
2 TB Failover blade
HIS (SBX)
0.25TB
HIQ(QA) HID(DEV)
12TB
Analytics (BW + OW) ES Extended Storage
(NLS/DT/Hadoop)
2TB
2TB
2TB
2TB
2TB
2TB
2TB
2TB
2TB
2TB
2TB
2TB
2TB
2TB
2TB
2TB
2TB
ES(NLS/DT/Hadoop)
2TB
2TB
2TB
2TB
2TB
2TB
ES(NLS/DT/Hadoop)
top related