scaleout hserverv2: enabling real-time analytics using hadoop map/reduce
DESCRIPTION
Welcome to real-time analytics for Hadoop! ScaleOut hServer V2 is the world's first in-memory execution engine for Hadoop MapReduce. Now you can analyze live data using standard Hadoop MapReduce code, in memory and in parallel without the need to install and manage the Hadoop stack of software. (Only one small change is needed to your Hadoop program.) Gone are disk I/O latencies, slow start-up times, and software environment management headaches. Benchmark tests have demonstrated 20x faster execution time over the Apache Hadoop distribution. Now you can use Hadoop MapReduce in live applications in financial services, e-commerce, logistics, and countless other scenarios where results are needed in seconds instead of minutes or hours. Learn more: http://www.scaleoutsoftware.com/products/scaleout-hserver Watch the presentation video: http://inside-bigdata.com/2013/10/15/enabling-real-time-analytics-using-hadoop-mapreduce/TRANSCRIPT
Enabling Real-Time Analytics Using Hadoop Map/Reduce
Copyright © 2013 by ScaleOut Software, Inc.
Briefing on New Product Release: ScaleOut hServer™ V2
October 14, 2013
Bill Bain, CEO ([email protected]) David Brinker, COO ([email protected])
2 ScaleOut Software, Inc.
ScaleOut hServer V2:
• World’s first Hadoop MapReduce engine integrated with a scalable, in-memory data grid
• Full Hadoop MapReduce support for “live” fast-changing data
• 20x performance improvement in benchmark tests
• Significant new technology to simplify development and maximize ease of use
What’s New Today
3 ScaleOut Software, Inc.
• Develops and markets software middleware for: • Scaling application performance and • Performing real-time analytics using • In-memory data storage and computing
• Executive Team:
• Dr. William Bain, Founder & CEO
• Career focused on parallel computing – Bell Labs, Intel, Microsoft
• 3 prior start-ups, last acquired by Microsoft and product now ships as Network Load Balancing in Windows Server
• David Brinker, COO
• 25 years software business and executive management experience
• Mentor Graphics, Cadence, Webridge
• Eight years market experience in Windows & Linux; 400 customers
About ScaleOut Software
4 ScaleOut Software, Inc.
• ScaleOut StateServer®
• In-Memory Data Grid for Windows and Linux
• Scales application performance.
• Industry-leading performance and ease of use
• ScaleOut GeoServer® adds • WAN based data replication for DR • Breakthrough technology for global
data access
• ScaleOut Analytics Server® adds • Real-time data analysis for “live” data
• Comprehensive management tools
• Introducing ScaleOut hServer™ V2 • Full Hadoop Map/Reduce engine (20X faster*) • Hadoop Map/Reduce on live, in-memory data
ScaleOut Software Products ScaleOut StateServer In-Memory Data Grid
GridService
GridService
GridService
GridService
*in benchmark testing
5 ScaleOut Software, Inc.
ScaleOut Analytics Server stores and analyzes “live” data:
• In-memory storage holds live data sets which are continuously updated and accessed within operational systems. • Examples: stock ticker data, business rules, order & inventory data
• Integrated analytics engine tracks important patterns & trends.
• Data-parallel analysis delivers results in msec. to seconds.
IMDGs Perform Real-Time Analytics
6 ScaleOut Software, Inc.
Integrate analysis into a stock trading platform:
• The IMDG holds market data and hedging strategies.
• Updates to market data continuously flow through the IMDG.
• The IMDG performs repeated map/reduce analysis on hedging strategies and alerts traders in real time.
• IMDG automatically and dynamically scales its throughput to handle new hedging strategies by adding servers.
Example in Financial Services
7 ScaleOut Software, Inc.
Example Uses
Online loan apps & banking
Portfolio management
Trading systems
Reservations systems
Ecommerce shopping
Customer service sites
Streaming entertainment
Configuration engines
Gaming
Customers • 400 unique customers • 35 Fortune 500 customers • 32 countries • 9,000 servers licensed • 50% have multiple deployments
% in $$s
Entertain.)&)Commun.
13%Financial)&)Insurance
26%Ecommerce)
Sales17%
Ecommerce)Services19%
Travel)&)Transport.
4%
Gov't)&)Education
10%
Software8%
Other3%
8 ScaleOut Software, Inc.
• In-Memory Data Grids have become key in several fast-growth markets.
• Drivers:
• Cloud computing / virtualization
• Hardware enablement
• Competitive pressure
• Exploding workloads
• Big data analysis
• ScaleOut addresses scalability and analytics.
IMDGs Seeing Wide Adoption
Sources: 1 Wikibon 2013 2 Gartner 2010, rolled fwd to 2013 3 Market Research Media 2015 rolled back to 2013 4. Gartner 2011 rolled fwd to 2013
Big Data Analytics $18B 1
Enterprise Software
$292B 2
HPC / Grid Computing
$25B 3
In-Memory Data Grids
$355M 4
9 ScaleOut Software, Inc.
Big Data Analytics $18B
Analytics Market
Static data sets Petabytes Disk storage Hours to minutes Best uses:
• Analyzing warehoused data
• Mining for long-term trends
Live data sets Gigabytes to terabytes In-memory storage Minutes to seconds Best uses:
• Tracking live data
• Immediately identifying trends and capturing opportunities
Analytics Server
hServer
Hadoop IBM
Teradata SAS SAP
Real-Time Batch
Real-time “Operational Intelligence”
Batch “Business Intelligence”
10 ScaleOut Software, Inc.
Run continuous Hadoop on live data, while it’s being updated.
Accelerate Hadoop on static data with a one line code change.
Quickly prototype Hadoop code.
ScaleOut hServer Targeted Use Cases
“Capture perishable business opportunities and identify issues.” Real-time risk
analysis Credit card fraud
detection
“Speed-up Hadoop execution by >10X for faster business insights.”
Process simulations
Financial modeling
“Validate your Hadoop code before it goes into batch processing.”
Fast-turn debug and tuning
No need to install Hadoop stack
...
...
...
11 ScaleOut Software, Inc.
• Typically used for very large, static, offline datasets
• Data must be copied from disk-based storage (e.g., HDFS) into memory for analysis.
• Hadoop Map/Reduce adds lengthy batch scheduling overhead.
Problem: Hadoop Cannot Efficiently Perform Real-Time Analytics
12 ScaleOut Software, Inc.
Benefits:
• Enables real-time analysis using Hadoop M/R APIs. • Accelerates data access by staging data in memory.
• Eliminates batch scheduling and data shuffling overheads of standard Hadoop distributions.
• Analyzes “live” data.
• Allows Hadoop M/R programs to run without change.
• Eliminates complexity in Hadoop deployment.
• Enables rapid prototyping.
Solution: Integrate Hadoop M/R into In-Memory Data Grid
13 ScaleOut Software, Inc.
Enables Hadoop Map/Reduce to perform real-time analysis:
• Adds full Map/Reduce engine to SOAS IMDG. • Delivers results in msec. to seconds instead of
minutes or hours. • Benchmark results show 20X speedup.
• Has flexible options for data storage/access: • Hadoop programs can access/store
key/value pairs using either IMDG or HDFS.
• Automatically caches HDFS data in IMDG for fast access.
• Allows dynamic updates to key/value pairs during analysis to support “live” data.
• Ships as open source Java library combined with SOAS IMDG.
Introducing ScaleOut hServer™ V2
14 ScaleOut Software, Inc.
• ScaleOut hServer adds Grid Record Reader for accessing key/value pairs held in the IMDG.
• Hadoop programs optionally can output results to IMDG with Grid Record Writer.
• Grid Record Reader optimizes access to key/value pairs to eliminate network overhead.
• Applications can access and update key/value pairs as operational data during analysis.
Enabling Access to IMDG Data
15 ScaleOut Software, Inc.
• ScaleOut hServer adds Dataset Record Reader (wrapper) to cache HDFS data during program execution.
• Hadoop automatically retrieves data from ScaleOut IMDG on subsequent runs.
• Dataset Record Reader stores and retrieves data with minimum network and memory overheads.
• Tests with Terasort benchmark have demonstrated 11X faster access latency over HDFS without IMDG.
Enabling Fast Access to HDFS Data
16 ScaleOut Software, Inc.
ScaleOut hServer Editions
• Offered in community and commercial editions
• Community Edition can be used for evaluation or production
• Hybrid open source / proprietary licensing
Editions
Community Commercial
# Servers Up to 4 100s
Expected data set size
256GB (max) GB - TBs
Pricing Free Subscription & perpetual
Support Community Forum
Full support
17 ScaleOut Software, Inc.
• IMDGs help scale application performance and analyze “live” data in real-time.
• Hadoop focuses on analyzing large, static (offline) datasets held in file systems.
• ScaleOut hServer V2 introduces breakthrough technology enabling Hadoop applications to perform real-time analytics: • Integrates Hadoop Map/Reduce engine with SOAS’s IMDG.
• Accelerates Map/Reduce execution by 20X in benchmark tests.
• Enables Hadoop applications to analyze “live,” in-memory data.
• Offers flexible access to both in-memory and file-based data.
• Eliminates complex Hadoop deployment and tuning.
• Offers a fast, easy-to-use platform for rapid prototyping.
Summary
18 ScaleOut Software, Inc.
A few examples: • Equity trading: to minimize risk during a trading day • Ecommerce: to optimize real-time shopping activity • Reservations systems: to identify issues, reroute, etc. • Credit cards: to detect fraud in real time • Smart grids: to optimize power distribution & detect issues
Online Systems Need Real-Time Analysis
19 ScaleOut Software, Inc.
• ScaleOut Software conducted informal survey at Strata 2013 Conference (Santa Clara).
• Based on 150 responses:
• 78% of organizations generate fast-changing data.
• 60% use Hadoop and 78% plan to expand usage of Hadoop within 12 months.
• Only 42% consider Hadoop to be an effective platform for real-time analysis, but…
• 93% would benefit from real-time data analytics.
• 71% consider a 10X improvement in performance meaningful.
• Take-away: Hadoop users need real-time analytics.
Hadoop Users Need Real-Time Analytics