big data predictive analytics with revolution r enterprise (gartner bi summit 2014)
DESCRIPTION
Presented by David Smith, Chief Community Officer, Revolution Analytics at Garner Business Intelligence and Analytics Summit, April 2014. In this presentation, I'll introduce the open source R language — the modern standard for Data Science — and the enhanced performance, scalability and ease-of-use capabilities of Revolution R Enterprise. Customer case studies will illustrate Revolution R Enterprise as a component of the real-time analytics deployment process, via integration with Hadoop, database warehousing systems and Cloud platforms, to implement data-driven end-user applications.TRANSCRIPT
Big Data Predictive Analyticswith Revolution R EnterpriseDavid Smith
Gartner BI Conference, April 2014
Chief Community Officer@revodavid
2
OUR COMPANY
The leading providerof advanced analytics software and services
based on open source R, since 2007
OUR SOFTWARE
The only Big Data, Big Analytics software platform based on the data science
language R
KUDOS
VisionaryGartner Magic Quadrantfor Advanced Analytics
Platforms, 2014
What is R? Most widely used data analysis software
• Used by 2M+ data scientists, statisticians and analysts Most powerful statistical programming language
• Flexible, extensible and comprehensive for productivity Create beautiful and unique data visualizations
• As seen in New York Times, Twitter and Flowing Data Thriving open-source community
• Leading edge of analytics research Fills the talent gap
• New graduates prefer R
R is Hotbit.ly/r-is-hot
WHITE PAPER
Exploding growth and demand for R
R is the highest paid IT skill R most-used data science language
after SQL R is used by 70% of data miners R is #15 of all programming languages R growing faster than any other data
science language R is the #1 Google Search for
Advanced Analytics software R has more than 2 million users
worldwide
R Usage GrowthRexer Data Miner Survey, 2007-2013
70% of data miners report using R
R is the first choice of moredata miners than any other software
Source: www.rexeranalytics.com
5
Technical Support for Open Source RAdviseR™ from Revolution Analytics
Technical support for open source R, from the R experts.
24x7 email and phone support On-line case management and knowledgebase Access to technical resources, documentation and user forums Exclusive on-line webinars from community experts Guaranteed response times
Also available: expert hands-on and on-line training for R, from Revolution Analytics AcademyR.
www.revolutionanalytics.com/AdviseRwww.revolutionanalytics.com/AcademyR
Revolution R Enterprise
High Performance, Scalable Analytics Portable Across Enterprise Platforms Easier to Build & Deploy Analytics
is….the only big data big analytics platform based on open source R
6
7
Big Data In-memory bound Hybrid memory & disk scalability
Operates on bigger volumes & factors
Speed of Analysis
Single threaded Parallel threading Shrinks analysis time
Enterprise Readiness
Community support Commercial support Delivers full service production support
Analytic Breadth & Depth
5000+ innovative analytic packages
Leverage open source packages plus Big Data ready packages
Supercharges R
Commercial Viability
Risk of deployment of open source
GPL-compatible licensing
Eliminate risk with open source
Enhancing Open Source R for the Enterprise
COMBINE INTERMEDIATE RESULTS
8
Powering Next Generation AnalyticsParallel External Memory Algorithms
9
Unique PEMAs: Parallel, external-memory algorithms
High-performance, scalable replacements for R/SAS analytic functions
Parallel/distributed processing eliminates CPU bottleneck
Data streaming eliminates memory size limitations
Works with in-memory and disk-based architectures
Eliminates Performance and Capacity Limits of Open Source R and Legacy SAS
All of Open Source R plus: Big Data scalability High-performance analytics Development and deployment
tools Data source connectivity Application integration framework Multi-platform architecture Support, Training and Services
10
is the Big Data Big Analytics Platform
R+C
RA
N
Rev
oR
DistributedR
ScaleR
ConnectR
DeployRDevelopR
DESIGNED FOR SCALE, PORTABILITY & PERFORMANCE
In the Cloud Amazon AWS
Workstations & Servers WindowsRed Hat and SUSE Linux
Clustered Systems IBM Platform LSFMicrosoft HPC
EDW IBM NetezzaTeradata
Hadoop HortonworksCloudera
11
Write Once.Deploy Anywhere.
Write Once Deploy Anywhere
rxSetComputeContext("local") # DEFAULT
rxSetComputeContext(RxHadoopMR(<data, server environment arguments>))
# Summarize and calculate descriptive statistics from the data airDS data setadsSummary = rxSummary(~ArrDelay+CRSDepTime+DayOfWeek, data = airDS)
# Fit Linear Regression Model arrDelayLm1 = rxLinMod(ArrDelay ~ DayOfWeek, data = airDS); summary(arrDelayLm1)
rxSetComputeContext(RxHpcServer(<data, server environment arguments>))
rxSetComputeContext(RxLsfCluster(<data, server environment arguments>))
Same code to be run anywhere …..
Local System (default)
Set the desired compute context for code execution…..
rxSetComputeContext(RxTeradata(<data, server environment arguments>))
13
In-Hadoop Big Data Big Analytics
Eliminate data movement latency
Speed model development
Use commodity Hadoop nodes as analytics engine
Name Node
Data NodeData Node Data NodeData Node Data Node
Job Tracker
Task Tracker
Task Tracker
Task Tracker
Task Tracker
Task Tracker
MapReduce
HDFS
14
Revolution Analytics coupled with the Teradata Unified Data Architecture acceleratesbig data analytics with the R language.
+
In-Database Analytics: Parallel R in-database for big
data analytics on Teradata Build parallel R models
completely in R Use Teradata appliance as
analytics engine No need to move data
Teradata 14.10
+Revolution R Enterprise V7
15
RRE7 in the Cloud
Revolution R Enterprise 7, on the industry-leading cloud platform Pay as you go, priced by cores x hours
– No long-term commitment required Launch Windows and Linux servers on demand
– Windows 2008 R2 with DevelopR– RHEL 6 with RStudio Server Professional– Server instances from 2 – 32 cores– Analyze data sets up to 2 TB
Convenient, consistent and reliable– Available globally, accessible anywhere– Forum-based support with registration
Free 14-day trial available
CLOUD SERVERS
$0.70PER CORE/HOUR
PLUS AWS INFRASTRUCTURE COSTS
Revolution R Enterprise EcosystemIntegration with the Big Data Analytics Stack
Deployment / Consumption
Data / Infrastructure
Advanced Analytics
ETL
SI / Service MSP / DSP
16
How Customers Revolutionize their Business
Power
“We’ve combined Revolution R Enterprise and Hadoop to build and deploy customized exploratory data analysis and GAM survival models for our marketing performance management and attribution platform. Given that our data sets are already in the terabytes and are growing rapidly, we depend on Revolution R Enterprise’s scalability and power – we saw about a 4x performance improvement on 50 million records. It works brilliantly.” - CEO, John Wallace, DataSong
4X performance 50M records scored daily
Scalability
“We’ve been able to scale our solution to a problem that’s so big that most companies could not address it. If we had to go with a different solution we wouldn’t be as efficient as we are now.” - SVP Analytics, Kevin Lyons, eXelate
TB’s data from 200+ data sources10’s thousands attributes100’s millions of scores daily
2X data 2X attributes no impact on performance
Performance
“We need a high-performance analytics infrastructure because marketing optimization is a lot like a financial trading. By watching the market constantly for data or market condition updates, we can now identify opportunities for our clients that would otherwise be lost.” - Chief Analytics Officer, Leon Zemel, [x+1]
Why Revolution R Enterprise?
18
Platform Independence
Take Big Cost Out of Big Data
Supercharge R for Massive Data
Power R for the Enterprise
Thank YouDavid SmithChief Community [email protected]