predictive analytics: extracting big value from big data
TRANSCRIPT
#1 Modern Platform toTurn Data into a Strategic Asset
©2016 RapidMiner, Inc. All rights reserved.©2016 RapidMiner, Inc. All rights reserved.
May 24, 2016
Featuring Howard Dresner
Predictive Analytics:Extracting Big Value from Big Data
©2016 RapidMiner, Inc. All rights reserved. - 2 -
Speakers
Howard DresnerChief Research Officer
Dresner Advisory Services
Lars Bauerle Chief Product Officer
RapidMiner
©2016 RapidMiner, Inc. All rights reserved. - 3 -
Housekeeping
• Recording will be available within 1-2 business days, link will be emailed to you
• You may type your questions in the Questions panel on the screen at any time
• We will leave time at the end for a Q&A session
Dresner Advisory Services
Advanced and Predictive Analytics and Big Data
Copyright 2016 Dresner Advisory Services, LLC
www.dresneradvisory.com
Definitions
Advanced and Predictive Analytics Includes statistics, modeling, machine learning, and data mining to analyze facts to make predictions about future, or otherwise unknown, events.
We define big data analytics as systems that enable end-user access to and analysis of data contained and managed within the broader Hadoop ecosystem.
Copyright 2016 Dresner Advisory Services, LLC
DashboardsEnd-user "self service"
Data warehousingAdvanced visualization
Integration with operational processesData discovery
Enterprise planning/budgetingData mining, advanced algorithms,
predictiveEmbedded BI (contained within an application, portal, etc.)
Mobile device supportEnd-user data "blending" (data
mashups)Location intelligence/analytics
Collaborative support for group-based analysis
In-memory analysisSoftware-as-a-service and cloud
computingSearch-based interface
Pre-packaged vertical/functional ana-lytical applications
Big data (e.g., Hadoop)Ability to write to transactional ap-
plicationsText analytics
Social media analysis (Social BI)Open source software
Complex event processing (CEP)Internet of Things (IoT)
Cognitive BI (e.g., artificial intelligence-based BI)
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
Technologies and Initiatives Strategic to Business Intelligence
Critical Very important Important Somewhat important Not importantCopyright 2016 Dresner Advisory Services, LLC
Copyright 2016 Dresner Advisory Services, LLC
Advanced and predictive analyt -
ics
Big data0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
20%9%
30%
18%
24%
22%
16%
23%
11%28%
Importance of Advanced and Predictive Analyt -ics and Big Data
Not importantSomewhat importantImportantVery importantCritical
Copyright 2016 Dresner Advisory Services, LLC
0%10%20%30%40%
27%17%
37%
19%
Current Deployment of Advanced and Pre-dictive Analytics
Copyright 2016 Dresner Advisory Services, LLC
Will adopt in 201517%
Will adopt in 201623%
Will adopt beyond 2016
33%
No plan27%
Deployment Plans for Advanced and Predictive Analytics
Copyright 2016 Dresner Advisory Services, LLC
Yes. W
e use
big data to
day
We may u
se big data
in the f
uture
No. We h
ave n
o plans t
o use big data
at all
0%
10%
20%
30%
40%
50%
17%
47%
36%
Current Deployment of Big Data
Copyright 2016 Dresner Advisory Services, LLC
Will adopt in 2015, 4%
Will adopt in 2016; 27%
Will adopt beyond 2016; 69%
Deployment Plans for Big Data
Copyright 2016 Dresner Advisory Services, LLC
Range of regression models, from linear, logistic to nonlinear
Hierarchical clustering, expectation maximization, k-Means, and vari-
ants of self-organizing mapsTextbook statistical functions for
descriptive statistics
Geospatial analysis
Text analytic functions and sen-timent analysis
Bayesian methods, including Naïve Bayes and Bayesian Networks
Recommendation engine included
Automatic feature selection like principal component analysis (PCA)Vector machine (SVM) approaches
for classification and estimation
Neural networks supportedVarious approaches to CART (e.g. ID3, C4.5, CHAID, MARS, random
forests, gradient boosting)Ensemble learning
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
Features for Advanced and Predictive Analyt -ics
Critical Very important Important Somewhat important Unimportant
Copyright 2016 Dresner Advisory Services, LLC
Data warehouse op-timization
Customer/social analysis
Internet of Things
Clickstream analytics
Fraud detection
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
Big Data Use Cases
Critical Very important Important Somewhat important Not important
Copyright 2016 Dresner Advisory Services, LLC
Fast cycle time for analysis with data preparation
functionsAccess to advanced analyt -ics for predictive and tem-
poral analysisSimple process for contin-uous modification of mod-
elsSupport for easy iteration
Support for entire process in a single application/user
interfaceSupport/guidance in pre-paring data analytical
modelsPre-built drag and drop macros and tools from R
that require no scripting or programmingAutomatic creation of models from data
A specialist *NOT* required to create analytical models,
test and run them
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
Usability for Advanced and Predictive Analyt -ics
Critical Very important Important Somewhat important Unimportant
Copyright 2016 Dresner Advisory Services, LLC
A specialist *NOT* required to create analytical models, test and run them
Automatic creation of models from data
Support for entire process in a single application/user interface
Support for easy iteration
Pre-built drag and drop macros and tools from R that require no scripting or programming
Simple process for continuous modification of models
Support/guidance in preparing data analytical models
Fast cycle time for analysis with data preparation functions
Access to advanced analytics for predictive and temporal analysis
2.9 3 3.1 3.2 3.3 3.4 3.5 3.6 3.7
Usability 2014 to 2015
2015 2014
Copyright 2016 Dresner Advisory Services, LLC
Set operations e.g., joins, aggregations or pivot ta-
bles
Detection of duplicates or outliers
Cleansing and en-richment of source
data
Complex filtering
Support for data type conversions
Support for cutting, merging, and replacing
of values
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
Data Preparation for Advanced and Predictive Analytics
Critical Very important Important Somewhat important Unimportant
Copyright 2016 Dresner Advisory Services, LLC
Complex filtering
Support for data type conversions
Detection of duplicates or outliers
Cleansing and enrichment of source data
Support for cutting, merging, and replacing of values
Set operations e.g., joins, aggregations or pivot tables
3 3.2 3.4 3.6 3.8 4 4.2
Data Preparation 2014 to 2015
2015 2014
Copyright 2016 Dresner Advisory Services, LLC
SAP (1st)SAS (1st)
RapidMiner (2nd)
Dell Software (3rd)
IBM (4th)
Birst (5th)
Oracle (5th)TIBCO (5th)
Information Builders
Microsoft
Pentaho
OpenText
MicroStrategy
2
4
8
16
32
64
Advanced and Predictive Analytics Vendor Ratings
Features Data Prep UsabilityScale & Integration Total score
Copyright 2016 Dresner Advisory Services, LLC
RapidMiner Datameer
Clearstory
MicroStrategy
Pentaho
Domo
Platfora
OpenText
SAPTableauLogi
Qlik
JinfoNet
Dundas
Information Builders
Microsoft
Gooddata
IBM
1
10
100
Big Data Analytics Vendor Ratings
Infrastructure Data Access Search Distributions Machine Learning Total
Dresner Advisory Services
Advanced and Predictive Analytics and Big Data
Copyright 2016 Dresner Advisory Services, LLC
www.dresneradvisory.com
#1 Modern Platform toTurn Data into a Strategic Asset
©2016 RapidMiner, Inc. All rights reserved.©2016 RapidMiner, Inc. All rights reserved.
May 24, 2016
Lars BauerleChief Product Officer
RapidMinerfor
Advanced/Predictive Analytics and Big Data
©2016 RapidMiner, Inc. All rights reserved. - 22 -
Leader
2016, 2015 & 2014
Gartner Magic Quadrant for Advanced Analytics Platforms
Strong Performer
2015
Forrester Wave on Big Data Predictive Analytics
Innovation Winner
2015Wisdom of Crowds for
Advanced & Predictive Analytics, Big Data Analytics &
End-User Data Preparation
#1 Open-Source Platform
2015, 2014, 2013
Data Mining & Analytics Software Poll
RapidMiner is #1 OPEN SOURCE
©2016 RapidMiner, Inc. All rights reserved. - 23 -
RapidMiner is UNIQUE
Open-Source Innovation
Cutting-edge data science platform designed for
the Big Data era
Frictionless Operationalization
Prescriptive analyticscloses the loop between
insight & action
Lightning-FastData Science
Seamless orchestration accelerates predictive
analytics lifecycle
Self-Service Predictive Analytics
Effortless & guided design democratizes data science
©2016 RapidMiner, Inc. All rights reserved. - 24 -
ACCELERATES Time-to-Value
DATA PREP Speed & optimize ALL data
exploration, blending & cleansing tasks
OPERATIONALIZEEasily deploy & maintain
models and embed analytic results
MODEL & VALIDATERapidly prototype and
confidently validate predictive models
DATA PREP Speed & optimize ALL data
exploration, blending & cleansing tasks
CONNECT TO ANY DATA SOURCE, ANY
FORMAT, AT ANY SCALE
SUPPORT FOR ALL MAJOR BI, DATA VISUALIZATION &
BUSINESS APPLICATIONS
©2016 RapidMiner, Inc. All rights reserved. - 25 -©2016 RapidMiner, Inc. All rights reserved. - 25 -
STREAMLINED Data PreparationSpeed & optimize ALL data
exploration, blending & cleansing tasks
A powerful chart engine offers statisticaloverviews, graphs & charts for data exploration
Rapidly import, combine and transform structured & unstructured data for deeper predictive insights
Accelerate advanced data blending tasks with powerful feature weighting, selection & generation
Expertly cleanse data with anomaly & outlier detection, missing value handling and
normalization
©2016 RapidMiner, Inc. All rights reserved. - 26 -©2016 RapidMiner, Inc. All rights reserved. - 26 -
POWERFUL Modeling & ValidationRapidly prototype and
confidently validate predictive models
Breadth of machine learning functions enhance supervised & unsupervised learning
Automatic techniques for model building, selection & optimization, simplify each step
in the process
Prescriptive algorithms, optimization loops& guided recommendations reveal
optimal actions
Modular cross-validation & honest performance calculations ensure that results will deliver the expected outcome
©2016 RapidMiner, Inc. All rights reserved. - 27 -©2016 RapidMiner, Inc. All rights reserved. - 27 -
FRICTIONLESS Operationalization—the details
Easily deploy & maintain models and embed analytic results
Scheduled or event-driven model execution supports human decisions
& automated actionsEmbed results into data visualizations, business applications & web services
Dynamically manage models to ensure continued updates and accuracy including
tuning, versioning & alertingSupport for cloud, big data/Hadoop & server based infrastructure for separation of design
and execution
©2016 RapidMiner, Inc. All rights reserved. - 28 -- 28 -©2016 RapidMiner, Inc. All rights reserved.
Demo
©2016 RapidMiner, Inc. All rights reserved. - 29 -- 29 -©2016 RapidMiner, Inc. All rights reserved.
Big Data - Hadoop
©2016 RapidMiner, Inc. All rights reserved. - 30 -
Big Data – Hadoop Challenges
• How to EXTRACT VALUE from Hadoop– What to actually do with all the data being collected– There is opportunity to improve the business in there– But, how do we do it?
• SKILLS GAP is a major adoption inhibitor– Lots of technology– Rapidly changing– Very technical - programming
©2016 RapidMiner, Inc. All rights reserved. - 31 -
Sampling
Grid Computing
Native Distributed Algorithms
Different Approaches to Big Data Analytics
©2016 RapidMiner, Inc. All rights reserved. - 32 -
Sampling
Grid Computing
Native Distributed Algorithms
Approach 1: Sampling
©2016 RapidMiner, Inc. All rights reserved. - 33 -
Approach 1: Sampling
• Data Movement & Processing• Pulls sample data from HDFS/Hive/Impala• In the analytics tool (DV, PA, programming)
• When to use it+ Only data exploration / data understanding+ Early prototyping on prepared and clean data+ Machine Learning modeling with very few and basic
patterns (e.g. only a handful of columns and binary prediction target)
• When NOT to use it− Large number of columns in the data− Need to blend large data sets (e.g. large-scale joins)− Complex Machine Learning models
Analytics Tool
Pieces of data pulled out of Hadoop
Performs Calculations
©2016 RapidMiner, Inc. All rights reserved. - 34 -
Sampling
Grid Computing
Native Distributed Algorithms
Approach 2: Grid Computing
©2016 RapidMiner, Inc. All rights reserved. - 35 -
Approach 2: Grid computing
• Data Movement and Processing• Only results are moved, data remains in Hadoop• Custom single-node application running on multiple
Hadoop nodes• When to use it
+ Task can be performed on smaller, independent data subsets
+ Complex data pre-processing• When NOT to use it
– Complex Machine Learning models– Lots of interdependencies between data subsets
App
Analytics Tool
Application Results
Calculations
App
App
App
©2016 RapidMiner, Inc. All rights reserved. - 36 -
Sampling
Grid Computing
Native Distributed Algorithms
Approach 3: Native Distributed Algorithms
©2016 RapidMiner, Inc. All rights reserved. - 37 -
Analytics Tool
Approach 3: Native distributed algorithms
• Data Movement and Processing• Only results are moved, data remains in Hadoop• Executed by native Hadoop tools: Hive, Spark, H2O, Pig,
MapReduce, etc.• When to use it
+ Complex Machine Learning models needed+ Lots of interdependencies inside the data (e.g. graph
analytics)+ Need to blend and cleanse large data sets (e.g. large-
scale joins)• When NOT to use it
− Data is not that large− Sample would reveal all interesting patterns− You don’t want to do a lot of Programming in multiple
languages
Calculations
ResultsInstructions pushed to Hadoop
©2016 RapidMiner, Inc. All rights reserved. - 38 -
Sampling
Grid Computing
Native Distributed Algorithms
Different Approaches to Big Data Analytics
Which one to use for a givenuse case?
©2016 RapidMiner, Inc. All rights reserved. - 39 -
Typical projects need all three to succeed
Sampling
Grid Computing
Native Distributed Algorithms
©2016 RapidMiner, Inc. All rights reserved. - 40 -
RapidMiner Predictive Analytics Platform
©2016 RapidMiner, Inc. All rights reserved. - 41 -
Sampling
Grid Computing
Native Distributed Algorithms
Single Analytics Platform to support all three
Pull data from Hive/ImpalaUse 1500+ operators
SparkRM, PySpark, SparkR
Spark, Hive, Impala, custom UDFs,Mahout, Pig
RapidMiner• Capabilities for all use cases• In a GUI environment• In a single platform
©2016 RapidMiner, Inc. All rights reserved. - 42 -
What it looks likeRapidMiner• Capabilities for all use cases• In a GUI environment• In a single platform
©2016 RapidMiner, Inc. All rights reserved. - 43 -
RapidMiner for Big Data (Hadoop)RapidMiner Radoop extends predictive analytics to Hadoop and Spark • We speak Hadoop so you don’t have to
Translates predictive analytics into native Hadoop – you concentrate on creating analytics, not Hadoop programming
• COMPLETE insights into your Big DataPushes analytic instructions into Hadoop for computation, so you can analyze the full breadth and variety of your Big Data
• Use your favorite Hadoop scripts, too!Incorporates SparkR, PySpark, Pig and HiveQL
• Safe and sound Integrates with Kerberos authentication, supports data access authorization – seamless for users, easy administration for IT
©2016 RapidMiner, Inc. All rights reserved. - 44 -
TRANSFORMATIONAL Business Impact
Build Better Predictive Models Faster
Accelerate the creation of high-value predictive analytics while
streamlining low-value tasks
Easily Use Predictive Analytics
Confidently extract the hidden value from your data using intuitive predictive analytics
Operationalize Competitive Advantage
Bridge the Data Science Skills Gap
Leverage prescriptive analytics in all your
decisions to achieve better outcomes
Empower data scientists and citizen data scientists to feed
the insatiable demand for predictive insights
CHIEF ANALYTICS OFFICER CHIEF EXECUTIVE OFFICER
DATA SCIENTIST BUSINESS ANALYST
- 45 -CONFIDENTIAL
#1 Agile Predictive Analytics Platform for Today’s Modern Analysts
- 45 -©2016 RapidMiner, Inc. All rights reserved.
Q & A
Download RapidMiner Today @ www.rapidminer.com