sas®and big data –an analytics perspective · sas® and hadoop sas & hadoop intersect in...
TRANSCRIPT
Copyright © 2015, SAS Institute Inc. All rights reserved.
SAS®
and big data – an analytics perspective
Snurre Jensen,
Principal Business Solutions Manager,
SAS Global Technology Practice
“Big data” – just a marketing term?
Big data – volume (data at rest)
30 TB data transmitted by the Maersk Line fleet over satellite link every month2 TB data generated every 100 days by a modern vessel2 GB data stored every day from the main control system of a Triple E vessel
Source: Mærsk Group Annual Magazine – “TAKING THE LEAD IN A WORLD OF CHANGE”
Volume – is this a challenge?
• Do you need to sample data to run your analysis?
• Are you forced to split data preparation into sections in order to run?
• Do you only have time to try out a few model variations?
• Is valuable data available that you do not use due to sizing or timing considerations?
• Does the data you want to use fit into the structure in your existingdata environment?
Volume – a possible solution
SAS® and Hadoop
SAS & Hadoop intersect in many ways:
SAS can treat Hadoop just as any other data source, pulling data FROM Hadoop, when it is most convenient.
SAS can work WITH Hadoop, lifting data in a purpose-built advanced analytics in-memory environment.
SAS can work directly IN Hadoop, leveraging the distributed processing capabilities of Hadoop.
SAS Big Data Analytics – visualize, explore, interact
• Open to the masses
• No statistics skills required
• No programming knowledge required
• Focus on consuming analytics
SAS Visual Analytics
• Reduced number of users
• Some Statistics skills required
• No programming knowledge required
• Focus on producing analytics
SAS Visual Statistics
• Lower number of users
• Statistics skills required
• Programming knowledge required
• Focus on producing analytics
SAS In-Memory Statistcs
SAS Big Data Analytics – industrialize, operationalize, deploy
SAS High Performance Data Mining Nodes in SAS Enterprise Miner
SAS Factory Miner
SAS Big Data Analytics – High-Performance-procedures
High-Performance Statistics
High- PerformanceData Mining
High- Performance Econometrics
High-Performance Optimization
High- PerformanceText Mining
HPLOGISTICHPREGHPLMIXEDHPNLMODHPSPLITHPGENSELECTHPFMMHPCANDISCHPPRINCOMPHPPLSHPQUANTSELECTGAMPL
HPREDUCEHPNEURALHPFORESTHP4SCOREHPDECIDE
HPCLUS
HPSVM
HPBNET
HPCOUNTREGHPSEVERITYHPQLIM
HPPANEL
HPCOPULA
HPCDM
OPTLSOSelect features in
OPTMILPOPTLPOPTMODELOPTGRAPH
HPTMINEHPTMSCOREHPBOOLRULE
Common Set: HPDS2, HPDMDB, HPSAMPLE, HPSUMMARY, HPIMPUTE, HPBIN, HPCORR
Simple code examples
proc hplogistic data=getStarted;
class C;
model y = C x1-x10;
selection method=forward details=all;
run;
proc logistic data=Neuralgia;
class Treatment Sex;
model Pain= Treatment Sex Treatment*Sex Age Duration / expb;
run;
Customer case – SAS® High-Performance Data Mining
SAS Big Data Analytics - overview
Side note #1: SAS does Machine LearningSAS’ definition:
“Machine learning is a branch of artificial intelligence which automates the building of systems that learn, identify patterns and predict future results – with minimal human intervention.“
Selection of methods available in SAS:
Neurale networks, decision trees, random forests, association and sequenceanalysis, gradient boosting and bagging, support vector machines, nearest-neighbor mapping, k-means clustering, self-organizing maps, local search optimization techniques (eg. genetic algorithms), expectation maximization, multivariate adaptive regression splines, bayesian networks, kernel density estimation, principal component analysis, singular value decomposition, Gaussian mixture models, recommendation engine algorithms etc.
Side note #2: SAS can co-exist with open source analytics
Base SAS Java Object
data _null_;length rtn_val 8;* Python program takes working directory as first argument;python_pgm = "&WORK_DIR.\digitsdata_svm.py";python_arg1 = "&WORK_DIR";python_call = cat('"', trim(python_pgm), '" "', trim(python_arg1),'"');declare javaobj j("dev.SASJavaExec", "&PYTHON_EXEC_COMMAND", python_call);j.callIntMethod("executeProcess", rtn_val);
run;
PROC IML
proc iml;submit / R;
coplot(lat ~ long | depth,data = quakes)
endsubmit;
SAS Enterprise Miner
Side note #2: SAS can co-exist with open source analytics
Running SAS from Python Running SAS from R
Big data – velocity (data in motion)
0.05 seconds between each motion measurement on a ship200 sensors in a modern main engine room measuring temperature, pressure and operations7,000 channels monitored on the Triple E for situational awareness and alarms2,800 sensors hardwired into the Triple E vessel’s main control system 5,000 data tags on a modern vessel
Source: Mærsk Group Annual Magazine – “TAKING THE LEAD IN A WORLD OF CHANGE”
Velocity – when is this a challenge?
• Do you struggle with acting in due time?
• Are there benefits in reacting faster?
• Is all our incoming data valuable?
• And how do you decide which data needs to be acted on now?
Streaming analytics – application areasE-Commerce Optimization
Connected Devices (IoT)
Decision Management
Fraud Detection
Telecommunications
Capital Markets
Supply chain
SAS® Event Stream Processing
How to get started
• Have a vision and create a plan towards that vision
• Identify relevant use cases
• Evaluate their impact and their ease of completion
• Establish success early by prioritizing high impact, easy to implement cases first
• What type of organization do you work in?• C-level pull or innovation lab push?
Pitfalls to avoid
• Accept that Big Data is a new paradigm• It involves more than just “lift and shift”
• Do not test “to-be” problems on “as-is” infrastructure
• Focus on the entire process - do not be blinded by data science magic• Yes – algorithms are cool.
• But if results are not deployed they provide no value
What is wrong with this picture?
Key take aways
“Big Data” is not a marketing stunt. It is happening – and it is happening now!
SAS provides technology that enables your organization to benefit from Big Data in the manner most suitable for you.
“Failing to plan is planning to fail” – understand where you want to go and identify initiatives that will take you there.
Copyright © 2015, SAS Institute Inc. All rights reserved.
SAS® og big data
CLICK TO EDIT