sas®and big data –an analytics perspective · sas® and hadoop sas & hadoop intersect in...

25
Copyright © 2015, SAS Institute Inc. All rights reserved. SAS ® and big data – an analytics perspective Snurre Jensen, Principal Business Solutions Manager, SAS Global Technology Practice

Upload: dangtram

Post on 09-Nov-2018

240 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: SAS®and big data –an analytics perspective · SAS® and Hadoop SAS & Hadoop intersect in many ways: SAS can treat Hadoop just as any other data source, pulling data FROM Hadoop,

Copyright © 2015, SAS Institute Inc. All rights reserved.

SAS®

and big data – an analytics perspective

Snurre Jensen,

Principal Business Solutions Manager,

SAS Global Technology Practice

Page 2: SAS®and big data –an analytics perspective · SAS® and Hadoop SAS & Hadoop intersect in many ways: SAS can treat Hadoop just as any other data source, pulling data FROM Hadoop,

“Big data” – just a marketing term?

Page 3: SAS®and big data –an analytics perspective · SAS® and Hadoop SAS & Hadoop intersect in many ways: SAS can treat Hadoop just as any other data source, pulling data FROM Hadoop,

Big data – volume (data at rest)

30 TB data transmitted by the Maersk Line fleet over satellite link every month2 TB data generated every 100 days by a modern vessel2 GB data stored every day from the main control system of a Triple E vessel

Source: Mærsk Group Annual Magazine – “TAKING THE LEAD IN A WORLD OF CHANGE”

Page 4: SAS®and big data –an analytics perspective · SAS® and Hadoop SAS & Hadoop intersect in many ways: SAS can treat Hadoop just as any other data source, pulling data FROM Hadoop,

Volume – is this a challenge?

• Do you need to sample data to run your analysis?

• Are you forced to split data preparation into sections in order to run?

• Do you only have time to try out a few model variations?

• Is valuable data available that you do not use due to sizing or timing considerations?

• Does the data you want to use fit into the structure in your existingdata environment?

Page 5: SAS®and big data –an analytics perspective · SAS® and Hadoop SAS & Hadoop intersect in many ways: SAS can treat Hadoop just as any other data source, pulling data FROM Hadoop,

Volume – a possible solution

Page 6: SAS®and big data –an analytics perspective · SAS® and Hadoop SAS & Hadoop intersect in many ways: SAS can treat Hadoop just as any other data source, pulling data FROM Hadoop,

SAS® and Hadoop

SAS & Hadoop intersect in many ways:

SAS can treat Hadoop just as any other data source, pulling data FROM Hadoop, when it is most convenient.

SAS can work WITH Hadoop, lifting data in a purpose-built advanced analytics in-memory environment.

SAS can work directly IN Hadoop, leveraging the distributed processing capabilities of Hadoop.

Page 7: SAS®and big data –an analytics perspective · SAS® and Hadoop SAS & Hadoop intersect in many ways: SAS can treat Hadoop just as any other data source, pulling data FROM Hadoop,

SAS Big Data Analytics – visualize, explore, interact

• Open to the masses

• No statistics skills required

• No programming knowledge required

• Focus on consuming analytics

SAS Visual Analytics

• Reduced number of users

• Some Statistics skills required

• No programming knowledge required

• Focus on producing analytics

SAS Visual Statistics

• Lower number of users

• Statistics skills required

• Programming knowledge required

• Focus on producing analytics

SAS In-Memory Statistcs

Page 8: SAS®and big data –an analytics perspective · SAS® and Hadoop SAS & Hadoop intersect in many ways: SAS can treat Hadoop just as any other data source, pulling data FROM Hadoop,

SAS Big Data Analytics – industrialize, operationalize, deploy

SAS High Performance Data Mining Nodes in SAS Enterprise Miner

SAS Factory Miner

Page 9: SAS®and big data –an analytics perspective · SAS® and Hadoop SAS & Hadoop intersect in many ways: SAS can treat Hadoop just as any other data source, pulling data FROM Hadoop,

SAS Big Data Analytics – High-Performance-procedures

High-Performance Statistics

High- PerformanceData Mining

High- Performance Econometrics

High-Performance Optimization

High- PerformanceText Mining

HPLOGISTICHPREGHPLMIXEDHPNLMODHPSPLITHPGENSELECTHPFMMHPCANDISCHPPRINCOMPHPPLSHPQUANTSELECTGAMPL

HPREDUCEHPNEURALHPFORESTHP4SCOREHPDECIDE

HPCLUS

HPSVM

HPBNET

HPCOUNTREGHPSEVERITYHPQLIM

HPPANEL

HPCOPULA

HPCDM

OPTLSOSelect features in

OPTMILPOPTLPOPTMODELOPTGRAPH

HPTMINEHPTMSCOREHPBOOLRULE

Common Set: HPDS2, HPDMDB, HPSAMPLE, HPSUMMARY, HPIMPUTE, HPBIN, HPCORR

Page 10: SAS®and big data –an analytics perspective · SAS® and Hadoop SAS & Hadoop intersect in many ways: SAS can treat Hadoop just as any other data source, pulling data FROM Hadoop,

Simple code examples

proc hplogistic data=getStarted;

class C;

model y = C x1-x10;

selection method=forward details=all;

run;

proc logistic data=Neuralgia;

class Treatment Sex;

model Pain= Treatment Sex Treatment*Sex Age Duration / expb;

run;

Page 11: SAS®and big data –an analytics perspective · SAS® and Hadoop SAS & Hadoop intersect in many ways: SAS can treat Hadoop just as any other data source, pulling data FROM Hadoop,

Customer case – SAS® High-Performance Data Mining

Page 12: SAS®and big data –an analytics perspective · SAS® and Hadoop SAS & Hadoop intersect in many ways: SAS can treat Hadoop just as any other data source, pulling data FROM Hadoop,

SAS Big Data Analytics - overview

Page 13: SAS®and big data –an analytics perspective · SAS® and Hadoop SAS & Hadoop intersect in many ways: SAS can treat Hadoop just as any other data source, pulling data FROM Hadoop,

Side note #1: SAS does Machine LearningSAS’ definition:

“Machine learning is a branch of artificial intelligence which automates the building of systems that learn, identify patterns and predict future results – with minimal human intervention.“

Selection of methods available in SAS:

Neurale networks, decision trees, random forests, association and sequenceanalysis, gradient boosting and bagging, support vector machines, nearest-neighbor mapping, k-means clustering, self-organizing maps, local search optimization techniques (eg. genetic algorithms), expectation maximization, multivariate adaptive regression splines, bayesian networks, kernel density estimation, principal component analysis, singular value decomposition, Gaussian mixture models, recommendation engine algorithms etc.

Page 14: SAS®and big data –an analytics perspective · SAS® and Hadoop SAS & Hadoop intersect in many ways: SAS can treat Hadoop just as any other data source, pulling data FROM Hadoop,

Side note #2: SAS can co-exist with open source analytics

Base SAS Java Object

data _null_;length rtn_val 8;* Python program takes working directory as first argument;python_pgm = "&WORK_DIR.\digitsdata_svm.py";python_arg1 = "&WORK_DIR";python_call = cat('"', trim(python_pgm), '" "', trim(python_arg1),'"');declare javaobj j("dev.SASJavaExec", "&PYTHON_EXEC_COMMAND", python_call);j.callIntMethod("executeProcess", rtn_val);

run;

PROC IML

proc iml;submit / R;

coplot(lat ~ long | depth,data = quakes)

endsubmit;

SAS Enterprise Miner

Page 15: SAS®and big data –an analytics perspective · SAS® and Hadoop SAS & Hadoop intersect in many ways: SAS can treat Hadoop just as any other data source, pulling data FROM Hadoop,

Side note #2: SAS can co-exist with open source analytics

Running SAS from Python Running SAS from R

Page 16: SAS®and big data –an analytics perspective · SAS® and Hadoop SAS & Hadoop intersect in many ways: SAS can treat Hadoop just as any other data source, pulling data FROM Hadoop,

Big data – velocity (data in motion)

0.05 seconds between each motion measurement on a ship200 sensors in a modern main engine room measuring temperature, pressure and operations7,000 channels monitored on the Triple E for situational awareness and alarms2,800 sensors hardwired into the Triple E vessel’s main control system 5,000 data tags on a modern vessel

Source: Mærsk Group Annual Magazine – “TAKING THE LEAD IN A WORLD OF CHANGE”

Page 17: SAS®and big data –an analytics perspective · SAS® and Hadoop SAS & Hadoop intersect in many ways: SAS can treat Hadoop just as any other data source, pulling data FROM Hadoop,

Velocity – when is this a challenge?

• Do you struggle with acting in due time?

• Are there benefits in reacting faster?

• Is all our incoming data valuable?

• And how do you decide which data needs to be acted on now?

Page 18: SAS®and big data –an analytics perspective · SAS® and Hadoop SAS & Hadoop intersect in many ways: SAS can treat Hadoop just as any other data source, pulling data FROM Hadoop,

Streaming analytics – application areasE-Commerce Optimization

Connected Devices (IoT)

Decision Management

Fraud Detection

Telecommunications

Capital Markets

Supply chain

Page 19: SAS®and big data –an analytics perspective · SAS® and Hadoop SAS & Hadoop intersect in many ways: SAS can treat Hadoop just as any other data source, pulling data FROM Hadoop,

SAS® Event Stream Processing

Page 20: SAS®and big data –an analytics perspective · SAS® and Hadoop SAS & Hadoop intersect in many ways: SAS can treat Hadoop just as any other data source, pulling data FROM Hadoop,

How to get started

• Have a vision and create a plan towards that vision

• Identify relevant use cases

• Evaluate their impact and their ease of completion

• Establish success early by prioritizing high impact, easy to implement cases first

• What type of organization do you work in?• C-level pull or innovation lab push?

Page 21: SAS®and big data –an analytics perspective · SAS® and Hadoop SAS & Hadoop intersect in many ways: SAS can treat Hadoop just as any other data source, pulling data FROM Hadoop,

Pitfalls to avoid

• Accept that Big Data is a new paradigm• It involves more than just “lift and shift”

• Do not test “to-be” problems on “as-is” infrastructure

• Focus on the entire process - do not be blinded by data science magic• Yes – algorithms are cool.

• But if results are not deployed they provide no value

Page 22: SAS®and big data –an analytics perspective · SAS® and Hadoop SAS & Hadoop intersect in many ways: SAS can treat Hadoop just as any other data source, pulling data FROM Hadoop,

What is wrong with this picture?

Page 23: SAS®and big data –an analytics perspective · SAS® and Hadoop SAS & Hadoop intersect in many ways: SAS can treat Hadoop just as any other data source, pulling data FROM Hadoop,

Key take aways

“Big Data” is not a marketing stunt. It is happening – and it is happening now!

SAS provides technology that enables your organization to benefit from Big Data in the manner most suitable for you.

“Failing to plan is planning to fail” – understand where you want to go and identify initiatives that will take you there.

Page 24: SAS®and big data –an analytics perspective · SAS® and Hadoop SAS & Hadoop intersect in many ways: SAS can treat Hadoop just as any other data source, pulling data FROM Hadoop,
Page 25: SAS®and big data –an analytics perspective · SAS® and Hadoop SAS & Hadoop intersect in many ways: SAS can treat Hadoop just as any other data source, pulling data FROM Hadoop,

Copyright © 2015, SAS Institute Inc. All rights reserved.

SAS® og big data

CLICK TO EDIT