sap hana sps08 predictive analysis library
Embed Size (px)
DESCRIPTION
SAP HANA SPS 08 - What’s New? Predictive Analysis LibraryTRANSCRIPT

Use this title slide only with an image
SAP HANA SPS 08 - What’s New? Predictive Analysis Library
SAP HANA Product Management May, 2014
(Delta from SPS 07 to SPS 08)

© 2014 SAP AG or an SAP affiliate company. All rights reserved. 2Public
Agenda
Release Theme
List of Algorithms
New Algorithms• Distribution Fit
• Cumulative Distribution Function
• Quantile Function
• Random Distribution Sampling
• ARIMA
• FP-Growth
• CART
• K-Medoid Clustering
Enhancements
Documentation

Release Theme

© 2014 SAP AG or an SAP affiliate company. All rights reserved. 4Public
HANA Predictive Analysis Library – What’s New in SPS 08?Release Theme
The SPS 08 version of the predictive Analysis Library includes many new algorithms as well as several enhancements to existing algorithms.
These new features were chosen based on the prioritization of customer and other stakeholder requests.

List of Algorithms

© 2014 SAP AG or an SAP affiliate company. All rights reserved. 6Public
SAP HANA In-Memory Predictive Analytics Predictive Analysis Library (PAL) - Algorithms Supported
Association Analysis Apriori Apriori Lite FP-Growth *
Classification Analysis CART * C4.5 Decision Tree Analysis CHAID Decision Tree Analysis K Nearest Neighbour Logistic Regression Naïve Bayes Support Vector Machine
Regression Multiple Linear Regression Polynomial Regression Exponential Regression Bi-Variate Geometric Regression Bi-Variate Logarithmic
Regression
Outlier Detection Inter-Quartile Range Test
(Tukey’s Test) Variance Test Anomaly Detection
Statistic Functions (Univariate) Mean, Median, Variance,
Standard Deviation Kurtosis Skewness
Link Prediction Common Neighbors Jaccard’s Coefficient Adamic/Adar Katzβ
* New in SPS 08
Data Preparation Sampling
Random Distribution Sampling * Binning Scaling Partitioning
Statistic Functions (Multivariate) Covariance Matrix Pearson Correlations Matrix Chi-squared Tests:- Test of Quality of Fit- Test of Independence
F-test (variance equal test)
Other Weighted Scores Table Substitute Missing Values
Cluster Analysis ABC Classification DBSCAN K-Means K-Medoid Clustering * Kohonen Self Organized Maps Agglomerate Hierarchical Affinity Propagation
Time Series Analysis Single Exponential Smoothing Double Exponential Smoothing Triple Exponential Smoothing Forecast Smoothing ARIMA *
Probability Distribution Distribution Fit * Cumulative Distribution Function * Quantile Function *

New Algorithms

© 2014 SAP AG or an SAP affiliate company. All rights reserved. 8Public
HANA Predictive Analysis Library – What’s New in SPS 08?Distribution Fit
Distribution fits aim to fit a probability distribution for a variable according to a series measurements to this variable.
In PAL, users need to choose one probability distribution type from a supporting list (Normal, Gamma, Weibull, and Uniform) and then PAL will calculate the optimized parameters of this probability distribution which fits the observed variable best.
There are two distribution fitting interfaces: DISTRFIT and DISTRFITCENSORED. DISTRFIT fits un-censored data while DISTRFITCENSORED fits censored data.
Two methods are provided for finding the optimized parameters, Maximum-Likelihood and Median-Rank. In SPS 08, Maximum-Likelihood method supports all distribution types in supporting list for un-censored data. Median-Rank method supports Weibull distribution fitting for both censored and un-censored data.

© 2014 SAP AG or an SAP affiliate company. All rights reserved. 9Public
HANA Predictive Analysis Library – What’s New in SPS 08?Cumulative Distribution Function
The cumulative distribution function in PAL evaluates the probability of a variable x from the cumulative distribution function (CDF) or complementary cumulative distribution function (CCDF) for a given probability distribution.

© 2014 SAP AG or an SAP affiliate company. All rights reserved. 10Public
HANA Predictive Analysis Library – What’s New in SPS 08?Quantile Function
In PAL, quantile function evaluates the inverse F^(-1) (x) of cumulative distribution function (CDF) or the inverse F H (-1) (x) of complementary cumulative distribution function (CCDF) for a given probability p and probability distribution.

© 2014 SAP AG or an SAP affiliate company. All rights reserved. 11Public
HANA Predictive Analysis Library – What’s New in SPS 08?Random Distribution Sampling
Random generation function with a given distribution (Normal, Gamma, Weibull, and Uniform).

© 2014 SAP AG or an SAP affiliate company. All rights reserved. 12Public
HANA Predictive Analysis Library – What’s New in SPS 08?ARIMA
Autoregressive integrated moving average (ARIMA) algorithm is famous in econometrics, statistics and time series analysis. An ARIMA model can be written as ARIMA (p, d, q), where p refers to the auto regressive order, d refers to integrated order and q refers to the moving average order. It can help understand the time series data better and predict future data in the series. Both training and forecast functions are provided.

© 2014 SAP AG or an SAP affiliate company. All rights reserved. 13Public
HANA Predictive Analysis Library – What’s New in SPS 08?FP-Growth
FP-Growth is an algorithm to find frequent patterns from transactions without generating a candidate itemset. In PAL, FP-Growth algorithm is extended to find association rules. In the first step, the algorithm converts the transactions into a compressed frequent pattern tree (FP-Tree). In the second step, the algorithm recursively find frequent patterns from the FP-Tree. In the last step, the PAL generates association rules based on the frequent patterns that found in the second step.

© 2014 SAP AG or an SAP affiliate company. All rights reserved. 14Public
HANA Predictive Analysis Library – What’s New in SPS 08?CART
Classification And Regression Tree (CART) is invented by Breiman et al. (1984). It only supports binary split, and it can be used for classification or regression. CART is similar with C4.5, and it is a recursive partitioning method. It uses GINI index or TWOING for classification, and least square error for regression. Surrogate split method is used to support missing values when creating the tree model

© 2014 SAP AG or an SAP affiliate company. All rights reserved. 15Public
HANA Predictive Analysis Library – What’s New in SPS 08?K-Medoid Clustering
K-Medoid algorithm is a clustering algorithm related to the K-Means algorithm. Both K-Medoids and K-Means algorithms partition n observations into k clusters in which each observation is assigned to the cluster with the closest center. In contrast to K-Means algorithm, K-Medoids algorithm doesn’t calculate means, but medoids to be the new cluster centers. A medoid is defined as the center of a cluster, whose average dissimilarity to all the objects in the cluster is minimal. Compared to K-Means algorithm, it is said to be more robust to noise and outliers.

Enhancements

© 2014 SAP AG or an SAP affiliate company. All rights reserved. 17Public
HANA Predictive Analysis Library – What’s New in SPS 08?Enhancements (1 of 2)
Logistic regression• Support cancellation at runtime.• Support multi-nomial classification. In many business scenarios we want to train a classifier with
more than two classes. Multi-class logistic regression (also referred to as multi-nomial logistic regression) extends binary logistic regression algorithm (two classes) to multi-class cases. The input and output of multi-class logistic regression are similar to that of logistic regression.
K-Means
Determine best k given a range according to the slight Silhouette.
Apriori• Add prefix tree implementation for potential performance improvement with regards to memory
consumption and time cost.• Add rule filter to define some items only allowed in the left-/right-hand side of the association rules

© 2014 SAP AG or an SAP affiliate company. All rights reserved. 18Public
HANA Predictive Analysis Library – What’s New in SPS 08?Enhancements (2 of 2)
Forecast SmoothingAuto-detect the best model among single/double/triple models
Hierarchical clusteringSupport categorical attribute as input feature
Univariate statistics• Support population variance and standard deviation• Calculate lower/upper quartile for the data
Decision tree
Treat missing values as a separate class, not only to replace the NULL values

© 2014 SAP AG or an SAP affiliate company. All rights reserved. 19Public
Disclaimer
This presentation outlines our general product direction and should not be relied on in making a purchase decision. This presentation is not subject to your license agreement or any other agreement with SAP.
SAP has no obligation to pursue any course of business outlined in this presentation or to develop or release any functionality mentioned in this presentation. This presentation and SAP’s strategy and possible future developments are subject to change and may be changed by SAP at any time for any reason without notice.
This document is provided without a warranty of any kind, either express or implied, including but not limited to, the implied warranties of merchantability, fitness for a particular purpose, or non-infringement. SAP assumes no responsibility for errors or omissions in this document, except if such damages were caused by SAP intentionally or grossly negligent.

Documentation

© 2014 SAP AG or an SAP affiliate company. All rights reserved. 21Public
Important Note
The SAP Note 2022080 has been created for missing EXECUTION privilege to call AFL_WRAPPER_GENERATOR/ERASER during HANA SPS 08 upgrade.

© 2014 SAP AG or an SAP affiliate company. All rights reserved. 22Public
How to find SAP HANA documentation on this topic?
SAP HANA Platform SPS What’s New – Release Notes
Installation– SAP HANA Server InstallationGuide
Administration– SAP HANA Administration Guide
Development– SAP HANA Predictive Analysis Library (PAL) Reference
– SAP HANA Developer Guide
References – SAP HANA SQL Reference
• In addition to this learning material, you find SAP HANA documentation on SAP Help Portal knowledge center at http://help.sap.com/hana_platform.
• The knowledge center is structured according to the product lifecycle: installation, security, administration, development. So you can find e.g. the SAP HANA Predictive Analysis Library (PAL) Reference in the Development section and so forth …

© 2014 SAP AG or an SAP affiliate company. All rights reserved.
Thank youContact information
Mark HouraniSAP HANA Product [email protected]
To get the best overview of what’s new in SAP HANA SPS 08, read this blog.

© 2014 SAP AG or an SAP affiliate company. All rights reserved. 24Public
© 2014 SAP AG or an SAP affiliate company. All rights reserved.
No part of this publication may be reproduced or transmitted in any form or for any purpose without the express permission of SAP AG or an SAP affiliate company.
SAP and other SAP products and services mentioned herein as well as their respective logos are trademarks or registered trademarks of SAP AG (or an SAP affiliate company) in Germany and other countries. Please see http://global12.sap.com/corporate-en/legal/copyright/index.epx for additional trademark information and notices.
Some software products marketed by SAP AG and its distributors contain proprietary software components of other software vendors.
National product specifications may vary.
These materials are provided by SAP AG or an SAP affiliate company for informational purposes only, without representation or warranty of any kind, and SAP AG or its affiliated companies shall not be liable for errors or omissions with respect to the materials. The only warranties for SAP AG or SAP affiliate company products and services are those that are set forth in the express warranty statements accompanying such products and services, if any. Nothing herein should be construed as constituting an additional warranty.
In particular, SAP AG or its affiliated companies have no obligation to pursue any course of business outlined in this document or any related presentation, or to develop or release any functionality mentioned therein. This document, or any related presentation, and SAP AG’s or its affiliated companies’ strategy and possible future developments, products, and/or platform directions and functionality are all subject to change and may be changed by SAP AG or its affiliated companies at any time for any reason without notice. The information in this document is not a commitment, promise, or legal obligation to deliver any material, code, or functionality. All forward-looking statements are subject to various risks and uncertainties that could cause actual results to differ materially from expectations. Readers are cautioned not to place undue reliance on these forward-looking statements, which speak only as of their dates, and they should not be relied upon in making purchasing decisions.