dmdw lesson 08 - further data mining algorithms

20
STAATLICH ANERKANNTE FACHHOCHSCHULE STUDIEREN UND DURCHSTARTEN. Author I: Dip.-Inf. (FH) Johannes Hoppe Author II: M.Sc. Johannes Hofmeister Author III: Prof. Dr. Dieter Homeister Date: 13.05.2011

Upload: johannes-hoppe

Post on 11-May-2015

1.622 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: DMDW Lesson 08 - Further Data Mining Algorithms

STAATLICHANERKANNTEFACHHOCHSCHULE

STUDIERENUND DURCHSTARTEN.

Author I: Dip.-Inf. (FH) Johannes HoppeAuthor II: M.Sc. Johannes HofmeisterAuthor III: Prof. Dr. Dieter HomeisterDate: 13.05.2011

Page 2: DMDW Lesson 08 - Further Data Mining Algorithms

STAATLICHANERKANNTEFACHHOCHSCHULE

Further Data Mining Algorithms

Author I: Dip.-Inf. (FH) Johannes HoppeAuthor II: M.Sc. Johannes HofmeisterAuthor III: Prof. Dr. Dieter HomeisterDate: 13.05.2011

Page 3: DMDW Lesson 08 - Further Data Mining Algorithms

Data Mining Algorithms - Regression Analysis

01

3

Page 4: DMDW Lesson 08 - Further Data Mining Algorithms

DM Algorithms - Regression Analysis

Regression Analysis

› AKA. function approximation› Includes any techniques for modeling and analyzing

several variables› Models the relationship between one or more variables

you are trying to predict (dependent variables) and the predictive variables (independent variables)

4

Page 5: DMDW Lesson 08 - Further Data Mining Algorithms

DM Algorithms - Regression Analysis

SSAS build in

› MS Linear Regression Analysis› MS Logistic Regression Analysis› MS Time Series Algorithm

http://msdn.microsoft.com/en-us/library/ms170993(SQL.90).aspx

5

Page 6: DMDW Lesson 08 - Further Data Mining Algorithms

6

DM Algorithms - Regression / Linear Regression

Linear Regression

› Analyze two continuous columns › Relationship is an equation› Equation is a line (linear equation)

f(x) = m*x + b› Error == distance from the regression line

http://msdn.microsoft.com/en-us/library/ms174824(SQL.90).aspx

Page 7: DMDW Lesson 08 - Further Data Mining Algorithms

7

DM Algorithms - Regression / Linear Regression

0 100 200 300 400 500 6000

1000

2000

3000

4000

5000

6000

f(x) = 7.81381138497918 x + 866.585289444156R² = 0.701037764746929

Sales

SalesLinear (Sales)Linear (Sales)

Example

Page 8: DMDW Lesson 08 - Further Data Mining Algorithms

8

DM Algorithms - Regression / Linear Regression

Explanation

The Diagram shows a relationship between sales and advertising along with the regression equation. The goal is to be able to predict sales based on the amount spent on advertising. The graph shows a very linear relationship between sales and advertising. A key measure of the strength of the relationship is the R-square. The R-square measures the amount of the overall variation in the data that is explained by the model.This regression analysis results in an R-square of 70%. This implies that 70% of the variation in sales can be explained by the variation in advertising.[Source: Olivia Parr Rud et. al, Data Mining Cookbook]

Page 9: DMDW Lesson 08 - Further Data Mining Algorithms

9

DM Algorithms - Regression / Logistic Regression

Logistic regression › Dependent variables have values between 0 and 1› Functions which describes the probability of a given event › Instead of creating a straight line, logistic regression

analysis creates an "S" shaped curve that contains maximum and minimum constraints

› Wikipedia Algorithm != MSDN Algorithm

http://msdn.microsoft.com/en-us/library/ms174828(SQL.90).aspx

Page 10: DMDW Lesson 08 - Further Data Mining Algorithms

10

DM Algorithms - Regression / Logistic Regression

Logistic regression

Page 11: DMDW Lesson 08 - Further Data Mining Algorithms

DM Algorithms - Regression / Time-Series

MS Time-Series Algorithm

› Trend Analysis› Optimized for analyzing continuous values

› eg. product sales over time

› Train Predict› Cross-predictions possible! *

* cool!http://msdn.microsoft.com/en-us/library/ms174923(SQL.90).aspx

Page 12: DMDW Lesson 08 - Further Data Mining Algorithms

DM Algorithms - Regression / Time-Series

MS Time-Series Algorithm

Page 13: DMDW Lesson 08 - Further Data Mining Algorithms

13

DM Algorithms - Regression / Time-Series

› Combination of 2 algorithms, results are mixed› ARTxp

› Auto Regressive Tree Method› Developed by Microsoft Research› Based on Microsoft Decision-Tree› For Short term predictions

› ARIMA:› Auto Regressive Integrated Moving Average› Developed by Box and Jenkins› For long term predictions

http://msdn.microsoft.com/en-us/library/ms174828(SQL.90).aspxhttp://msdn.microsoft.com/en-us/library/bb677216.aspx

Page 14: DMDW Lesson 08 - Further Data Mining Algorithms

Data Mining Algorithms - Neural Networks

02

14

Page 15: DMDW Lesson 08 - Further Data Mining Algorithms

DM Algorithms - Neural Networks

15

Page 16: DMDW Lesson 08 - Further Data Mining Algorithms

DM Algorithms - Neural Networks

Neural Networks (NN or ANN)

› Better term: artificial neural networks (ANN),in opposite to biological NN

› Sometimes called neuronal networks

› By the way…http://code.google.com/p/clustered-neuronal-network/wiki/ProjektInfos

16

Page 17: DMDW Lesson 08 - Further Data Mining Algorithms

17

Page 18: DMDW Lesson 08 - Further Data Mining Algorithms

DM Algorithms - Neural Networks

Definition

› A neural network is a massively parallel distributed processor that has a natural propensity for storing experiential knowledge and making it available for use.

› It resembles the brain in two respects:› Knowledge is acquired by the network through a learning process. › Interneuron connection strengths known as synaptic weights are

used to store the knowledge.

[Source: Haykin, S. (1994), Neural Networks: A Comprehensive Foundation, NY: Macmillan. ]

18

Page 19: DMDW Lesson 08 - Further Data Mining Algorithms

DM Algorithms - Neural Networks

› Most NN are composed of several layers of neurons› The direction of most connections is from input to output › Often used: Back Propagation Networks› A single neuron has several inputs with individual weights

and one output › In the basic form, the output is activated if the sum of

inputs*weights exceeds a given threshold › Learning is done with a target value at an additional

training input plus a training mode signal.

19

Page 20: DMDW Lesson 08 - Further Data Mining Algorithms

THANK YOUFOR YOUR ATTENTION

20