chapter 8: introduction to pattern discovery

50
1 Chapter 8: Introduction to Pattern Discovery 8.1 Introduction 8.2 Cluster Analysis 8.3 Market Basket Analysis (Self-Study)

Upload: nigel-horn

Post on 02-Jan-2016

57 views

Category:

Documents


2 download

DESCRIPTION

Chapter 8: Introduction to Pattern Discovery. Chapter 8: Introduction to Pattern Discovery. Pattern Discovery. The Essence of Data Mining? “…the discovery of interesting, unexpected, or valuable structures in large data sets.” – David Hand. Pattern Discovery. The Essence of Data Mining? - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Chapter 8: Introduction to Pattern Discovery

1

Chapter 8: Introduction to Pattern Discovery

8.1 Introduction

8.2 Cluster Analysis

8.3 Market Basket Analysis (Self-Study)

Page 2: Chapter 8: Introduction to Pattern Discovery

2

Chapter 8: Introduction to Pattern Discovery

8.1 Introduction 8.1 Introduction

8.2 Cluster Analysis

8.3 Market Basket Analysis (Self-Study)

Page 3: Chapter 8: Introduction to Pattern Discovery

3

Pattern Discovery

The Essence of Data Mining?

“…the discovery of interesting,unexpected, or valuablestructures in large data sets.”

– David Hand

...

Page 4: Chapter 8: Introduction to Pattern Discovery

4

Pattern Discovery

“If you’ve got terabytes of data, and you’re relying on data mining to find interesting things in there for you, you’ve lost before you’ve even begun.”

The Essence of Data Mining?

“…the discovery of interesting,unexpected, or valuablestructures in large data sets.”

– David Hand

– Herb Edelstein

Page 5: Chapter 8: Introduction to Pattern Discovery

5

Pattern Discovery Caution Poor data quality Opportunity Interventions Separability Obviousness Non-stationarity

Page 6: Chapter 8: Introduction to Pattern Discovery

6

Pattern Discovery Applications

Data reduction

Novelty detection

Profiling

Market basket analysis

Sequence analysisCB

A

...

Page 7: Chapter 8: Introduction to Pattern Discovery

7

Pattern Discovery Tools

Data reduction

Novelty detection

Profiling

Market basket analysis

Sequence analysisCB

A

...

Page 8: Chapter 8: Introduction to Pattern Discovery

8

Pattern Discovery Tools

Data reduction

Novelty detection

Profiling

Market basket analysis

Sequence analysisCB

A

Page 9: Chapter 8: Introduction to Pattern Discovery

9

Chapter 8: Introduction to Pattern Discovery

8.1 Introduction

8.2 Cluster Analysis8.2 Cluster Analysis

8.3 Market Basket Analysis (Self-Study)

Page 10: Chapter 8: Introduction to Pattern Discovery

10

Unsupervised Classificationinputs

Unsupervised classification: grouping of cases based on similarities in input values.

grouping

cluster 1

cluster 2

cluster 2

cluster 1

cluster 3

...

Page 11: Chapter 8: Introduction to Pattern Discovery

11

Unsupervised Classificationinputs

Unsupervised classification: grouping of cases based on similarities in input values.

grouping

cluster 1

cluster 2

cluster 2

cluster 1

cluster 3

...

Page 12: Chapter 8: Introduction to Pattern Discovery

12

k-means Clustering AlgorithmTraining Data

1. Select inputs.

2. Select k cluster centers.

3. Assign cases to closest center.

4. Update cluster centers.

5. Re-assign cases.

6. Repeat steps 4 and 5until convergence.

Page 13: Chapter 8: Introduction to Pattern Discovery

13

k-means Clustering AlgorithmTraining Data

1. Select inputs.

2. Select k cluster centers.

3. Assign cases to closest center.

4. Update cluster centers.

5. Re-assign cases.

6. Repeat steps 4 and 5until convergence.

Page 14: Chapter 8: Introduction to Pattern Discovery

14

k-means Clustering AlgorithmTraining Data

1. Select inputs.

2. Select k cluster centers.

3. Assign cases to closest center.

4. Update cluster centers.

5. Reassign cases.

6. Repeat steps 4 and 5until convergence.

...

Page 15: Chapter 8: Introduction to Pattern Discovery

15

k-means Clustering AlgorithmTraining Data

1. Select inputs.

2. Select k cluster centers.

3. Assign cases to closest center.

4. Update cluster centers.

5. Reassign cases.

6. Repeat steps 4 and 5until convergence.

...

Page 16: Chapter 8: Introduction to Pattern Discovery

16

k-means Clustering AlgorithmTraining Data

1. Select inputs.

2. Select k cluster centers.

3. Assign cases to closest center.

4. Update cluster centers.

5. Reassign cases.

6. Repeat steps 4 and 5until convergence.

...

Page 17: Chapter 8: Introduction to Pattern Discovery

17

k-means Clustering AlgorithmTraining Data

1. Select inputs.

2. Select k cluster centers.

3. Assign cases to closest center.

4. Update cluster centers.

5. Reassign cases.

6. Repeat steps 4 and 5until convergence.

...

Page 18: Chapter 8: Introduction to Pattern Discovery

18

k-means Clustering AlgorithmTraining Data

1. Select inputs.

2. Select k cluster centers.

3. Assign cases to closest center.

4. Update cluster centers.

5. Reassign cases.

6. Repeat steps 4 and 5until convergence.

...

Page 19: Chapter 8: Introduction to Pattern Discovery

19

k-means Clustering AlgorithmTraining Data

1. Select inputs.

2. Select k cluster centers.

3. Assign cases to closest center.

4. Update cluster centers.

5. Reassign cases.

6. Repeat steps 4 and 5until convergence.

...

Page 20: Chapter 8: Introduction to Pattern Discovery

20

k-means Clustering AlgorithmTraining Data

1. Select inputs.

2. Select k cluster centers.

3. Assign cases to closest center.

4. Update cluster centers.

5. Reassign cases.

6. Repeat steps 4 and 5until convergence.

...

Page 21: Chapter 8: Introduction to Pattern Discovery

21

k-means Clustering AlgorithmTraining Data

1. Select inputs.

2. Select k cluster centers.

3. Assign cases to closest center.

4. Update cluster centers.

5. Reassign cases.

6. Repeat steps 4 and 5until convergence.

...

Page 22: Chapter 8: Introduction to Pattern Discovery

22

k-means Clustering AlgorithmTraining Data

1. Select inputs.

2. Select k cluster centers.

3. Assign cases to closest center.

4. Update cluster centers.

5. Reassign cases.

6. Repeat steps 4 and 5until convergence.

...

Page 23: Chapter 8: Introduction to Pattern Discovery

23

k-means Clustering AlgorithmTraining Data

1. Select inputs.

2. Select k cluster centers.

3. Assign cases to closest center.

4. Update cluster centers.

5. Reassign cases.

6. Repeat steps 4 and 5until convergence.

...

Page 24: Chapter 8: Introduction to Pattern Discovery

24

k-means Clustering AlgorithmTraining Data

1. Select inputs.

2. Select k cluster centers.

3. Assign cases to closest center.

4. Update cluster centers.

5. Reassign cases.

6. Repeat steps 4 and 5until convergence.

...

Page 25: Chapter 8: Introduction to Pattern Discovery

25

k-means Clustering AlgorithmTraining Data

1. Select inputs.

2. Select k cluster centers.

3. Assign cases to closest center.

4. Update cluster centers.

5. Reassign cases.

6. Repeat steps 4 and 5until convergence.

...

Page 26: Chapter 8: Introduction to Pattern Discovery

26

k-means Clustering AlgorithmTraining Data

1. Select inputs.

2. Select k cluster centers.

3. Assign cases to closest center.

4. Update cluster centers.

5. Reassign cases.

6. Repeat steps 4 and 5until convergence.

...

Page 27: Chapter 8: Introduction to Pattern Discovery

27

Segmentation Analysis

When no clusters exist, use the k-means algorithm to partition cases into contiguous groups.

Training Data

Page 28: Chapter 8: Introduction to Pattern Discovery

28

Page 29: Chapter 8: Introduction to Pattern Discovery

29

8.01 Multiple Choice PollFor a k-means clustering analysis, which of the following statements is true about input variables?

a. Input variables should be limited in number and be relatively independent.

b. Input variables should be of interval measurement level.

c. Input variables should have distributions that are somewhat symmetric.

d. Input variables should be meaningful to analysis objectives.

e. All of the above.

Page 30: Chapter 8: Introduction to Pattern Discovery

30

8.01 Multiple Choice Poll – Correct AnswerFor a k-means clustering analysis, which of the following statements is true about input variables?

a. Input variables should be limited in number and be relatively independent.

b. Input variables should be of interval measurement level.

c. Input variables should have distributions that are somewhat symmetric.

d. Input variables should be meaningful to analysis objectives.

e. All of the above.

Page 31: Chapter 8: Introduction to Pattern Discovery

31

Demographic Segmentation DemonstrationAnalysis goal:

Group geographic regions into segments based on: income, household size, and population density.

Analysis plan: Select and transform segmentation inputs. Select the number of segments to create. Create segments with the Cluster tool. Interpret the segments.

Page 32: Chapter 8: Introduction to Pattern Discovery

32

Segmenting Census Data

This demonstration introduces SAS Enterprise Miner tools and techniques for cluster and segmentation analysis.

Page 33: Chapter 8: Introduction to Pattern Discovery

33

Exploring and Filtering Analysis Data

This demonstration introduces SAS Enterprise Miner tools and techniques that explore and filteranalysis data, particularly data source exploration and case filtering.

Page 34: Chapter 8: Introduction to Pattern Discovery

34

Setting Cluster Tool Options

This demonstration illustrates how to use the Cluster tool to segment the cases in the CENSUS2000 data set.

Page 35: Chapter 8: Introduction to Pattern Discovery

35

Creating Clusters with the Cluster Tool

This demonstration illustrates how the Cluster tool determines the number of clusters in the data.

Page 36: Chapter 8: Introduction to Pattern Discovery

36

Specifying the Segment Count

This demonstration illustrates how you can change the number of clusters created by the Cluster node.

Page 37: Chapter 8: Introduction to Pattern Discovery

37

Exploring Segments

This demonstration illustrates how to use graphical aids to explore the segments.

Page 38: Chapter 8: Introduction to Pattern Discovery

38

Profiling Segments

This demonstration illustrates using the Segment Profile tool to interpret the composition of clusters.

Page 39: Chapter 8: Introduction to Pattern Discovery

39

Exercises

This exercise reinforces the concepts discussed previously.

Page 40: Chapter 8: Introduction to Pattern Discovery

40

Chapter 8: Introduction to Pattern Discovery

8.1 Introduction

8.2 Cluster Analysis

8.3 Market Basket Analysis (Self-Study)8.3 Market Basket Analysis (Self-Study)

Page 41: Chapter 8: Introduction to Pattern Discovery

41

Market Basket Analysis

Rule

A DC AA C

B & C D

Support

2/52/52/51/5

Confidence

2/32/42/31/3

A B C A C D B C D A D E B C E

...

Page 42: Chapter 8: Introduction to Pattern Discovery

42

Market Basket Analysis

Rule

A DC AA C

B & C D

Support

2/52/52/51/5

Confidence

2/32/42/31/3

A B C A C D B C D A D E B C E

...

Page 43: Chapter 8: Introduction to Pattern Discovery

43

Implication?Checking Account

No

Yes

No Yes

SavingsAccount

4,000

6,000

10,000Support(SVG CK) = 50%Confidence(SVG CK) = 83%

Lift(SVG CK) = 0.83/0.85 < 1Expected Confidence(SVG CK) = 85%

Page 44: Chapter 8: Introduction to Pattern Discovery

44

Barbie Doll Candy1. Put them closer together in the store.

2. Put them far apart in the store.

3. Package candy bars with the dolls.

4. Package Barbie + candy + poorly selling item.

5. Raise the price on one, and lower it on the other.

6. Offer Barbie accessories for proofs of purchase.

7. Do not advertise candy and Barbie together.

8. Offer candies in the shape of a Barbie doll.

Page 45: Chapter 8: Introduction to Pattern Discovery

45

Data Capacity

A A B C D A

D A A B BA

Page 46: Chapter 8: Introduction to Pattern Discovery

46

Association Tool DemonstrationAnalysis goal:

Explore associations between retail banking services used by customers.

Analysis plan: Create an association data source. Run an association analysis. Interpret the association rules. Run a sequence analysis. Interpret the sequence rules.

Page 47: Chapter 8: Introduction to Pattern Discovery

47

Market Basket Analysis

This demonstration illustrates how to conduct market basket analysis.

Page 48: Chapter 8: Introduction to Pattern Discovery

48

Sequence Analysis

This demonstration illustrates how to conduct a sequence analysis.

Page 49: Chapter 8: Introduction to Pattern Discovery

49

Exercise

This exercise reinforces the concepts discussed previously.

Page 50: Chapter 8: Introduction to Pattern Discovery

50

Pattern Discovery Tools: ReviewGenerate cluster models using automatic settings and segmentation models with user-defined settings.

Compare within-segment distributions ofselected inputs to overall distributions. Thishelps you understand segment definition.

Conduct market basket and sequence analysis on transactions data. A data source must have one target, one ID, and (if desired) one sequence variable in the data source.