chapter 8: introduction to pattern discovery

Chapter 8: Introduction to Pattern Discovery

8.1 Introduction

8.2 Cluster Analysis

8.3 Market Basket Analysis (Self-Study)

8.1 Introduction 8.1 Introduction

Pattern Discovery

The Essence of Data Mining?

“…the discovery of interesting,unexpected, or valuablestructures in large data sets.”

– David Hand

Pattern Discovery

“If you’ve got terabytes of data, and you’re relying on data mining to find interesting things in there for you, you’ve lost before you’ve even begun.”

The Essence of Data Mining?

“…the discovery of interesting,unexpected, or valuablestructures in large data sets.”

– David Hand

– Herb Edelstein

Pattern Discovery Caution Poor data quality Opportunity Interventions Separability Obviousness Non-stationarity

Pattern Discovery Applications

Data reduction

Novelty detection

Profiling

Market basket analysis

Sequence analysisCB

Pattern Discovery Tools

Data reduction

Novelty detection

Profiling

Sequence analysisCB

Pattern Discovery Tools

Data reduction

Novelty detection

Profiling

Sequence analysisCB

8.1 Introduction

8.2 Cluster Analysis8.2 Cluster Analysis

Unsupervised Classificationinputs

Unsupervised classification: grouping of cases based on similarities in input values.

grouping

cluster 1

cluster 2

cluster 1

cluster 3

Unsupervised Classificationinputs

Unsupervised classification: grouping of cases based on similarities in input values.

grouping

cluster 1

cluster 2

cluster 1

cluster 3

k-means Clustering AlgorithmTraining Data

1. Select inputs.

2. Select k cluster centers.

3. Assign cases to closest center.

4. Update cluster centers.

5. Re-assign cases.

6. Repeat steps 4 and 5until convergence.

1. Select inputs.

5. Re-assign cases.

1. Select inputs.

5. Reassign cases.

1. Select inputs.

5. Reassign cases.

1. Select inputs.

5. Reassign cases.

1. Select inputs.

5. Reassign cases.

1. Select inputs.

5. Reassign cases.

1. Select inputs.

5. Reassign cases.

1. Select inputs.

5. Reassign cases.

1. Select inputs.

5. Reassign cases.

1. Select inputs.

5. Reassign cases.

1. Select inputs.

5. Reassign cases.

1. Select inputs.

5. Reassign cases.

1. Select inputs.

5. Reassign cases.

1. Select inputs.

5. Reassign cases.

Segmentation Analysis

When no clusters exist, use the k-means algorithm to partition cases into contiguous groups.

Training Data

8.01 Multiple Choice PollFor a k-means clustering analysis, which of the following statements is true about input variables?

a. Input variables should be limited in number and be relatively independent.

b. Input variables should be of interval measurement level.

c. Input variables should have distributions that are somewhat symmetric.

d. Input variables should be meaningful to analysis objectives.

e. All of the above.

8.01 Multiple Choice Poll – Correct AnswerFor a k-means clustering analysis, which of the following statements is true about input variables?

a. Input variables should be limited in number and be relatively independent.

b. Input variables should be of interval measurement level.

c. Input variables should have distributions that are somewhat symmetric.

d. Input variables should be meaningful to analysis objectives.

e. All of the above.

Demographic Segmentation DemonstrationAnalysis goal:

Group geographic regions into segments based on: income, household size, and population density.

Analysis plan: Select and transform segmentation inputs. Select the number of segments to create. Create segments with the Cluster tool. Interpret the segments.

Segmenting Census Data

This demonstration introduces SAS Enterprise Miner tools and techniques for cluster and segmentation analysis.

Exploring and Filtering Analysis Data

This demonstration introduces SAS Enterprise Miner tools and techniques that explore and filteranalysis data, particularly data source exploration and case filtering.

Setting Cluster Tool Options

This demonstration illustrates how to use the Cluster tool to segment the cases in the CENSUS2000 data set.

Creating Clusters with the Cluster Tool

This demonstration illustrates how the Cluster tool determines the number of clusters in the data.

Specifying the Segment Count

This demonstration illustrates how you can change the number of clusters created by the Cluster node.

Exploring Segments

This demonstration illustrates how to use graphical aids to explore the segments.

Profiling Segments

This demonstration illustrates using the Segment Profile tool to interpret the composition of clusters.

Exercises

This exercise reinforces the concepts discussed previously.

8.1 Introduction

8.3 Market Basket Analysis (Self-Study)8.3 Market Basket Analysis (Self-Study)

Market Basket Analysis

A DC AA C

B & C D

Support

2/52/52/51/5

Confidence

2/32/42/31/3

A B C A C D B C D A D E B C E

A DC AA C

B & C D

Support

2/52/52/51/5

Confidence

2/32/42/31/3

A B C A C D B C D A D E B C E

Implication?Checking Account

No Yes

SavingsAccount

10,000Support(SVG CK) = 50%Confidence(SVG CK) = 83%

Lift(SVG CK) = 0.83/0.85 < 1Expected Confidence(SVG CK) = 85%

Barbie Doll Candy1. Put them closer together in the store.

2. Put them far apart in the store.

3. Package candy bars with the dolls.

4. Package Barbie + candy + poorly selling item.

5. Raise the price on one, and lower it on the other.

6. Offer Barbie accessories for proofs of purchase.

7. Do not advertise candy and Barbie together.

8. Offer candies in the shape of a Barbie doll.

Data Capacity

A A B C D A

D A A B BA

Association Tool DemonstrationAnalysis goal:

Explore associations between retail banking services used by customers.

Analysis plan: Create an association data source. Run an association analysis. Interpret the association rules. Run a sequence analysis. Interpret the sequence rules.

This demonstration illustrates how to conduct market basket analysis.

Sequence Analysis

This demonstration illustrates how to conduct a sequence analysis.

Exercise

This exercise reinforces the concepts discussed previously.

Pattern Discovery Tools: ReviewGenerate cluster models using automatic settings and segmentation models with user-defined settings.

Compare within-segment distributions ofselected inputs to overall distributions. Thishelps you understand segment definition.

Conduct market basket and sequence analysis on transactions data. A data source must have one target, one ID, and (if desired) one sequence variable in the data source.

chapter 8: introduction to pattern discovery

Documents

chapter 16 pattern and antipattern discovery in ethiopian...

experiments on design pattern discovery

hyperclique pattern discovery - hui xiong - rutgers...

streaming pattern discovery in multiple time-series

effective pattern discovery for text mining

the false discovery rate for statistical pattern recognition

common visual pattern discovery via spatially coherent...

pattern discovery - part i

the emergence of pattern discovery techniques in...

6. pattern discovery from document and filtering noise...

1 efficient discovery of conserved patterns using a pattern...

pattern discovery using sequence data mining

pattern based vulnerability discovery

algorithms for pattern matching and pattern discovery in...

sequence pattern discovery with applications to...

covariance kernels for fast automatic pattern discovery...

guiding motif discovery by iterative pattern refinement

weakly-supervised discovery of visual pattern...

part 1 similarity search pattern discovery...

pattern discovery - a sax-ga based investment strategy