dexin zhou – bard college (presenter) ralph abbey – north carolina state u. jeremy diepenbrock...

27
Dexin Zhou – Bard College (Presenter) Ralph Abbey – North Carolina State U. Jeremy Diepenbrock – Washington U. at St. Louis Project Advisor: Dr. Carl Meyer Additional Advising: Dr. Amy Langville Graduate Assistant: Shaina Race

Upload: estella-chase

Post on 01-Jan-2016

220 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Dexin Zhou – Bard College (Presenter) Ralph Abbey – North Carolina State U. Jeremy Diepenbrock – Washington U. at St. Louis Project Advisor: Dr. Carl Meyer

Dexin Zhou – Bard College (Presenter)Ralph Abbey – North Carolina State U.

Jeremy Diepenbrock – Washington U. at St. Louis

Project Advisor: Dr. Carl MeyerAdditional Advising: Dr. Amy Langville

Graduate Assistant: Shaina Race

Page 2: Dexin Zhou – Bard College (Presenter) Ralph Abbey – North Carolina State U. Jeremy Diepenbrock – Washington U. at St. Louis Project Advisor: Dr. Carl Meyer

What is Data Clustering?Clustering is the partitioning of a data set

into subsets (clusters).We are interested in creating good clusters

that allow us to reorganize disordered data into a block structure so that useful information can be extracted.

Page 3: Dexin Zhou – Bard College (Presenter) Ralph Abbey – North Carolina State U. Jeremy Diepenbrock – Washington U. at St. Louis Project Advisor: Dr. Carl Meyer

A Visible ExampleBefore Clustering After Clustering

Page 4: Dexin Zhou – Bard College (Presenter) Ralph Abbey – North Carolina State U. Jeremy Diepenbrock – Washington U. at St. Louis Project Advisor: Dr. Carl Meyer

What are we clustering?An 86 mini-document set that we created

with 13 topicsA 185 document set used in Daniel Boley’s

paper with 10 topicsSAS grocery store dataset

Page 5: Dexin Zhou – Bard College (Presenter) Ralph Abbey – North Carolina State U. Jeremy Diepenbrock – Washington U. at St. Louis Project Advisor: Dr. Carl Meyer

Preparing the data

• Term Aij is in the following form• g term is a function of term i, it downplays

the terms that appear frequently globally• l term is a function of the raw frequency of a

certain term in document j(eg: log)• d term is a normalization factor

Page 6: Dexin Zhou – Bard College (Presenter) Ralph Abbey – North Carolina State U. Jeremy Diepenbrock – Washington U. at St. Louis Project Advisor: Dr. Carl Meyer

How?Principal Direction Divisive PartitioningPrincipal Direction Gap PartitioningNon-Negative Matrix FactorizationClustering Aggregation

Page 7: Dexin Zhou – Bard College (Presenter) Ralph Abbey – North Carolina State U. Jeremy Diepenbrock – Washington U. at St. Louis Project Advisor: Dr. Carl Meyer

Singular Value Decomposition

Page 8: Dexin Zhou – Bard College (Presenter) Ralph Abbey – North Carolina State U. Jeremy Diepenbrock – Washington U. at St. Louis Project Advisor: Dr. Carl Meyer

Principle Direction Divisive Partitioning

Page 9: Dexin Zhou – Bard College (Presenter) Ralph Abbey – North Carolina State U. Jeremy Diepenbrock – Washington U. at St. Louis Project Advisor: Dr. Carl Meyer

PDDP

Page 10: Dexin Zhou – Bard College (Presenter) Ralph Abbey – North Carolina State U. Jeremy Diepenbrock – Washington U. at St. Louis Project Advisor: Dr. Carl Meyer

PDDP

Page 11: Dexin Zhou – Bard College (Presenter) Ralph Abbey – North Carolina State U. Jeremy Diepenbrock – Washington U. at St. Louis Project Advisor: Dr. Carl Meyer

Principle Direction Gap Partitioning

Sorted Indices Sorted Indices

Sor

ted

Val

ue

Sor

ted

Val

ue

Plot of the First Right Singular Vector Plot of the Second Right Singular Vector

Page 12: Dexin Zhou – Bard College (Presenter) Ralph Abbey – North Carolina State U. Jeremy Diepenbrock – Washington U. at St. Louis Project Advisor: Dr. Carl Meyer

A Comparison of PDGP w/ PDDP

Page 13: Dexin Zhou – Bard College (Presenter) Ralph Abbey – North Carolina State U. Jeremy Diepenbrock – Washington U. at St. Louis Project Advisor: Dr. Carl Meyer

Centering Vs. Non-Centering

Page 14: Dexin Zhou – Bard College (Presenter) Ralph Abbey – North Carolina State U. Jeremy Diepenbrock – Washington U. at St. Louis Project Advisor: Dr. Carl Meyer

Non-Negative Matrix Factorization

Page 15: Dexin Zhou – Bard College (Presenter) Ralph Abbey – North Carolina State U. Jeremy Diepenbrock – Washington U. at St. Louis Project Advisor: Dr. Carl Meyer

NMF Clustering

Page 16: Dexin Zhou – Bard College (Presenter) Ralph Abbey – North Carolina State U. Jeremy Diepenbrock – Washington U. at St. Louis Project Advisor: Dr. Carl Meyer

Cluster Aggregation

Page 17: Dexin Zhou – Bard College (Presenter) Ralph Abbey – North Carolina State U. Jeremy Diepenbrock – Washington U. at St. Louis Project Advisor: Dr. Carl Meyer

Cluster Aggregation

Page 18: Dexin Zhou – Bard College (Presenter) Ralph Abbey – North Carolina State U. Jeremy Diepenbrock – Washington U. at St. Louis Project Advisor: Dr. Carl Meyer

MetricsEntropy Method

A standard measurement based on our prior knowledge to the data file.

Density MethodDoes not require prior knowledge to the data

file.Less accurate.

Page 19: Dexin Zhou – Bard College (Presenter) Ralph Abbey – North Carolina State U. Jeremy Diepenbrock – Washington U. at St. Louis Project Advisor: Dr. Carl Meyer

Mini-document dataset

Page 20: Dexin Zhou – Bard College (Presenter) Ralph Abbey – North Carolina State U. Jeremy Diepenbrock – Washington U. at St. Louis Project Advisor: Dr. Carl Meyer

Mini-document dataset Result

Page 21: Dexin Zhou – Bard College (Presenter) Ralph Abbey – North Carolina State U. Jeremy Diepenbrock – Washington U. at St. Louis Project Advisor: Dr. Carl Meyer

Boley’s J1 Dataset

Page 22: Dexin Zhou – Bard College (Presenter) Ralph Abbey – North Carolina State U. Jeremy Diepenbrock – Washington U. at St. Louis Project Advisor: Dr. Carl Meyer

Boley’s Dataset Result

Page 23: Dexin Zhou – Bard College (Presenter) Ralph Abbey – North Carolina State U. Jeremy Diepenbrock – Washington U. at St. Louis Project Advisor: Dr. Carl Meyer

SAS Grocery Dataset

Page 24: Dexin Zhou – Bard College (Presenter) Ralph Abbey – North Carolina State U. Jeremy Diepenbrock – Washington U. at St. Louis Project Advisor: Dr. Carl Meyer

SAS Grocery Dataset Results

Page 25: Dexin Zhou – Bard College (Presenter) Ralph Abbey – North Carolina State U. Jeremy Diepenbrock – Washington U. at St. Louis Project Advisor: Dr. Carl Meyer

SAS Grocery Dataset Result

Page 26: Dexin Zhou – Bard College (Presenter) Ralph Abbey – North Carolina State U. Jeremy Diepenbrock – Washington U. at St. Louis Project Advisor: Dr. Carl Meyer

Conclusion

Page 27: Dexin Zhou – Bard College (Presenter) Ralph Abbey – North Carolina State U. Jeremy Diepenbrock – Washington U. at St. Louis Project Advisor: Dr. Carl Meyer

For Additional InformationPlease Visit

http://meyer.math.ncsu.edu/Meyer/REU/REU.html