june 11, 2005interface / csna meeting st. louis1 estimating the cluster tree of a density werner...

71
June 11, 2005 Interface / CSNA Meeting St. Louis 1 Estimating the Cluster Tree of a Density Werner Stuetzle Rebecca Nugent Department of Statistics University of Washington

Upload: pranav-tarleton

Post on 31-Mar-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: June 11, 2005Interface / CSNA Meeting St. Louis1 Estimating the Cluster Tree of a Density Werner Stuetzle Rebecca Nugent Department of Statistics University

June 11, 2005 Interface / CSNA Meeting St. Louis 1

Estimating the Cluster Treeof a Density

Werner StuetzleRebecca NugentDepartment of StatisticsUniversity of Washington

Page 2: June 11, 2005Interface / CSNA Meeting St. Louis1 Estimating the Cluster Tree of a Density Werner Stuetzle Rebecca Nugent Department of Statistics University

June 11, 2005 Interface / CSNA Meeting St. Louis 2

Page 3: June 11, 2005Interface / CSNA Meeting St. Louis1 Estimating the Cluster Tree of a Density Werner Stuetzle Rebecca Nugent Department of Statistics University

June 11, 2005 Interface / CSNA Meeting St. Louis 3

Page 4: June 11, 2005Interface / CSNA Meeting St. Louis1 Estimating the Cluster Tree of a Density Werner Stuetzle Rebecca Nugent Department of Statistics University

June 11, 2005 Interface / CSNA Meeting St. Louis 4

Groups K-means with k = 2

Page 5: June 11, 2005Interface / CSNA Meeting St. Louis1 Estimating the Cluster Tree of a Density Werner Stuetzle Rebecca Nugent Department of Statistics University

June 11, 2005 Interface / CSNA Meeting St. Louis 5

Page 6: June 11, 2005Interface / CSNA Meeting St. Louis1 Estimating the Cluster Tree of a Density Werner Stuetzle Rebecca Nugent Department of Statistics University

June 11, 2005 Interface / CSNA Meeting St. Louis 6

• Detect that there are 5 or 6 distinct groups.

• Assign group labels to observations.

40 45 50 55

74

76

78

80

82

84

Page 7: June 11, 2005Interface / CSNA Meeting St. Louis1 Estimating the Cluster Tree of a Density Werner Stuetzle Rebecca Nugent Department of Statistics University

June 11, 2005 Interface / CSNA Meeting St. Louis 7

Page 8: June 11, 2005Interface / CSNA Meeting St. Louis1 Estimating the Cluster Tree of a Density Werner Stuetzle Rebecca Nugent Department of Statistics University

June 11, 2005 Interface / CSNA Meeting St. Louis 8

Page 9: June 11, 2005Interface / CSNA Meeting St. Louis1 Estimating the Cluster Tree of a Density Werner Stuetzle Rebecca Nugent Department of Statistics University

June 11, 2005 Interface / CSNA Meeting St. Louis 9

-2 0 2 4 6 8 100

10

02

00

30

0

Feature histogram

Page 10: June 11, 2005Interface / CSNA Meeting St. Louis1 Estimating the Cluster Tree of a Density Werner Stuetzle Rebecca Nugent Department of Statistics University

June 11, 2005 Interface / CSNA Meeting St. Louis 10

ClusteringasastatisticalproblemAssumethatfeaturevectorsx1;:::;xniid»p(x).DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.ThereforethetotalcollectionofsetscanbearrangedintoamodetreeLeavesofmodetreecorrespondtonodesofp(x).

Considerfeaturevectorsx1;:::;xnasarandomsamplefromsome(innite)population.Letp(x)bethepopulationdensity.(Think"histogramwithinnitesimallysmallbins")DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.Thereforethetotalcollectionofsetscanbearrangedintoatree,theclustertreeofthedensity.Leavesofclustertreecorrespondtomodesofp(x).

(a) (b) (c)

(d) (e) (f)

Page 11: June 11, 2005Interface / CSNA Meeting St. Louis1 Estimating the Cluster Tree of a Density Werner Stuetzle Rebecca Nugent Department of Statistics University

June 11, 2005 Interface / CSNA Meeting St. Louis 11

ClusteringasastatisticalproblemAssumethatfeaturevectorsx1;:::;xniid»p(x).DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.ThereforethetotalcollectionofsetscanbearrangedintoamodetreeLeavesofmodetreecorrespondtonodesofp(x).

Considerfeaturevectorsx1;:::;xnasarandomsamplefromsome(innite)population.Letp(x)bethepopulationdensity.(Think"histogramwithinnitesimallysmallbins")DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.Thereforethetotalcollectionofsetscanbearrangedintoatree,theclustertreeofthedensity.Leavesofclustertreecorrespondtomodesofp(x).

(a) (b) (c)

(d) (e) (f)

Page 12: June 11, 2005Interface / CSNA Meeting St. Louis1 Estimating the Cluster Tree of a Density Werner Stuetzle Rebecca Nugent Department of Statistics University

June 11, 2005 Interface / CSNA Meeting St. Louis 12

ClusteringasastatisticalproblemAssumethatfeaturevectorsx1;:::;xniid»p(x).DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.ThereforethetotalcollectionofsetscanbearrangedintoamodetreeLeavesofmodetreecorrespondtonodesofp(x).

Considerfeaturevectorsx1;:::;xnasarandomsamplefromsome(innite)population.Letp(x)bethepopulationdensity.(Think"histogramwithinnitesimallysmallbins")DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.Thereforethetotalcollectionofsetscanbearrangedintoatree,theclustertreeofthedensity.Leavesofclustertreecorrespondtomodesofp(x).

Page 13: June 11, 2005Interface / CSNA Meeting St. Louis1 Estimating the Cluster Tree of a Density Werner Stuetzle Rebecca Nugent Department of Statistics University

June 11, 2005 Interface / CSNA Meeting St. Louis 13

ClusteringasastatisticalproblemAssumethatfeaturevectorsx1;:::;xniid»p(x).DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.ThereforethetotalcollectionofsetscanbearrangedintoamodetreeLeavesofmodetreecorrespondtonodesofp(x).

Considerfeaturevectorsx1;:::;xnasarandomsamplefromsome(innite)population.Letp(x)bethepopulationdensity.(Think"histogramwithinnitesimallysmallbins")DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.Thereforethetotalcollectionofsetscanbearrangedintoatree,theclustertreeofthedensity.Leavesofclustertreecorrespondtomodesofp(x).

Page 14: June 11, 2005Interface / CSNA Meeting St. Louis1 Estimating the Cluster Tree of a Density Werner Stuetzle Rebecca Nugent Department of Statistics University

June 11, 2005 Interface / CSNA Meeting St. Louis 14

ClusteringasastatisticalproblemAssumethatfeaturevectorsx1;:::;xniid»p(x).DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.ThereforethetotalcollectionofsetscanbearrangedintoamodetreeLeavesofmodetreecorrespondtonodesofp(x).

Considerfeaturevectorsx1;:::;xnasarandomsamplefromsome(innite)population.Letp(x)bethepopulationdensity.(Think"histogramwithinnitesimallysmallbins")DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.Thereforethetotalcollectionofsetscanbearrangedintoatree,theclustertreeofthedensity.Leavesofclustertreecorrespondtomodesofp(x).

Page 15: June 11, 2005Interface / CSNA Meeting St. Louis1 Estimating the Cluster Tree of a Density Werner Stuetzle Rebecca Nugent Department of Statistics University

June 11, 2005 Interface / CSNA Meeting St. Louis 15

ClusteringasastatisticalproblemAssumethatfeaturevectorsx1;:::;xniid»p(x).DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.ThereforethetotalcollectionofsetscanbearrangedintoamodetreeLeavesofmodetreecorrespondtonodesofp(x).

Considerfeaturevectorsx1;:::;xnasarandomsamplefromsome(innite)population.Letp(x)bethepopulationdensity.(Think"histogramwithinnitesimallysmallbins")DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.Thereforethetotalcollectionofsetscanbearrangedintoatree,theclustertreeofthedensity.Leavesofclustertreecorrespondtomodesofp(x).

Page 16: June 11, 2005Interface / CSNA Meeting St. Louis1 Estimating the Cluster Tree of a Density Werner Stuetzle Rebecca Nugent Department of Statistics University

June 11, 2005 Interface / CSNA Meeting St. Louis 16

ClusteringasastatisticalproblemAssumethatfeaturevectorsx1;:::;xniid»p(x).DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.ThereforethetotalcollectionofsetscanbearrangedintoamodetreeLeavesofmodetreecorrespondtonodesofp(x).

Considerfeaturevectorsx1;:::;xnasarandomsamplefromsome(innite)population.Letp(x)bethepopulationdensity.(Think"histogramwithinnitesimallysmallbins")DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.Thereforethetotalcollectionofsetscanbearrangedintoatree,theclustertreeofthedensity.Leavesofclustertreecorrespondtomodesofp(x).

Page 17: June 11, 2005Interface / CSNA Meeting St. Louis1 Estimating the Cluster Tree of a Density Werner Stuetzle Rebecca Nugent Department of Statistics University

June 11, 2005 Interface / CSNA Meeting St. Louis 17

ClusteringasastatisticalproblemAssumethatfeaturevectorsx1;:::;xniid»p(x).DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.ThereforethetotalcollectionofsetscanbearrangedintoamodetreeLeavesofmodetreecorrespondtonodesofp(x).

Considerfeaturevectorsx1;:::;xnasarandomsamplefromsome(innite)population.Letp(x)bethepopulationdensity.(Think"histogramwithinnitesimallysmallbins")DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.Thereforethetotalcollectionofsetscanbearrangedintoatree,theclustertreeofthedensity.Leavesofclustertreecorrespondtomodesofp(x).

Page 18: June 11, 2005Interface / CSNA Meeting St. Louis1 Estimating the Cluster Tree of a Density Werner Stuetzle Rebecca Nugent Department of Statistics University

June 11, 2005 Interface / CSNA Meeting St. Louis 18

ClusteringasastatisticalproblemAssumethatfeaturevectorsx1;:::;xniid»p(x).DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.ThereforethetotalcollectionofsetscanbearrangedintoamodetreeLeavesofmodetreecorrespondtonodesofp(x).

Considerfeaturevectorsx1;:::;xnasarandomsamplefromsome(innite)population.Letp(x)bethepopulationdensity.(Think"histogramwithinnitesimallysmallbins")DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.Thereforethetotalcollectionofsetscanbearrangedintoatree,theclustertreeofthedensity.Leavesofclustertreecorrespondtomodesofp(x).

Page 19: June 11, 2005Interface / CSNA Meeting St. Louis1 Estimating the Cluster Tree of a Density Werner Stuetzle Rebecca Nugent Department of Statistics University

June 11, 2005 Interface / CSNA Meeting St. Louis 19

ClusteringasastatisticalproblemAssumethatfeaturevectorsx1;:::;xniid»p(x).DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.ThereforethetotalcollectionofsetscanbearrangedintoamodetreeLeavesofmodetreecorrespondtonodesofp(x).

Considerfeaturevectorsx1;:::;xnasarandomsamplefromsome(innite)population.Letp(x)bethepopulationdensity.(Think"histogramwithinnitesimallysmallbins")DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.Thereforethetotalcollectionofsetscanbearrangedintoatree,theclustertreeofthedensity.Leavesofclustertreecorrespondtomodesofp(x).

Page 20: June 11, 2005Interface / CSNA Meeting St. Louis1 Estimating the Cluster Tree of a Density Werner Stuetzle Rebecca Nugent Department of Statistics University

June 11, 2005 Interface / CSNA Meeting St. Louis 20

0 1 2 3 4

-10

12

34

rs = 1

rs = 5

rs = 2

Page 21: June 11, 2005Interface / CSNA Meeting St. Louis1 Estimating the Cluster Tree of a Density Werner Stuetzle Rebecca Nugent Department of Statistics University

June 11, 2005 Interface / CSNA Meeting St. Louis 21

ClusteringasastatisticalproblemAssumethatfeaturevectorsx1;:::;xniid»p(x).DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.ThereforethetotalcollectionofsetscanbearrangedintoamodetreeLeavesofmodetreecorrespondtonodesofp(x).

Considerfeaturevectorsx1;:::;xnasarandomsamplefromsome(innite)population.Letp(x)bethepopulationdensity.(Think"histogramwithinnitesimallysmallbins")DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.Thereforethetotalcollectionofsetscanbearrangedintoatree,theclustertreeofthedensity.Leavesofclustertreecorrespondtomodesofp(x).

Page 22: June 11, 2005Interface / CSNA Meeting St. Louis1 Estimating the Cluster Tree of a Density Werner Stuetzle Rebecca Nugent Department of Statistics University

June 11, 2005 Interface / CSNA Meeting St. Louis 22

Note large tree fragments in bottom left panel

-2 0 2 4 6

-3-2

-10

12

3

Weakly bimodal data

-2 0 2 4 6

-3-2

-10

12

3

MST for weakly bimodal data

-2 0 2 4 6

-3-2

-10

12

3

MST after removal of longest edges

0 20 40 60 80

01

02

03

04

0

runts .bimodal2[runts.bimodal2 > 1]

Histogram of runt sizes

Page 23: June 11, 2005Interface / CSNA Meeting St. Louis1 Estimating the Cluster Tree of a Density Werner Stuetzle Rebecca Nugent Department of Statistics University

June 11, 2005 Interface / CSNA Meeting St. Louis 23

-2 -1 0 1 2 3

-3-2

-10

12

3

Unimodal data

-2 -1 0 1 2 3

-3-2

-10

12

3

MST for unimodal data

-2 -1 0 1 2 3

-3-2

-10

12

3

MST after removal of longest edges

0 20 40 60 80

01

02

03

04

0

runts .unimodal[runts.unimodal > 1]

Histogram of runt sizes

Page 24: June 11, 2005Interface / CSNA Meeting St. Louis1 Estimating the Cluster Tree of a Density Werner Stuetzle Rebecca Nugent Department of Statistics University

June 11, 2005 Interface / CSNA Meeting St. Louis 24

ClusteringasastatisticalproblemAssumethatfeaturevectorsx1;:::;xniid»p(x).DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.ThereforethetotalcollectionofsetscanbearrangedintoamodetreeLeavesofmodetreecorrespondtonodesofp(x).

Considerfeaturevectorsx1;:::;xnasarandomsamplefromsome(innite)population.Letp(x)bethepopulationdensity.(Think"histogramwithinnitesimallysmallbins")DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.Thereforethetotalcollectionofsetscanbearrangedintoatree,theclustertreeofthedensity.Leavesofclustertreecorrespondtomodesofp(x).

Page 25: June 11, 2005Interface / CSNA Meeting St. Louis1 Estimating the Cluster Tree of a Density Werner Stuetzle Rebecca Nugent Department of Statistics University

June 11, 2005 Interface / CSNA Meeting St. Louis 25

ClusteringasastatisticalproblemAssumethatfeaturevectorsx1;:::;xniid»p(x).DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.ThereforethetotalcollectionofsetscanbearrangedintoamodetreeLeavesofmodetreecorrespondtonodesofp(x).

Considerfeaturevectorsx1;:::;xnasarandomsamplefromsome(innite)population.Letp(x)bethepopulationdensity.(Think"histogramwithinnitesimallysmallbins")DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.Thereforethetotalcollectionofsetscanbearrangedintoatree,theclustertreeofthedensity.Leavesofclustertreecorrespondtomodesofp(x).

Page 26: June 11, 2005Interface / CSNA Meeting St. Louis1 Estimating the Cluster Tree of a Density Werner Stuetzle Rebecca Nugent Department of Statistics University

June 11, 2005 Interface / CSNA Meeting St. Louis 26

ClusteringasastatisticalproblemAssumethatfeaturevectorsx1;:::;xniid»p(x).DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.ThereforethetotalcollectionofsetscanbearrangedintoamodetreeLeavesofmodetreecorrespondtonodesofp(x).

Considerfeaturevectorsx1;:::;xnasarandomsamplefromsome(innite)population.Letp(x)bethepopulationdensity.(Think"histogramwithinnitesimallysmallbins")DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.Thereforethetotalcollectionofsetscanbearrangedintoatree,theclustertreeofthedensity.Leavesofclustertreecorrespondtomodesofp(x).

Page 27: June 11, 2005Interface / CSNA Meeting St. Louis1 Estimating the Cluster Tree of a Density Werner Stuetzle Rebecca Nugent Department of Statistics University

June 11, 2005 Interface / CSNA Meeting St. Louis 27

leve

l

10

12

14

16

A 8

A 7 A 9

A 8

A 2

A 3A 3

A 5A 6

Page 28: June 11, 2005Interface / CSNA Meeting St. Louis1 Estimating the Cluster Tree of a Density Werner Stuetzle Rebecca Nugent Department of Statistics University

June 11, 2005 Interface / CSNA Meeting St. Louis 28

Page 29: June 11, 2005Interface / CSNA Meeting St. Louis1 Estimating the Cluster Tree of a Density Werner Stuetzle Rebecca Nugent Department of Statistics University

June 11, 2005 Interface / CSNA Meeting St. Louis 29

cc1

cc2

-4 -2 0 2 4 6

-4-2

02

4

11

1

111

1

1

11

11

1

1

1

1

1

1

1

1

1

1

1

1

1

22

22

22

2

2

2

2

2

2

2

2

2222

2 2

2 22

2

2

2

2

222

2

2

2

2222

2

2

22

2

2

2

2 22 22

22

2

2

2

2

2

33

3

3

33

3

3

3

3

33

3

3

3

3

3

33

33

3

3

3

3

3

3

3

33

33 3

3

33

3

3

3

3

33

333

33

3

3

3

3

3

3 33

3

33

3

33

3 3

3

3

3

3333

3 33

3

3

3

3

33

3

33

3

33

33

3

3

3

33

3 33

3

3

333

33

33

3

33

3

3

3

3

3

3

3 33

3

33

3

3

3

3

33

3

3

333

3

3

3

3 33

3

3

3

33

33

33 3

33

3

33

3

3

3

3

33

33

33

33

3

3

3

3

33

3

33

3

3

33

33

3

33

3

3

3

33

33

3 3 3

3

333

3

3

3

3

3

3 3

3

3

3

3

4

44

4

4

44

4

4

4

4

44

44

4

4

4

4

4

4

4

44

4

4

44

4

44

4

44

4 4

Areas 1-4, projected on crimcords

Page 30: June 11, 2005Interface / CSNA Meeting St. Louis1 Estimating the Cluster Tree of a Density Werner Stuetzle Rebecca Nugent Department of Statistics University

June 11, 2005 Interface / CSNA Meeting St. Louis 30

ClusteringasastatisticalproblemAssumethatfeaturevectorsx1;:::;xniid»p(x).DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.ThereforethetotalcollectionofsetscanbearrangedintoamodetreeLeavesofmodetreecorrespondtonodesofp(x).

Considerfeaturevectorsx1;:::;xnasarandomsamplefromsome(innite)population.Letp(x)bethepopulationdensity.(Think"histogramwithinnitesimallysmallbins")DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.Thereforethetotalcollectionofsetscanbearrangedintoatree,theclustertreeofthedensity.Leavesofclustertreecorrespondtomodesofp(x).

Page 31: June 11, 2005Interface / CSNA Meeting St. Louis1 Estimating the Cluster Tree of a Density Werner Stuetzle Rebecca Nugent Department of Statistics University

June 11, 2005 Interface / CSNA Meeting St. Louis 31

ClusteringasastatisticalproblemAssumethatfeaturevectorsx1;:::;xniid»p(x).DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.ThereforethetotalcollectionofsetscanbearrangedintoamodetreeLeavesofmodetreecorrespondtonodesofp(x).

Considerfeaturevectorsx1;:::;xnasarandomsamplefromsome(innite)population.Letp(x)bethepopulationdensity.(Think"histogramwithinnitesimallysmallbins")DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.Thereforethetotalcollectionofsetscanbearrangedintoatree,theclustertreeofthedensity.Leavesofclustertreecorrespondtomodesofp(x).

Page 32: June 11, 2005Interface / CSNA Meeting St. Louis1 Estimating the Cluster Tree of a Density Werner Stuetzle Rebecca Nugent Department of Statistics University

June 11, 2005 Interface / CSNA Meeting St. Louis 32

ClusteringasastatisticalproblemAssumethatfeaturevectorsx1;:::;xniid»p(x).DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.ThereforethetotalcollectionofsetscanbearrangedintoamodetreeLeavesofmodetreecorrespondtonodesofp(x).

Considerfeaturevectorsx1;:::;xnasarandomsamplefromsome(innite)population.Letp(x)bethepopulationdensity.(Think"histogramwithinnitesimallysmallbins")DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.Thereforethetotalcollectionofsetscanbearrangedintoatree,theclustertreeofthedensity.Leavesofclustertreecorrespondtomodesofp(x).

Page 33: June 11, 2005Interface / CSNA Meeting St. Louis1 Estimating the Cluster Tree of a Density Werner Stuetzle Rebecca Nugent Department of Statistics University

June 11, 2005 Interface / CSNA Meeting St. Louis 33

ClusteringasastatisticalproblemAssumethatfeaturevectorsx1;:::;xniid»p(x).DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.ThereforethetotalcollectionofsetscanbearrangedintoamodetreeLeavesofmodetreecorrespondtonodesofp(x).

Considerfeaturevectorsx1;:::;xnasarandomsamplefromsome(innite)population.Letp(x)bethepopulationdensity.(Think"histogramwithinnitesimallysmallbins")DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.Thereforethetotalcollectionofsetscanbearrangedintoatree,theclustertreeofthedensity.Leavesofclustertreecorrespondtomodesofp(x).

Page 34: June 11, 2005Interface / CSNA Meeting St. Louis1 Estimating the Cluster Tree of a Density Werner Stuetzle Rebecca Nugent Department of Statistics University

June 11, 2005 Interface / CSNA Meeting St. Louis 34

ClusteringasastatisticalproblemAssumethatfeaturevectorsx1;:::;xniid»p(x).DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.ThereforethetotalcollectionofsetscanbearrangedintoamodetreeLeavesofmodetreecorrespondtonodesofp(x).

Considerfeaturevectorsx1;:::;xnasarandomsamplefromsome(innite)population.Letp(x)bethepopulationdensity.(Think"histogramwithinnitesimallysmallbins")DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.Thereforethetotalcollectionofsetscanbearrangedintoatree,theclustertreeofthedensity.Leavesofclustertreecorrespondtomodesofp(x).

Page 35: June 11, 2005Interface / CSNA Meeting St. Louis1 Estimating the Cluster Tree of a Density Werner Stuetzle Rebecca Nugent Department of Statistics University

June 11, 2005 Interface / CSNA Meeting St. Louis 35

ClusteringasastatisticalproblemAssumethatfeaturevectorsx1;:::;xniid»p(x).DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.ThereforethetotalcollectionofsetscanbearrangedintoamodetreeLeavesofmodetreecorrespondtonodesofp(x).

Considerfeaturevectorsx1;:::;xnasarandomsamplefromsome(innite)population.Letp(x)bethepopulationdensity.(Think"histogramwithinnitesimallysmallbins")DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.Thereforethetotalcollectionofsetscanbearrangedintoatree,theclustertreeofthedensity.Leavesofclustertreecorrespondtomodesofp(x).

Page 36: June 11, 2005Interface / CSNA Meeting St. Louis1 Estimating the Cluster Tree of a Density Werner Stuetzle Rebecca Nugent Department of Statistics University

June 11, 2005 Interface / CSNA Meeting St. Louis 36

ClusteringasastatisticalproblemAssumethatfeaturevectorsx1;:::;xniid»p(x).DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.ThereforethetotalcollectionofsetscanbearrangedintoamodetreeLeavesofmodetreecorrespondtonodesofp(x).

Considerfeaturevectorsx1;:::;xnasarandomsamplefromsome(innite)population.Letp(x)bethepopulationdensity.(Think"histogramwithinnitesimallysmallbins")DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.Thereforethetotalcollectionofsetscanbearrangedintoatree,theclustertreeofthedensity.Leavesofclustertreecorrespondtomodesofp(x).

Page 37: June 11, 2005Interface / CSNA Meeting St. Louis1 Estimating the Cluster Tree of a Density Werner Stuetzle Rebecca Nugent Department of Statistics University

June 11, 2005 Interface / CSNA Meeting St. Louis 37

ClusteringasastatisticalproblemAssumethatfeaturevectorsx1;:::;xniid»p(x).DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.ThereforethetotalcollectionofsetscanbearrangedintoamodetreeLeavesofmodetreecorrespondtonodesofp(x).

Considerfeaturevectorsx1;:::;xnasarandomsamplefromsome(innite)population.Letp(x)bethepopulationdensity.(Think"histogramwithinnitesimallysmallbins")DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.Thereforethetotalcollectionofsetscanbearrangedintoatree,theclustertreeofthedensity.Leavesofclustertreecorrespondtomodesofp(x).

Page 38: June 11, 2005Interface / CSNA Meeting St. Louis1 Estimating the Cluster Tree of a Density Werner Stuetzle Rebecca Nugent Department of Statistics University

June 11, 2005 Interface / CSNA Meeting St. Louis 38

ClusteringasastatisticalproblemAssumethatfeaturevectorsx1;:::;xniid»p(x).DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.ThereforethetotalcollectionofsetscanbearrangedintoamodetreeLeavesofmodetreecorrespondtonodesofp(x).

Considerfeaturevectorsx1;:::;xnasarandomsamplefromsome(innite)population.Letp(x)bethepopulationdensity.(Think"histogramwithinnitesimallysmallbins")DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.Thereforethetotalcollectionofsetscanbearrangedintoatree,theclustertreeofthedensity.Leavesofclustertreecorrespondtomodesofp(x).

Page 39: June 11, 2005Interface / CSNA Meeting St. Louis1 Estimating the Cluster Tree of a Density Werner Stuetzle Rebecca Nugent Department of Statistics University

June 11, 2005 Interface / CSNA Meeting St. Louis 39

ClusteringasastatisticalproblemAssumethatfeaturevectorsx1;:::;xniid»p(x).DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.ThereforethetotalcollectionofsetscanbearrangedintoamodetreeLeavesofmodetreecorrespondtonodesofp(x).

Considerfeaturevectorsx1;:::;xnasarandomsamplefromsome(innite)population.Letp(x)bethepopulationdensity.(Think"histogramwithinnitesimallysmallbins")DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.Thereforethetotalcollectionofsetscanbearrangedintoatree,theclustertreeofthedensity.Leavesofclustertreecorrespondtomodesofp(x).

Page 40: June 11, 2005Interface / CSNA Meeting St. Louis1 Estimating the Cluster Tree of a Density Werner Stuetzle Rebecca Nugent Department of Statistics University

June 11, 2005 Interface / CSNA Meeting St. Louis 40

ClusteringasastatisticalproblemAssumethatfeaturevectorsx1;:::;xniid»p(x).DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.ThereforethetotalcollectionofsetscanbearrangedintoamodetreeLeavesofmodetreecorrespondtonodesofp(x).

Considerfeaturevectorsx1;:::;xnasarandomsamplefromsome(innite)population.Letp(x)bethepopulationdensity.(Think"histogramwithinnitesimallysmallbins")DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.Thereforethetotalcollectionofsetscanbearrangedintoatree,theclustertreeofthedensity.Leavesofclustertreecorrespondtomodesofp(x).

Page 41: June 11, 2005Interface / CSNA Meeting St. Louis1 Estimating the Cluster Tree of a Density Werner Stuetzle Rebecca Nugent Department of Statistics University

June 11, 2005 Interface / CSNA Meeting St. Louis 41

ClusteringasastatisticalproblemAssumethatfeaturevectorsx1;:::;xniid»p(x).DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.ThereforethetotalcollectionofsetscanbearrangedintoamodetreeLeavesofmodetreecorrespondtonodesofp(x).

Considerfeaturevectorsx1;:::;xnasarandomsamplefromsome(innite)population.Letp(x)bethepopulationdensity.(Think"histogramwithinnitesimallysmallbins")DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.Thereforethetotalcollectionofsetscanbearrangedintoatree,theclustertreeofthedensity.Leavesofclustertreecorrespondtomodesofp(x).

Page 42: June 11, 2005Interface / CSNA Meeting St. Louis1 Estimating the Cluster Tree of a Density Werner Stuetzle Rebecca Nugent Department of Statistics University

June 11, 2005 Interface / CSNA Meeting St. Louis 42

ClusteringasastatisticalproblemAssumethatfeaturevectorsx1;:::;xniid»p(x).DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.ThereforethetotalcollectionofsetscanbearrangedintoamodetreeLeavesofmodetreecorrespondtonodesofp(x).

Considerfeaturevectorsx1;:::;xnasarandomsamplefromsome(innite)population.Letp(x)bethepopulationdensity.(Think"histogramwithinnitesimallysmallbins")DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.Thereforethetotalcollectionofsetscanbearrangedintoatree,theclustertreeofthedensity.Leavesofclustertreecorrespondtomodesofp(x).

Page 43: June 11, 2005Interface / CSNA Meeting St. Louis1 Estimating the Cluster Tree of a Density Werner Stuetzle Rebecca Nugent Department of Statistics University

June 11, 2005 Interface / CSNA Meeting St. Louis 43

ClusteringasastatisticalproblemAssumethatfeaturevectorsx1;:::;xniid»p(x).DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.ThereforethetotalcollectionofsetscanbearrangedintoamodetreeLeavesofmodetreecorrespondtonodesofp(x).

Considerfeaturevectorsx1;:::;xnasarandomsamplefromsome(innite)population.Letp(x)bethepopulationdensity.(Think"histogramwithinnitesimallysmallbins")DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.Thereforethetotalcollectionofsetscanbearrangedintoatree,theclustertreeofthedensity.Leavesofclustertreecorrespondtomodesofp(x).

Page 44: June 11, 2005Interface / CSNA Meeting St. Louis1 Estimating the Cluster Tree of a Density Werner Stuetzle Rebecca Nugent Department of Statistics University

June 11, 2005 Interface / CSNA Meeting St. Louis 44

ClusteringasastatisticalproblemAssumethatfeaturevectorsx1;:::;xniid»p(x).DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.ThereforethetotalcollectionofsetscanbearrangedintoamodetreeLeavesofmodetreecorrespondtonodesofp(x).

Considerfeaturevectorsx1;:::;xnasarandomsamplefromsome(innite)population.Letp(x)bethepopulationdensity.(Think"histogramwithinnitesimallysmallbins")DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.Thereforethetotalcollectionofsetscanbearrangedintoatree,theclustertreeofthedensity.Leavesofclustertreecorrespondtomodesofp(x).

Page 45: June 11, 2005Interface / CSNA Meeting St. Louis1 Estimating the Cluster Tree of a Density Werner Stuetzle Rebecca Nugent Department of Statistics University

June 11, 2005 Interface / CSNA Meeting St. Louis 45

ClusteringasastatisticalproblemAssumethatfeaturevectorsx1;:::;xniid»p(x).DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.ThereforethetotalcollectionofsetscanbearrangedintoamodetreeLeavesofmodetreecorrespondtonodesofp(x).

Considerfeaturevectorsx1;:::;xnasarandomsamplefromsome(innite)population.Letp(x)bethepopulationdensity.(Think"histogramwithinnitesimallysmallbins")DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.Thereforethetotalcollectionofsetscanbearrangedintoatree,theclustertreeofthedensity.Leavesofclustertreecorrespondtomodesofp(x).

Page 46: June 11, 2005Interface / CSNA Meeting St. Louis1 Estimating the Cluster Tree of a Density Werner Stuetzle Rebecca Nugent Department of Statistics University

June 11, 2005 Interface / CSNA Meeting St. Louis 46

ClusteringasastatisticalproblemAssumethatfeaturevectorsx1;:::;xniid»p(x).DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.ThereforethetotalcollectionofsetscanbearrangedintoamodetreeLeavesofmodetreecorrespondtonodesofp(x).

Considerfeaturevectorsx1;:::;xnasarandomsamplefromsome(innite)population.Letp(x)bethepopulationdensity.(Think"histogramwithinnitesimallysmallbins")DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.Thereforethetotalcollectionofsetscanbearrangedintoatree,theclustertreeofthedensity.Leavesofclustertreecorrespondtomodesofp(x).

Page 47: June 11, 2005Interface / CSNA Meeting St. Louis1 Estimating the Cluster Tree of a Density Werner Stuetzle Rebecca Nugent Department of Statistics University

June 11, 2005 Interface / CSNA Meeting St. Louis 47

ClusteringasastatisticalproblemAssumethatfeaturevectorsx1;:::;xniid»p(x).DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.ThereforethetotalcollectionofsetscanbearrangedintoamodetreeLeavesofmodetreecorrespondtonodesofp(x).

Considerfeaturevectorsx1;:::;xnasarandomsamplefromsome(innite)population.Letp(x)bethepopulationdensity.(Think"histogramwithinnitesimallysmallbins")DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.Thereforethetotalcollectionofsetscanbearrangedintoatree,theclustertreeofthedensity.Leavesofclustertreecorrespondtomodesofp(x).

Page 48: June 11, 2005Interface / CSNA Meeting St. Louis1 Estimating the Cluster Tree of a Density Werner Stuetzle Rebecca Nugent Department of Statistics University

June 11, 2005 Interface / CSNA Meeting St. Louis 48

0 1 2 3 4

-10

12

34

rs = 1

rs = 5

rs = 2

Page 49: June 11, 2005Interface / CSNA Meeting St. Louis1 Estimating the Cluster Tree of a Density Werner Stuetzle Rebecca Nugent Department of Statistics University

June 11, 2005 Interface / CSNA Meeting St. Louis 49

Runt Pruning algorithm:

generate_cluster_tree_node (mst, runt_size_threshold) { node = new_cluster_tree_node node.leftson = node.rightson = NULL node.obs = leaves (mst) cut_edge = longest_edge_with_large_runt_size (mst, runt_size_threshold) if (cut_edge) { node.leftson = generate_cluster_tree_node (left_subtree (cut_edge), runt_size_threshold) node.rightson = generate_cluster_tree_node (right_subtree (cut_edge), runt_size_threshold) } return (node)}

0 1 2 3 4

-10

12

34

rs = 1

rs = 5

rs = 2

Page 50: June 11, 2005Interface / CSNA Meeting St. Louis1 Estimating the Cluster Tree of a Density Werner Stuetzle Rebecca Nugent Department of Statistics University

June 11, 2005 Interface / CSNA Meeting St. Louis 50

ClusteringasastatisticalproblemAssumethatfeaturevectorsx1;:::;xniid»p(x).DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.ThereforethetotalcollectionofsetscanbearrangedintoamodetreeLeavesofmodetreecorrespondtonodesofp(x).

Considerfeaturevectorsx1;:::;xnasarandomsamplefromsome(innite)population.Letp(x)bethepopulationdensity.(Think"histogramwithinnitesimallysmallbins")DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.Thereforethetotalcollectionofsetscanbearrangedintoatree,theclustertreeofthedensity.Leavesofclustertreecorrespondtomodesofp(x).

Page 51: June 11, 2005Interface / CSNA Meeting St. Louis1 Estimating the Cluster Tree of a Density Werner Stuetzle Rebecca Nugent Department of Statistics University

June 11, 2005 Interface / CSNA Meeting St. Louis 51

Note large tree fragments in bottom left panel

-2 0 2 4 6

-3-2

-10

12

3

Weakly bimodal data

-2 0 2 4 6

-3-2

-10

12

3

MST for weakly bimodal data

-2 0 2 4 6

-3-2

-10

12

3

MST after removal of longest edges

0 20 40 60 80

01

02

03

04

0

runts .bimodal2[runts.bimodal2 > 1]

Histogram of runt sizes

Page 52: June 11, 2005Interface / CSNA Meeting St. Louis1 Estimating the Cluster Tree of a Density Werner Stuetzle Rebecca Nugent Department of Statistics University

June 11, 2005 Interface / CSNA Meeting St. Louis 52

-2 -1 0 1 2 3

-3-2

-10

12

3

Unimodal data

-2 -1 0 1 2 3

-3-2

-10

12

3

MST for unimodal data

-2 -1 0 1 2 3

-3-2

-10

12

3

MST after removal of longest edges

0 20 40 60 80

01

02

03

04

0

runts .unimodal[runts.unimodal > 1]

Histogram of runt sizes

Page 53: June 11, 2005Interface / CSNA Meeting St. Louis1 Estimating the Cluster Tree of a Density Werner Stuetzle Rebecca Nugent Department of Statistics University

June 11, 2005 Interface / CSNA Meeting St. Louis 53

ClusteringasastatisticalproblemAssumethatfeaturevectorsx1;:::;xniid»p(x).DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.ThereforethetotalcollectionofsetscanbearrangedintoamodetreeLeavesofmodetreecorrespondtonodesofp(x).

Considerfeaturevectorsx1;:::;xnasarandomsamplefromsome(innite)population.Letp(x)bethepopulationdensity.(Think"histogramwithinnitesimallysmallbins")DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.Thereforethetotalcollectionofsetscanbearrangedintoatree,theclustertreeofthedensity.Leavesofclustertreecorrespondtomodesofp(x).

Page 54: June 11, 2005Interface / CSNA Meeting St. Louis1 Estimating the Cluster Tree of a Density Werner Stuetzle Rebecca Nugent Department of Statistics University

June 11, 2005 Interface / CSNA Meeting St. Louis 54

ClusteringasastatisticalproblemAssumethatfeaturevectorsx1;:::;xniid»p(x).DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.ThereforethetotalcollectionofsetscanbearrangedintoamodetreeLeavesofmodetreecorrespondtonodesofp(x).

Considerfeaturevectorsx1;:::;xnasarandomsamplefromsome(innite)population.Letp(x)bethepopulationdensity.(Think"histogramwithinnitesimallysmallbins")DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.Thereforethetotalcollectionofsetscanbearrangedintoatree,theclustertreeofthedensity.Leavesofclustertreecorrespondtomodesofp(x).

Page 55: June 11, 2005Interface / CSNA Meeting St. Louis1 Estimating the Cluster Tree of a Density Werner Stuetzle Rebecca Nugent Department of Statistics University

June 11, 2005 Interface / CSNA Meeting St. Louis 55

leve

l

10

12

14

16

A 8

A 7 A 9

A 8

A 2

A 3A 3

A 5A 6

Page 56: June 11, 2005Interface / CSNA Meeting St. Louis1 Estimating the Cluster Tree of a Density Werner Stuetzle Rebecca Nugent Department of Statistics University

June 11, 2005 Interface / CSNA Meeting St. Louis 56

Page 57: June 11, 2005Interface / CSNA Meeting St. Louis1 Estimating the Cluster Tree of a Density Werner Stuetzle Rebecca Nugent Department of Statistics University

June 11, 2005 Interface / CSNA Meeting St. Louis 57

cc1

cc2

-4 -2 0 2 4 6

-4-2

02

4

11

1

111

1

1

11

11

1

1

1

1

1

1

1

1

1

1

1

1

1

22

22

22

2

2

2

2

2

2

2

2

2222

2 2

2 22

2

2

2

2

222

2

2

2

2222

2

2

22

2

2

2

2 22 22

22

2

2

2

2

2

33

3

3

33

3

3

3

3

33

3

3

3

3

3

33

33

3

3

3

3

3

3

3

33

33 3

3

33

3

3

3

3

33

333

33

3

3

3

3

3

3 33

3

33

3

33

3 3

3

3

3

3333

3 33

3

3

3

3

33

3

33

3

33

33

3

3

3

33

3 33

3

3

333

33

33

3

33

3

3

3

3

3

3

3 33

3

33

3

3

3

3

33

3

3

333

3

3

3

3 33

3

3

3

33

33

33 3

33

3

33

3

3

3

3

33

33

33

33

3

3

3

3

33

3

33

3

3

33

33

3

33

3

3

3

33

33

3 3 3

3

333

3

3

3

3

3

3 3

3

3

3

3

4

44

4

4

44

4

4

4

4

44

44

4

4

4

4

4

4

4

44

4

4

44

4

44

4

44

4 4

Areas 1-4, projected on crimcords

Page 58: June 11, 2005Interface / CSNA Meeting St. Louis1 Estimating the Cluster Tree of a Density Werner Stuetzle Rebecca Nugent Department of Statistics University

June 11, 2005 Interface / CSNA Meeting St. Louis 58

ClusteringasastatisticalproblemAssumethatfeaturevectorsx1;:::;xniid»p(x).DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.ThereforethetotalcollectionofsetscanbearrangedintoamodetreeLeavesofmodetreecorrespondtonodesofp(x).

Considerfeaturevectorsx1;:::;xnasarandomsamplefromsome(innite)population.Letp(x)bethepopulationdensity.(Think"histogramwithinnitesimallysmallbins")DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.Thereforethetotalcollectionofsetscanbearrangedintoatree,theclustertreeofthedensity.Leavesofclustertreecorrespondtomodesofp(x).

Page 59: June 11, 2005Interface / CSNA Meeting St. Louis1 Estimating the Cluster Tree of a Density Werner Stuetzle Rebecca Nugent Department of Statistics University

June 11, 2005 Interface / CSNA Meeting St. Louis 59

ClusteringasastatisticalproblemAssumethatfeaturevectorsx1;:::;xniid»p(x).DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.ThereforethetotalcollectionofsetscanbearrangedintoamodetreeLeavesofmodetreecorrespondtonodesofp(x).

Considerfeaturevectorsx1;:::;xnasarandomsamplefromsome(innite)population.Letp(x)bethepopulationdensity.(Think"histogramwithinnitesimallysmallbins")DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.Thereforethetotalcollectionofsetscanbearrangedintoatree,theclustertreeofthedensity.Leavesofclustertreecorrespondtomodesofp(x).

Page 60: June 11, 2005Interface / CSNA Meeting St. Louis1 Estimating the Cluster Tree of a Density Werner Stuetzle Rebecca Nugent Department of Statistics University

June 11, 2005 Interface / CSNA Meeting St. Louis 60

ClusteringasastatisticalproblemAssumethatfeaturevectorsx1;:::;xniid»p(x).DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.ThereforethetotalcollectionofsetscanbearrangedintoamodetreeLeavesofmodetreecorrespondtonodesofp(x).

Considerfeaturevectorsx1;:::;xnasarandomsamplefromsome(innite)population.Letp(x)bethepopulationdensity.(Think"histogramwithinnitesimallysmallbins")DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.Thereforethetotalcollectionofsetscanbearrangedintoatree,theclustertreeofthedensity.Leavesofclustertreecorrespondtomodesofp(x).

Page 61: June 11, 2005Interface / CSNA Meeting St. Louis1 Estimating the Cluster Tree of a Density Werner Stuetzle Rebecca Nugent Department of Statistics University

June 11, 2005 Interface / CSNA Meeting St. Louis 61

ClusteringasastatisticalproblemAssumethatfeaturevectorsx1;:::;xniid»p(x).DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.ThereforethetotalcollectionofsetscanbearrangedintoamodetreeLeavesofmodetreecorrespondtonodesofp(x).

Considerfeaturevectorsx1;:::;xnasarandomsamplefromsome(innite)population.Letp(x)bethepopulationdensity.(Think"histogramwithinnitesimallysmallbins")DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.Thereforethetotalcollectionofsetscanbearrangedintoatree,theclustertreeofthedensity.Leavesofclustertreecorrespondtomodesofp(x).

Page 62: June 11, 2005Interface / CSNA Meeting St. Louis1 Estimating the Cluster Tree of a Density Werner Stuetzle Rebecca Nugent Department of Statistics University

June 11, 2005 Interface / CSNA Meeting St. Louis 62

Page 63: June 11, 2005Interface / CSNA Meeting St. Louis1 Estimating the Cluster Tree of a Density Werner Stuetzle Rebecca Nugent Department of Statistics University

June 11, 2005 Interface / CSNA Meeting St. Louis 63

ClusteringasastatisticalproblemAssumethatfeaturevectorsx1;:::;xniid»p(x).DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.ThereforethetotalcollectionofsetscanbearrangedintoamodetreeLeavesofmodetreecorrespondtonodesofp(x).

Considerfeaturevectorsx1;:::;xnasarandomsamplefromsome(innite)population.Letp(x)bethepopulationdensity.(Think"histogramwithinnitesimallysmallbins")DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.Thereforethetotalcollectionofsetscanbearrangedintoatree,theclustertreeofthedensity.Leavesofclustertreecorrespondtomodesofp(x).

Page 64: June 11, 2005Interface / CSNA Meeting St. Louis1 Estimating the Cluster Tree of a Density Werner Stuetzle Rebecca Nugent Department of Statistics University

June 11, 2005 Interface / CSNA Meeting St. Louis 64

Page 65: June 11, 2005Interface / CSNA Meeting St. Louis1 Estimating the Cluster Tree of a Density Werner Stuetzle Rebecca Nugent Department of Statistics University

June 11, 2005 Interface / CSNA Meeting St. Louis 65

Page 66: June 11, 2005Interface / CSNA Meeting St. Louis1 Estimating the Cluster Tree of a Density Werner Stuetzle Rebecca Nugent Department of Statistics University

June 11, 2005 Interface / CSNA Meeting St. Louis 66

Page 67: June 11, 2005Interface / CSNA Meeting St. Louis1 Estimating the Cluster Tree of a Density Werner Stuetzle Rebecca Nugent Department of Statistics University

June 11, 2005 Interface / CSNA Meeting St. Louis 67

Page 68: June 11, 2005Interface / CSNA Meeting St. Louis1 Estimating the Cluster Tree of a Density Werner Stuetzle Rebecca Nugent Department of Statistics University

June 11, 2005 Interface / CSNA Meeting St. Louis 68

-10 -5 0 5 10

-50

51

0

PC1

PC

2

Page 69: June 11, 2005Interface / CSNA Meeting St. Louis1 Estimating the Cluster Tree of a Density Werner Stuetzle Rebecca Nugent Department of Statistics University

June 11, 2005 Interface / CSNA Meeting St. Louis 69

-10 -5 0 5 10

-50

51

0

PC1

PC

2

-10 -5 0 5 10

-50

51

0

PC1

PC

2

Page 70: June 11, 2005Interface / CSNA Meeting St. Louis1 Estimating the Cluster Tree of a Density Werner Stuetzle Rebecca Nugent Department of Statistics University

June 11, 2005 Interface / CSNA Meeting St. Louis 70

Page 71: June 11, 2005Interface / CSNA Meeting St. Louis1 Estimating the Cluster Tree of a Density Werner Stuetzle Rebecca Nugent Department of Statistics University

June 11, 2005 Interface / CSNA Meeting St. Louis 71

Runt Pruning algorithm:

generate_cluster_tree_node (mst, runt_size_threshold) { node = new_cluster_tree_node node.leftson = node.rightson = NULL node.obs = leaves (mst) cut_edge = longest_edge_with_large_runt_size (mst, runt_size_threshold) if (cut_edge) { node.leftson = generate_cluster_tree_node (left_subtree (cut_edge), runt_size_threshold) node.rightson = generate_cluster_tree_node (right_subtree (cut_edge), runt_size_threshold) } return (node)}

0 1 2 3 4

-10

12

34

rs = 1

rs = 5

rs = 2