visualization and data mining

Post on 21-Mar-2016

27 Views

Category:

Documents

3 Downloads

Preview:

Click to see full reader

DESCRIPTION

Visualization and Data Mining. Napoleon Invasion of Russia, 1812. Napoleon. Marley, 1885. Snow’s Cholera Map, 1855. Asia at night. South and North Korea at night. North Korea Notice how dark it is. Seoul, South Korea. Visualization Role. Support interactive exploration - PowerPoint PPT Presentation

TRANSCRIPT

Visualization and

Data Mining

22

Napoleon Invasion of Russia, 1812

Napoleon

33

Marley, 1885

44

55

Snow’s Cholera Map, 1855

66

Asia at night

77

South and North Korea at night

Seoul,South Korea

North KoreaNotice how darkit is

88

Visualization Role Support interactive exploration Help in result presentation Disadvantage: requires human eyes Can be misleading

99

Bad Visualization: Spreadsheet with misleading Y -axisYear Sales

1999 21102000 21052001 21202002 21212003 2124

Sales

20952100210521102115212021252130

1999 2000 2001 2002 2003

Sales

Y-Axis scale gives WRONGimpression of big change

1010

Better VisualizationYear Sales

1999 21102000 21052001 21202002 21212003 2124

Sales

0

500

1000

1500

2000

2500

3000

1999 2000 2001 2002 2003

Sales

Axis from 0 to 2000 scale gives correct impression of small change

1111

Lie Factor=14.8

(E.R. Tufte, “The Visual Display of Quantitative Information”, 2nd edition)(E.R. Tufte, “The Visual Display of Quantitative Information”, 2nd edition)

1212

Lie Factor

dataineffectofsizegraphicinshowneffectofsizeFactorLie

8.14528.0833.7

18)0.185.27(

6.0)6.03.5(

Tufte requirement: 0.95<Lie Factor<1.05

(E.R. Tufte, “The Visual Display of Quantitative Information”, 2nd edition)(E.R. Tufte, “The Visual Display of Quantitative Information”, 2nd edition)

1313

Tufte’s Principles of Graphical Excellence Give the viewer

the greatest number of ideas in the shortest time with the least ink in the smallest space.

Tell the truth about the data!

(E.R. Tufte, “The Visual Display of Quantitative Information”, 2nd edition)(E.R. Tufte, “The Visual Display of Quantitative Information”, 2nd edition)

1414

Visualization Methods Visualizing in 1-D, 2-D and 3-D

well-known visualization methods Visualizing more dimensions

Parallel Coordinates Other ideas

1515

1-D (Univariate) Data Representations

7

5

3

1

0 20

Mean

low highMiddle 50%

Tukey box plot

Histogram

1616

2-D (Bivariate) Data Scatter plot, …

price

mileage

1717

3-D Data (projection)

price

1818

3-D image (requires 3-D blue and red glasses)

Taken by Mars Rover Spirit, Jan 2004

1919

Visualizing in 4+ Dimensions Scatterplots Parallel Coordinates Chernoff faces …

2020

Multiple Views

Give each variable its own display

A B C D E1 4 1 8 3 52 6 3 4 2 13 5 7 2 4 34 2 6 3 1 5

A B C D E

1

2

3

4

Problem: does not show correlations

2121

Scatterplot Matrix

Represent each possiblepair of variables in theirown 2-D scatterplot (car data)

Q: Useful for what? A: linear correlations (e.g. horsepower & weight)

Q: Misses what? A: multivariate effects

2222

Parallel Coordinates • Encode variables along a horizontal row• Vertical line specifies values

Dataset in a Cartesian coordinates

Same dataset in parallel coordinates

Invented by Alfred Inselberg while at IBM, 1985

2323

Example: Visualizing Iris Data

sepal length

sepal width

petal length

petal width

5.1 3.5 1.4 0.2

4.9 3 1.4 0.2

... ... ... ...

5.9 3 5.1 1.8

Iris setosa

Iris versicolor

Iris virginica

2424

Flower PartsPetal, a non-reproductive part of the flower

Sepal, a non-reproductive part of the flower

2525

Parallel Coordinates Sepal Length

5.1

sepal length

sepal width

petal length

petal width

5.1 3.5 1.4 0.2

2626

Parallel Coordinates: 2 DSepal Length

5.1

Sepal Width

3.5

sepal length

sepal width

petal length

petal width

5.1 3.5 1.4 0.2

2727

Parallel Coordinates: 4 DSepal Length

5.1

Sepal Width

Petal length

Petal Width

3.5

sepal length

sepal width

petal length

petal width

5.1 3.5 1.4 0.2

1.4 0.2

2828

5.1

3.5

1.4 0.2

Parallel Visualization of Iris data

2929

Parallel Visualization Summary Each data point is a line Similar points correspond to similar lines Lines crossing over correspond to negatively correlated attributes

Interactive exploration and clustering

Problems: order of axes, limit to ~20 dimensions

3030

Chernoff FacesEncode different variables’ values in characteristicsof human face

http://www.cs.uchicago.edu/~wiseman/chernoff/http://hesketh.com/schampeo/projects/Faces/chernoff.html

Cute applets:

3131

Interactive Face

3232

Chernoff faces, example

3333

Visualization Summary Many methods Visualization is possible in more than 3-D Aim for graphical excellence

top related