multivariate data & representations

32
Multivariate Data & Representations CS 7450 - Information Visualization Jan. 15, 2004 John Stasko Spring 2004 CS 7450 2 Agenda Data forms and representations Basic representation techniques Multivariate (>3) techniques

Upload: others

Post on 01-Oct-2021

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Multivariate Data & Representations

1

Multivariate Data &Representations

CS 7450 - Information VisualizationJan. 15, 2004John Stasko

Spring 2004 CS 7450 2

Agenda

• Data forms and representations• Basic representation techniques• Multivariate (>3) techniques

Page 2: Multivariate Data & Representations

2

Spring 2004 CS 7450 3

Data Sets

• Data comes in many different forms• Typically, not in the way you want it

• How is stored (in the raw)?

Spring 2004 CS 7450 4

Example

• Cars− make− model− year− miles per gallon− cost− number of cylinders− weights− ...

Page 3: Multivariate Data & Representations

3

Spring 2004 CS 7450 5

Example

• Web pages

Spring 2004 CS 7450 6

Data Tables

• Often, we take raw data and transform it into a form that is more workable

• Main idea:− Individual items are called cases− Cases have variables (attributes)

Page 4: Multivariate Data & Representations

4

Spring 2004 CS 7450 7

Data Table Format

Case1 Case2 Case3 ...

Variable1

Variable2

Variable3

...

Value11 Value21 Value31

Value12 Value22 Value32

Value13 Value23 Value33

Think of as a functionf(case1) = <Val11, Val12,…>

Spring 2004 CS 7450 8

Example

People in class

Mary Jim Sally Mitch ...

SSN

Age

Hair

GPA

...

145 294 563 823

23 17 47 29

brown black blonde red

2.9 3.7 3.4 2.1

Page 5: Multivariate Data & Representations

5

Spring 2004 CS 7450 9

Example

Baseballstatistics

Spring 2004 CS 7450 10

Variable Types

• Three main types of variables− N-Nominal (equal or not equal to other

values)Example: gender

− O-Ordinal (obeys < relation, ordered set)Example: fr,so,jr,sr

− Q-Quantitative (can do math on them)Example: age

Page 6: Multivariate Data & Representations

6

Spring 2004 CS 7450 11

Metadata

• Descriptive information about the data− Might be something as simple as the type of

a variable, or could be more complex− For times when the table itself just isn’t

enough− Example: if variable1 is “l”, then variable3

can only be 3, 7 or 16

Spring 2004 CS 7450 12

How Many Variables?

• Data sets of dimensions 1, 2, 3 are common

• Number of variables per class− 1 - Univariate data− 2 - Bivariate data− 3 - Trivariate data− >3 - Hypervariate data

Page 7: Multivariate Data & Representations

7

Spring 2004 CS 7450 13

Representation

• What’s a common way of visually representing multivariate data sets?

• Graphs!

Spring 2004 CS 7450 14

Good Example

www.nationmaster.com

Page 8: Multivariate Data & Representations

8

Spring 2004 CS 7450 15

Basic Symbolic Displays

• Graphs • Charts• Maps• Diagrams

From:S. Kosslyn, “Understanding chartsand graphs”, Applied CognitivePsychology, 1989.

Spring 2004 CS 7450 16

1. Graph

Showing the relationships between variables’values in a data table

0

20

40

60

80

100

1stQtr

2ndQtr

3rdQtr

4thQtr

EastWestNorth

Page 9: Multivariate Data & Representations

9

Spring 2004 CS 7450 17

Properties

• Graph− Visual display that illustrates one or more

relationships among entities− Shorthand way to present information− Allows a trend, pattern or comparison to be

easily comprehended

Spring 2004 CS 7450 18

Issues

• Critical to remain task-centric− Why do you need a graph?− What questions are being answered?− What data is needed to answer those

questions?− Who is the audience?

time

money

Page 10: Multivariate Data & Representations

10

Spring 2004 CS 7450 19

Graph Components

• Framework− Measurement types, scale

• Content− Marks, lines, points

• Labels− Title, axes, ticks

Spring 2004 CS 7450 20

Other Symbolic Displays

• Chart• Map• Diagram

Page 11: Multivariate Data & Representations

11

Spring 2004 CS 7450 21

2. Chart

• Structure is important, relates entities to each other• Primarily uses lines, enclosure, position to link entities

Examples: flowchart, family tree, org chart, ...

Spring 2004 CS 7450 22

3. Map

• Representation of spatial relations• Locations identified by labels

Page 12: Multivariate Data & Representations

12

Spring 2004 CS 7450 23

Choropleth Map

Areas are filledand coloreddifferently toindicate someattribute of thatregion

Spring 2004 CS 7450 24

Cartography

• Cartographers and map-makers have a wealth of knowledge about the design and creation of visual information artifacts− Labeling, color, layout, …

• Information visualization researchers should learn from this older, existing area

Page 13: Multivariate Data & Representations

13

Spring 2004 CS 7450 25

4. Diagram

• Schematic picture of object or entity• Parts are symbolic

Examples: figures, steps in a manual, illustrations,...

Spring 2004 CS 7450 26

Details

• What are the constituent pieces of these four symbolic displays?

• What are the building blocks?

Page 14: Multivariate Data & Representations

14

Spring 2004 CS 7450 27

Visual Structures

• Composed of− Spatial substrate− Marks− Graphical properties of marks

Spring 2004 CS 7450 28

Space

• Visually dominant• Often put axes on space to assist• Use techniques of

composition, alignment, folding,recursion, overloading to 1) increase use of space2) do data encodings

Page 15: Multivariate Data & Representations

15

Spring 2004 CS 7450 29

Marks

• Things that occur in space− Points− Lines− Areas− Volumes

Spring 2004 CS 7450 30

Graphical Properties

• Size, shape, color, orientation...

Spatial properties Object properties

Expressingextent

Differentiatingmarks

PositionSize Grayscale

Orientation ColorShapeTexture

Page 16: Multivariate Data & Representations

16

Spring 2004 CS 7450 31

Back to Data

• What were the different types of data sets?

• Number of variables per class− 1 - Univariate data− 2 - Bivariate data− 3 - Trivariate data− >3 - Hypervariate data

Spring 2004 CS 7450 32

Univariate Data

• Representations

7

5

3

1

Bill

0 20

Mean

low highMiddle 50%

Tukey box plot

Page 17: Multivariate Data & Representations

17

Spring 2004 CS 7450 33

What goes where

• In univariate representations, we often think of the data case as being shown along one dimension, and the value in another

Linegraph

Bargraph

Y-axis is quantitativevariable

See changes overconsecutive values

Y-axis is quantitativevariable

Compare relative pointvalues

Spring 2004 CS 7450 34

Alternative View

• We may think of graph as representing independent (data case) and dependent (value) variables

• Guideline:− Independent vs. dependent variables

Put independent on x-axisSee resultant dependent variables along y-axis

Page 18: Multivariate Data & Representations

18

Spring 2004 CS 7450 35

Bivariate Data

• Representations

Scatter plot is common

price

mileage

Two variables, want tosee relationship

Is there a linear, curved orrandom pattern?

Each mark is nowa data case

Spring 2004 CS 7450 36

Trivariate Data

• Representations

3D scatter plot is possible

horsepower

mileage

price

Page 19: Multivariate Data & Representations

19

Spring 2004 CS 7450 37

Alternative Representation

Still use 2D but havemark propertyrepresent thirdvariable

Spring 2004 CS 7450 38

Alternative Representation

Represent each variablein its own explicit way

Page 20: Multivariate Data & Representations

20

Spring 2004 CS 7450 39

Hypervariate Data

• Ahhh, the tough one• Number of well-known visualization

techniques exist for data sets of 1-3 dimensions− line graphs, bar graphs, scatter plots OK− We see a 3-D world (4-D with time)

• What about data sets with more than 3 variables?− Often the interesting, challenging ones

Spring 2004 CS 7450 40

Multiple Views

Give each variable its own display

A B C D E1 4 1 8 3 52 6 3 4 2 13 5 7 2 4 34 2 6 3 1 5

A B C D E

1

2

3

4

Page 21: Multivariate Data & Representations

21

Spring 2004 CS 7450 41

Scatterplot Matrix

Represent each possiblepair of variables in theirown 2-D scatterplot

Useful for what?Misses what?

Spring 2004 CS 7450 42

Chernoff Faces

Encode different variables’ values in characteristicsof human face

http://www.cs.uchicago.edu/~wiseman/chernoff/http://hesketh.com/schampeo/projects/Faces/chernoff.html

Cute applets:

Page 22: Multivariate Data & Representations

22

Spring 2004 CS 7450 43

Star Plots

Var 1

Var 2

Var 3Var 4

Var 5

Value

Space out the nvariables at equalangles around a circle

Each “spoke” encodesa variable’s value

Spring 2004 CS 7450 44

Star Plot examples

http://seamonkey.ed.asu.edu/~behrens/asu/reports/compre/comp1.html

Page 23: Multivariate Data & Representations

23

Spring 2004 CS 7450 45

Star Coordinates

E. Kandogan, “Star Coordinates: A Multi-dimensional VisualizationTechnique with Uniform Treatment of Dimensions”, InfoVis 2000Late-Breaking Hot Topics, Oct. 2000 Demo

Spring 2004 CS 7450 46

Parallel Coordinates

• What are they?− Explain…

Page 24: Multivariate Data & Representations

24

Spring 2004 CS 7450 47

Parallel Coordinates

V1 V2 V3 V4 V5

Encode variables alonga horizontal row

Vertical line specifiesvalues

Spring 2004 CS 7450 48

Parallel Coords Example

Basic

Grayscale

Color

Page 25: Multivariate Data & Representations

25

Spring 2004 CS 7450 49

Application

• System that uses parallel coordinates for information analysis and discovery

• Interactive tool− Can focus on certain data items− Color

Taken from:

A. Inselberg, “Multidimensional Detective”InfoVis ‘97, 1997.

Spring 2004 CS 7450 50

Discuss

• What was their domain?• What was their problem?• What were their data sets?

Page 26: Multivariate Data & Representations

26

Spring 2004 CS 7450 51

The Problem

• VLSI chip manufacture• Want high quality chips (high speed) and

a high yield batch (% of useful chips)• Able to track defects• Hypothesis: No defects gives desired chip

types• 473 batches of data

Spring 2004 CS 7450 52

The Data

• 16 variables− X1 - yield− X2 - quality− X3-X12 - # defects (inverted)− X13-X16 - physical parameters

Page 27: Multivariate Data & Representations

27

Spring 2004 CS 7450 53

Parallel Coordinate Display

Yikes!But notthat bad

yield &quality defects parameters

Distributionsx1 - normalx2 - bipolar

Spring 2004 CS 7450 54

Top Yield & Quality

defects

Have some defects

split

Page 28: Multivariate Data & Representations

28

Spring 2004 CS 7450 55

Minimal Defects

Not thehighestyields andquality

Spring 2004 CS 7450 56

Best Yields

Appears thatsome defectsare necessaryto produce the best chips

Non-intuitive!

Page 29: Multivariate Data & Representations

29

Spring 2004 CS 7450 57

Xmdv

Toolsuite createdby Matthew Wardof WPI

Includes parallelcoordinate views

Demo

Spring 2004 CS 7450 58

Parallel Coordinate Tree

D. Brodbeck and L. Girardin, "Visualization of Large-Scale Customer Satisfaction Surveys Using a Parallel Coordinate Tree", InfoVis ’03.

Demo

Page 30: Multivariate Data & Representations

30

Spring 2004 CS 7450 59

Parallel Coordinates

• Technique− Strengths?− Weaknesses?

Spring 2004 CS 7450 60

Sliding Rods

T. Lanning, K. Wittenburg, et al, "Multidimensional Information Visualization through Sliding Rods", Proceedings of AVI 2000

Page 31: Multivariate Data & Representations

31

Spring 2004 CS 7450 61

Administratia

• Computer accounts• HW 2 due Tuesday

Spring 2004 CS 7450 62

Upcoming

• Multivariate vis tools− Reading:

Eick paper

• Visual perception• Tufte (please be reading)

Page 32: Multivariate Data & Representations

32

Spring 2004 CS 7450 63

Sources Used

CMS bookReferenced articlesMarti Hearst SIMS 247 lecturesKosslyn ‘89 articleA. Marcus, Graphic Design for Electronic Documents

and User InterfacesM. Monmonier, How to Lie with MapsW. Cleveland, The Elements of Graphing DataC. H. Yu, Visualization Techniques of Different Dimensionshttp://seamonkey.ed.asu.edu/~behrens/asu/reports/compre/comp1.htmlhttp://www.csc.ncsu.edu/faculty/healey/PP/PP.html