a comparison of graphical techniques for the display of co-occurrence data jan w. buzydlowski, xia...

39
A Comparison of Graphical Techniques for the Display of Co-Occurrence Data Jan W. Buzydlowski, Xia Lin, Howard D. White College of Information Science and Technology Drexel University Philadelphia, PA 19104 USA

Upload: flora-stevenson

Post on 14-Dec-2015

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A Comparison of Graphical Techniques for the Display of Co-Occurrence Data Jan W. Buzydlowski, Xia Lin, Howard D. White College of Information Science

A Comparison of Graphical Techniques for the Display of Co-Occurrence Data

Jan W. Buzydlowski, Xia Lin, Howard D. WhiteCollege of Information Science and Technology

Drexel University

Philadelphia, PA 19104

USA

Page 2: A Comparison of Graphical Techniques for the Display of Co-Occurrence Data Jan W. Buzydlowski, Xia Lin, Howard D. White College of Information Science

Information Visualization

(Data) Visualization allows for the revelation of intricate structure which cannot be absorbed in any other way. [Cleveland, 1993]

(Information) Visualization has two aspects, structural modeling and graphic representation.[C. Chen, 1999]

– data - model - display

Page 3: A Comparison of Graphical Techniques for the Display of Co-Occurrence Data Jan W. Buzydlowski, Xia Lin, Howard D. White College of Information Science

Visualization Overview Model - Display

– Co-Occurrence Model– 3 Graphical Displays

Data– Co-citation counts from the Institute for

Scientific Information, Philadelphia, PA• Obtained from a 10-year Arts & Humanities

Citation Index database given Drexel by ISI for research purposes

Page 4: A Comparison of Graphical Techniques for the Display of Co-Occurrence Data Jan W. Buzydlowski, Xia Lin, Howard D. White College of Information Science

Co-Occurrence Model

Examples Derivation Metrics

Page 5: A Comparison of Graphical Techniques for the Display of Co-Occurrence Data Jan W. Buzydlowski, Xia Lin, Howard D. White College of Information Science

Co-Occurrence Data - Example 1

Market Basket Analysis– a shopping cart holds items purchased

• e.g., milk, bread, razor blades, newspaper

Over all the sales for one day– what items are purchased together

• how can we arrange the items in the store– Pampers and beer on Thursdays...

Page 6: A Comparison of Graphical Techniques for the Display of Co-Occurrence Data Jan W. Buzydlowski, Xia Lin, Howard D. White College of Information Science

Co-Occurrence Data - Example 2 Author Co-citation Analysis (ACA)

– Bibliographic data on a given article holds, e.g.,• title, keywords, abstract, citations to other documents

– An article might cite, e.g.:• Plato, Aristotle, Smith, Brown

Over a given set of many citing articles – Count how many times each pair of authors were

cited together – Resulting co-citation count shows common

intellectual interest

Page 7: A Comparison of Graphical Techniques for the Display of Co-Occurrence Data Jan W. Buzydlowski, Xia Lin, Howard D. White College of Information Science

Co-Occurrence Derivation For a given data set (N = 4 unique terms)

– Article 1: Plato, Aristotle, Smith– Article 2: Plato, Smith– Article 3: Plato, Aristotle, Smith, Brown

The following co-citations (C(4,2) = 6) are found– COMBINATION COUNT ARTICLES– Plato and Smith 3 1, 2, 3– Plato and Aristotle 2 1, 3– Plato and Brown 1 3– Aristotle and Smith 2 1, 3– Aristotle and Brown 1 3– Smith and Brown 1 3

Page 8: A Comparison of Graphical Techniques for the Display of Co-Occurrence Data Jan W. Buzydlowski, Xia Lin, Howard D. White College of Information Science

Co-Occurrence Measures

Raw counts Additional information

– Correlations• Replace each cell by correlation measure of each pair-

wise column

– Conditional Probability• Compute each cell by dividing each unique combination

by total occurring

Page 9: A Comparison of Graphical Techniques for the Display of Co-Occurrence Data Jan W. Buzydlowski, Xia Lin, Howard D. White College of Information Science

Co-Occurrence Structure -Example

Plato Aristotle Smith Brown

Plato 3 2 3 1

Aristotle 2 2 2 1

Smith 3 2 3 1

Brown 1 1 1 1

Page 10: A Comparison of Graphical Techniques for the Display of Co-Occurrence Data Jan W. Buzydlowski, Xia Lin, Howard D. White College of Information Science

Graphical Techniques

Three Methodologies

–Multi-dimensional scaling

–Self-organizing maps

–Pathfinder networks

Page 11: A Comparison of Graphical Techniques for the Display of Co-Occurrence Data Jan W. Buzydlowski, Xia Lin, Howard D. White College of Information Science

MDS

Page 12: A Comparison of Graphical Techniques for the Display of Co-Occurrence Data Jan W. Buzydlowski, Xia Lin, Howard D. White College of Information Science

Baltimore Denver Detroit Miami Seattle SanFransico

Baltimore 0 1621 503 1080 2681 2796

Denver 1621 0 1274 2077 1303 1257

Detroit 503 1274 0 1389 2359 2411

Miami 1080 2077 1389 0 3334 3131

Seattle 2681 1303 2359 3334 0 2840

SanFransico

2796 1257 2411 3131 2840 0

Page 13: A Comparison of Graphical Techniques for the Display of Co-Occurrence Data Jan W. Buzydlowski, Xia Lin, Howard D. White College of Information Science

2.01.51.0.50.0-.5-1.0-1.5-2.0

.6

.4

.2

0.0

-.2

-.4

-.6

-.8

san fransico

seattlee

miami

detroit

denver

baltimore

Page 14: A Comparison of Graphical Techniques for the Display of Co-Occurrence Data Jan W. Buzydlowski, Xia Lin, Howard D. White College of Information Science

MDS Methodology

Given original distances (similarities) estimate coordinates that could give those distances

The computed distances should correspond to the original distances– Stress

• Added dimensions

Page 15: A Comparison of Graphical Techniques for the Display of Co-Occurrence Data Jan W. Buzydlowski, Xia Lin, Howard D. White College of Information Science

SOM

Page 16: A Comparison of Graphical Techniques for the Display of Co-Occurrence Data Jan W. Buzydlowski, Xia Lin, Howard D. White College of Information Science

Self-Organizing Maps (SOMs) Also known as Kohonen Maps Based on Neural Networks

– Related to wetware• robust techniques

– If categories are known• supervised technique

– backproprogating learning

– If categories are sought• unsupervised technique

– competitive learning

Page 17: A Comparison of Graphical Techniques for the Display of Co-Occurrence Data Jan W. Buzydlowski, Xia Lin, Howard D. White College of Information Science

SOMs Given a 2-D grid of nodes

– each node has N weights– each vector (row) has N terms– map each input vector to a node

Similar to vector quantization (VQ)

Page 18: A Comparison of Graphical Techniques for the Display of Co-Occurrence Data Jan W. Buzydlowski, Xia Lin, Howard D. White College of Information Science

SOMs Generation– nodes initially given random weights– randomly sample an input vector

• row of co-occurrence matrix• with replacement

– find a node closest to vector• Euclidean distance

– update node weights• node weight = node weight + gain term * distance• update “neighborhood”

– “cool” gain term and neighborhood– repeat…

Page 19: A Comparison of Graphical Techniques for the Display of Co-Occurrence Data Jan W. Buzydlowski, Xia Lin, Howard D. White College of Information Science

PF Nets

Page 20: A Comparison of Graphical Techniques for the Display of Co-Occurrence Data Jan W. Buzydlowski, Xia Lin, Howard D. White College of Information Science

Pathfinder Networks

Uses on graph notation– nodes = authors– edges = co-citation counts

Co-occurrence is a complete network (weighted, undirected)

Plato

Aristotle

Smith2

3

2

Plato Aristotle Smith Brown

Plato 3 2 3 1

Aristotle 2 2 2 1

Smith 3 2 3 1

Brown 1 1 1 1

Page 21: A Comparison of Graphical Techniques for the Display of Co-Occurrence Data Jan W. Buzydlowski, Xia Lin, Howard D. White College of Information Science

Pathfinder Networks Generation

Pathfinder Network is generated by varying the parameters:– distance (r)– triangle inequality (q)

Page 22: A Comparison of Graphical Techniques for the Display of Co-Occurrence Data Jan W. Buzydlowski, Xia Lin, Howard D. White College of Information Science

Pathfinder Distance Uses Minkowski metric:

d = ( eir )1/r

Example– e1 = 3, e2 = 4

– r = 1 => d: 7 = 3 + 4 : • Driving distance / ratio data

– r = 2 => d: 5 = (9 + 16)1/2 • Euclidean Distance

– r (approaches) infinity => d: 4 = max( 3, 4)• ordinal data• rank rather than value

Page 23: A Comparison of Graphical Techniques for the Display of Co-Occurrence Data Jan W. Buzydlowski, Xia Lin, Howard D. White College of Information Science

Pathfinder Triangle Inequality

A required property of a metric definitiond(i,j) < d(i,k) + d(k,j)

But may not be justified – in personal judgments

• If a is similar to b, and b is similar to c, there may be no transitive judgment of similarity from a to c

– in set intersections• Even though Smith and Jones appear 12 times, and Jones

and Brown appear 5 times, the overlap between Smith and Brown cannot be predicted

Page 24: A Comparison of Graphical Techniques for the Display of Co-Occurrence Data Jan W. Buzydlowski, Xia Lin, Howard D. White College of Information Science

Pathfinder Triangle Inequality Defines q-triangular

– check paths of length q to determine if inequality is met

• minimum is 2• maximum is n -1

– full compliance

– the longer the length, the fewer the connections

Page 25: A Comparison of Graphical Techniques for the Display of Co-Occurrence Data Jan W. Buzydlowski, Xia Lin, Howard D. White College of Information Science

Pathfinder Example

Page 26: A Comparison of Graphical Techniques for the Display of Co-Occurrence Data Jan W. Buzydlowski, Xia Lin, Howard D. White College of Information Science

Pathfinder Network Creation

PFNet (r, q)– Examine all paths of length q or less.– Use Minkowski Metric with parameter r to compute

path length.– If a path of less weight is found, then remove the

edge.

Page 27: A Comparison of Graphical Techniques for the Display of Co-Occurrence Data Jan W. Buzydlowski, Xia Lin, Howard D. White College of Information Science

Pathfinder - ExampleSmith Jones

Brown

5

3 4

r = 1 => Smith - Jones is kept

r = 2 => Smith - Jones is kept

q = 2

r = infinity => Smith - Jones is removed

Page 28: A Comparison of Graphical Techniques for the Display of Co-Occurrence Data Jan W. Buzydlowski, Xia Lin, Howard D. White College of Information Science

Comparison of Techniques MDS

– Reduces dimensions / reveals clusters• 2D may be insufficient• measurement may not be Euclidean

SOM– robust

• no guarantee of convergence/unique solution

Pathfinder– does not assume ratio data/triangle inequality

• connections rather than position is important• additional methodology needed for display

Page 29: A Comparison of Graphical Techniques for the Display of Co-Occurrence Data Jan W. Buzydlowski, Xia Lin, Howard D. White College of Information Science

Comparison of Techniques

Similarities– Spatial models

Differences– use of visual space– semantic meaning

• as related to data– research in progress

Page 30: A Comparison of Graphical Techniques for the Display of Co-Occurrence Data Jan W. Buzydlowski, Xia Lin, Howard D. White College of Information Science

Graphical Display of Methodologies

MDS– assume that 2 dimensions are sufficient

• x, y for each point already defined

SOM– grid defines the 2D surface

• plot each label with the appropriate node

Pathfinder– only defines the nodes and links

• need additional methodologies– Spring-embedder models

» Kamada and Kawai (1989)

» Fruchterman and Reingold (1991)

» Davidson and Harel (1996)

Page 31: A Comparison of Graphical Techniques for the Display of Co-Occurrence Data Jan W. Buzydlowski, Xia Lin, Howard D. White College of Information Science

Graphical Comparison of Three Methods

Data– Institute for Scientific Information– Arts and Humanities Database (AHCI)

• 1988 - 1997• 1.26 million records

Example: – Given Plato, find related authors

• Interface described in IV 2000 Paper• CSNA 2000 Paper

– (Lin, Buzydlowski, White)

Page 32: A Comparison of Graphical Techniques for the Display of Co-Occurrence Data Jan W. Buzydlowski, Xia Lin, Howard D. White College of Information Science

25 Authors Co-cited with Plato

PLATO (4928) ARISTOTLE (1861) PLUTARCH (838) CICERO (699) HOMER (627) BIBLE (552) EURIPIDES (515) ARISTOPHANES (474) XENOPHON (459) AUGUSTINE (432) HERODOTUS (425) KANT-I (385) AESCHYLUS (374)

SOPHOCLES (363) THUCYDIDES (363) OVID (334) HESIOD (325) DIOGENES-LAERTIUS (317) HEIDEGGER-M (312) DERRIDA-J (304) PINDAR (292) NIETZSCHE-F (278) HEGEL-GWF (264) VERGIL (259) AQUINAS-T (255)

Page 33: A Comparison of Graphical Techniques for the Display of Co-Occurrence Data Jan W. Buzydlowski, Xia Lin, Howard D. White College of Information Science

300 Pair-wise co-citations

1: PLATO AND ARISTOTLE -1940 docs 2: PLATO AND PLUTARCH - 872 docs

.

.

.

300: VERGIL AND AQUINAS-T - 38 docs

Page 34: A Comparison of Graphical Techniques for the Display of Co-Occurrence Data Jan W. Buzydlowski, Xia Lin, Howard D. White College of Information Science

0 1940 872 742 664 566 532 532 499 442 444 391 385 371 374 350 346 339 317 316 306 308 279 279 2691940 0 864 917 531 554 409 446 473 544 423 595 317 312 374 253 271 326 413 290 229 284 383 223 744872 864 0 1046 535 395 427 448 503 220 582 28 276 264 447 444 265 314 20 25 265 28 27 372 38742 917 1046 0 329 431 218 180 213 476 217 85 118 125 156 645 162 264 46 59 115 58 61 615 142664 531 535 329 0 233 552 333 250 105 394 24 400 416 224 446 513 121 33 65 404 60 35 568 20566 554 395 431 233 0 126 83 133 908 141 173 86 90 62 257 98 81 118 118 59 126 112 266 405532 409 427 218 552 126 0 402 258 52 314 11 475 491 224 257 277 90 13 24 307 51 12 207 5532 446 448 180 333 83 402 0 307 45 297 6 270 264 263 140 193 107 4 11 202 21 11 110 5499 473 503 213 250 133 258 307 0 40 367 11 176 180 320 101 136 132 6 8 131 18 11 92 14442 544 220 476 105 908 52 45 40 0 57 152 39 34 32 217 49 51 104 109 23 105 105 214 499444 423 582 217 394 141 314 297 367 57 0 5 245 243 398 145 210 122 6 19 230 19 11 133 5391 595 28 85 24 173 11 6 11 152 5 0 14 20 11 22 13 20 552 372 2 382 752 8 201385 317 276 118 400 86 475 270 176 39 245 14 0 391 166 127 220 57 19 32 240 40 18 124 2371 312 264 125 416 90 491 264 180 34 243 20 391 0 166 137 197 55 34 37 219 51 46 118 3374 374 447 156 224 62 224 263 320 32 398 11 166 166 0 66 98 75 10 14 121 22 7 65 9350 253 444 645 446 257 257 140 101 217 145 22 127 137 66 0 204 53 14 49 152 30 19 924 39346 271 265 162 513 98 277 193 136 49 210 13 220 197 98 204 0 70 15 20 241 23 15 173 9339 326 314 264 121 81 90 107 132 51 122 20 57 55 75 53 70 0 16 14 57 24 26 45 17317 413 20 46 33 118 13 4 6 104 6 552 19 34 10 14 15 16 0 776 9 525 453 9 120316 290 25 59 65 118 24 11 8 109 19 372 32 37 14 49 20 14 776 0 12 532 346 38 44306 229 265 115 404 59 307 202 131 23 230 2 240 219 121 152 241 57 9 12 0 18 8 128 5308 284 28 58 60 126 51 21 18 105 19 382 40 51 22 30 23 24 525 532 18 0 346 29 60279 383 27 61 35 112 12 11 11 105 11 752 18 46 7 19 15 26 453 346 8 346 0 13 114279 223 372 615 568 266 207 110 92 214 133 8 124 118 65 924 173 45 9 38 128 29 13 0 38269 744 38 142 20 405 5 5 14 499 5 201 2 3 9 39 9 17 120 44 5 60 114 38 0

Visualization allows for the revelation of intricate structure which cannot be absorbed in any other way...

Page 35: A Comparison of Graphical Techniques for the Display of Co-Occurrence Data Jan W. Buzydlowski, Xia Lin, Howard D. White College of Information Science

210-1-2-3

1.5

1.0

.5

0.0

-.5

-1.0

aquinas-tvergil

hegel-gwf

nietzsche-f

pindarderrida-jheidegger-m

diogenes-laerti

hesiod

ovid

thucydides

sophoclesaeschylus

kant-i herodotus

augustine

xenophonaristophanes

euripides

bible

homer

ciceroplutarch

aristotleplato

2D MDS map of 25 authors co-cited with Plato

Page 36: A Comparison of Graphical Techniques for the Display of Co-Occurrence Data Jan W. Buzydlowski, Xia Lin, Howard D. White College of Information Science
Page 37: A Comparison of Graphical Techniques for the Display of Co-Occurrence Data Jan W. Buzydlowski, Xia Lin, Howard D. White College of Information Science

PLATO

ARISTOTLE

PLUTARCHCICERO

HOMER

BIBLE

EURIPIDES

ARISTOPHANES

XENOPHON

AUGUSTINE

HERODOTUS

KANT-I

AESCHYLUS

SOPHOCLES

THUCYDIDES

OVID

HESIOD

DIOGENES-LAERTIUS

HEIDEGGER-M

DERRIDA-J

PINDAR

NIETZSCHE-F

HEGEL-GWF

VERGIL

AQUINAS-T

PFNet of 25 authorsco-cited with Plato

Page 38: A Comparison of Graphical Techniques for the Display of Co-Occurrence Data Jan W. Buzydlowski, Xia Lin, Howard D. White College of Information Science

Conclusion

Slides available at:– faculty.cis.drexel.edu/~jbuzydlo/– [email protected]

Page 39: A Comparison of Graphical Techniques for the Display of Co-Occurrence Data Jan W. Buzydlowski, Xia Lin, Howard D. White College of Information Science

Bibliography

Chen, Chaomei, Information Visualization and Virtual Environments, 1999.

Cleveland, William S., Visualizing Data, Hobart Press, 1993. Davidson, R, Harel, D, Drawing Graphs Nicely Using Simulated

Annealing, ACM Transactions on Graphics, 15(4): 301-31 (1996).

Fruchterman,TMJ, Reingold, EM, Graph Drawing by Force-Directed Placement, Software Practice and Experience, 21: 1129-64 (1991).

Kamada, T,Kawai, S, An Algorithm for Drawing General Undirected Graphs, Information Processing Letters, 31(1): 7-15, (1989).