dynamic visualization of transient data streams p. wong, et al the pacific northwest national...

23
Dynamic Visualization of Transient Data Streams P. Wong, et al The Pacific Northwest National Laboratory Presented by John Sharko Visualization of Massive Datasets

Upload: cody-walters

Post on 04-Jan-2016

222 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Dynamic Visualization of Transient Data Streams P. Wong, et al The Pacific Northwest National Laboratory Presented by John Sharko Visualization of Massive

Dynamic Visualization of Transient Data Streams

P. Wong, et alThe Pacific Northwest National Laboratory

Presented by John SharkoVisualization of Massive Datasets

Page 2: Dynamic Visualization of Transient Data Streams P. Wong, et al The Pacific Northwest National Laboratory Presented by John Sharko Visualization of Massive

Characteristics of Data Streams

• Arrives continuously

• Arrives unpredictably

• Arrives unboundedly

• Arrives without persistent patterns

Page 3: Dynamic Visualization of Transient Data Streams P. Wong, et al The Pacific Northwest National Laboratory Presented by John Sharko Visualization of Massive

Examples of Data Streams

• Newswires

• Internet click streams

• Network resource management

• Phone call records

• Remote sensing imagery

Page 4: Dynamic Visualization of Transient Data Streams P. Wong, et al The Pacific Northwest National Laboratory Presented by John Sharko Visualization of Massive

Visualization Problem

• Fusing a large amount of previously analyzed information with a small amount of new information

• Reprocess the whole dataset in full detail

Page 5: Dynamic Visualization of Transient Data Streams P. Wong, et al The Pacific Northwest National Laboratory Presented by John Sharko Visualization of Massive

First Objective

• Achieve the best understanding of transient data when influx rate exceed processing rate

Approach: Data stratification to reduce data size

Page 6: Dynamic Visualization of Transient Data Streams P. Wong, et al The Pacific Northwest National Laboratory Presented by John Sharko Visualization of Massive

Second Objective

• Incremental visualization technique

Approach: Project new information incrementally onto previous data

Page 7: Dynamic Visualization of Transient Data Streams P. Wong, et al The Pacific Northwest National Laboratory Presented by John Sharko Visualization of Massive

Primary Visualization OutputMultidimensional Scaling

OJ Simpson trial

French elections

Oklahoma bombing

Page 8: Dynamic Visualization of Transient Data Streams P. Wong, et al The Pacific Northwest National Laboratory Presented by John Sharko Visualization of Massive

Adaptive Visualization Using Stratification

Page 9: Dynamic Visualization of Transient Data Streams P. Wong, et al The Pacific Northwest National Laboratory Presented by John Sharko Visualization of Massive

Methods for Adaptive Visualization

• Vector dimension reduction

• Vector sampling

Page 10: Dynamic Visualization of Transient Data Streams P. Wong, et al The Pacific Northwest National Laboratory Presented by John Sharko Visualization of Massive

Vector Dimension Reduction

Approach: dyadic wavelets (Haar)

200 terms

100 terms

50 terms

Page 11: Dynamic Visualization of Transient Data Streams P. Wong, et al The Pacific Northwest National Laboratory Presented by John Sharko Visualization of Massive

Results of Vector Dimension Reduction

200 10050

Dimensions

Page 12: Dynamic Visualization of Transient Data Streams P. Wong, et al The Pacific Northwest National Laboratory Presented by John Sharko Visualization of Massive

Results of Vector Sampling

3298 1649 824

Number of Documents

Page 13: Dynamic Visualization of Transient Data Streams P. Wong, et al The Pacific Northwest National Laboratory Presented by John Sharko Visualization of Massive

Scatterplot Similarity Matching

Page 14: Dynamic Visualization of Transient Data Streams P. Wong, et al The Pacific Northwest National Laboratory Presented by John Sharko Visualization of Massive

Scatterplot Similarity Matching

Procrustes Analysis Results

200 100 50

All 0.0 (self) 0.022 0.084

1/2 0.016 0.051 0.111

1/4 0.033 0.062 0.141

Page 15: Dynamic Visualization of Transient Data Streams P. Wong, et al The Pacific Northwest National Laboratory Presented by John Sharko Visualization of Massive

Incremental Visualization Using Fusion

• Reprocessing by projecting new items onto existing visualization

• Feature: reprocessing the entire dataset is often not required

Page 16: Dynamic Visualization of Transient Data Streams P. Wong, et al The Pacific Northwest National Laboratory Presented by John Sharko Visualization of Massive

Hyperspectral Image Processing

• Apply MDS to scale pixel vectors

• K-mean process to assign unique colors

• Stratify the vectors progressively

Page 17: Dynamic Visualization of Transient Data Streams P. Wong, et al The Pacific Northwest National Laboratory Presented by John Sharko Visualization of Massive

Robust Eigenvectors

Generate three MDS scatter plots for each third of the image

Page 18: Dynamic Visualization of Transient Data Streams P. Wong, et al The Pacific Northwest National Laboratory Presented by John Sharko Visualization of Massive

Robust Eigenvectors (cont’d)Generate MDS scatterplot for entire dataset

Page 19: Dynamic Visualization of Transient Data Streams P. Wong, et al The Pacific Northwest National Laboratory Presented by John Sharko Visualization of Massive

Robust Eigenvectors (cont’d)

Extract points from cropped areas

Page 20: Dynamic Visualization of Transient Data Streams P. Wong, et al The Pacific Northwest National Laboratory Presented by John Sharko Visualization of Massive

Using Multiple Sliding Windows

Eigenvectors determined by the long window

New vectors are projected using the Eigenvectors of the long window

Data Stream

Long Window Short Window

Sliding Direction

Page 21: Dynamic Visualization of Transient Data Streams P. Wong, et al The Pacific Northwest National Laboratory Presented by John Sharko Visualization of Massive

Dynamic Visualization Steps

1. When influx rate < processing rate, use MDS

2. When influx rate > processing rate, halt MDS

3. Use multiple sliding windows for pre-defined number of steps

4. Use stratification approach for fast overview

5. Check for accumulated error using Procrustes analysis

6. If error threshold not reached, go to step 3

If error threshold reached, go to step 1

Page 22: Dynamic Visualization of Transient Data Streams P. Wong, et al The Pacific Northwest National Laboratory Presented by John Sharko Visualization of Massive

Conclusions

• The data stratification approach can substantially accelerate visualization process

• The data fusion approach can provide instant updates

Page 23: Dynamic Visualization of Transient Data Streams P. Wong, et al The Pacific Northwest National Laboratory Presented by John Sharko Visualization of Massive