dicon: visual analysis on multidimensional clusters · visual encoding encoding data items in...

31
DICON: Visual Analysis On Multidimensional Clusters Nan Cao, David Gotz, Jimeng Sun, Huamin Qu

Upload: others

Post on 08-Aug-2020

14 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: DICON: Visual Analysis On Multidimensional Clusters · Visual Encoding Encoding data items in detail Packing Entities into clusters 0.3 0.2 0.1 0.1 0.2 0.1 entity cancer diabetes

DICON: Visual Analysis On Multidimensional Clusters

Nan Cao, David Gotz, Jimeng Sun, Huamin Qu

Page 2: DICON: Visual Analysis On Multidimensional Clusters · Visual Encoding Encoding data items in detail Packing Entities into clusters 0.3 0.2 0.1 0.1 0.2 0.1 entity cancer diabetes

Topic: Cluster Analysis Link: http://en.wikipedia.org/wiki/Cluster_analysis

Applications: •Biology •Medicine •Market research •Education Research •Other applications

Cluster Analysis

Page 3: DICON: Visual Analysis On Multidimensional Clusters · Visual Encoding Encoding data items in detail Packing Entities into clusters 0.3 0.2 0.1 0.1 0.2 0.1 entity cancer diabetes

Cluster Analysis

dataset Cluster Analysis: K = 3 K = 5

Page 4: DICON: Visual Analysis On Multidimensional Clusters · Visual Encoding Encoding data items in detail Packing Entities into clusters 0.3 0.2 0.1 0.1 0.2 0.1 entity cancer diabetes

Cluster Analysis

Ground Truth: The data contains 6 clusters

• Problems of cluster analysis

– The cluster result is not always precisely reveals the ground truth of the data

– The cluster analysis highly depend on the experience of the analyzer. It is most unlike to find the ground truth within a single iteration

– In case of multidimensional dataset, it is difficult for explain the meaning of the clusters

Page 5: DICON: Visual Analysis On Multidimensional Clusters · Visual Encoding Encoding data items in detail Packing Entities into clusters 0.3 0.2 0.1 0.1 0.2 0.1 entity cancer diabetes

Cluster Analysis

Ground Truth: The data contains 6 clusters

• Problems of cluster analysis

– The cluster result is not always precisely reveals the ground truth of the data

– The cluster analysis highly depend on the experience of the analyzer. It is most unlike to find the ground truth within a single iteration

– In case of multidimensional dataset, it is difficult for explain the meaning of the clusters

How can information visualization aid on

cluster analysis?

Page 6: DICON: Visual Analysis On Multidimensional Clusters · Visual Encoding Encoding data items in detail Packing Entities into clusters 0.3 0.2 0.1 0.1 0.2 0.1 entity cancer diabetes

Challenges

• How can we interpret the multidimensional cluster results?

• How can we make comparisons among multidimensional clusters?

• How can we refine the clustering results and detect multidimensional patterns?

Page 7: DICON: Visual Analysis On Multidimensional Clusters · Visual Encoding Encoding data items in detail Packing Entities into clusters 0.3 0.2 0.1 0.1 0.2 0.1 entity cancer diabetes

Solution • Goal:

– Design an novel visualization for multidimensional cluster analysis that facilitates cluster interpretation, quality evaluation, comparison and manipulation

• Approach: – A multidimensional cluster icon design that encodes

multiple data attributes as well as derived statistical information for cluster interpretation

– A stabilized icon layout algorithm that generates similar icons for similar clusters for cluster comparison

– New visual cues that evaluate cluster qualities and highlights the information patterns as well as Intuitive user interactions driven by these cues to support cluster refinement via direct manipulation of icons

Page 8: DICON: Visual Analysis On Multidimensional Clusters · Visual Encoding Encoding data items in detail Packing Entities into clusters 0.3 0.2 0.1 0.1 0.2 0.1 entity cancer diabetes

How can we interpret the multidimensional cluster result in details?

Encoding the single entity

Packing entities into clusters

Global layout

? Using an iconic design to

visualize multidimensional clusters at multiple granularity

Page 9: DICON: Visual Analysis On Multidimensional Clusters · Visual Encoding Encoding data items in detail Packing Entities into clusters 0.3 0.2 0.1 0.1 0.2 0.1 entity cancer diabetes

Visual Encoding

Encoding data items in detail

Packing Entities into clusters

0.3 0.2 0.1 0.1 0.2 0.1

entity

cancer diabetes

kidney disorder heart disease Fever high blood pressure

cancer diabetes

kidney disorder

heart disease Fever high blood pressure

Global Layout

E.g. the patient dataset

Intuitively share the same visual encodings at the feature level, the entity level and the cluster level

Design Guideline 1

feature entity

cluster

Page 10: DICON: Visual Analysis On Multidimensional Clusters · Visual Encoding Encoding data items in detail Packing Entities into clusters 0.3 0.2 0.1 0.1 0.2 0.1 entity cancer diabetes

DEMO

Page 11: DICON: Visual Analysis On Multidimensional Clusters · Visual Encoding Encoding data items in detail Packing Entities into clusters 0.3 0.2 0.1 0.1 0.2 0.1 entity cancer diabetes

How can we make comparisons among multidimensional clusters ?

Encoding data items in detail

Packing Entities into clusters

Global Layout

0.3 0.2 0.1 0.1 0.2 0.1

entity cancer

diabetes

kidney disorder

heart disease

hiv

high blood pressure

cancer diabetes

kidney disorder

heart disease HIV high blood pressure

Similar clusters should be represented by similar icons – Overview: Similar clusters have

similar data distributions

– Details: Similar clusters must be laid out in a similar way

Design Guideline 2

?

Page 12: DICON: Visual Analysis On Multidimensional Clusters · Visual Encoding Encoding data items in detail Packing Entities into clusters 0.3 0.2 0.1 0.1 0.2 0.1 entity cancer diabetes

How can we make comparisons among multidimensional clusters ?

Encoding data items in detail

Packing Entities into clusters

Global Layout

Statistical Embedding (overview)

Stabilized icon Layout

(detail)

Similar clusters should be represented by similar icons – Overview: Similar clusters have

similar data distributions

– Details: Similar clusters must be laid out in a similar way

Design Guideline 2

?

Page 13: DICON: Visual Analysis On Multidimensional Clusters · Visual Encoding Encoding data items in detail Packing Entities into clusters 0.3 0.2 0.1 0.1 0.2 0.1 entity cancer diabetes

Statistical Embedding(1)

• Kurtosis

• Skewness

Statistical Embedding

Stabilized icon Layout

Page 14: DICON: Visual Analysis On Multidimensional Clusters · Visual Encoding Encoding data items in detail Packing Entities into clusters 0.3 0.2 0.1 0.1 0.2 0.1 entity cancer diabetes

Stabilized Layout

Statistical Embedding

Stabilized icon Layout

1. Initial Spiral layout 2. Weighted Centroid Voronoi Tessellation

3. Random Layout for features

4. Optimization

ji

ii

ji

ji

iji

ii XpreXXXd

cX 2

3

2

22

2

1 ||||1

||min

Centroid Similarity Smoothness

Page 15: DICON: Visual Analysis On Multidimensional Clusters · Visual Encoding Encoding data items in detail Packing Entities into clusters 0.3 0.2 0.1 0.1 0.2 0.1 entity cancer diabetes

Fit in multiple scales and can be embedded into various other visualizations – Both color and shape is highly

scalable can be distinguishable even in a very small area

Design Guideline 3 Global Layout

Encoding data items in detail

Packing items into clusters

Global Layout

Page 16: DICON: Visual Analysis On Multidimensional Clusters · Visual Encoding Encoding data items in detail Packing Entities into clusters 0.3 0.2 0.1 0.1 0.2 0.1 entity cancer diabetes

How can we refine the clustering results and How can we detect interesting patterns within the multidimensional clusters? ?

Interactive visual analysis driven by visual cues

Page 17: DICON: Visual Analysis On Multidimensional Clusters · Visual Encoding Encoding data items in detail Packing Entities into clusters 0.3 0.2 0.1 0.1 0.2 0.1 entity cancer diabetes

Cluster Quality Cue

Cluster Quality: Defined by the signed variances of its containing entities

f

ffsign

1

1)(

)( fsign

f the feature vector of a single entity

the mean feature vector of the cluster C that contains f

the variance between f and

Signed variance:

High quality clusters has a homogenous representation

Low quality clusters has a heterogeneous representation

Page 18: DICON: Visual Analysis On Multidimensional Clusters · Visual Encoding Encoding data items in detail Packing Entities into clusters 0.3 0.2 0.1 0.1 0.2 0.1 entity cancer diabetes

Feature Co-occurrence and Dominant Cue

f1 f2 f3 f4 f5 f6 f2

if fi > 0, we call it occurred

If fi > 0, fj > 0, and fi, fj in the same vector, we call they are co-occurred

f5 Feature Vector

j

iji ffpC2

0|0

Co-occurrence Score:

Co-occurrence Cue: Highlight the features that are mostly co-occurred with others

Dominant Cue: Highlight the features that are not co-occurred with any other feature

Page 19: DICON: Visual Analysis On Multidimensional Clusters · Visual Encoding Encoding data items in detail Packing Entities into clusters 0.3 0.2 0.1 0.1 0.2 0.1 entity cancer diabetes

Interactions and Animated Transition

• Interactions – attribute group

– Split : binary split / outlier split

– Merge: drag merge and select merge

• Animation Path Bundling – Aggregate the animation

paths with similar trends

– Inspired by the hierarchal edge bundling

• Demo

Page 20: DICON: Visual Analysis On Multidimensional Clusters · Visual Encoding Encoding data items in detail Packing Entities into clusters 0.3 0.2 0.1 0.1 0.2 0.1 entity cancer diabetes

Evaluation

Page 21: DICON: Visual Analysis On Multidimensional Clusters · Visual Encoding Encoding data items in detail Packing Entities into clusters 0.3 0.2 0.1 0.1 0.2 0.1 entity cancer diabetes

Comparing with other techniques

• The cluster is easy to identify

• Immediately convene the size of each cluster

• Fast comparison

• Highly compressed, can be imbedded into other visualizations

• Base on intuitive designs

Advantages:

• Multidimensional Only

• No precise value is directly observed

• Splitting entities into multiple parts

Disadvantages:

Page 22: DICON: Visual Analysis On Multidimensional Clusters · Visual Encoding Encoding data items in detail Packing Entities into clusters 0.3 0.2 0.1 0.1 0.2 0.1 entity cancer diabetes

Case Study (1) Study on Patient Similarity

1. Find a group of patient that are similar to a target patient. The similarity is automatically computed based on five features 2. Initial cluster result is given 3. Users are required to refine the clusters and interpret why the patient in the cluster are similar

Page 23: DICON: Visual Analysis On Multidimensional Clusters · Visual Encoding Encoding data items in detail Packing Entities into clusters 0.3 0.2 0.1 0.1 0.2 0.1 entity cancer diabetes

Case Study (2)

Highlight all the co-occurred features we find different disease distribution patterns

Page 24: DICON: Visual Analysis On Multidimensional Clusters · Visual Encoding Encoding data items in detail Packing Entities into clusters 0.3 0.2 0.1 0.1 0.2 0.1 entity cancer diabetes

User Study

• T1: Compare on feature details of 9 clusters

• T2: Compare on large set of clusters, 50 clusters

• 3 (groups) X 10 (user) X 2 (tasks)

Icons laid out randomly

Icons laid out by our algorithm

With statistical embedding

Page 25: DICON: Visual Analysis On Multidimensional Clusters · Visual Encoding Encoding data items in detail Packing Entities into clusters 0.3 0.2 0.1 0.1 0.2 0.1 entity cancer diabetes

User Study Results

• Finding:

– The cluster icon design is extremely efficient on cluster comparison (Average 12s for compare 50 clusters)

– The proposed design principles help great on comparison

Page 26: DICON: Visual Analysis On Multidimensional Clusters · Visual Encoding Encoding data items in detail Packing Entities into clusters 0.3 0.2 0.1 0.1 0.2 0.1 entity cancer diabetes

DICON: Visual Analysis On Multidimensional Clusters

Nan Cao, David Gotz, Jimeng Sun, Huamin Qu

Page 27: DICON: Visual Analysis On Multidimensional Clusters · Visual Encoding Encoding data items in detail Packing Entities into clusters 0.3 0.2 0.1 0.1 0.2 0.1 entity cancer diabetes

Related Work

• Pixel Based Technique

• Iconic Techniques

• Parallel Coordinates

• Scatter Plots

Page 28: DICON: Visual Analysis On Multidimensional Clusters · Visual Encoding Encoding data items in detail Packing Entities into clusters 0.3 0.2 0.1 0.1 0.2 0.1 entity cancer diabetes

Prior Art: Icon-based techniques

• Chernoff face visualization • Stick figure technique

– two dimensions are mapped to the display dimensions and the remaining dimensions are mapped to the angles and/or limb lengths of the stick figure icon

– the number of dimensions that can be visualized is limited

• Shape encoding • Color Icons

Page 29: DICON: Visual Analysis On Multidimensional Clusters · Visual Encoding Encoding data items in detail Packing Entities into clusters 0.3 0.2 0.1 0.1 0.2 0.1 entity cancer diabetes

Prior Art:Pixel-Oriented Techniques

• Query Independent – Space-Filling Curve

Arrangements

– Recursive Pattern Technique

• Query Dependent

– Spiral Technique

– Axes Technique

– Circle Segments

Page 30: DICON: Visual Analysis On Multidimensional Clusters · Visual Encoding Encoding data items in detail Packing Entities into clusters 0.3 0.2 0.1 0.1 0.2 0.1 entity cancer diabetes

Prior Art: Table-based techniques

• Table Lens

• Tableau

• Heat Map

Page 31: DICON: Visual Analysis On Multidimensional Clusters · Visual Encoding Encoding data items in detail Packing Entities into clusters 0.3 0.2 0.1 0.1 0.2 0.1 entity cancer diabetes

Prior Art: Others (Hybrid Techniques) • NodeTrix: a Hybrid Visualization of Social Networks.

Nathalie Henry, Jean-Daniel Fekete, Michael J. McGuffin, InfoVis 2007

• Scattering Points in Parallel Coordinates. Xiaoru Yuan, Peihong Guo, He Xiao, Hong Zhou, Huamin Qu, InfoVis 2009

• Bubble Sets: Revealing Set Relations with Isocontours over Existing Visualizations, Christopher Collins, Gerald Penn, Sheelagh Carpendale, InfoVis 2009

• Rolling the Dice: Multidimensional Visual Exploration using Scatterplot Matrix Navigation. Niklas Elmqvist, Pierre Dragicevic, Jean-Daniel Fekete, InfoVis 2008

• Interactive Dimensionality Reduction Through User-defined Combinations of Quality Metrics, Sara Johansson, Jimmy Johansson, InfoVis 2009

• FacetAtlas: Multifaceted Visualization for Rich Text Corpora, InfoVis 2010