visual analytics research at tufts

33
Dist Func Intro VA Apps ATG Wrap-up 25 Visual Analytics Research at Tufts Remco Chang Assistant Professor Tufts University

Upload: blenda

Post on 07-Jan-2016

22 views

Category:

Documents


2 download

DESCRIPTION

Visual Analytics Research at Tufts. Remco Chang Assistant Professor Tufts University. Problem Statement. The growth of data is exceeding our ability to analyze them. The amount of digital information generated in the years 2002, 2006, 2010: 2002: 22 EB ( exabytes , 10 18 ) - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Visual  Analytics Research at  Tufts

Dist FuncIntro VA Apps ATG Wrap-up1/25

Visual Analytics Research at Tufts

Remco Chang

Assistant ProfessorTufts University

Page 2: Visual  Analytics Research at  Tufts

Dist FuncIntro VA Apps ATG Wrap-up2/25

Problem Statement

• The growth of data is exceeding our ability to analyze them.

• The amount of digital information generated in the years 2002, 2006, 2010:– 2002: 22 EB (exabytes, 1018)– 2006: 161 EB– 2010: 988 EB (almost 1 ZB)

1: Data courtesy of Dr. Joseph Kielman, DHS2: Image courtesy of Dr. Maria Zemankova, NSF

Page 3: Visual  Analytics Research at  Tufts

Dist FuncIntro VA Apps ATG Wrap-up3/25

Problem Statement

• The data is often complex, ambiguous, noisy. Analysis of which requires human understanding.

– About 2 GB of digital information is being produced per person per year

– 95% of the Digital Universe’s information is unstructured

1: Data courtesy of Dr. Joseph Kielman, DHS2: Image courtesy of Dr. Maria Zemankova, NSF

Page 4: Visual  Analytics Research at  Tufts

Dist FuncIntro VA Apps ATG Wrap-up4/25

Example: What Does Fraud Look Like?

• Financial Institutions like Bank of America have legal responsibilities to report all suspicious activities

• Data size: approximately 200,000 transactions per day (73 million transactions per year)

• Problems:– Automated approach can only detect known patterns– Bad guys are smart: patterns are constantly changing– No single transaction appears fraudulent– Few experts: fraud detection is considered an “art”– Data is messy: lack of international standards resulting in ambiguous data

• Current methods:– 10 analysts monitoring and analyzing all transactions– Using SQL queries and spreadsheet-like interfaces– Limited to the time scale (2 weeks)

Page 5: Visual  Analytics Research at  Tufts

Dist FuncIntro VA Apps ATG Wrap-up5/25

WireVis: Financial Fraud Analysis

• In collaboration with Bank of America– Looks for suspicious wire transactions– Currently beta-deployed at WireWatch– Visualizes 7 million transactions over 1 year

• Uses interaction to coordinate four perspectives:– Keywords to Accounts– Keywords to Keywords– Keywords/Accounts over Time– Account similarities (search by example)

Page 6: Visual  Analytics Research at  Tufts

Dist FuncIntro VA Apps ATG Wrap-up6/25

WireVis: Financial Fraud Analysis

Heatmap View(Accounts to Keywords Relationship)

Strings and Beads(Relationships over Time)

Search by Example (Find Similar Accounts)

Keyword Network(Keyword Relationships)

R. Chang et al., Scalable and interactive visual analysis of financial wire transactions for fraud detection. Information Visualization,2008.R. Chang et al., Wirevis: Visualization of categorical, time-varying data from financial transactions. IEEE VAST, 2007.

Page 7: Visual  Analytics Research at  Tufts

Dist FuncIntro VA Apps ATG Wrap-up7/25

What is Visual Analytics?

• Visual analytics is the science of analytical reasoning facilitated by interactive visual interfaces [Thomas & Cook 2005]

• Since 2004, the field has grown significantly. Aside from tens to hundreds of domestic and international partners, it now has a IEEE conference (IEEE VAST), an NSF program (FODAVA), and a forthcoming IEEE Transactions journal.

Page 8: Visual  Analytics Research at  Tufts

Dist FuncIntro VA Apps ATG Wrap-up8/25

Individually Not Unique

Analytical Reasoning

and Interaction

Visual Representation

Production, Presentation

Dissemination

Data Representation Transformation

Validation and Evaluation

• Data Mining• Machine

Learning• Databases• Information

Retrieval• etc

• Tech Transfer• Report Generation• etc

• Quality Assurance• User studies (HCI)• etc

• Interaction Design• Cognitive Psychology• Intelligence Analysis• etc.

• InfoVis• SciVis• Graphics• etc

Page 9: Visual  Analytics Research at  Tufts

Dist FuncIntro VA Apps ATG Wrap-up9/25

In Combinations of 2 or 3…

Analytical Reasoning

and Interaction

Visual Representation

Production, Presentation

Dissemination

Data Representation Transformation

Validation and Evaluation

• Data Mining• Machine

Learning• Databases• Information

Retrieval• etc

• InfoVis• SciVis• Graphics• etc

Page 10: Visual  Analytics Research at  Tufts

Dist FuncIntro VA Apps ATG Wrap-up10/25

In Combinations of 2 or 3…

Analytical Reasoning

and Interaction

Visual Representation

Production, Presentation

Dissemination

Data Representation Transformation

Validation and Evaluation

• Interaction Design• Cognitive Psychology• Intelligence Analysis• etc.

• Tech Transfer• Report Generation• etc

Page 11: Visual  Analytics Research at  Tufts

Dist FuncIntro VA Apps ATG Wrap-up11/25

Extending Visual Analytics Principles

• Global Terrorism Database– Application of the

investigative 5 W’s

• Bridge Maintenance – Exploring subjective

inspection reports

• Biomechanical Motion– Interactive motion

comparison methods

Where

When

Who

What

Original Data

EvidenceBox

R. Chang et al., Investigative Visual Analysis of Global Terrorism, Journal of Computer Graphics Forum, 2008.

Page 12: Visual  Analytics Research at  Tufts

Dist FuncIntro VA Apps ATG Wrap-up12/25

Extending Visual Analytics Principles

• Global Terrorism Database– Application of the

investigative 5 W’s

• Bridge Maintenance – Exploring subjective

inspection reports

• Biomechanical Motion– Interactive motion

comparison methodsR. Chang et al., An Interactive Visual Analytics System for Bridge Management, Journal of Computer Graphics Forum, 2010. To Appear.

Page 13: Visual  Analytics Research at  Tufts

Dist FuncIntro VA Apps ATG Wrap-up13/25

Extending Visual Analytics Principles

• Global Terrorism Database– Application of the

investigative 5 W’s

• Bridge Maintenance – Exploring subjective

inspection reports

• Biomechanical Motion– Interactive motion

comparison methodsR. Chang et al., Interactive Coordinated Multiple-View Visualization of Biomechanical Motion Data , IEEE Vis (TVCG) 2009.

Page 14: Visual  Analytics Research at  Tufts

Dist FuncIntro VA Apps ATG Wrap-up14/25

Human + ComputerA Mixed-Initiative Perspective• So far, our approach is mostly user-driven

• Human vs. Artificial IntelligenceGarry Kasparov vs. Deep Blue (1997)– Computer takes a “brute force” approach without analysis– “As for how many moves ahead a grandmaster sees,” Kasparov concludes: “Just one,

the best one”

• Artificial Intelligence vs. Augmented IntelligenceHydra vs. Cyborgs (1998)– Grandmaster + 1 computer > Hydra (equiv. of Deep Blue)– Amateur + 3 computers > Grandmaster + 1 computer1

• How to systematically repeat the success? – Unsupervised machine learning + User– User’s interactions with the computer

1. http://www.collisiondetection.net/mt/archives/2010/02/why_cyborgs_are.php

Computer Translation Human

Page 15: Visual  Analytics Research at  Tufts

Dist FuncIntro VA Apps ATG Wrap-up15/25

Examples of Human + Computer Computing

• CAPCHA– RE-CAPCHA– General Crowd-Sourcing

• Adaptive / Intelligent User Interfaces (IUI)

• User assisted clustering / searching

Page 16: Visual  Analytics Research at  Tufts

Dist FuncIntro VA Apps ATG Wrap-up16/25

Simple Example

• Distance Function

other

change

A

tjiji

A

tjiji

xxDxxD

xxDIxxD

ji

ji

x,x

1

x,x

1

|,|,

|,|,minarg

changejijiother

jichange

AxxxxA

YYYYxxA

,|,

x,or x x,x|, 1j2i2j1i

Page 17: Visual  Analytics Research at  Tufts

Dist FuncIntro VA Apps ATG Wrap-up17/25

Application 1: Find Important Features

-0.2 -0.15 -0.1 -0.05 0 0.05 0.1 0.15 0.2 0.25-0.2

-0.15

-0.1

-0.05

0

0.05

0.1

0.15

0.2• Data set: X, 178x13• 3 classes • add 10 random number columns as extra features

Page 18: Visual  Analytics Research at  Tufts

Dist FuncIntro VA Apps ATG Wrap-up18/25

1st Step: Success

-0.2 -0.1 0 0.1 0.2 0.3-0.2

-0.15

-0.1

-0.05

0

0.05

0.1

0.15

0.2

-0.3 -0.2 -0.1 0 0.1 0.2-0.2

-0.15

-0.1

-0.05

0

0.05

0.1

0.15

0.2

0.25

Trying to separate circled green dots from all blue dots

Page 19: Visual  Analytics Research at  Tufts

Dist FuncIntro VA Apps ATG Wrap-up19/25

Result

• Recall the structure of data set

• Weight vector:– Randomly generated features gets low weights

0.096 0.150 0.062 0 0.018 0.011 0.025 0.039 0.037 0.047 0.091 0.186 0.127

0.038 0.011 0 0.017 0 0.046 0 0 0 0

Original Wine Dataset, each instance has 13

feature values

10 Randomly generated feature values for every

instance

Page 20: Visual  Analytics Research at  Tufts

Dist FuncIntro VA Apps ATG Wrap-up20/25

Visual Analytics for Political Science

Page 21: Visual  Analytics Research at  Tufts

Dist FuncIntro VA Apps ATG Wrap-up21/25

Aggregate Temporal Graph

1000 simulations

60 time steps in each simulation

(time step == a node)(edge == transition)

Merged time steps if two states are the same

Page 22: Visual  Analytics Research at  Tufts

Dist FuncIntro VA Apps ATG Wrap-up22/25

Aggregate Temporal Graph

Page 23: Visual  Analytics Research at  Tufts

Dist FuncIntro VA Apps ATG Wrap-up23/25

Gateways and Terminals

Each of the yellow vertices is a Gateway to the vertex set of {A}. That is, every maximal path leaving a yellow vertex eventually passes through A.

Vertex G is a Gateway to each of the yellow vertices, or Terminals. That is, every maximal path leaving G passes eventually through each of the yellow vertices.

Page 24: Visual  Analytics Research at  Tufts

Dist FuncIntro VA Apps ATG Wrap-up24/25

Applications of Aggregate Temporal Graphs

• A generalizable representation of problems involving parameter spaces that are too large to explore as a whole, but which are composed of related individual parts can be examined independently

• Collaborative Analysis– Each analyst’s trail is a simulation– Each configuration state is a node

• Web Analytics– Each visit is a simulation– Each configuration of a page is a node

Page 25: Visual  Analytics Research at  Tufts

Dist FuncIntro VA Apps ATG Wrap-up25/25

Conclusion

Analytical Reasoning

and Interaction

Visual Representa

tion

Production, Presentatio

n Disseminati

on

Data Representat

ion Transformat

ion

Validation and

Evaluation

• Visual Analytics is a growing new area that is looking to address some pressing needs– Too much (messy) data, too little

time

• By combining strengths and findings in existing disciplines, we have demonstrated that– There are some great benefits– But there are also some difficult

challenges

Page 26: Visual  Analytics Research at  Tufts

Dist FuncIntro VA Apps ATG Wrap-up26/25

Questions?

Thank you!

Page 27: Visual  Analytics Research at  Tufts

Dist FuncIntro VA Apps ATG Wrap-up27/25

Backup Slides

Page 28: Visual  Analytics Research at  Tufts

Dist FuncIntro VA Apps ATG Wrap-up28/25

(2) Investigative GTD

Where

When

Who

What

Original Data

EvidenceBox

R. Chang et al., Investigative Visual Analysis of Global Terrorism, Journal of Computer Graphics Forum (Eurovis), 2008.

Page 29: Visual  Analytics Research at  Tufts

Dist FuncIntro VA Apps ATG Wrap-up29/25

WHY?

This group’s attacks are not bounded by geo-locations but instead, religious beliefs.

Its attack patterns changed with its developments.

(2) Investigative GTD: Revealing Global Strategy

Page 30: Visual  Analytics Research at  Tufts

Dist FuncIntro VA Apps ATG Wrap-up30/25

Domestic Group

A geographically-bounded entity in the Philippines.

The ThemeRiver shows its rise and fall as an entity and its modus operandi.

(2) Investigative GTD:Discovering Unexpected Temporal Pattern

Page 31: Visual  Analytics Research at  Tufts

Dist FuncIntro VA Apps ATG Wrap-up31/25

What is in a User’s Interactions?

• Types of Human-Visualization Interactions– Word editing (input heavy, little output)– Browsing, watching a movie (output heavy, little input)– Visual Analysis (closer to 50-50)

Visualization HumanOutput

Input

Keyboard, Mouse, etc

Images (monitor)

Page 32: Visual  Analytics Research at  Tufts

Dist FuncIntro VA Apps ATG Wrap-up32/25

Discussion

• What interactivity is not good for:– Presentation– YMMV = “your mileage may vary”• Reproducibility: Users behave differently each time.• Evaluation is difficult due to opportunistic discoveries..

– Often sacrifices accuracy• iPCA – SVD takes time on large datasets, use iterative

approximation algorithms such as onlineSVD.• WireVis – Clustering of large datasets is slow. Either

pre-compute or use more trivial “binning” methods.

Page 33: Visual  Analytics Research at  Tufts

Dist FuncIntro VA Apps ATG Wrap-up33/25

Discussion• Interestingly,

– It doesn’t save you time…– And it doesn’t make a user more

accurate in performing a task.• However, there are empirical

evidence that using interactivity:– Users are more engaged (don’t give

up)– Users prefer these systems over

static (query-based) systems– Users have a faster learning curve

• We need better measurements to determine the “benefits of interactivity”