visual analytics

54
09/05/14 pag. 1 Information visualization lecture 8 visual analytics Katrien Verbert Department of Computer Science Faculty of Science Vrije Universiteit Brussel [email protected]

Upload: katrien-verbert

Post on 11-May-2015

529 views

Category:

Education


0 download

TRANSCRIPT

Page 1: Visual analytics

09/05/14 pag. 1

Information visualization lecture 8

visual analytics

Katrien Verbert Department of Computer Science

Faculty of Science Vrije Universiteit Brussel

[email protected]

Page 2: Visual analytics

09/05/14 pag. 2

Motivation

h"p://bigdatablog.emc.com/2013/08/08/mlb-­‐a-­‐big-­‐fan-­‐of-­‐big-­‐data/    

Page 3: Visual analytics

09/05/14 pag. 3

Motivation

•  Volume:Gigabyte(109), Terabyte(1012), Petabyte(1015), Exabyte(1018), Zettabytes(1021)

•  Variety: –  structured, semi-structured, unstructured; –  text, image, audio, video, record

•  Velocity: dynamic, sometimes time-varying

Big  Data  refers  to  datasets  that  grow  so  large  that  it  is  difficult  to  capture,  store,  manage,  share,  analyze  and  visualize  with  the  typical  database  soFware  tools.    

Page 4: Visual analytics

09/05/14 pag. 4

The social layer in an interconnected world

2+  billion  

people  on  the  Web  by  end  2011    

30  billion  RFID  tags  today  

 (1.3B  in  2005)  

4.6  billion  camera  phones  

world  wide  

100s  of  millions  of  GPS  

enabled  devices  sold  

annually  

76  million  smart  meters  in  2009…    200M  by  2014    

12+ TBs of tweet data

every day

25+ TBs of log data

every day

?  TB

s  of  

data  every  day  

Page 5: Visual analytics

09/05/14 pag. 5

Motivation

Raw  data  has  no  value  in  itself,  only  the  extracted  informaIon  has  value.  

   Src  :  h"p://www.sas.com/knowledge-­‐exchange/business-­‐analyIcs/featured/big-­‐data-­‐drives-­‐performance-­‐%E2%80%93-­‐if-­‐you-­‐can-­‐exploit-­‐the-­‐informaIon-­‐overload/index.html    

 

Page 6: Visual analytics

09/05/14 pag. 6

Information overload

•  Refers to the danger of getting lost in data, which may be: –  irrelevant  to  the  current  task  at  hand,  –  processed  in  an  inappropriate  way,  or  –  presented  in  an  inappropriate  way.  

Src:  h"p://supermarketpeople.co.uk/archives/713    

Page 7: Visual analytics

09/05/14 pag. 7

Visual Analytics - Mastering the Information Age

h"ps://www.youtube.com/watch?v=5i3xbitEVfs    

Page 8: Visual analytics

09/05/14 pag. 8

Visual analytics is the science of analytical reasoning facilitated by interactive visual interfaces.

Definition

Page 9: Visual analytics

09/05/14 pag. 9

Whom does it matter

•  Research Community J •  Business Community - New tools, new capabilities, new

infrastructure, new business models etc.

Financial  Services...  

Page 10: Visual analytics

09/05/14 pag. 10

Visual analytics is a multidisciplinary field

Page 11: Visual analytics

09/05/14 pag. 11

History

Page 12: Visual analytics

09/05/14 pag. 12

History discovery tools

•  Shneiderman (IV ‘02) suggests combining computational analysis approaches such as data mining with information visualization

–  Too often viewed as competitors in past –  Instead, can complement each other –  Each has something valuable to contribute

•  Alternatives –  Issues influencing the design of discovery tools:

•  Statistical algorithms vs. visual data presentation •  Hypothesis testing vs. exploratory data analysis

–  Each has Pro’s and Con’s

Page 13: Visual analytics

09/05/14 pag. 13

Hypothesis testing & exploratory data analysis

•  Hypothesis testing –  Advocates:

•  By stating hypotheses up front, limit variables and sharpens thinking, more precise measurement

–  Critics: •  Too far from reality, initial hypotheses bias toward finding evidence to support it

•  Exploratory Data Analysis –  Advocates:

•  Find the interesting things this way, we now have computational capabilities to do them

–  Skeptics: •  Not generalizable, everything is a special case, detecting statistical relationships

does not infer cause and effect

Page 14: Visual analytics

09/05/14 pag. 14

Recommendations

•  Integrate data mining and information visualization •  Allow users to specify what they are seeking •  Recognize that users are situated in a social context •  Respect human responsibility

Page 15: Visual analytics

09/05/14 pag. 15

History

•  Information visualization systems inadequately supported decision making (Amar & Stasko IV ‘04):

–  Limited affordances –  Predetermined representations –  Decline of determinism in decision-Making

•  “Representational primacy” versus “Analytic primacy” –  Telling truth about data vs. providing analytically useful visualizations

Page 16: Visual analytics

09/05/14 pag. 16

Representational primacy

•  Pursuit of faithful data replication and comprehension •  Above all else, we must show the data •  Can be a limiting notion •  May focus on low level tasks not relevant to users •  Some data is hard to represent faithfully (think high

dimensional data)

Page 17: Visual analytics

09/05/14 pag. 17

Gaps between representation and analysis

•  When explanatory or correlative models are the desired outcome, model visualization may be more important than data visualization

•  One end is to build black boxes where data goes in and the answer comes out •  Not a good idea to trust black boxes

–  What  can  go  wrong  with  this?    –  White  box  approach  be"er.  

Page 18: Visual analytics

09/05/14 pag. 18

Application of Precepts

Worldview-based tasks (“Did we show the right thing to the user?”)

–  Determine Domain Parameters –  Expose Multivariate Explanation –  Facilitate Hypothesis Testing

Rationale-based tasks (“Will the user believe what she sees?”)

-  Expose  Uncertainty    -  ConcreIze  RelaIonships    -  Expose  Cause  and  Effect    

Amar  &  Stasko  2005  

Page 19: Visual analytics

09/05/14 pag. 19

Task level emphases

•  Don’t just help “low-level” tasks –  Find, filter, correlate, etc.

•  Facilitate analytical thinking –  Complex decision-making, especially under uncertainty –  Learning a domain –  Identifying the nature of trends –  Predicting the future

Page 20: Visual analytics

09/05/14 pag. 20

History:  coining  the  term

•  2003-04 Jim Thomas of PNNL, together with colleagues, develops notion of visual analytics

–  Holds workshops at PNNL and at InfoVis ‘04 to help define a research agenda

•  Agenda is formalized in book, Illuminating the Path –  Available online, http://nvac.pnl.gov/

•  People use visual analytics tools and techniques to

–  Synthesize information and derive insight from massive, dynamic, ambiguous, and often conflicting data

–  Detect the expected and discover the unexpected –  Provide timely, defensible, and understandable assessments –  Communicate assessment effectively for action.

Src: Stasko

Page 21: Visual analytics

09/05/14 pag. 21

Visual  AnalyMcs  

•  Not really an “area” per se –  More of an “umbrella” notion –  Combines multiple areas or disciplines –  Ultimately about using data to improve our knowledge and help make

decisions

•  Main Components:

Src: Stasko

Page 22: Visual analytics

09/05/14 pag. 22

Visual Analytics: alternate Definition

Visual analytics combines automated analysis techniques with interactive visualizations for an effective understanding, reasoning and decision making on the basis of very large and complex data sets Keim et al, chapter in Information Visualization: Human-Centered Issues and Perspectives, 2008

Page 23: Visual analytics

09/05/14 pag. 23

In-Spire Video

h"ps://www.youtube.com/watch?v=YONTBZaxz8g    

Page 24: Visual analytics

09/05/14 pag. 24

Synergy

•  Combine strengths of both human and electronic data processing

–  Gives a semi-automated analytical process –  Use strengths from each –  Below from Keim, 2008

Page 25: Visual analytics

09/05/14 pag. 25

InfoVis Comparison

•  Clearly much overlap •  Perhaps fair to say that infovis hasn’t always focused on

analysis tasks so much and that it doesn’t always include advanced data analysis algorithms

–  Not a criticism, just not focus –  InfoVis has a more narrow scope

Src: Stasko

Page 26: Visual analytics

09/05/14 pag. 26

Visual Analytics

•  Encompassing, integrated approach to data analysis –  Use computational algorithms where helpful –  Use human-directed visual exploration where helpful –  Not just “Apply A, then apply B” though –  Integrate the two tightly

Stasko, 2013 Src: Stasko

Page 27: Visual analytics

09/05/14 pag. 27

Visual analytics aims at making data and information processing transparent

just as information visualization has changed our view on databases, the goal of visual analytics is to make our way of processing data and information transparent for an analytic discourse

Page 28: Visual analytics

09/05/14 pag. 28

The visual analytics process

The  visual  analyIcs  process  is  characterized  through  interacIon  between  data,  visualizaIons,  models  about  the  data,  and  the  users  in  order  to  discover  knowledge.    

Page 29: Visual analytics

09/05/14 pag. 29

Information seeking mantra for visual analytics applications

Overview first, zoom/filter, details on demand

Analyze first, show the important, zoom/filter, analyze further, details on demand.

Keim  et  al.  2006  

Page 30: Visual analytics

09/05/14 pag. 30

Applications

Page 31: Visual analytics

09/05/14 pag. 31

Page 32: Visual analytics

09/05/14 pag. 32

UTOPIAN: user-driven topic modeling

   

h"ps://www.youtube.com/watch?v=du6_s6hcaRA&feature=youtu.be    

Page 33: Visual analytics

09/05/14 pag. 33

SketchPadN-D: WYDIWYG Sculpting and Editing in High-Dimensional Space

   

h"ps://www.youtube.com/watch?v=ar8kAdAfx6w    

Page 34: Visual analytics

09/05/14 pag. 34

Some features

Page 35: Visual analytics

09/05/14 pag. 35

Relationship discovery

Scale independent representations, whole and parts at same time at multiple levels of abstraction, often linked

Page 36: Visual analytics

09/05/14 pag. 36

Relationship discovery

Explore high dimensional relationships, theme groupings, outlier detection, searching by proximity at multiple scales

Page 37: Visual analytics

09/05/14 pag. 37

Combined exploratory and confirmatory analytics

•  Develop and refine hypothesis •  Evidence collection, management, and matching to hypothesis •  Tailor views/displays for thematic/hypothesis focus of interest •  Often suggestive of predictions enabling proactive thinking

Page 38: Visual analytics

09/05/14 pag. 38

Multiple data types

•  Supports multiple data types: structured/unstructured text •  Imagery/video, cyber •  Systems of either data type or application specific

Page 39: Visual analytics

09/05/14 pag. 39

Temporal views and interactions

•  Most analytics situations involve time, pace, velocity •  Group segments of thoughts by time •  Compare time segments •  Often combined with geospatial

Page 40: Visual analytics

09/05/14 pag. 40

Reasoning workspace

Stu Card, PARC

Page 41: Visual analytics

09/05/14 pag. 41

Grouping and outlier detection

•  Form groups of thought/data •  Labels and annotation •  Compare groupings •  Find small groups or outliers

41

Page 42: Visual analytics

09/05/14 pag. 42

Labeling

•  Critically important, •  Dynamic in scope, number labels, size, color •  Positioning •  Almost everything has labels •  Labels tell semantic meaning

Page 43: Visual analytics

09/05/14 pag. 43

Multiple linked views

Temporal, geospatial, theme, cluster, list views with association linkages between views

Page 44: Visual analytics

09/05/14 pag. 44

Challenges

Page 45: Visual analytics

09/05/14 pag. 45

Challenge 1: scalability

•  Challenge with regard to both: –  visual  representaIons    –  automaIc  analysis  

•  Requirements/needs: –  SoluIon  needs  to  scale  in  size,  dimensionality,  data  types,  levels  of  quality.  –  Methods  are  needed  to  deal  with  input  data  that  is  noisy  and  conInuous.  –  Pa"erns  and  relaIonships  need  to  be  visualized  on  different  levels  of  

details,  and  with  appropriate  levels  of  data  and  visual  abstracIon.  

Page 46: Visual analytics

09/05/14 pag. 46

Challenge 2: uncertainty

•  Challenge: –  large  amount  of  noise  and  missing  values  from  heterogeneous  data  sources    –  bias  introduced  by  automaIc  analysis  methods  as  well  as  human  percepIon  

•  Requirements/needs: –  representaIon  of  the  noIon  of  data  quality  and  the  confidence    –  analysts  need  to  be  aware  of  the  uncertainty  and  be  able  to  analyze  quality  

Page 47: Visual analytics

09/05/14 pag. 47

Challenge 3: text data stream

•  Analysis and visualization of text steams is still a relatively new field. •  Text stream data often

–  have  li"le  structure,  grammar  and  context,  –  are  provided  as  high-­‐frequency  mulIlingual  stream,    –  contain  a  high  percentage  of  non-­‐meaningful  and  irrelevant  messages.    

•  The challenge is to handle the dynamic nature of the stream data and the low-tolerance of monitoring delay in emergency cases.

Page 48: Visual analytics

09/05/14 pag. 48

Challenge 4: interaction

•  Novel interaction techniques and tangible user-interfaces for seamless intuitive visual communication with the system.

•  The analyst should be able to –  fully  focus  on  the  task  at  hand  and    –  not  be  distracted  by  overly  technical  or  complex  user  interfaces  and  interacIons.    

•  User feedback should be taken as intelligently as possible, requiring as little user input as possible.

Page 49: Visual analytics

09/05/14 pag. 49

Challenge 5: evaluation

•  Challenge: –  hard  to  assess  the  quality  of  visual  analyIcs  soluIons  –  due  to  the  interdisciplinary  nature  and  complex  process  

•  Needs: –  evaluaIon  framework  to  assess  effecIveness,  efficiency  and  acceptance  –  of  new  visual  analyIcs  techniques,  methods,  and  models  

Page 50: Visual analytics

09/05/14 pag. 50

Challenge 6: infrastructure

•  Challenge: –  most  soluIons  develop  their  own  infrastructures    –  mismatch  between  the  level  of  service  provided  and  real  needs:  

•  fast  and  precise  answers  with  progressive  refinement,    •  incremental  re-­‐computaIon,    •  steering  the  computaIon  towards  data  regions  that  are  of  interest.    

•  Requirements/needs: –  infrastructure  to  bind  together  all  the  processes,  funcIons,  and  services    –  repositories  of  available  visual  analyIcs  soluIons  

Page 51: Visual analytics

09/05/14 pag. 51

h"p://www.visual-­‐analyIcs.eu/    

Page 52: Visual analytics

09/05/14 pag. 52

Questions?

Page 53: Visual analytics

09/05/14 pag. 53

Readings

•  Ch. 1-2

h"p://www.vismaster.eu/wp-­‐content/uploads/2010/11/VisMaster-­‐book-­‐lowres.pdf    

Page 54: Visual analytics

09/05/14 pag. 54

References

•  Amar, R. A., & Stasko, J. T. (2005). Knowledge precepts for design and evaluation of information visualizations. Visualization and Computer Graphics, IEEE Transactions on, 11(4), 432-442.

•  D. A. Keim, F. Mansmann, J. Schneidewind, and H. Ziegler. Challenges in visual data analysis. In Information Visualization (IV 2006), Invited Paper, July 5-7, London, United Kingdom. IEEE Press, 2006.

•  Daniel Keim, Gennady Andrienko, Jean-Daniel Fekete, Carsten Görg, Jörn Kohlhammer, and Guy Melancon. 2008. Visual Analytics: Definition, Process, and Challenges. In Information Visualization, Andreas Kerren, John T. Stasko, Jean-Daniel Fekete, and Chris North (Eds.). Lecture Notes In Computer Science, Vol. 4950. Springer-Verlag, Berlin, Heidelberg 154-175

•  Shneiderman, B. (2002)Inventing discovery tools: combining information visualization with data mining1. Information visualization, 1(1), 5-12.