marti hearst school of information, uc berkeley visualization in text analysis problems vac...

75
Marti Hearst School of Information, UC Berkeley Visualization in Text Analysis Problems VAC Consortium Meeting Stanford, May 24, 2006

Post on 18-Dec-2015

220 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Marti Hearst School of Information, UC Berkeley Visualization in Text Analysis Problems VAC Consortium Meeting Stanford, May 24, 2006

Marti HearstSchool of Information, UC Berkeley

Visualization in Text Analysis Problems

VAC Consortium MeetingStanford, May 24, 2006

Page 2: Marti Hearst School of Information, UC Berkeley Visualization in Text Analysis Problems VAC Consortium Meeting Stanford, May 24, 2006

Outline

Some Visualization Design Principles Illustrated with a new example

Why Text is Tricky to Visualize How to do good visualization design with

text while meeting analysts needs?Focus on Flexibility with ReproducibilityExamples from 4 different domains

Page 3: Marti Hearst School of Information, UC Berkeley Visualization in Text Analysis Problems VAC Consortium Meeting Stanford, May 24, 2006

What Makes for a Good Visualization?

Visually illuminates important aspects of the underlying data and domain.

Supports the users’ tasks (better than without the visualization).

Adheres to good design principles.

Page 4: Marti Hearst School of Information, UC Berkeley Visualization in Text Analysis Problems VAC Consortium Meeting Stanford, May 24, 2006

Example from Software EngineeringMarat Boshernitsan, UC Berkeley PhD Dissertation 2006

Problem: need to make complex changes throughout code. Example: convert from one API to another.

Page 5: Marti Hearst School of Information, UC Berkeley Visualization in Text Analysis Problems VAC Consortium Meeting Stanford, May 24, 2006

A Typical Solution Either requires programmers to understand

and manipulate abstract syntax trees … Or requires learning another programming

language (or both)!

Page 6: Marti Hearst School of Information, UC Berkeley Visualization in Text Analysis Problems VAC Consortium Meeting Stanford, May 24, 2006

First Attempt

Page 7: Marti Hearst School of Information, UC Berkeley Visualization in Text Analysis Problems VAC Consortium Meeting Stanford, May 24, 2006

Second Attempt

Page 8: Marti Hearst School of Information, UC Berkeley Visualization in Text Analysis Problems VAC Consortium Meeting Stanford, May 24, 2006

A Better Solution

Build on how programmers think about programming. Operate on the textual representation of code.

Page 9: Marti Hearst School of Information, UC Berkeley Visualization in Text Analysis Problems VAC Consortium Meeting Stanford, May 24, 2006

Users Operate on Familiar Visual Representation of Code

Page 10: Marti Hearst School of Information, UC Berkeley Visualization in Text Analysis Problems VAC Consortium Meeting Stanford, May 24, 2006

Context-and-Domain Sensitive Visual Cues

Page 11: Marti Hearst School of Information, UC Berkeley Visualization in Text Analysis Problems VAC Consortium Meeting Stanford, May 24, 2006

Lessons from this Example

User-centered DesignThis was the third attempt.First 2 attempts did not accurately reflect how

users think about the problem.Careful design of labels and interaction cuesVery intelligent backend, but user-activated.

Visually and interactively reflects how programmers think about programming.

Page 12: Marti Hearst School of Information, UC Berkeley Visualization in Text Analysis Problems VAC Consortium Meeting Stanford, May 24, 2006

What Makes for a Good Visualization for Analysts?

Visually illuminates important aspects of the underlying data and domain.

Supports the users’ tasks (better than without the visualization).

Adheres to good design principles.

Page 13: Marti Hearst School of Information, UC Berkeley Visualization in Text Analysis Problems VAC Consortium Meeting Stanford, May 24, 2006

Goals vs. Tasks

Analysts’ Goals:Understand current and past situationsPredict and anticipate future situations

Observations by Pirolli & Card ’05: Different analysts starting with people,

organizations, tasks, and time: predict coup likelihood understand bio-warfare threats understand relations within cartel

Page 14: Marti Hearst School of Information, UC Berkeley Visualization in Text Analysis Problems VAC Consortium Meeting Stanford, May 24, 2006

Goals vs. Tasks Analysts’ tasks:

ExploreExtractFilterLinkArrangeCompareHypothesize

(A combination of Foraging and Sensemaking) Should do the tasks only to support the goals.

Page 15: Marti Hearst School of Information, UC Berkeley Visualization in Text Analysis Problems VAC Consortium Meeting Stanford, May 24, 2006

Design Principles for Analysts

Experienced analysts notice what is missing or unexpected (Wright et al. ’06)

Thus consistency and reproducibility are important.

Page 16: Marti Hearst School of Information, UC Berkeley Visualization in Text Analysis Problems VAC Consortium Meeting Stanford, May 24, 2006

Design Principles for Analysts

Analysts must guard against confirmation bias. (Pirolli & Card ’05)

Thus it is important for analysts toBe able to easily arrange and re-arrange,View information flexibly from many angles,

While at the same time retaining consistency and reproducibility.

However … it’s hard to do this with text.

Page 17: Marti Hearst School of Information, UC Berkeley Visualization in Text Analysis Problems VAC Consortium Meeting Stanford, May 24, 2006

Working with Text Text is especially difficult to visualize

Very high dimensionality Tens to hundreds of thousands of features

Compositional Can be combined together in innumerable ways

Abstract And so difficult to visualize

Not pre-attentive Must foveate to read

Subtle Small differences matter

Unordered

Page 18: Marti Hearst School of Information, UC Berkeley Visualization in Text Analysis Problems VAC Consortium Meeting Stanford, May 24, 2006

Text Meaning is NOT pre-attentive

SUBJECT PUNCHED QUICKLY OXIDIZED TCEJBUS DEHCNUP YLKCIUQ DEZIDIXOCERTAIN QUICKLY PUNCHED METHODS NIATREC YLKCIUQ DEHCNUP SDOHTEMSCIENCE ENGLISH RECORDS COLUMNS ECNEICS HSILGNE SDROCER SNMULOCGOVERNS PRECISE EXAMPLE MERCURY SNREVOG ESICERP ELPMAXE YRUCREMCERTAIN QUICKLY PUNCHED METHODS NIATREC YLKCIUQ DEHCNUP SDOHTEMGOVERNS PRECISE EXAMPLE MERCURY SNREVOG ESICERP ELPMAXE YRUCREMSCIENCE ENGLISH RECORDS COLUMNS ECNEICS HSILGNE SDROCER SNMULOCSUBJECT PUNCHED QUICKLY OXIDIZED TCEJBUS DEHCNUP YLKCIUQ DEZIDIXOCERTAIN QUICKLY PUNCHED METHODS NIATREC YLKCIUQ DEHCNUP SDOHTEMSCIENCE ENGLISH RECORDS COLUMNS ECNEICS HSILGNE SDROCER SNMULOC

Page 19: Marti Hearst School of Information, UC Berkeley Visualization in Text Analysis Problems VAC Consortium Meeting Stanford, May 24, 2006

Why Text is Tough

Abstract concepts are difficult to visualize Combinations of abstract concepts are

even more difficult to visualize timeshades of meaningsocial and psychological conceptscausal relationships

Page 20: Marti Hearst School of Information, UC Berkeley Visualization in Text Analysis Problems VAC Consortium Meeting Stanford, May 24, 2006

Why Text is ToughThe dog..

Why Text is Tough

Page 21: Marti Hearst School of Information, UC Berkeley Visualization in Text Analysis Problems VAC Consortium Meeting Stanford, May 24, 2006

Why Text is ToughThe dog.

The dog cavorts.

The dog cavorted.

Why Text is Tough

Page 22: Marti Hearst School of Information, UC Berkeley Visualization in Text Analysis Problems VAC Consortium Meeting Stanford, May 24, 2006

Why Text is Tough

The man.

The man walks.

Why Text is Tough

Page 23: Marti Hearst School of Information, UC Berkeley Visualization in Text Analysis Problems VAC Consortium Meeting Stanford, May 24, 2006

Why Text is Tough

The man walks the cavorting dog.

So far, we can sort of show this in pictures.

Why Text is Tough

Page 24: Marti Hearst School of Information, UC Berkeley Visualization in Text Analysis Problems VAC Consortium Meeting Stanford, May 24, 2006

Why Text is Tough

As the man walks the cavorting dog, thoughtsarrive unbidden of the previous spring, so unlikethis one, in which walking was marching anddogs were baleful sentinels outside unjust halls.

How do we visualize this?

Why Text is Tough

Page 25: Marti Hearst School of Information, UC Berkeley Visualization in Text Analysis Problems VAC Consortium Meeting Stanford, May 24, 2006

Why Text is Tough

Language only hints at meaning Most meaning of text lies within our minds and

common understanding “How much is that doggy in the window?”

how much: social system of barter and trade (not the size of the dog)

“doggy” implies childlike, plaintive, probably cannot do the purchasing on their own

“in the window” implies behind a store window, not really inside a window, requires notion of window shopping

Why Text is Tough

Page 26: Marti Hearst School of Information, UC Berkeley Visualization in Text Analysis Problems VAC Consortium Meeting Stanford, May 24, 2006

Why Text is Tough

General categories have no standard ordering (nominal data)

Categorization of documents by single topics misses important distinctions

Consider an article aboutNAFTAThe effects of NAFTA on truck manufactureThe effects of NAFTA on productivity of truck

manufacture in the neighboring cities of El Paso and Juarez

Why Text is Tough

Page 27: Marti Hearst School of Information, UC Berkeley Visualization in Text Analysis Problems VAC Consortium Meeting Stanford, May 24, 2006

Why Text is Tough

Other issues about languageAmbiguous (many different meanings for

the same words and phrases)Same meaning implied by different

combinationsDifferent combinations imply different

meanings

Page 28: Marti Hearst School of Information, UC Berkeley Visualization in Text Analysis Problems VAC Consortium Meeting Stanford, May 24, 2006

Why Text is (Deceptively) Easy Text is easier when you have a lot of it

Web search is now usually conjunction Text has a lot of redundancy

A very simple algorithm can: Pull out “important” phrases Find “meaningfully” related words Create a “summary” from document Group “related” documents

Page 29: Marti Hearst School of Information, UC Berkeley Visualization in Text Analysis Problems VAC Consortium Meeting Stanford, May 24, 2006

Simple Text Analysis can Mislead Most frequent words

Biases towards concepts with unique identifiers.

From Spink, Wolfram, Jansen, Saracevic, JASIS ‘01

Page 30: Marti Hearst School of Information, UC Berkeley Visualization in Text Analysis Problems VAC Consortium Meeting Stanford, May 24, 2006

Major Trends vs. Minor Discoveries

With text, it’s easy to extract and show the largest, main trends

But often we want the rare but unexpected and important event: Russian oil company example Schwarzenegger and Enron Cigarettes and kids Person on the periphery who is working stealthily to

influence things This is really difficult to solve!

Page 31: Marti Hearst School of Information, UC Berkeley Visualization in Text Analysis Problems VAC Consortium Meeting Stanford, May 24, 2006

Design Principles for Analysts

Experienced analysts notice what is missing or unexpected.

Analysts must guard against confirmation bias. Need to be able to easily arrange and re-arrange, View information flexibly from many angles,

While at the same time retaining consistency and reproducibility.

Interfaces should reflect the domain and data. How to achieve this with text collections?

Must transform text in understandable ways Must provide multiple, consistent views that nevertheless

allow for new discovery and insight

Page 32: Marti Hearst School of Information, UC Berkeley Visualization in Text Analysis Problems VAC Consortium Meeting Stanford, May 24, 2006

Why Emphasize Flexibility?

Can’t view representations of all the text content at once.

Instead, needs ways to flexibly navigate, group, organize, explore

See important pieces over time.

Page 33: Marti Hearst School of Information, UC Berkeley Visualization in Text Analysis Problems VAC Consortium Meeting Stanford, May 24, 2006

The Importance of Flexibility

Russell, Slaney, Qu, Houston ’05 The ease of viewing and manipulation in the system

strongly influenced the kind of analysis operations done.

Page 34: Marti Hearst School of Information, UC Berkeley Visualization in Text Analysis Problems VAC Consortium Meeting Stanford, May 24, 2006

Examples of Flexibility on Text Data

PaperLens (Conference proceedings) TAMKI (Customer service requests) Faceted Browsing (e-commerce)

FlamencoEbay ExpressFaThumb

TRIST and Sandbox (Analysts)

Page 35: Marti Hearst School of Information, UC Berkeley Visualization in Text Analysis Problems VAC Consortium Meeting Stanford, May 24, 2006

Flexible views

Infoviz 2004 contest Visualize 8 years of conference proceedings Tasks:

1. Static Overview of 10 years of Infovis 2. Characterize the research areas and their evolution3. The people in InfoVis4. Which papers/authors are most often referenced? 5. How many papers conducted a user study?

PaperLens integrated solution by Lee, Czerwinski, Robertson, Bederson

Uses graphical elements and brushing and linking to flexibly elicudate a collection’s contents. http://www.cs.umd.edu/hcil/InfovisRepository/contest-2004/index.shtml

Page 36: Marti Hearst School of Information, UC Berkeley Visualization in Text Analysis Problems VAC Consortium Meeting Stanford, May 24, 2006
Page 37: Marti Hearst School of Information, UC Berkeley Visualization in Text Analysis Problems VAC Consortium Meeting Stanford, May 24, 2006
Page 38: Marti Hearst School of Information, UC Berkeley Visualization in Text Analysis Problems VAC Consortium Meeting Stanford, May 24, 2006

Flexibility in Foraging and Analysis

TAKMI, by Nasukawa and Nagano, ‘01 The system integrates:

Analysis tasks (customer service help) Content analysis Information Visualization

Page 39: Marti Hearst School of Information, UC Berkeley Visualization in Text Analysis Problems VAC Consortium Meeting Stanford, May 24, 2006

Flexibility in AnalysisTAKMI, by Nasukawa and Nagano, 2001

Documents containing “windows 98”

Page 40: Marti Hearst School of Information, UC Berkeley Visualization in Text Analysis Problems VAC Consortium Meeting Stanford, May 24, 2006

TAKMI, by Nasukawa and Nagano, 2001Flexibility in AnalysisTAKMI, by Nasukawa and Nagano, 2001

Patent documents containing “inkjet”, organized by entity and year

Page 41: Marti Hearst School of Information, UC Berkeley Visualization in Text Analysis Problems VAC Consortium Meeting Stanford, May 24, 2006

Flexibility in Category Navigation

Browsing Information Collections using (Hierarchical) Faceted Metadata

Page 42: Marti Hearst School of Information, UC Berkeley Visualization in Text Analysis Problems VAC Consortium Meeting Stanford, May 24, 2006

What are facets? Sets of categories, each of which describe

a different aspect of the objects in the collection.

Each of these can be hierarchical. (Not necessarily mutually exclusive nor

exhaustive, but often that is a goal.)

Time/Date TopicGeoRegion

Page 43: Marti Hearst School of Information, UC Berkeley Visualization in Text Analysis Problems VAC Consortium Meeting Stanford, May 24, 2006

Facet example: Recipes

Course

Main Course

CookingMethod

Stir-fry

Cuisine

Thai

Ingredient

Red Bell Pepper

Curry

Chicken

Page 44: Marti Hearst School of Information, UC Berkeley Visualization in Text Analysis Problems VAC Consortium Meeting Stanford, May 24, 2006

Nobel Prize Winners Collection

Page 45: Marti Hearst School of Information, UC Berkeley Visualization in Text Analysis Problems VAC Consortium Meeting Stanford, May 24, 2006
Page 46: Marti Hearst School of Information, UC Berkeley Visualization in Text Analysis Problems VAC Consortium Meeting Stanford, May 24, 2006
Page 47: Marti Hearst School of Information, UC Berkeley Visualization in Text Analysis Problems VAC Consortium Meeting Stanford, May 24, 2006
Page 48: Marti Hearst School of Information, UC Berkeley Visualization in Text Analysis Problems VAC Consortium Meeting Stanford, May 24, 2006
Page 49: Marti Hearst School of Information, UC Berkeley Visualization in Text Analysis Problems VAC Consortium Meeting Stanford, May 24, 2006
Page 50: Marti Hearst School of Information, UC Berkeley Visualization in Text Analysis Problems VAC Consortium Meeting Stanford, May 24, 2006
Page 51: Marti Hearst School of Information, UC Berkeley Visualization in Text Analysis Problems VAC Consortium Meeting Stanford, May 24, 2006
Page 52: Marti Hearst School of Information, UC Berkeley Visualization in Text Analysis Problems VAC Consortium Meeting Stanford, May 24, 2006

New Site: eBay Express

Page 53: Marti Hearst School of Information, UC Berkeley Visualization in Text Analysis Problems VAC Consortium Meeting Stanford, May 24, 2006
Page 54: Marti Hearst School of Information, UC Berkeley Visualization in Text Analysis Problems VAC Consortium Meeting Stanford, May 24, 2006
Page 55: Marti Hearst School of Information, UC Berkeley Visualization in Text Analysis Problems VAC Consortium Meeting Stanford, May 24, 2006
Page 56: Marti Hearst School of Information, UC Berkeley Visualization in Text Analysis Problems VAC Consortium Meeting Stanford, May 24, 2006
Page 57: Marti Hearst School of Information, UC Berkeley Visualization in Text Analysis Problems VAC Consortium Meeting Stanford, May 24, 2006
Page 58: Marti Hearst School of Information, UC Berkeley Visualization in Text Analysis Problems VAC Consortium Meeting Stanford, May 24, 2006
Page 59: Marti Hearst School of Information, UC Berkeley Visualization in Text Analysis Problems VAC Consortium Meeting Stanford, May 24, 2006
Page 60: Marti Hearst School of Information, UC Berkeley Visualization in Text Analysis Problems VAC Consortium Meeting Stanford, May 24, 2006

Is This Visualization?

Prior experience and other people’s attempts seem to suggest that fewer graphics and more text is better.

Details of layout, font and color contrast, label selection, and interaction make all the difference.

Page 61: Marti Hearst School of Information, UC Berkeley Visualization in Text Analysis Problems VAC Consortium Meeting Stanford, May 24, 2006

Earlier Variation on the Idea

Cat-a-Cone, 1997

Page 62: Marti Hearst School of Information, UC Berkeley Visualization in Text Analysis Problems VAC Consortium Meeting Stanford, May 24, 2006

Mobile Variation FaThumb: Karlson, Robertson, Robbins, Czerwinski, Smith ’06 Well-received, but visualization part not looked at.

Page 63: Marti Hearst School of Information, UC Berkeley Visualization in Text Analysis Problems VAC Consortium Meeting Stanford, May 24, 2006

Flexibility in SenseMaking

DLITE by Cousins et al. ‘97 Sandbox by Wright et al. ‘06

Page 64: Marti Hearst School of Information, UC Berkeley Visualization in Text Analysis Problems VAC Consortium Meeting Stanford, May 24, 2006

Query History

Entities

Dimensions

TRIST (The Rapid Information Scanning Tool) is the work space for Information Retrieval and Information Triage.

Launch Queries

AnnotatedDocument Browser

Comparative Analysisof Answers and Content

User Defined andAutomatic

Categorization

Rapid Scanningwith Context

Linked Multi-Dimensional Views Speed Scanning

Flexibility in SensemakingTRIST, Jonkers et al 05

Page 65: Marti Hearst School of Information, UC Berkeley Visualization in Text Analysis Problems VAC Consortium Meeting Stanford, May 24, 2006

Flexibility for Sensemaking Support

Quick Emphasis of Items of

Importance.

Sandbox, Wright et al ‘06

Direct interactionwith Gestures(no dialog, no controls).

DynamicAnalytical Models.

Assertions with Proving/Disprovin

g Gates.

Page 66: Marti Hearst School of Information, UC Berkeley Visualization in Text Analysis Problems VAC Consortium Meeting Stanford, May 24, 2006

Communication-Centric Text

Email, conversations, blogsThe first thought is usually nodes and

linksDoesn’t have the desired flexibility

Some alternatives:The NetworkMultivariate Networks

Page 67: Marti Hearst School of Information, UC Berkeley Visualization in Text Analysis Problems VAC Consortium Meeting Stanford, May 24, 2006

Re-envisioning Networks Viewing people’s shared workplaces,

hometowns, schools over time. www.theyrule.net:

Page 68: Marti Hearst School of Information, UC Berkeley Visualization in Text Analysis Problems VAC Consortium Meeting Stanford, May 24, 2006

Re-envisioning Networks

First cut: Hastings, Snow, and King ’05

Page 69: Marti Hearst School of Information, UC Berkeley Visualization in Text Analysis Problems VAC Consortium Meeting Stanford, May 24, 2006

Re-envisioning Networks

Better version: Hastings, Snow, and King ’05

Page 70: Marti Hearst School of Information, UC Berkeley Visualization in Text Analysis Problems VAC Consortium Meeting Stanford, May 24, 2006

Re-envisioning Networks Wattenberg ’06 OLAP on directed labeled graphs

Page 71: Marti Hearst School of Information, UC Berkeley Visualization in Text Analysis Problems VAC Consortium Meeting Stanford, May 24, 2006

Network Flexibility

Page 72: Marti Hearst School of Information, UC Berkeley Visualization in Text Analysis Problems VAC Consortium Meeting Stanford, May 24, 2006

Martin Wattenberg, “Visual Exploration of Multivariate Graphs”

M FLocation A

Location B

Location C

Location D

Location E

Page 73: Marti Hearst School of Information, UC Berkeley Visualization in Text Analysis Problems VAC Consortium Meeting Stanford, May 24, 2006

Re-envisioning Networks

Idea: vary these ideas to apply to email and other communication text.

Page 74: Marti Hearst School of Information, UC Berkeley Visualization in Text Analysis Problems VAC Consortium Meeting Stanford, May 24, 2006

Summary:Text Viz Design Guidelines

An emphasis on flexible views on text data Emphasize brushing and linking using appropriate

visual cues. Interaction flow should guide the user but also be

flexible. Information structure should be consistent and

reproducible. Other guidelines:

Make text visible. Visual components should reflect the data and tasks.

Page 75: Marti Hearst School of Information, UC Berkeley Visualization in Text Analysis Problems VAC Consortium Meeting Stanford, May 24, 2006

Thank you!

www.sims.berkeley.edu/~hearst