hci and swe: tool support for crazed developers€¦ · – heuristics for software design. basic...

Visualization and Interaction for Business and Entertainment

MSR UW Workshop

2007

HCI and SWE: Tool Support for Crazed Developers

Mary Czerwinski, Research Area Manager, Human-Centered Computing

Manager, VIBE, Microsoft Research

Overview HCI and our Research Efforts in the SWE Domain

• Why UCD?Background on Psychology– Learning, Memory and Perception– Traditional view of HCI

• Methodologies and when they are useful

• Information worker and developer productivity and group awareness

• Future directions

Why user-centered design?

• Cost savings (well documented, see Neilsen, 1993)– Not always directly visible (support calls, resales,

product returns, distributed productivity benefits to user, sw development costs)

• Competitive market--user expectations

• Political demands

• Help might not help

What is Usable SW?

Useful - Does it do what is needed? (teach, find, manage $, communicate, share, escape)

UsableIs it easy to learn?Is efficient to use?Do no or only few errors occur?Is it easy to remember?

Desirable - Is it fun to use? Do you want to keep using it again and again?

What Can Research Tell us about Making Usable Software?

• Psychological research on human cognitive abilities:– attention; visual perception– memory; learning

• Research on human-computer interaction– applied research; task-oriented studies– heuristics for software design

Basic Cognitive Principles

• Associations are built by repetition

• Scaffold model - more likely to remember items that have many associations

• Recognition is easier than recall

• Working memory has small capacity (time & size)

• Long-term memory has large capacity (time & size)

Memory

http://images.google.com/imgres?imgurl=http://brainblogger.com/wp-content/uploads/Memory.JPG&imgrefurl=http://brainblogger.com/2006/06/17/studies-working-memory-key-to-breakthroughs-in-cognitive-neuroscience/&h=4050&w=2700&sz=2508&hl=en&start=15&sig2=g8UPqNXYNGIkH32ou1lRCQ&tbnid=8EVEjE3axXQ_ZM:&tbnh=150&tbnw=100&ei=eMvDRvmEIp-MeZ3imOoL&prev=/images?q=memory&gbv=2&svnum=10&hl=en


• Attention is a resource - gets divided between the different senses, different tasks

• Automatic, well-learned processes don’t require much attention so we can concentrate on new items

• Good design canprovide information where it is neededmake observer focus on one part of the displayprime an observer so they’re biased towards what you want them to see

Attention

http://images.google.com/imgres?imgurl=http://dericbownds.net/uploaded_images/attention.gif&imgrefurl=http://mindblog.dericbownds.net/2006_07_01_archive.html&h=427&w=580&sz=66&hl=en&start=5&sig2=p5RdfcEoSGkCPdUiwL3xbw&tbnid=R84Z7IkbDnHaSM:&tbnh=99&tbnw=134&ei=n8zDRtLTEJmMedXb8OkL&prev=/images?q=attention&gbv=2&svnum=10&hl=en


• We excel at pattern recognition

• We automatically try to organize visual displays and look for cues about the organization should be - gestalt principles

• Motion, grouping, contrast, color can make different parts of a display more or less salient

Visual Perception

http://images.google.com/imgres?imgurl=http://www.psychol.cam.ac.uk/lara/projects/images/vis-percep.jpg&imgrefurl=http://www.psychol.cam.ac.uk/lara/projects/vis-percep.html&h=200&w=200&sz=5&hl=en&start=6&sig2=LDlGnc0RUpaBJegnxI6Blg&tbnid=1h6vW4HHEibppM:&tbnh=104&tbnw=104&ei=zszDRu3-NqSCePf2meUL&prev=/images?q=visual+perception&gbv=2&svnum=10&hl=en


Memory, Attention, and Visual Perception Interact

MEMORY

ATTENTION

PERCEPTION What is this feature?Does it match the task?RecognitionPull info from memoryFeedback

HCI: Some Important Facts about Human Learning

• Learning is improved by organization– Also, grouping and levels of processing

• Consistency and mnemonics improve learning

• Targeted feedback facilitates learning

• Learning occurs across people and organizations

http://images.google.com/imgres?imgurl=http://www.asianvu.com/bookstoread/images/elfmk300.jpg&imgrefurl=http://www.asianvu.com/bookstoread/framework/&h=300&w=300&sz=44&hl=en&start=27&sig2=sLx6Ym-GuCYJHYcnCu-R_w&tbnid=r5YO8xh0uNQCxM:&tbnh=116&tbnw=116&ei=Ds3DRoLJPJqceL-Q1PwL&prev=/images?q=learning&start=18&gbv=2&ndsp=18&svnum=10&hl=en&sa=N

HCI: Human Learning Facts continued….• Learning proceeds faster and more

effectively when info is presented incrementally

• Some users like to explore systems to learn; others will not

• Workers focus on accomplishing tasks, not learning software

http://images.google.com/imgres?imgurl=http://www.asianvu.com/bookstoread/images/elfmk300.jpg&imgrefurl=http://www.asianvu.com/bookstoread/framework/&h=300&w=300&sz=44&hl=en&start=27&sig2=sLx6Ym-GuCYJHYcnCu-R_w&tbnid=r5YO8xh0uNQCxM:&tbnh=116&tbnw=116&ei=Ds3DRoLJPJqceL-Q1PwL&prev=/images?q=learning&start=18&gbv=2&ndsp=18&svnum=10&hl=en&sa=N

What Can Research Tell us about Making Usable Software?

• Research on human-computer interaction– Applied research; task-oriented lab

studies– Heuristics for software design– In situ studies– Logging– Surveys

Usability in your product cycle the earlier the better!

• Establish usability goals• Field research--tasks• Cognitive modeling• Competitive testing• Participatory design• UI design guidelines• Applied research• PSS communication• Roundtables• Low fidelity prototyping•Focus groups•Surveys

Quality Assurance• Competitive testing• Field testing • PSS communication

• Iterative test and design• Heuristic evaluation• Spec reviews• Low/Hi fidelity

Development

Planning

Toward user-centered design…early stages of cycle

• Modeling customers’ activities (even mental ones)Understand activities, then create a solutionGOMS-style modelsA way to share information as a team

• Generate multiple solutions

• Develop usability goalsMeasuring against clear, quantifiable goals

Usability metrics--data

♦Collecting data: video, protocols, subjective ratings and objective observations; debrief

♦Averages: times, % error time, # of trials before success, # of experimenter interventions, subjective ratings, # of task interrupts, % completed

♦Usability issues with # of Ss♦Look for patterns and lines of converging

evidence

Development stage: design, test & redesign

• Not traditional “waterfall” model

• Developing low-fi/hi-fi prototypes– Formative and heuristic evaluations first

• Test with a small number of users– Neilsen’s famous number 6

• Redesign based on feedback

• Evaluate again

Toward beta….

♦ Identify usability “showstoppers” before ship; fit and finish (e.g., audio tweaks, aesthetics)

♦Competitive benchmarking

♦Prioritize usability enhancements for next version

♦Field research to understand real usage of products in context and usability opportunities

Ship

♦Usability issues and recommendations for v. 2.0

♦Important to mark specifics down and publish so that positives and negatives of the design solution are archived

♦Usability issues should be tracked with PSS if unresolved

Important considerations...

♦Ethical treatment of Ss, consent forms and NDAs

♦Statistical power and significance

♦Guided exploration v. free discovery, learning v. initial use

♦Validity, reliability, and generalizability

♦Objectivity

Cautions about lab testing

• Doesn’t tell you what to design--structured user visits and interviews do

• We set the tasks, the design, and the analysis

• Best case performance

• Look for patterns of behaviors--the usability issues with the UI design; not necessarily hypothesis testing (but may in competitive or complex studies)

Some Examples

• Now, some examples of how we do user-centered design for iworkers and devs

• Work with Rob Deline, Gina Venolia, George Robertson, Andy Begel, KoriInkpen and many others

iWorker Diary Study: Motivation

• Hypothesis: Current software does not support multitasking well– How bad/universal is the problem?

• Seek SW design ideas…– Research shows users developing workaround

strategies– Interruptions research shows harmful effects of

incoming notifications on current task– Memory for To Do’s poor, undersupported– Need to better understand task switching and

multitasking

Method

• 10 multitasking users recruited• An excel spreadsheet was used as a

diary “template” to be filled out each day• Diaries emailed back to me each

evening• Participants instructed to write down

every “task switch”– how hard to switch, # of docs required, # of

interrupts experienced, task time, anything forgotten, notes, etc.

Partial diary for MS (6 hours)

About the same time…Large Display Findings

• Started exploring how user behavior changes as displays increase in size and resolution

• Found that users were significantly more productive when performing knowledge work (multitasking, task switching) with large displays

• Less window management=less cognitive load

• But still needed help with task management

• Created robust logger to determine how windowing behavior changed with larger displays

Tools for Task Management

• GroupBar joins related items in the taskbar, remembers spatial layouts of tasks (Smith et al., 2003)– Desktop “snapshots”– Can “rehydrate” tasks with the

press of a button

• Scalable Fabric and VibeLog(AVI 2004)– Over 5000 downloads of SF– Logging of task activity

Color Plate 1. Scalable Fabric showing the representation of three tasks as clusters of windows, and a single window being dragged

from the focus area into the periphery.


MSR UW Workshop

2007

Clipping Lists and Change Borders

Peripheral Information Display

Tara Matthews, Mary Czerwinski, George Robertson, and Desney Tan

Study of Proposed Solutions:Clipping Lists and Change Borders• Compare interfaces w/ varying types of abstraction

– All interfaces based on Scalable Fabric (SF)

• Abstraction types:– Change detection– Semantic content extraction

• 4 interfaces:

SF Semantic Content Extraction (Clippings)

SF + Change Detection Semantic Content Extraction + Change Detection

Baseline: Scalable Fabric• Tasks as piles

• Windows shrunken

SF Clippings

SF + Change Detection

Clippings + Change Detection

Change Borders

• Adds red borders around windows changing content

• Border turns green when change is complete

SF List


List + Change Detection

Red Change Border Green Change BorderRed Change Border Green Change Border

Clipping Lists

• Extracts window content

• Two ways to select content– Default: title bar– User WinCut– Future: AI

• Goal of selection:– Help w/ recognition,

resumption timing, and flow

SF List


List + Change Detection

Clipping Lists + Change Borders

• Extracts window content

• Adds green highlight to task boundary & windows that have changed

SF List


Clippings + Change Detection

tion

Study Results

• Semantic content extraction (Clipping Lists)– Is more effective

than both change detection and scaling

– Significantly benefits:

• Task flow• Resumption timing• Reacquisition

Average Task Times

540

560

580

600

620

640

660

680

700

Ave

rage

Tim

e in

Sec

onds

SF Clippings + Change

ClippingsSF + Change

Average Task Times

540

560

580

600

620

640

660

680

700

Ave

rage

Tim

e in

Sec

onds



Average Time to Resume Quiz

0

10

2030

40

50

6070

80

90

Ave

rage

Tim

e in

Sec

onds



Average Time to Resume Quiz

0

10

2030

40

50

6070

80

90

Ave

rage

Tim

e in

Sec

onds



Programmer Productivity: Team Tracks w/Rob Deline et al.

• We have observed devs struggling with unfamiliar code– Inefficient navigation to find task-relevant code– Misleading results of text searches– Disorientation from too much navigation, too many

open files, interruptions– [DeLine, Khella, Czerwinski, Robertson SoftVis ’05],

[Ko, Aung, Myers ICSE ’05]

• Team Tracks guides code exploration– Records the team’s code navigation during

development– Mines that data to prune the working set and guide

navigation

Evaluating Team Tracks

• Study 1: Does nav frequency indicate importance?– Setup: Four programming tasks, then ratings

questionnaire and quiz– Dependent measures: code paths, task completion,

ratings, quiz scores– Hypothesis: Navigation frequency correlates to

importance rating [reported at SoftVis ’05]

• Study 2: Does Team Tracks improve productivity?– Use Team Tracks with Group 1’s navigation data– Same set up and dependent measures– Hypothesis: Team Tracks improves task completions

and quiz scores

Navigation frequency does correlate with importance ratings

• Pearson product moment correlation, r=0.79, p<0.01

Team Tracks does improve task completion rates and quiz scores

•Improved task completion rates–All completed tasks 1 and 2–Task 3 (localized code): 1 / 7 without, 3 / 9 with Team Tracks–Task 4 (dispersed code): 1 / 7 without, 7 / 9 with Team Tracks

•Group 2 quiz scores significantly higher t(16)=-2.04, p<.03

–IE 8.0 team deployment ethnography next–Added annotations and other features


MSR UW Workshop

2007

Code ThumbnailsUsing Spatial Memory to Navigate Source Code (with larger displays)

DeLine, Czerwinski, Meyers, Venolia, Drucker, Robertson ▪ VL/HCC 06 ▪ 40/11

• Recent studies have documented the problem– Ko, Aung, and Myers 2005: 35% of developer task time is

navigation– DeLine, Khella, Czerwinski, Robertson 2005 report

disorientation

• Current navigation UI relies on remembering symbols– Most common are text search, symbol search, file boxes,

project tree view, class tree view

• Could developers use their spatial memory instead?

Code navigation is a problem


Code Thumbnails is designed to leverage spatial memory


Formative Evaluation

• Areas for feedback– Do developers like Code Thumbnails?– Do developers find CT useful for navigation?– Do developers form a spatial memory of the

CT visualizations?

• Participants– 11 developers (10 outside, 1 MS), average

15 years experience


Task structure

• Two-hour sessions– Introduction to Code Thumbnails (10 min)– Three programming tasks on 3000 KLOC C# code (75 min)– Targeted search (10 min)– Spatial memory quiz (10 min)– Survey and feedback (15 min)

• IDE operations logged for 5 participants– Includes both CT and standard features– Collected during programming tasks and targeted search


High survey marks1 2 3 4 5

Learnability

Ease of use

Preference

Satisfaction

Global navigation

Utility

Divided attention

Local navigation

Lack of frustration

Avg. Response : Unfavorable .. Favorable

0%10%20%30%40%50%60%70%80%90%

100%

1 2 3 4 5

Participant

Perc

ent o

f Act

ions

Click symbol searchresultSolution Explorer

Go To Definition

Click text searchresultCTD double-click

CTD thumbnail click

CTD title click

CTS scrollbar scroll

CTS thumbnail click


Frequent use during programming tasks


Frequent use during targeted search

• Find fifteen targets, using any feature– Find five files by name– Find five methods by name– Find five methods by functional description

• Often, multiple operations were used per search trial– e.g. CT Desktop to select a file, then scrolling within the file

• CT Desktop used in more trials than other operations– CT Desktop used in 64% of trials– Text search used in 16%– CT Scrollbar used in 11%– Solution Explorer used in 8%


– File searches significantly slower than method searches (just Fitt’s law)

– File searches significantly slower without thumbnails– Method searches not signficantly slower without thumbnails

(always fast)– Frequently accessed files had smaller first-click distance

(368 pixels vs 511)

Spatial memory quiz

5 files by name 5 methods by name5 files by name 5 methods by name

1 2 3 4


Related work

• Seesoft– Used code thumbnails to show statistics per line

• Eclipse scrollbar– Shows errors and file result as tick marks

• Aspect Browser– Shows search results in Seesoft style to help find aspects

• Data Mountain– Replacement for web Favorites leveraging spatial memory


MSR UW Workshop

2007

FASTDash: A Visual Dashboard for Fostering Awareness in Software Teams

Jacob T. Biehl*, Mary Czerwinski,Greg Smith & George G. Robertson

*Department of Computer Science†University of Illinois

VIBE GroupMicrosoft Research

Problem

• Dev coordination breakdowns are frequent and costly– Defects cost ~$60 billion to US economy [NIST ’02]

• Actions of team members are difficult to acquire– Common, unsatisfied information need [Ko, et.al.

‘07

• Lack techniques for gaining awareness information

Contextual Inquiry

• 90 surveys/13 structured interviews with MS developers

• Key set of detailed information needed– What source files are team members working in?– How are those files being used?– Are the files changing? If so, what parts?– Am I affected by the changes?

• Scattered resources used– Source code– Emails/IMs– Diagrams/notes on whiteboard– Bug DBs, check-in logs, status reports

• Frequently changing information

FASTDash

• Map information need onto a visualization

• Combines multiple sources of activity information– Source repository actions (e.g. check-ins, check-outs, conflicts)– Active file actions (e.g. open files, changing files, edit/debug state,

etc.)– Project related comments/notes (e.g. status, assistance messages)

• Designed to be a persistent visualization

• Targeted for project groups of 2-8 programmers

System Design

• Works automatically along side existing tools• Source repository independent

– Works with SourceDepot and Team Foundation Server

– Extendable to others (e.g. CVS or SVN)

• Utilizes IDE plug-in capabilities– Currently implemented for Visual Studio– Other IDE plug-ins could be easily integrated

(e.g. Eclipse)

• SQL database to centrally manage information

Study

• Evaluate impact on programmer awareness and overall behavior

• Observation-based field study– Provide semi-longitudinal exposure– Actual projects/workspace– Impact on use of existing practices/tools

• 6 experienced programmers– µ =12.3 years as professional developers

Methodology

• Coding scheme– Influenced by existing coding schemes– 5 coding categories (further details in paper)

• Communication, shared display use, shared physical artifact use, collaboration type, collaboration configuration

– Can be leveraged/applied in future studies

• Pre/post design– 2 days pre-visualization– 2 days with visualization

• Other measures include situation awareness ratings and pre/post questionnaires

Workspace

Results• General increase project-related communication

– “Why are you editing that file? It’s not part of what you are working on.”

– “You can’t leave yet, you have files still checked out and we need to runthe build tonight.”

Results

• Reduction in use of physical artifacts

• Trend toward improved situational awareness– Division of attention ratings reduced by 30%– Instability of situation ratings reduced by 30%

Use & Feedback

• Enabled global view of project activity– “makes it easier to verify if no item has been checked

out… before making the build for the final release”

• Provided instant reflection of project state– “I liked the real time info…and to know what [fellow]

developers are working on”– “The visualization of possible conflicts was useful”

• Increased utility of information through contextual notes/comments– “We usually make comments… but those ‘verbal’

comments are often lost. Placing [flags] with the common on the context where it applies is cool”

• Voluntary continued use

FASTDash Future Work

• Better understand which features of FASTDash were most/least useful

• Evaluate the long-term efficacy and impact of FASTDash

• Extend the visualization to support other information and iworker workgroups

• Explore augmenting the visualization to address issues of scale and artifact importance

Conclusion

• Basic psychology can be used to– Derive HCI principles– Offer methodologies for HCI

• More opps for UCD early in product life cycle; look for converging lines of evidence– Different methods appropriate at different times

• Can’t know what to design without understanding current practice or what’s wrong with current designs


MSR UW Workshop

2007

Thank you for your attention!http://research.microsoft.com/research/vibe

http://research.microsoft.com/research/vibe

http://research.microsoft.com/research/vibe

Task Frequencies Breakdown

Frequency of Task Type

Downtime0%

Email23%

Meeting6%

Personal5%

Project18%

Routine Task27%

Telephone Call8%

Task Tracking13%

Indicative of Difficulty Tracking Tasks

“Returned to”Tasks from this group

Frequency of Task Shift Initiators

Frequency of Switch Causes

Email3%

Next Task19%

Self-Initiated40%

Telephone Call14%

Return to Task7%

Other Person1%

New Information Request

3%

Emergency1%

Appointment9%

Deadline2%

App Prompt1%

Difficulty Switching by Type

Rated Difficulty Sw itching to Task

0

1

2

3

Task Type

Diff

icul

ty S

witc

hing

(1=L

ow,

2=M

ed, 3

=Hig

h)

Other Tasks

Returned-to Tasks

Task Length by Type

Task Duration by Task Type

0

20

40

60

80

100

120

140

160

Task Type

Ave

rage

Tas

k D

urat

ion

(Min

s)

Other TasksReturned-to Tasks

Document Requirements by Task Type

Number of Documents by Task Type

0

0.5

1

1.5

2

2.5

3

Task Type

Aver

age

# of

Doc

s


Difficulty Switching by Type

Rated Difficulty Sw itching to Task

0

1

2

3

Task Type

Diff

icul

ty S

witc

hing

(1=L

ow,

2=M

ed, 3

=Hig

h)

Other Tasks

Returned-to Tasks

Task Length by Type

Task Duration by Task Type

0

20

40

60

80

100

120

140

160

Task Type

Ave

rage

Tas

k D

urat

ion

(Min

s)


Document Requirements by Task Type

Number of Documents by Task Type

0

0.5

1

1.5

2

2.5

3

Task Type

Aver

age

# of

Doc

s


Number of Interruptions by Task Type

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2

Task Type

Aver

age

Num

ber

of In

terr

uptio

ns


Interruptions by Task Type

Focus on Returned to Tasks

• Elapsed time spanned hours to days• Maintaining desktop state isn’t always the

answer– Often, users said they were waiting on info from

other people or places (web, server)—prospective reminders needed here

– Info came in via phone, email, web, or personal contacts (better app integration needed here)

– But reminding about task context and info assembly / layout was a key problem identified

General Design Ideas from Participants

• Smarter, adjustable To Do list tracking & alarming

– In the projects versus just in Calendar

– Consider sticky notes for partial / future tasks

• Auto-categorization of email and files

• Better reminders for things forgotten – Track events we know about and visualize them, or rely on

user manual tagging

• Better user adaptivity– e.g., knowing what kinds of paste operations a user

typically performs and automating them

Findings

• During a given week, KWs task shift an awful lot (avg. 10 task shifts a day)

• Long-term projects are more complex shifts– Lengthier (11.25% of the week), more documents,

interrupts, “returns”– Rated significantly harder to return to

• Passage of time also takes its toll• What designs will help?