evaluation shneiderman and plaisant chapter 4. introduction iterative design –current “best...

Evaluation

Shneiderman and Plaisant

Chapter 4

Introduction

• Iterative design– Current “best practice”– Specialization of Boehm’s spiral

model– Cost increases in radial dimension

• Last time, design– “Radically transformational”

• So, count on multiple passes

– Requirements• Know task and user

– Guidelines

• This time, evaluation– How, after all, to know if usable

Design

Evaluate Implement

Overview

• Introduction– Evaluation Plans, Acceptance Testing, and Life Cycle

• Expert reviews

• Usability testing and techniques– Goal is to “engineer” a good interface, constrained by time and cost

• Survey instruments

• Acceptance tests

• Evaluation during active use

• “Controlled psychologically oriented experiments”– Elements of science, as applied to interface evaluation

Introduction

• Usability of interface design – a key component …from 2nd week!

• Evaluation required to know how/if “usable”– By whatever means … reviews, surveys, etc.

• Again, what makes sense (is appropriate for) programmer/expert not right for general user population

– … an early point – “Know thy user”

• and know how “thy user” performs with the system• and how the system performs with “thy user”

– and the way to know is by evaluation

• In Shneiderman-ese:

– “Designers can become so entranced with their creations that they may fail to evaluate them adequately.”

– “Experienced designers have attained the wisdom and humility to know that extensive testing is a necessity.”

– “If feedback is the “breakfast of champions”, then testing is the “dinner of gods”

Evaluation Plan – Metrics of Usability “It’s Fundamental”

• “Evaluation plan” should be part of system development … and life cycle … in larger projects

– Also, part of “acceptance tests”• Objective measurable goals for hardware and software performance

– So, for system performance:• Response time, functionality, reliability, …

– And for usability, user-experience:• Values on specific metrics

– Also, part of “maintenance” after deployment

• As noted, metrics of usability include:– Time to learn specific tasks– Speed of task performance– Rate of errors– Retention of commands or task sequences over time– Frequency of help/assistance requests– Subjective user satisfaction

Evaluation Plan – Depends of Project

• Evaluation plan varies, depending on project– Range of costs might be from 10%-20% of a project down to 5%.– Range of evaluation plans might be from years to a few days test

• Will have different kinds of elements depending on:

– Stage of design (early, middle, late) • Key screens, prototype, final system

– Novelty of project • Well defined vs. exploratory

– Number of expected users

– Criticality of the interface • E.g., life-critical medical system vs. museum exhibit support

– Costs of product and finances allocated for testing

– Time available

– Experience of design and evaluation team

“Step-by-Step Usability Guide”from Shneiderman, web site example

• From Shneiderman

But, Testing is not a Panacea, so …

• Nonetheless, testing can’t eliminate all problems

• In essence, plan for remaining problems/challenges• As part of evaluation plan!

• Cost of eliminating error (or enhancing performance) does not increase linearly

• I.e., the obvious things are easy, the hard require more resources to refine

• Design/cost decision made about what amount of costs to allocate

• Still, some things extremely hard to test

• E.g., user performance under stress

Expert Reviews, 1

• Expert reviews can be from “just asking for feedback” to structured techniques, e.g., heuristic review, guidelines review

– … of course, you have to have an expert, and large organizations do– Expert needs to be familiar with domain and design goals

• One-half day to one week effort– But lengthy training period may sometimes be required to explain task domain or

operational procedures

• Even informal demos to colleagues or customers can provide some useful feedback

– More formal expert reviews have proven to be effective

Expert Reviews, 2

• Can be scheduled at several points in development process – When experts are available– When design team is ready for feedback

• Different experts tend to find different problems in an interface– 3-5 expert reviewers can be highly productive, as can complementary usability

testing

• Caveats:– Experts may not have an adequate understanding of task domain or user

communities– Conflicting advice– Even experienced expert reviewers have great difficulty knowing how typical

users, especially first-time users will really behave

Expert Review Techniques

• Heuristic evaluation– General review for adherence of interface to principles of successful design, e.g., Nielsen

• E.g., “error messages should be informative”, “feedback provided”

– Adherence to some theory or model, e.g., object-action model

• Guidelines review– Check for conformance with guidelines– Given complexity of guidelines, can be significant effort

• Consistency inspection – E.g., of interface terminology, fonts, color schemes, input/output format

• Formal usability inspection/review – as part of SE process– Structure forum for critiqueing (if not courtroom style …)– Might be occasion to request exception for guideline deviation

• Bird’s-eye view of interface – By, e.g., full set of printed screens on wall– Inconsistencies, organization, etc. more evident

• Cognitive walkthrough …

Heuristic EvaluationRecall, Nielsen’s Heuristics

• Meet expectations – 1. Match the real world– 2. Consistency & standards – 3. Help & documentation

• User is boss – 4. User control & freedom – 5. Visibility of system status – 6. Flexibility & efficiency

• Errors – 7. Error prevention – 8. Recognition, not recall– 9. Error reporting, diagnosis, and recovery

• Keep it simple – 10. Aesthetic & minimalistic design

Heuristic Evaluationcf. Nielsen, useit.com article

• A small number of experts / evaluators either use or observe use of system and provide list of problems, based on heuristics

– A type of “discount usability testing”– Recall, principles of Nielsen,

Shneiderman, Togazinni, and others

• Some evaluators find some problems, others find others

– Nielsen recommends 3-5 evaluators

• Steps :– Inspect UI thoroughly– Compare UI against heuristics – List usability problems

• Explain and justify each problem with heuristics

How To Do Heuristic EvaluationDetails

• Justify every problem with a heuristic– “Too many choices on the home page - Aesthetic & minimalistic Design”– Can’t just say “I don’t like the colors”, but can justify

• List every problem – Even if an interface element has multiple problems

• Go through the interface at least twice – Once to get the feel of the system – Again to focus on particular interface elements

• Don’t limit to a single heuristic set (“8 Golden Rules”, Nielsen, etc.) – Others: affordances, visibility, perceptual elements, color principles

• But, a particular heuristic set, e.g., Nielsen’s, is easier to compare against

Example

• .

Example

• Shopping cart icon not balanced with its background whitespace (Aesthetic & minimalist design)

• Good: user is greeted by name (Visibility of system status)

• Red is used both for help messages and for error messages (Consistency, Match real world)

• “There is a problem with your order”, but no explanation or suggestions for resolution (Error reporting)

• ExtPrice and UnitPrice are strange labels (Match real world)

• Remove Hardware button inconsistent with Remove checkbox (Consistency)

Example

• "Click here“ is unnecessary (Aesthetic & minimalist design)

• No “Continue shopping" button (User control & freedom)

• Recalculate is very close to Clear Cart (Error prevention)

• “Check Out” button doesn’t look like other buttons

` (Consistency, both internal & external)

• Uses “Cart Title” and “Cart Name” for the same concept (Consistency)

• Must recall and type in cart title to load (Recognition not recall, Error prevention, Flexibility & efficiency)

Heuristic Evaluation is Not User Testing

• Evaluators not the user either – Maybe closer to being a typical user than the coder/developer is, though

• Analogy: code inspection vs. testing

• Heuristic evaluation finds problems that user testing often misses– E.g., inconsistent fonts

• But user testing is the “gold standard” for usability

Hints for Better Heuristic Evaluation

• Use multiple evaluators – Different evaluators find different problems– The more the better, but diminishing returns– Nielsen recommends 3-5 evaluators

• Alternate heuristic evaluation with user testing– Each method finds different problems– Heuristic evaluation is cheaper

• Use “observer” with evaluator– Adds cost, but cheap enough anyway– Take notes– Provide domain guidance, where needed

• It’s OK for observer to help evaluator

• As long as the problem has already been noted

• This wouldn’t be OK in a user test

Writing Good Heuristic Evaluations(fyi)

• Heuristic evaluations must communicate well to developers and managers

• Include positive comments as well as criticisms– “Good: Toolbar icons are simple, with good contrast and few colors (minimalist

design)”

• Be tactful– Not: “the menu organization is a complete mess”– Better: “menus are not organized by function”

• Be specific– Not: “text is unreadable”– Better: “text is too small, and has poor contrast (black text on dark green

background)”

Suggested Report Format(fyi)

• What to include: – Problem – Heuristic– Description– Severity – Recommendation (if any) – Screenshot (if helpful)

Formal Evaluation

• Formal evaluation typically much larger effort

• Again, consider a large scale SE project

• Will look at some elements of formal evaluation– Training, evaluation, severity ratings, debriefing

Formal Evaluation Process

• 1. Training – Meeting for design team & evaluators – Introduce application – Explain user population, domain, scenarios

• 2. Evaluation – Evaluators work separately– Generate written report, or oral comments recorded by an observer– Focus on generating problems, not on ranking their severity yet – 1-2 hours per evaluator

• 3. Severity Rating – Evaluators prioritize all problems found (not just their own)– Take the mean of the evaluators ratings

• 4. Debriefing – Evaluators & design team discuss results, brainstorm solutions

Severity Ratings

• Contributing factors – Frequency: how common? – Impact: how hard to overcome?– Persistence: how often to overcome?

• Severity scale– 1. Cosmetic: need not be fixed – 2. Minor: needs fixing but low priority – 3. Major: needs fixing and high priority – 4. Catastrophic: imperative to fix

Evaluating Prototypes

• Heuristic evaluation works on: – Sketches – Paper prototypes – Unstable prototypes

• “Missing-element” problems are harder to find on sketches – Because you’re not actually using the interface, you aren’t blocked by feature’s

absence – Look harder for them

Cognitive Walkthrough

• Experts “walks through” the design, as a user would to carry out specific tasks

– Identifies potential problems using psychological principles• E.g., “user has to remember action too long to successfully recall”• Principle that short term memory is limited – more later

– Usually performed by expert in cognitive psychology

– Evaluates design on how well it supports user in learning task, etc.

• Analysis focuses on goals and knowledge: – Does the interface design lead user to generate correct goals?

• E.g., at low level, having arrow in list box helps user form goal to click and select among alternatives

• For each task walkthrough considers– What impact will interaction have on user?– What cognitive processes are required?– What learning problems may occur?

Kinds of User Tests

• Formative evaluation – Find problems for next iteration of design – Evaluates prototype or implementation, in lab, on chosen tasks – Qualitative observations (usability problems)

• Field study – Find problems in context – Evaluates working implementation, in real context, on real tasks – Mostly qualitative observations

• Controlled experiment – Tests a hypothesis, e.g., interface X is faster than interface Y– Evaluates working implementation, in controlled lab environment, on chosen

tasks – Mostly quantitative observations (time, error rate, satisfaction)

Usability Testing and Laboratories, 1

• Usability testing and laboratories since early 1980s– Speeded up projects and cut costs – which led to acceptance and implementation

• Usability testing is a unique practice

– Roots of techniques in experimental psychology

– Again, interface design is like engineering• Which draws on science, but is a practice

– Not testing hypotheses about theories• Rather, goal is to refine interfaces rapidly

– A variable or few at a time approach not appropriate• Too slow, costly

• “User interface architect”– Works with usability laboratory– Carry out elements of evaluation plan

• “Pilot test”

Usability Testing and Laboratories, 2

• Small usability lab– Two areas:

• one for the participants to perform the tasks

• another, separated by a half-silvered mirror, for the testers and observers

• Participants should be chosen to represent intended user communities

– Consider … background in computing, experience with task, motivation, education, ability with natural language used in interface

Usability Testing – Techniques, 1

• Thinking aloud protocols

– Surprisingly straightforward and effective technique

– Users simply say what they are doing• User observed performing task• user asked to describe what he is doing and why,

what he thinks is happening etc.

– Well studied methodology

– Advantages• simplicity - requires little expertise• can provide useful insight• can show how system is actually use

– Disadvantages• subjective• selective• act of describing may alter task performance


• Videotaping– Useful for later review and showing designers or managers problems users

encounter– Sessions can be “coded” by observers for data reduction

• Paper mockups– Sketches, story-boards

• Actually, used often!– Different skills (and costs) for programming and sketching

• “Discount usability testing”– Shneiderman’s, and others, term for “quick and dirty”– “Rapid usability testing”

• Rapid, perhaps lo fidelity, prototype• “Global” task performance

• Competitive usability testing– Compares new design with existing or others– Essentially, “incremental” testing of changes with existing as baseline


• Universal usability testing– Considers diversity of hardware

platforms and users– E.g., ambient light levels, network

speed, age groups, color-blindness

• Field test and portable labs– Puts logging software, usability task,

video equipment, etc. where will be used

– Cost benefits and validity

• Remote usability testing– Web-based application natural, e-

feedback in general

• Can-you-break-this tests– … like it says …

Usability Testing – Limitations

• Emphasizes first-time users– After all, bringing in people to laboratory

• In fact, often solicited in large labs in newspaper

– Short sessions show only first part of learning curve …

• Only possible to see part of complete system functionality

• Testing should be performed in environment in which system to be used– Office, home, outside … not in laboratory

– Misses context of use

• Testing should be performed for long duration– And such long duration testing should be part of plan, but is often not

Ethics of User Testing

• Users are human beings – Human subjects have been seriously

abused in past

• Research involving user testing is now subject to close scrutiny

• Institutional Review Board (IRB)

must approve user studies

• Pressures on a user– Performance anxiety – Feel like an intelligence test – Comparing self with other subjects – Feeling stupid in front of observers – Competing with other subjects

Informed consent statement:

I have freely volunteered to participate in this experiment.

I have been informed in advance what my task(s) will be and what procedures will be followed.

I have been given the opportunity to ask questions, and have had my questions answered to my satisfaction.

I am aware that I have the right to withdraw consent and to discontinue participation at any time, without prejudice to my future treatment.

My signature below may be taken as affirmation of all the above statements; it was given prior to my participation in this study.

Treat the User With Respect

• Time – Don’t waste

• Comfort – Make the user comfortable

• Informed consent – Inform the user as fully as possible

• Privacy – Preserve the users privacy

• Control– The user can stop at any time

Before a Test

• Time– Pilot-test all materials and tasks

• Comfort (psychological and physical)– “We’re testing the system; we’re not testing you”– “Any difficulties you encounter are the system’s fault. We need your help to find

these problems.”

• Privacy – “Your test results will be completely confidential”

• Information – Brief about purpose of study – Inform about audio taping, videotaping, other observers – Answer any questions beforehand (unless biasing)

• Control– “You can stop at any time.”

During the Test

• Time– Eliminate unnecessary tasks– Comfort, calm, relaxed atmosphere– Take breaks in long session– Never act disappointed– Give tasks one at a time– First task should be easy, for an early success experience

• Privacy– E.g., user’s boss shouldn’t be watching

• Information– Answer questions (where won’t bias)

• Control– User can give up a task and go on to the next– User can quit entirely

After the Test

• Comfort – Say what they’ve helped you do

• Information – Answer questions that you had to defer to avoid biasing the experiment

• Privacy – Don’t publish user-identifying information – Don’t show video or audio without user permission

Formative Evaluation

• Find some users– Should be representative of the target user class(es), based on user analysis

• Give each user some tasks– Should be representative of important tasks, based on task analysis– Watch user do the tasks

• Roles in formative evaluation– User – Facilitator – Observers

User’s Role

• E.g., user should think aloud – What they think is happening – What they’re trying to do – Why they took an action

• Problems – Feels odd– Thinking aloud may alter behavior– Disrupts concentration

• Another approach: pairs of users – Two users working together are more key to converse naturally– Also called co-discovery, constructive interaction

Facilitator’s Role

• Does the briefing

• Provides the tasks

• Coaches the user to think aloud by asking questions – “What are you thinking?”– “Why did you try that?”

• Controls the session and prevents interruptions by observers

Observer’s Role

• Be quiet– Don’t help, don’t explain, don’t point out mistakes

• Take notes – Watch for critical incidents: events that strongly affect task performance or

satisfaction – Usually negative

• Errors

• Repeated attempts

• Curses

– May be positive • “Cool”

• “Oh, now I see”

Recording Observations

• Pen & paper notes – Prepared forms can help

• Audio recording – For think-aloud

• Video recording – Usability abs often set up with two cameras, one for user’s face, one for screen – User may be self-conscous – Good for closed-circuit view by observers in another room – Generates too much data – Retrospective testing: go back through the video with the user, disicussng critical

incidents

• Screen capture & event logging – Cheap and unobtrusive

How Many Users?(supplementary)

• Landauer-Nielsen model– Every tested user finds a fraction L of usability problems (typcal L = 31%)– If user tests are independent, then n users will find a fraction 1-(1-L)^n– So 5 users will find 85% of the problems

• Which is better: – Using 15 users to find 99% of problems with one design iteration – Using 5 users to find 85% problems with each of three design iterations

• For multiple user classes, get 3-5 users from each class

Flaws in Nielsen-Landauer Model (supplementary)

• L may be much smaller than 31% – Spool & Schroeder study of a CD-purchasing web site found L=8%, so 5 users

only find 35% of problems

• L may vary from problem to problem – Different problems have different probabilities of being found, caused by:

• Individual differences • Interface diversity • Task complexity

• Lesson: you can’t predict with confidence how many users may be needed

Usability Testing - Other Techniques

• Physiological methods

• Eye tracking– Head or desk mounted equipment

tracks position of eye– Eye movement reflects amount of

cognitive processing a display requires

• Measurements include– Fixations: eye maintains stable

position. • Number and duration indicate level of

difficulty with display

– Saccades: rapid eye movement from one point of interest to another

– Scan paths: moving straight to a target with a short fixation at the target is optimal

Usability Testing – “Heat Maps”, 1

• Eye movement (gaze) data mapped to false color

• “Eyetracking Web Usability”, Nielsen, 2009

Usability Testing – “Heat Maps”, 2

• Eye movement (gaze) data mapped to false color

• “Search Engine Optimization for Dummies”, Koneka

Usability Testing - Other Techniques

• Physiological measurements

• Emotional response linked to physical changes

• These may help determine a user’s reaction to an interface

• Measurements include:– heart activity, including blood pressure, volume and pulse. – activity of sweat glands: Galvanic Skin Response (GSR)– electrical activity in muscle: electromyogram (EMG)– electrical activity in brain: electroencephalogram (EEG)

• Difficulty in interpreting these physiological responses – more research needed

Survey Instruments and Questionnaires(briefly)

• Familiar, inexpensive and generally acceptable companion for usability tests and expert reviews

– Advantages• quick and reaches large user group• can be analyzed more rigorously

– Disadvantages• less flexible• less probing

– Long, detailed example in text

• Keys to successful surveys:– Clear goals in advance – what information is required– Development of focused items that help attain the goals

• Styles of question– General, open-ended, scalar, multiple-choice, ranked

• Users could be asked for their subjective impressions about specific aspects of interface such as representations of:

– task domain objects and actions – syntax of inputs and design of displays.

Surveys and Questionnaires, 2(briefly)

• Other goals would be to ascertain:– users background

• (age, gender, origins, education, income) – experience with computers

• (specific applications or software packages, length of time, depth of knowledge) – job responsibilities

• (decision-making influence, managerial roles, motivation) – personality style

• (introvert vs. extrovert, risk taking vs. risk aversive, early vs. late adopter, systematic vs. opportunistic) – reasons for not using an interface

• (inadequate services, too complex, too slow) – familiarity with features

• (printing, macros, shortcuts, tutorials) – their feeling state after using an interface

• (confused vs. clear, frustrated vs. in-control, bored vs. excited)

• Online surveys avoid cost of printing and extra effort needed for distribution and collection of paper forms

• Many people prefer to answer a brief survey displayed on a screen, instead of filling in and returning a printed form,

– although there is a potential bias in the sample.

Acceptance Test(briefly)

• As noted at outset:– For large implementation projects, customer or manager usually sets objective and

measurable goals for hardware and software performance

• If completed product fails to meet these acceptance criteria, system must be reworked until success is demonstrated

– “deliverables” include

• Again, measurable criteria for user interface can be established and might include:

– Time to learn specific functions – Speed of task performance – Rate of errors by users – Human retention of commands over time – Subjective user satisfaction

• In large system, there may be 8 or 10 such tests to carry out on different components of interface and with different user communities

• Once acceptance testing has been successful, there may be a period of field testing before national or international distribution

Evaluation During Active Use(briefly)

• Recall, evaluation plan should include evaluation throughout software’s life cycle

– Successful active use requires constant attention from dedicated managers, user-services personnel, and maintenance staff

– “Perfection is not attainable, but percentage improvements are possible”

• Idea of “gradual interface dissemination” useful for minimal disruption

– Continue to fix problems and refine design (including user interface)– Taken further by alpha and beta testing

• Many techniques available:– Interviews and focus group discussions– Continuous user-performance data logging– Online suggestion box or e-mail trouble reporting– Discussion group and newsgroup

• Interviews and focus group discussions – Interviews with individual users can be productive because the interviewer can

pursue specific issues of concern. – Group discussions are valuable to ascertain the universality of comments

Evaluation During Active Use, 2(briefly)

• Continuous user-performance data logging – The software architecture should make it easy for system managers to collect data about

– The patterns of system usage - Speed of user performance– Rate of errors - Frequency of request for online assistance

– A major benefit is guidance to system maintainers in optimizing performance and reducing costs for all participants

• Online or telephone consultants – Many users feel reassured if they know there is a human assistance available – On some network systems, the consultants can monitor the user's computer and see the

same displays that the user sees

• Online suggestion box or e-mail trouble reporting – Electronic mail to the maintainers or designers. – For some users, writing a letter may be seen as requiring too much effort

• Discussion group and newsgroup– Permit postings of open messages and questions– Some are independent, e.g. America Online and Yahoo!– Topic list– Sometimes moderators– Social systems– Comments and suggestions should be encouraged.

Controlled Psychologically-oriented Experiments- Context

• Recall, idea that goals for the engineering practice of interface design and implementation differ from goals for science of psychology (or, for that matter a science of HCI)

– Goal of interface design is to design and implement (“good”) interfaces rapidly, or, in a pragmatic, cost effective manner

• Hence, different techniques appropriate

• Following from this, Shneiderman suggests that:– As scientific and engineering progress is often stimulated by improved techniques for

precise measurement,– Rapid progress in the designs of interfaces will be stimulated as researchers and

practitioners evolve suitable human-performance measures and techniques– For example:

• Appliances have energy efficiency ratings, and • Interfaces might have measures such as learning time for tasks, user satisfaction

ratings

• A second principle Shneiderman suggests is to “adapt” elements of “the scientific method” to HCI, or interface design

– Bears looking at, but understand that he is speaking of both “science” and “empirical investigation”

– In fact, this is how you, as students educated in science and engineering should think!

Controlled Psychologically-oriented Experiments- “Empirical Investigation for Interface Design”

(supplemental)

• “The scientific method (Shneiderman)”, or, empirical investigation, as applied to HCI and interface design

– Deal with a practical problem and consider the theoretical framework

• To help the user learn how to navigate through the information presented he/she should be shown which items to select as objects and where committing to some action will take him/or her

– State a lucid and testable hypothesis • By changing (color, font) of the item it will be more easily selected as shown by the time to perform the task decreasing by 2

seconds

– Identify a small number of independent variables that are to be manipulated • Those things to change (manipulate), e.g., color, font

– Carefully choose the dependent variables that will be measured • Those things to measure, e.g., time to complete task

– Judiciously select subjects and carefully or randomly assign subjects to groups• As noted below – one of several “biasing factors”

– Control for biasing factors (non-representative sample of subjects or selection of tasks, inconsistent testing procedures)

• So that any change in value of dependent variable is not attributable to anything except the difference in independent variable

– Apply statistical methods to data analysis • So you know what to expect by chance, measurement error, etc.

– Resolve the practical problem, refine the theory, and give advice to future researchers


• Recall, … goals for the engineering practice of interface design and implementation differ from goals for science …psychology, physics

– (or, for that matter a science of HCI)

• Goal of interface design is to design and implement (“good”) interfaces rapidly, or, in a pragmatic, cost effective manner

– Hence, different techniques appropriate




• Following from this, Shneiderman suggests that:– As scientific and engineering progress is often stimulated by improved

techniques for precise measurement,– Rapid progress in the designs of interfaces will be stimulated as researchers and

practitioners evolve suitable human-performance measures and techniques– For example:

• Appliances have energy efficiency ratings, and • Interfaces might have measures such as learning time for tasks, user

satisfaction ratings




• A second principle Shneiderman suggests is to “adapt” elements of “the scientific method” to HCI, or interface design

– Bears looking at, but understand that he is speaking of both “science” and “empirical investigation”

– In fact, this is how you, as students educated in science and engineering should think!

Controlled Psychologically-oriented ExperimentsShneiderman - “Empirical Investigation for Interface Design”

• How to “ adapt the scientific method, or empirical investigation, as applied to HCI and interface design

– Deal with a practical problem and consider the theoretical framework

• To help the user learn how to navigate through the information presented he/she should be shown which items to select as objects and where committing to some action will take him/or her

– State a lucid and testable hypothesis • “By changing (color, font) of the item it will be more easily selected as shown by the

time to perform the task decreasing by 2 seconds”

– Identify a small number of independent variables that are to be manipulated • Those things to change (manipulate), e.g., color, font

– Carefully choose the dependent variables that will be measured • Those things to measure, e.g., time to complete task

Controlled Psychologically-oriented ExperimentsShneiderman - “Empirical Investigation for Interface Design”

• How to “ adapt the scientific method, or empirical investigation, as applied to HCI and interface design

– Judiciously select subjects and carefully or randomly assign subjects to groups

• As noted below – one of several “biasing factors”

– Control for biasing factors (non-representative sample of subjects or selection of tasks, inconsistent testing procedures)

• So that any change in value of dependent variable is not attributable to anything except the difference in independent variable

– Apply statistical methods to data analysis • So you know what to expect by chance, measurement error, etc.

– Resolve the practical problem, refine the theory, and give advice to future researchers

End

• Materials from:– Shneiderman publisher site:

• http://wps.aw.com/aw_shneider_dtui_4 &5/

– John Klemmer’s Intro. to HCI Design course• http://hci.stanford.edu/courses/cs147/

– MIT OpenCourseware, Robert Miller’s User Interface Design and Implementation• http://ocw.mit.edu/OcwWeb/Electrical-Engineering-and-Computer-Science/6-831Fall-2004/CourseHo

me/index.htm

http://wps.aw.com/aw_shneider_dtui_4/

http://hci.stanford.edu/courses/cs147/

http://ocw.mit.edu/OcwWeb/Electrical-Engineering-and-Computer-Science/6-831Fall-2004/CourseHome/index.htm

http://ocw.mit.edu/OcwWeb/Electrical-Engineering-and-Computer-Science/6-831Fall-2004/CourseHome/index.htm

evaluation shneiderman and plaisant chapter 4. introduction iterative design –current “best...

Documents

evaluation shneiderman

interface evaluation

project evaluation plan

evaluation team slide

fundamental evaluation

shneiderman slide

range of evaluation

acceptance testing