evaluating the habitability of q&a with user-generated tasks

37
Evaluating the Habitability of Q&A With User-Generated Tasks Bill Ogden Ron Zacharski Jim McDonald Roger Chadwick New Mexico State University

Upload: mignon

Post on 14-Jan-2016

32 views

Category:

Documents


0 download

DESCRIPTION

Evaluating the Habitability of Q&A With User-Generated Tasks. Bill Ogden Ron Zacharski Jim McDonald Roger Chadwick New Mexico State University. Habitability. Watt (1968) - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Evaluating the Habitability of Q&A With User-Generated Tasks

Evaluating the Habitability of Q&A With User-Generated Tasks

Bill OgdenRon ZacharskiJim McDonald

Roger Chadwick

New Mexico State University

Page 2: Evaluating the Habitability of Q&A With User-Generated Tasks

CRLCOMPUTINGRESEARCHLABORATORY

Habitability

• Watt (1968) – A language is considered habitable if users can

express everything that is needed for a task using language they would expect the system to understand.

– If there are 26 ways that a user population would be likely to ask a question, a habitable system will process all 26.

• Goal is to improve NL query systems by achieving habitability

Page 3: Evaluating the Habitability of Q&A With User-Generated Tasks

CRLCOMPUTINGRESEARCHLABORATORY

How to Achieve Habitability

• Discover all ways a user population will likely ask a question– Will depend on:

• User characteristics, knowledge, expectations– Subject Domain Knowledge– Past experience w/ Q&A

• Perceived capability of the system– Interface presentation, visibility– Feedback, error handling

Page 4: Evaluating the Habitability of Q&A With User-Generated Tasks

CRLCOMPUTINGRESEARCHLABORATORY

Evaluating Interactive IR and Q&A

• “You are not sure about the safety of genetically engineered foods, and would like to find more information and research on this topic. Name four potential types of safety problems that have been raised.”

• 8 tasks, 6-12 users, 2-3 sessions.

• Recorded screen and voice.

Page 5: Evaluating the Habitability of Q&A With User-Generated Tasks

CRLCOMPUTINGRESEARCHLABORATORY

Evaluating Interactive IR and Q&A

• Using surrogate users/tasks may not capture ‘real’ Q&A user behavior

• We are looking for ways to observe users who are working on their own questions.– Lack of control will be offset by richness of

behavior

Page 6: Evaluating the Habitability of Q&A With User-Generated Tasks

CRLCOMPUTINGRESEARCHLABORATORY

How many users?

• Typical usability evaluation – Defined task are given to representative users– Widely held theory that five users will detect

almost all software usability problems. Jakob Nielsen and Thomas Landauer (1993)

– Recent usability evaluation for a web CD shopping site suggest otherwise. Perfetti and Landesman http://www.uie.com/Articles/eight_is_not_enough.htm

– 18 users each with a list of CDs they wanted to purchase, found 247 total obstacles-to-purchase with only 35% in first five users.

– Self-generated tasks leads to more discovery.

Page 7: Evaluating the Habitability of Q&A With User-Generated Tasks

CRLCOMPUTINGRESEARCHLABORATORY

Self-generated Tasks

• Provide broader coverage for habitability studies.– Many more usability issues will emerge

• Reflect real information needs– Motivates test participants to find real

conclusions. (no artificial satisfaction)– Demonstrates query drift (information need

changes through searching)

Page 8: Evaluating the Habitability of Q&A With User-Generated Tasks

CRLCOMPUTINGRESEARCHLABORATORY

Problems With Self-generated Tasks

• System comparisons are difficult– But user and task variability in user studies

make system comparisons difficult anyway

• Users will generate questions outside of the system’s capabilities– But isn’t this the point of habitability

testing?

Page 9: Evaluating the Habitability of Q&A With User-Generated Tasks

CRLCOMPUTINGRESEARCHLABORATORY

Our New Study

• Users are asked to generate information needs– 8 things they think they would like to know

that may be on the web.• User is assigned to LCC web demo • User session is recorded.

– Screen video, think aloud audio, automatic captured query and web result.

Page 10: Evaluating the Habitability of Q&A With User-Generated Tasks

CRLCOMPUTINGRESEARCHLABORATORY

Participants (LCC)

• 7 Graduate Psychology Students at NMSU.• 6 Female and 1 Male, ages 18 to 37.• Experience with computers / search engines:

– mean exp using computers 6.6 (1-7 scale)– Mean self rated computer expertise: 5.1 (1-7 scale)– mean exp using W.W.W. 6.6 (1-7 scale)– mean frequency of computer use: 6.6 (1-7 scale) – mean years of online search exp. : 7.8 yrs– mean rated success at searching : 6.0 (1-7 scale)– mean years of schooling: 17.5 yrs (3:BA, 4:MA)

Page 11: Evaluating the Habitability of Q&A With User-Generated Tasks

Questionnaire

How satisfied are you with the results of your search? In other words, to what extent did this search answer your question?"

Dissatisfied Satisfied1 2 3 4 5 6 7

How useful did you find the retrieval system you used to accomplish your search? In other words, to what extent do you feel the retrieval system helped you accomplish your goal? Keep in mind that you could be Dissatisfied with your results because you feel the Internet simply doesn't contain an answer to your question, and still find the retrieval system UsefulNot Useful Useful

1 2 3 4 5 6 7

Did you change your question as the search progressed? In other words, did it become narrower (more focused), stay the same, or broader over time?Narrower Broader

1 2 3 4 5 6 7

Page 12: Evaluating the Habitability of Q&A With User-Generated Tasks

CRLCOMPUTINGRESEARCHLABORATORY

User generated questions

What is the proposed smoking ordinance in Las Cruces?What is the Senate reaction to the phone-in campaign?How can I find out more about the AIDS vaccine?How do people around the world feel about the impending war?Can I find a map of the Middle East?How does an individual qualify for the NCCA National Championships?Which Senators are opposed to the War?Who died in the Korean Subway Fire? Spec. K.W.R [initials]

What is tuition at all state universities?Have any new planets been discovered?How can I stop my cat from clawing furniture?When will Apple release the new Powermac?From where can I order out of print records?Did the Bruins win today?Is there an instrument store within 10 miles?Where is the nearest comic book store?

All results published on Web

Page 13: Evaluating the Habitability of Q&A With User-Generated Tasks

CRLCOMPUTINGRESEARCHLABORATORY

LCC Results With NL Query

Page 14: Evaluating the Habitability of Q&A With User-Generated Tasks

CRLCOMPUTINGRESEARCHLABORATORY

LCC Results Without NL Query

Page 15: Evaluating the Habitability of Q&A With User-Generated Tasks

CRLCOMPUTINGRESEARCHLABORATORY

Focus question for LCC interface

• Does LCC provide cues to input NL query?– Two subjects used NL throughout– Two subjects used keyword mostly– Two subjects started out with NL but

switched to keyword– One used NL with an occasional switch to

keyword

Page 16: Evaluating the Habitability of Q&A With User-Generated Tasks

CRLCOMPUTINGRESEARCHLABORATORY

Participant 2 – 1st Question

• Need: – What is tuition at all state universities?

• Queries:– What is tuition at all state universities?– List of tuition costs at U.S. Universities– What are the tuition costs at state universities in

the united states?– Does consumer reports list college tuition?– College ranking by tuition– Is a college ranking list published?

Page 17: Evaluating the Habitability of Q&A With User-Generated Tasks

CRLCOMPUTINGRESEARCHLABORATORY

Observations.

• People seem to struggle to think of a NL question they can easily express with keywords.– One users said it was like playing Jeopardy

• Time critical questions predominated – but the LCC demo had no time processing

capability.

Page 18: Evaluating the Habitability of Q&A With User-Generated Tasks

CRLCOMPUTINGRESEARCHLABORATORY

Controlling query function with NL

• Controlling the date of the information being reported.– Example 1. Saddam alive

• also example of benefits of user-generated tasks

– Example 2. Did the Bruins win

Page 19: Evaluating the Habitability of Q&A With User-Generated Tasks

CRLCOMPUTINGRESEARCHLABORATORY

Highest satisfaction ratings

Is aspartame bad for your health?Can you rent a house boat in NM?Weather in St. Petersburg RussiaWhat is the value of my year old car?How often do I need to water roses?What are some of the theatrical performances coming to NMSU this semester?Where is Westminster Colorado?How do I get a passport?

Page 20: Evaluating the Habitability of Q&A With User-Generated Tasks

CRLCOMPUTINGRESEARCHLABORATORY

Lowest satisfaction ratings.

Is Sadaam alive?Easiest ways to lose weight.What is the name of the new Acura?What new games are out on X-box?What are the additional requirements that I need to fulfill for a PhD at NMSU?What is some of the new information on the new "female viagra" (ie elevil)?What is the treasury forcast for interest rates?When does the NCAA tournament start?

Page 21: Evaluating the Habitability of Q&A With User-Generated Tasks

CRLCOMPUTINGRESEARCHLABORATORY

Other Successful questions

• Have any new planets been discovered?– "no we still have the same 9"

• How can I stop my cat from clawing furniture?– "provide your cat with a scratching post."

Page 22: Evaluating the Habitability of Q&A With User-Generated Tasks

CRLCOMPUTINGRESEARCHLABORATORY

Spell checking in LCC demo

• Works well in some cases– sadam

• But confusing in others– el paso, tx

Page 23: Evaluating the Habitability of Q&A With User-Generated Tasks

CRLCOMPUTINGRESEARCHLABORATORY

The NL Interface Quandary

• What do users of NL systems need to know?

• If you need to train NL what is the point?

Page 24: Evaluating the Habitability of Q&A With User-Generated Tasks

CRLCOMPUTINGRESEARCHLABORATORY

One user’s comments.

“You have to think too much about what you are looking for""I'm not used to it, with other search engines..[you don't have to think so much]"“I can't figure out the system yet, with other engines..."“After a few trials you can figure out what they want from you"“I can't pinpoint the best search technique"“Like when you use MSN, it has much more demands that yahoo“Yahoo you can type anything and you get results""I can feel what the system wants from me, here i don't get a feeling"

Page 25: Evaluating the Habitability of Q&A With User-Generated Tasks

CRLCOMPUTINGRESEARCHLABORATORY

"Which features of the retrieval system made it more useful?"

• Question based queries• Short descriptions [of results]• Nothing• The natural language method was nice but it didn't

work well.• You could type phrases• None• Suggestions to misspellings• The system seemed capable of handling fairly narrow

searches via the use of questions instead of just phrases.

Page 26: Evaluating the Habitability of Q&A With User-Generated Tasks

CRLCOMPUTINGRESEARCHLABORATORY

"Were there any features of the retrieval system that could be improved?"

• Organize the results chronologically or alphabetically• Avoid chat groups and personal emails [subjects did not like results

that were not credible]• Bold the keywords• Include the website for each result [done]• Have a general menu with topics like "education"• Add [instructions ?] that you can use keywords not just questions.• No system feedback when you click on a link.• Didn't like that a new window opens• Web address not given to you in results [done]• Broader searches• No repeat websites• Have it look for keywords, not the question.

Page 27: Evaluating the Habitability of Q&A With User-Generated Tasks

CRLCOMPUTINGRESEARCHLABORATORY

Too Early?

• Is the Q&A technology too primitive to worry about the user interface?

Page 28: Evaluating the Habitability of Q&A With User-Generated Tasks

CRLCOMPUTINGRESEARCHLABORATORY

Conclusions

• Real information needs are often different from the needs expressed in the original question

• NL query could be more useful if users knew when or how it could be used.

• The user-generated tasks approach directly addresses the goal of improving the habitability of Q&A systems.

Page 29: Evaluating the Habitability of Q&A With User-Generated Tasks

CRLCOMPUTINGRESEARCHLABORATORY

Project Goals for Collaboration

– Identify the characteristics of habitable Aquaint systems

– Use prototype Aquaint systems for iterative formative evaluation

Page 30: Evaluating the Habitability of Q&A With User-Generated Tasks

CRLCOMPUTINGRESEARCHLABORATORY

Questions?

Please form your question in Natural Language

Page 31: Evaluating the Habitability of Q&A With User-Generated Tasks

CRLCOMPUTINGRESEARCHLABORATORY

User generated questions

What is some of the new information on the new "female viagra" (ie elevil)?What are the additional requirements that I need to fulfill for a PhD at NMSU?How safe is the "sponge" and why was it taken off of the market in the first place?What are some recent findings regarding gender stereotype enforcement in advertising?What are some of the theatrical performances coming to NMSU this semester?What are the upcoming tour dates for Ani Difranco?Are there any lab retrievers for sale near Indiana for my dad?How many hours are required for an M.A. in rehab.counseling at SFSU?

How far in the universe have we studied?Does Venus have water?When did humans arrive in the U.S.?What re reasons my feet hurt?Where can I take flying lessons?What would someone in an African tribe do on a daily basis?How do Americans compare to African tribes about love?How do USA feel about war with Iraq?

Page 32: Evaluating the Habitability of Q&A With User-Generated Tasks

CRLCOMPUTINGRESEARCHLABORATORY

User generated questions

What classes and labs does UNM have to offer for a PhD in BioPsyc?Are there relevant articles on parental bonding and advise on data collection?How do you paint tile and waterproof it?How many grams of protein does a woman of my height and activity level need?What new exercise can I do for m y lower back?Is there a recipe for Rosemary Creme' Brulee?Is there a link between anthrax and cancer?What special prices are offered on gifts for Mother's Day?

Where can I find info about visiting Alaska?What schools have PhD programs in HF? [human factors]When does the NCAA tournament start?What is playing at the Met right now?Where can I find info about bands touring schedules?Where can I find info about uses of VR? [virtual reality]Can you rent a house boat in NM?When does the new Matrix movie come out?

Page 33: Evaluating the Habitability of Q&A With User-Generated Tasks

LCC user 6

Where can I find info about visiting Alaska? 1.1 where can I find out information about visiting alaska? 1.2 where does the alaska ferry leave from? 1.3 where does the Alaska Marine Highway System depart from? 1.4 where in alaska is denali national park?

What schools have Phd programs in human factors? 2.1 what schools have Phd programs in human factors? 2.2 what graduate schools have engineering psychology programs?

When does the NCAA tournament start? 3.1 when does the NCAA tournament start? 3.2 when does the men's NCAA basketball tournament start? 3.3 when does the men's NCAA basketball tournament start in 2003?

What is on display at the met right now? 4.1 what is on display at the metropitan museum of art? 4.2 what is on display in March 2003 at the metropitan museum of art in new york? 4.3 what is on display in March 2003 at the metropolitan museum of art in new york?

Page 34: Evaluating the Habitability of Q&A With User-Generated Tasks

CRLCOMPUTINGRESEARCHLABORATORY

LCC user 6 (cont)

Where can I find information about band's touring schedules? 5.1 where can I find information about bands touring schedules? 5.2 what bands are playing in new york city in march 2003? 5.3 who is playing at the bowery blallroom in march 2003? 5.4 who is playing at the bowery ballroom in march 2003? Where can I find information about the uses of virtual reality? 6.1 where can I find information about the uses of virtual reality?

Can you rent a houseboat in new mexico? 7.1 can you rent a house boat in New Mexico? When does the new matrix movie come out? 8.1 when does the new Matrix movie come out? 8.2 what does the matrix reloaded come out? 8.3 when is the matrix reloaded opening?

Page 35: Evaluating the Habitability of Q&A With User-Generated Tasks

CRLCOMPUTINGRESEARCHLABORATORY

Same Person. 8th Question.

• Need:– Where is the nearest comic book store

• Queries:– Is there a comic book store in Las Cruces

nm?– List comic book stores in New Mexico– Comic book store Las Cruces NM

Page 36: Evaluating the Habitability of Q&A With User-Generated Tasks

CRLCOMPUTINGRESEARCHLABORATORY

Participant 3 – 3rd Question

• Need– How safe is the "sponge" and why was it

taken off of the market in the first place?

• Queries– contraceptives sponge safety– Why was Today Sponge taken off the

market?

Page 37: Evaluating the Habitability of Q&A With User-Generated Tasks

CRLCOMPUTINGRESEARCHLABORATORY

Participant 3 – 7th Question

• Need:– Are there any lab retrievers for sale near Indiana

for my dad?• Queries:

– laboratory retriever breeders midwest– breeders laboratory retrievers– laboratory retrievers– purchasing laboratory retrievers

LCC does well with ‘lab retrievers’