1 cs 430: information discovery lecture 15 usability 2

1

CS 430: Information Discovery

Lecture 15

Usability 2

2

Course Administration

• Preliminary version of Assignment 3 is on the web site. Detailed submission instructions will be added later.

3

Shared Work!!!

Some programs for Assignment 2 had sections of identical code!

This is not acceptable.

1. If you incorporate code from other sources, it must be acknowledged.

2. If you work with a colleague:

(a) You must write your own assignment.(b) You should acknowledge the joint preparation.

IF YOU HAVE NOT FOLLOWED THESE PRINCIPLES, CONTACT ME DIRECTLY.

4

Levels of Usability

interface design

functional design

data and metadata

computer systems and networks

conceptual model

5

Conceptual Model

The conceptual model is the user's internal model of what the system provides:

• The desk top metaphor -- files and folders

• The web model -- click on hyperlinks

• Library models

search and retrievesearch, browse and retrieve

6

Interface Design

The interface design is the appearance on the screen and the actual manipulation by the user

• Fonts, colors, logos, key board controls, menus, buttons

• Mouse control or keyboard control?

• Conventions (e.g., "back", "help")

Example: Screen space utilization in American Memory page turner.

7

Functional Design

The functional design, determines the functions that are offered to the user

• Selection of parts of a digital object

• Searching a list or sorting the results

• Help information

• Manipulation of objects on a screen

• Pan or zoom

8

Same functions, different interface

Example: the desk top metaphor

• Mouse -- 1 button (Macintosh), 2 button (Windows) or 3 button (Unix)

• Close button -- left of window (Macintosh) right of window (Windows)

9

Data and metadata

Structural data and metadata stored by the computer system enable the functions and the interface

• The desktop metaphor has the concept of associating a file with an application. This requires a file type to be stored with each file:

-- extension to filename (Windows and Unix)-- resource fork (Macintosh)

10

Computer systems and networks

The performance, reliability and predictability of computer systems and networks is crucial to usability

• Response timeinstantaneous for mouse tracking and echo of key stroke5 seconds for simple transactions

• Example: Pipelined algorithm for the Mercury page turner

11

Croft's Top Ten Criteria

1. Integrated Solutions

"A text retrieval system is a tool that can be used to solve part of an organization's information management problems. It is not often, however, the complete solution.

"Typically, a complete solution requires other text-based tools such as routing and extraction, tools for handling multimedia and scanned documents such as OCR, a database management system for structured data, and workflow or other groupware systems for managing documents and their use in the organization."

Croft 1995

12


2. Distributed Information Retrieval

There is a huge "demand for text retrieval systems that can work in distributed, wide-area network environments."

"The more general problems are locating the best databases to search in a distributed environment that may contain hundreds or even thousands of databases, and merging the results that come back from the distributed search."

13


3. Efficient, Flexible Indexing and Retrieval

"One of the most frequently mentioned, and most highly rated, issues is efficiency. Many different aspects of a system can have an impact on efficiency, and metrics such as query response time and indexing speed are major concerns of virtually every company involved with text-based systems."

"The other aspect of indexing that is considered very important is the capability of handling a wide variety of document formats. This includes both standards such as SGML, HTML, Acrobat, and WordPerfect [and] the myriad formats used in text-based applications..."

14


4. 'Magic'

"One of the major causes of failures in IR systems is vocabulary mismatch. This means that the information need is often described using different words than are found in relevant documents. Techniques that address this problem by automatic expansion of the query are often regarded as a form of 'magic' by users and are viewed as highly desirable."

15


5. Interfaces and Browsing

"Effective interfaces for text-based information systems are a high priority for users of these systems. The interface is a major part of how a system is evaluated, ... Interfaces must support a range of functions including query formulation, presentation of retrieved information, feedback, and browsing."

16


6. Routing and Filtering

"Information routing, filtering and clipping are all synonyms used to describe the process of identifying relevant documents in streams of information such as news feeds ... large number of archived profiles are compared to individual documents. Documents that match are sent to the users associated with the profile."

17


7. Effective Retrieval

"Contrary to some researchers' opinions, companies that sell and use IR systems are interested in effectiveness. It is not, however, the primary focus of their concerns."

"... companies are particularly interested in techniques that produce significant improvements (rather than a few percent average precision) and that avoid occasional major mistakes."

18


8. Multimedia Retrieval

"The perceived value of multimedia information systems is very high and, consequently, industry has a considerable interest in the development of these techniques."

19


9. Information Extraction

"Information extraction techniques are designed to identify database entities, attributes and relationships in full text."

Also known as data mining.

20


10. Relevance Feedback

"Companies and government agencies that use IR systems also view relevance feedback as a desirable feature, but there are some practical difficulties that have delayed the general adoption of this technique."

21

See paper by Croft, Cook and Wilder in the CS 430 readings

22

THOMAS

The documents:

• Full text of all legislation introduced in Congresses, since 1989.

• Text of the Congressional Record.

Indexes

• Bills are indexed by title, bill number, and the text of the bill.

• The Congressional Record is indexed by title, document identifier, date, speaker, and page number.

Search system

InQuery -- developed by the University of Massachusetts, Amherst.

23

Weighting

Single-word Query

The more instances of that word in the document, the more relevant the document will be considered.

Occurrence of the term in the title are considered most relevant (weight x 20).

24

Weighting

Multiple-word Queries

1. Documents containing instances of the search terms as a phrase --i.e., adjacent to each other

2. Search terms occur near, but not next to, each other, and not necessarily in the same order as entered.

3. All search terms appear singly, not in proximity to each other.

4. Documents contain less than all of the words.

25

Language Problems

InQuery considers of NO relevance documents containing NO instances of any form of the search words

• A search for "capital punishment" does not find legislation about "death penalty".

If there are no highly relevant documents, InQuery returns poorly relevant documents

• A search for "elderly black Americans" returned a bill on "black bears" as most relevant, followed by bills relating to "black colleges and universities". (There were no bills in any way related to "elderly black Americans".)

26

Queries

Words Unique Queries

1 5,767 2 9,646 3 6,905 4 2,240 5 656 6 87 7 19 8 1

Total 25,321

Table showing number of words in queries

27

The Human in the Loop

Search index

Return hits

Browse repository

Return objects

28

D-Lib Working Group on Metrics

DARPA-funded attempt to develop a TREC-like approach to digital libraries (1997).

"This Working Group is aimed at developing a consensus on an appropriate set of metrics to evaluate and compare the effectiveness of digital libraries and component technologies in a distributed environment. Initial emphasis will be on (a) information discovery with a human in the loop, and (b) retrieval in a heterogeneous world. "

Very little progress made.

See: http://www.dlib.org/metrics/public/index.html

29

MIRA

Evaluation Frameworks for Interactive Multimedia Information Retrieval Applications

European study 1996-99

Chair Keith Van Rijsbergen, Glasgow University

ExpertiseMulti Media Information Retrieval Information RetrievalHuman Computer Interaction

Case Based ReasoningNatural Language Processing

30

MIRA Starting Point

• Information Retrieval techniques are beginning to be used in complex goal and task oriented systems whose main objectives are not just the retrieval of information.

• New original research in IR is being blocked or hampered by the lack of a broader framework for evaluation.

31

MIRA Aims

• Bring the user back into the evaluation process.

• Understand the changing nature of IR tasks and their evaluation.

• 'Evaluate' traditional evaluation methodologies.

• Consider how evaluation can be prescriptive of IR design

• Move towards balanced approach (system versus user)

• Understand how interaction affects evaluation.

• Support the move from static to dynamic evaluation.

• Understand how new media affects evaluation.

• Make evaluation methods more practical for smaller groups.

• Spawn new projects to develop new evaluation frameworks

32

MIRA Approaches

• Developing methods and tools for evaluating interactive IR. Possibly the most important activity of all.

• User tasks: Studying real users, and their overall goals. Improve user interfaces is to widen the set of users

• Develop a design for a multimedia test collection.

• Get together collaborative projects. (TREC was organized as competition.)

• Pool tools and data.