imt530- organization of information resources1 recap descriptive metadata elements can be used for...

31
IMT530- Organization of Information Resources 1 Recap Descriptive metadata elements can be used for access or selection For access, it is important to have good authority control to enable the users to: Find known items from the information they have available Gather all the items of a similar nature together Choose the right one from among retrieved items Authority control takes time and effort, but pays off in better results for users Need to balance cost against benefits and make a decision on your approach for each project Don’t do it halfway, because it’s not worth it

Upload: blaise-blake

Post on 17-Jan-2016

218 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: IMT530- Organization of Information Resources1 Recap Descriptive metadata elements can be used for access or selection For access, it is important to have

IMT530- Organization of Information Resources 1

Recap

• Descriptive metadata elements can be used for access or selection

• For access, it is important to have good authority control to enable the users to:– Find known items from the information they have

available– Gather all the items of a similar nature together– Choose the right one from among retrieved items

• Authority control takes time and effort, but pays off in better results for users– Need to balance cost against benefits and make a

decision on your approach for each project– Don’t do it halfway, because it’s not worth it

Page 2: IMT530- Organization of Information Resources1 Recap Descriptive metadata elements can be used for access or selection For access, it is important to have

Module 5b: Subject Analysis and Indexing

IMT530: Organization of Information Resources

Winter 2008

Michael Crandall

Page 3: IMT530- Organization of Information Resources1 Recap Descriptive metadata elements can be used for access or selection For access, it is important to have

IMT530- Organization of Information Resources 3

Module 5b Outline

• Subject analysis– Definition– Why do this?– Mai’s domain-centered analysis– Consistency

• Subject indexing– Definition and purpose of subject indexing– Types of subject indexing– Indexing non-text objects– Types of terms used in subject indexing– The subject indexing process

Page 4: IMT530- Organization of Information Resources1 Recap Descriptive metadata elements can be used for access or selection For access, it is important to have

IMT530- Organization of Information Resources 4

Some Questions

• Library catalogs often lump fiction into one subject heading– why?

• Would you describe the subject of “The Organization of Information” to your mother the same way you would to a classmate?

• Would you use the same subjects to describe Chapter 9 in Taylor that you would to describe the whole book?

• If you wanted to assign a subject to your kitchen or garage, what would it be?

• What if you had to describe snow to a Papua New Guinea native? What words would you use? Would they be the same for an Inuit?

• How do you describe the subject of a picture or film?

Page 5: IMT530- Organization of Information Resources1 Recap Descriptive metadata elements can be used for access or selection For access, it is important to have

IMT530- Organization of Information Resources 5

Subject Analysis - Definition

• The process of determining the subject and other content-related attributes of an object

• The purpose of subject analysis is to come to an understanding of or judgment regarding: – what an object is about, in the context of how it

might be used;– what an object exemplifies;– what discipline (or other aspect, including

community) an object reflects (for classification)

Page 6: IMT530- Organization of Information Resources1 Recap Descriptive metadata elements can be used for access or selection For access, it is important to have

IMT530- Organization of Information Resources 6

Why Subject Analysis?

• One of the primary means of access to information is through “subjects”

• In order for a computer to access those subjects, there has to be some way to get to them– an index of some kind– Remember Soergel’s model, and the necessity for

a means to match user requests to information objects

• Automatic indexing works for some situations, but not all– As we’ll see, subject concepts are not necessarily

contained in words (especially not in images!!)– A specific audience may dictate specific analysis

Page 7: IMT530- Organization of Information Resources1 Recap Descriptive metadata elements can be used for access or selection For access, it is important to have

IMT530- Organization of Information Resources 7

Wilson on Subjects

• One of the main purposes of Wilson’s chapter on subjects is to analyze the subject analysis process – to take it apart

• Starts with the words, then the sentences, then the work itself, and asks questions about how you can elicit descriptions of “aboutness”

• Wilson suggests four different ways to approach this:– Purposive- why did the author write– Figure-ground: what stands out among all the possible

subjects – Objective- count what is most frequently mentioned– Appeal to unity and completeness- what questions are

answered within the work• Ultimately, he concludes that any extraction will miss

some part of the work, and not satisfy some user

Page 8: IMT530- Organization of Information Resources1 Recap Descriptive metadata elements can be used for access or selection For access, it is important to have

IMT530- Organization of Information Resources 8

Subject Analysis in Context

• Subject analysis should always be done in context

• Context considerations include:– user (children, medical practitioners, etc.) – uses (developing egg substitutes, learning

how to cook)– the document itself (the “text” of a

document, intended audience, uses, etc.)– institution (public library, corporate intranet)– administrative and information systems

context

Page 9: IMT530- Organization of Information Resources1 Recap Descriptive metadata elements can be used for access or selection For access, it is important to have

IMT530- Organization of Information Resources 9

Mai’s Domain-Centered Approach

Page 10: IMT530- Organization of Information Resources1 Recap Descriptive metadata elements can be used for access or selection For access, it is important to have

IMT530- Organization of Information Resources 10

Relevance

• Taylor’s stages in development of an information need– The visceral need– The conscious need– The formalized need– The compromised need

• Relevance is usually measured against the last of these, while ignoring the more complex situational aspects that affect the other states– Mai concludes that evaluation should be less mechanistic

(focused on terminology matches) and more humanistic (focused on the visceral needs)

– Requires contextual analysis and qualitative research rather than just precision/recall measures

Page 11: IMT530- Organization of Information Resources1 Recap Descriptive metadata elements can be used for access or selection For access, it is important to have

IMT530- Organization of Information Resources 11

Consistency

• Taylor points out the difficulty of getting people to assign similar subjects to objects

• But when controlled vocabularies and rules for selecting subject terms from those vocabularies are used, consistency is much better– Assumes trained subject indexers– Not likely to be the case in most settings other than

libraries– Again points out need to determine what your

objectives in building a taxonomy are before you make the investment

• So how do you go about subject indexing?

Page 12: IMT530- Organization of Information Resources1 Recap Descriptive metadata elements can be used for access or selection For access, it is important to have

IMT530- Organization of Information Resources 12

Definition and Purpose of Subject Indexing

• Subject indexing is the process or technique of identifying and selecting terms (words, phrases, sentences, taxonomic categories, notation) used in a domain of information to indicate the subject content of a resource for users and to provide subject access

• Purposes of subject indexing may be seen in light of Cutter’s objects of the catalog:– To facilitate finding a particular object on the basis of its

subject content (finding function)– To display to a user all of the objects that exhibit

particular subject content (collocating function)– To aid a user in the selection of a particular object

(choice function).

Page 13: IMT530- Organization of Information Resources1 Recap Descriptive metadata elements can be used for access or selection For access, it is important to have

IMT530- Organization of Information Resources 13

Rowley Article

• Trade off between precision and recall• 4 eras in indexing

– Era1: Pre-computer access- Title indexing– Era 2: Online age- Cranfield and other retrieval

studies showed free indexing worked as well as controlled in abstract databases

– Era 3: Full-text vs. subject indexing- shown to complement each other (Taylor also points out the tradeoff between summarization for document retrieval vs. depth indexing for information retrieval)

– Era 4: Tests with real users instead of controlled experiments- difficulty in using search interfaces because of complex and varied systems

Page 14: IMT530- Organization of Information Resources1 Recap Descriptive metadata elements can be used for access or selection For access, it is important to have

IMT530- Organization of Information Resources 14

Types of Subject Indexing: Derived Indexing

• Derived Indexing: in derived indexing, terms used for indexing are limited to those that actually appear in the document or resource.

• Derived indexing may be done manually or automatically– Search engine indexes are examples of

automatic derived indexing

Page 15: IMT530- Organization of Information Resources1 Recap Descriptive metadata elements can be used for access or selection For access, it is important to have

IMT530- Organization of Information Resources 15

Assigned Indexing

• Assigned Indexing: in assigned indexing, terms used for indexing are not limited to those in the object, but may come from the object, the mind of the indexer, or from a controlled vocabulary

• There are two types of Assigned indexing: Free Indexing and Indexing from controlled vocabularies

Page 16: IMT530- Organization of Information Resources1 Recap Descriptive metadata elements can be used for access or selection For access, it is important to have

IMT530- Organization of Information Resources 16

Free Indexing

• In free indexing, the indexer or indexing program is free to assign terms from anywhere inside or outside the object– the indexer may take terms from the object, or use

any terms that occur to them – In some “free” indexing settings, very detailed

instructions guide indexers in their selection of terms

– Other settings are much looser, users can pick any terms that mean something to them or others

• Pictures (http://flickr.com)• Folksonomies (http://del.icio.us)

Page 17: IMT530- Organization of Information Resources1 Recap Descriptive metadata elements can be used for access or selection For access, it is important to have

IMT530- Organization of Information Resources 17

Controlled Vocabulary Indexing

• In indexing from controlled vocabularies, indexers are constrained by the terms that are available in lists of terms called “controlled vocabularies” - they must assign one or more terms from the controlled vocabulary.

• Controlled vocabulary indexing is much like choosing terms from a very large drop-down menu.

Page 18: IMT530- Organization of Information Resources1 Recap Descriptive metadata elements can be used for access or selection For access, it is important to have

IMT530- Organization of Information Resources 18

Automatic Indexing

• In automatic indexing, it is common for indexing software applications to use derived indexing techniques only, enhanced with word stemming and spelling algorithms to improve matching

• However, more advanced programs are being developed that mimic free indexing (e.g., text summarization programs)

• Some advanced automatic indexing programs (particularly those in medicine) are making use of controlled vocabularies in term selection and identification.

Page 19: IMT530- Organization of Information Resources1 Recap Descriptive metadata elements can be used for access or selection For access, it is important to have

IMT530- Organization of Information Resources 19

Mai’s Conceptions of Indexing

• Simplistic conception of indexing– automatic extraction (derived indexing)

• Document-oriented indexing– focus on document & document parts

• Content-oriented indexing– focus on content in document (still document

oriented)• User-oriented indexing

– focus on user & possible uses of the document• Requirement-oriented indexing

– relies on in-depth knowledge of users & uses of documents; complete knowledge of context

Page 20: IMT530- Organization of Information Resources1 Recap Descriptive metadata elements can be used for access or selection For access, it is important to have

IMT530- Organization of Information Resources 20

Types of Terms Used in Subject Indexing

• Words or short phrases – descriptors, identifiers, subject headings, or

keywords

• Sentences – derived indexing may use whole sentences, but rarely done – used in some web documents and for derived abstracts – abstracts, summaries, or annotations

• Taxonomic categories (such as the type used in the Yahoo directory)

• Notation (such as the type used in the Dewey Decimal Classification)

Page 21: IMT530- Organization of Information Resources1 Recap Descriptive metadata elements can be used for access or selection For access, it is important to have

IMT530- Organization of Information Resources 21

Sample ERIC Indexing Record

PERSONAL AUTHOR: Magnuson,-Sandy; Norem,-KenTITLE: Challenges for Higher Education Couples in Commuter Marriages: Insights for Couples and

Counselors Who Work with Them.PUBLICATION YEAR: 1999SOURCE (JOURNAL CITATION): Family-Journal:-Counseling-and-Therapy-for-Couples-and-Families;

v7 n2 p125-34 Apr 1999DOCUMENT TYPE: Journal-Articles (080); Reports-Research (143)LANGUAGE: English

MAJOR DESCRIPTORS: *Counseling-Techniques; *Dual-Career-Family; *Job-Satisfaction; *Marital-Satisfaction; *Marriage-

MINOR DESCRIPTORS: Trust-Psychology

MAJOR IDENTIFIERS: *Career-Commitment

MINOR IDENTIFIERS: Quality-Time

ABSTRACT: Focuses on the experiences of dual-career couples that maintain two homes to attain career satisfaction. Findings include support for the potential strength and satisfaction of commuting relationships. Trust, commitment, regular communication, and quality shared time were endorsed as factors contributing to successful distance marriages. (Author/GCP)

Page 22: IMT530- Organization of Information Resources1 Recap Descriptive metadata elements can be used for access or selection For access, it is important to have

IMT530- Organization of Information Resources 22

Indexing Non-text Objects

• Layne discusses the indexing of images and points out some useful distinctions– Defines four general types of attributes

• Biographical• Subject• Exemplified• Relationship

– While she discusses in the context of images, these can prove useful when indexing almost any object

Page 23: IMT530- Organization of Information Resources1 Recap Descriptive metadata elements can be used for access or selection For access, it is important to have

IMT530- Organization of Information Resources 23

Identification of Concepts

• Taylor lists several concepts that can be helpful in teasing out subject terms– Topics– Names

• Persons, corporations, geographic, other

– Time periods– Form (genre)

• http://isotropic.org/papers/chicken.pdf

• See the appendix in Taylor for an example and checklist

Page 24: IMT530- Organization of Information Resources1 Recap Descriptive metadata elements can be used for access or selection For access, it is important to have

IMT530- Organization of Information Resources 24

Indexing Policies

• Many indexers are guided by indexing policies that determine the types of terms that are finally used in indexing

• Three characteristics of indexing upon which indexing policies may be built: – Exhaustivity– Specific entry (sometimes called

“specificity”, but incorrectly)– Coextensivity

Page 25: IMT530- Organization of Information Resources1 Recap Descriptive metadata elements can be used for access or selection For access, it is important to have

IMT530- Organization of Information Resources 25

ISO 5963

• Despite Wilson’s assertion that subject analysis is impossible, a variety of standards exist prescribing how it should be done – the British Standard ISO 5963 in your readings this week is one of them

• Viewed from Wilson’s or Mai’s perspective (and your own), what are the problems with this standard?

Page 26: IMT530- Organization of Information Resources1 Recap Descriptive metadata elements can be used for access or selection For access, it is important to have

IMT530- Organization of Information Resources 26

Page 27: IMT530- Organization of Information Resources1 Recap Descriptive metadata elements can be used for access or selection For access, it is important to have

IMT530- Organization of Information Resources 27

Steps in Free and Assigned Indexing

1. Identify subject content

2. Identify disciplinary context or domain (for classifications or taxonomies)

3. Express or describe content (steps 1-3 describe the subject analysis process)

4. Select or create terms and add them to the document representation

5. If working with a controlled vocabulary (CV), update and maintain the CV based on the indexing experience

Page 28: IMT530- Organization of Information Resources1 Recap Descriptive metadata elements can be used for access or selection For access, it is important to have

IMT530- Organization of Information Resources 28

Questions?

• If not, take a break!!!

Page 29: IMT530- Organization of Information Resources1 Recap Descriptive metadata elements can be used for access or selection For access, it is important to have

IMT530- Organization of Information Resources 29

Exercise 5

• Purpose is to try different methods of extracting concepts from an article, so you can see the impact on users

• Spend the rest of class working through the questions in Exercise 5

• We’ll discuss before the end of class

Page 30: IMT530- Organization of Information Resources1 Recap Descriptive metadata elements can be used for access or selection For access, it is important to have

IMT530- Organization of Information Resources 30

Differences

• Hopefully, this exercise gave you a chance to see a couple things:– How difficult it can be to actually determine what

something is about– How different methods of assigning terms would

result in very different access for users

• We didn’t throw in Mai’s perspective on domain indexing in this exercise, which makes it even more difficult– This is obviously not a simple thing to do well– But you now are aware of the issues, and can keep

them in mind when working in this area

Page 31: IMT530- Organization of Information Resources1 Recap Descriptive metadata elements can be used for access or selection For access, it is important to have

IMT530- Organization of Information Resources 31

Next Week

• We’ll start looking in more detail at controlled vocabularies and discuss how they might interact with emergent social tagging systems

• Remember to read assignments BEFORE class

• Important– your mid-term assignments are due at the start of class next week!!