subject analysis and representation presented by garry l. bastida

29
SUBJECT ANALYSIS AND REPRESENTATION Presented by GARRY L. BASTIDA

Upload: leslie-douglas

Post on 13-Jan-2016

240 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: SUBJECT ANALYSIS AND REPRESENTATION Presented by GARRY L. BASTIDA

SUBJECT ANALYSIS AND REPRESENTATION

Presented byGARRY L. BASTIDA

Page 2: SUBJECT ANALYSIS AND REPRESENTATION Presented by GARRY L. BASTIDA

INTRODUCTION/REVIEW

One of the major functions of an information retrieval system is to match the contents of documents with users queries.

The system personnel have to prepare a surrogate for every document, and all such surrogates must be maintained in an organized manner. (indexing).

Page 3: SUBJECT ANALYSIS AND REPRESENTATION Presented by GARRY L. BASTIDA

INTRODUCTION/REVIEWTASK: analyze the content of the given document

and represent this analysis by some content identifiers or keywords.

Lancaster: indexing involves two quite distinct contents. Conceptual analysis and representation.

In subject classification, the basic objective of which is to arrange documents according to their subject contents, the result of the conceptual analysis is represented by some artificial analysis is represented by some artificial language or notational symbol

Page 4: SUBJECT ANALYSIS AND REPRESENTATION Presented by GARRY L. BASTIDA

Subject Subject AnalysisAnalysis

What’s it all about, What’s it all about, Garry?Garry?

Page 5: SUBJECT ANALYSIS AND REPRESENTATION Presented by GARRY L. BASTIDA

55

What is it?Subject analysisSubject analysis

Examination of a bibliographic item by a trained subject specialist to determine the most specific subject heading(s) or descriptor(s) that fully describe its content, to serve in the bibliographic record as access points in a subject search of a library catalog, index, abstracting service, or bibliographic database. When no applicable subject heading can be found in the existing headings list or thesaurus of indexing terms, a new one must be created.

Page 6: SUBJECT ANALYSIS AND REPRESENTATION Presented by GARRY L. BASTIDA

What is it?

It means the presence, identification and expression of subject matter in document texts, databases, controlled and natural languages, information requests and search strategies.

Page 7: SUBJECT ANALYSIS AND REPRESENTATION Presented by GARRY L. BASTIDA

77

Say what?

Page 8: SUBJECT ANALYSIS AND REPRESENTATION Presented by GARRY L. BASTIDA

88

Why do all that?

If we don’t we can’t find stuff!If we don’t we can’t find stuff! “Subject analysis is [essentially] all methods

and processes which can be described as representation for retrieval of information by its subjects, be they names, geographic locations, or topical subjects.” Quoted from Williamson, N. J. (1997). The Importance of

Subject Analysis in Library and Information Science Education. Technical Services Quarterly 15(1/2):67-87 by Pamela Hill in LS 500 Organization of Information

Tuesday, February 24, 2004

Page 9: SUBJECT ANALYSIS AND REPRESENTATION Presented by GARRY L. BASTIDA

99

Why use a standardized list?Why Subject Headings? Why Subject Headings?

Subject headings often indicate the contents of books in terms that their titles do not use, which often may be nondescriptive or very general. Subject headings in online databases are often referred to as descriptors, but they serve the same purpose in locating valuable resources.

Along with their subdivisions, subject headings provide a clear and systematic way of scanning the catalog for what is needed. Assigned headings are usually the dominant, and most important, subjects of a given item.

Subject headings bring like materials together, requiring less use of the wide variation of synonomous terms that may appear to describe a single concept (teen, youth, adolescent, young adult, etc.).

• Using Subject Headings in PantherCat

Page 10: SUBJECT ANALYSIS AND REPRESENTATION Presented by GARRY L. BASTIDA

BS 65296 factors in choosing subject of document.Does the document deal with a specific

product condition or phenomenon?

Does the subject contain an action concept, an operation or a process?

Is the object or patient affected by the action identified?

Does the document deal with the agent of this action?

Page 11: SUBJECT ANALYSIS AND REPRESENTATION Presented by GARRY L. BASTIDA

BS 65296 factors in choosing subject of documentDoes it refer to a particular means for

accomplishing the actionWere these factors considered in the

content of a particular location or environment?

Are any independent or dependent variables identified?

Was the subject considered from a special viewpoint not normally associated with that field of study.

Page 12: SUBJECT ANALYSIS AND REPRESENTATION Presented by GARRY L. BASTIDA

SUBJECT INDEXING is the act of describing a document by index terms

to indicate what the document is about or to summarize its content. Indexes are constructed, separately, on three distinct levels: terms in a document such as a book; objects in a collection such as a library; and documents (such as books and articles) within a field of knowledge.

Subject indexing systems have been classified broadly as pre-coordinate and post-coordinate systems. The major objective of any indexing system is to represent the contents of documents through keywords or descriptors

Page 13: SUBJECT ANALYSIS AND REPRESENTATION Presented by GARRY L. BASTIDA

Exhaustively and SpecificityAn exhaustive index is one which lists all possible index

terms. Greater exhaustivity gives a higher recall, or more likelihood of all the relevant articles being retrieved, however, this occurs at the expense of precision. This means that the user may retrieve a larger number of irrelevant documents or documents which only deal with the subject in little depth. In a manual system a greater level of exhaustivity brings with it a greater cost as more man hours are required.

The specificity describes how closely the index terms match the topics they represent . An index is said to be specific if the indexer uses parallel descriptors to the concept of the document and reflects the concepts precisely

Page 14: SUBJECT ANALYSIS AND REPRESENTATION Presented by GARRY L. BASTIDA

Recall vs Precision

Number of relevant documents retrievedPrecision = ----------------------------------------------------------

Total number of documents retrieved

Number of relevant documents retrievedRecall = ----------------------------------------------------------

Number of relevant documents in the collection

Page 15: SUBJECT ANALYSIS AND REPRESENTATION Presented by GARRY L. BASTIDA

Manual indexing

Analysis of subjectIdentification of keywordsStandardization of keywordsChoice of an indexing system

If the chosen system is a post – coordinate one then Preparation of entries under each term with reference

to the document identification number. Preparation of reference entries.

Page 16: SUBJECT ANALYSIS AND REPRESENTATION Presented by GARRY L. BASTIDA

Manual indexing If the chosen system is a pre-coordinate

one then: Preparation of an entry (main entry) using all the

keywords organized in a way prescribed by the system. Preparation of index entries by using each significant

term as an entry element and the full entry (main entry) as the context, or by rotation/permutation of the significant terms in the main entry according to the rules prescribed by the system chosen.

Preparation of reference entries.

filing entries

Page 17: SUBJECT ANALYSIS AND REPRESENTATION Presented by GARRY L. BASTIDA

STEPS IN MANUAL INDEXING SYSTEM

Page 18: SUBJECT ANALYSIS AND REPRESENTATION Presented by GARRY L. BASTIDA

Pre – coordinate indexing system

Chain indexing

Dr. S.R. Ranganathan developed a method a pre-coordinate indexing. It attempts to represent, in natural language, the chain of concept’s that constitutes a subject

Page 19: SUBJECT ANALYSIS AND REPRESENTATION Presented by GARRY L. BASTIDA

Pre – coordinate indexing system

Basic steps in chain indexing may be represented as follows: Take the class number prepared for the given

document. Consult the corresponding classification schedule

and write the notation at each step and the correspondence term or phrase (from the schedule). This will produce a chain of concepts from the general to the specific.

Page 20: SUBJECT ANALYSIS AND REPRESENTATION Presented by GARRY L. BASTIDA

Basic steps in chain indexing may be represented as follows:

Identify the sought, unsought , and false links. Sought links denote the concepts that the user is likely to use as access points; unsought links are those that are not likely to be used as access points, and false links are those that really do not represent any valid concepts.

Invert the chain, and this will generate the index entries.

Page 21: SUBJECT ANALYSIS AND REPRESENTATION Presented by GARRY L. BASTIDA

Pre – coordinate indexing system

Relational indexing J.E.L. Farradane devised a scheme. The system

was developed first in the early 1950s and has been modified several times since then. The latest changes may be noted from Farradane’s own papers that appeared in 1980. According to Farradane, any subject can be represented by identifying and representing in the form of what he called analets (pairs of terms interposed by an operator), the relationship between each pair of the contituent concepts, and he suggested that any possible relationship can be represented by either of these nine relational operators.

Page 22: SUBJECT ANALYSIS AND REPRESENTATION Presented by GARRY L. BASTIDA

Pre – coordinate indexing system

PRECIS – PREserved contect Index System. Developed by Derek Austin and first came out in 1974.

Major tasks: Analysing the document concerned and identifying key concepts. Organizing the concepts into a subject statement based on the principle of

context dependency. Assigning codes (operators) which signify the syntactical function of each

term Deciding which terms should be the access points and which terms would

be in other positions in the index entries, and assigning further codes to achieve these results.

Adding further prepositions, auxiliaries or phrases which would result in clarity and expressiveness of the resulting index entries.

Making supporting reference entries from semantically elated terms taken from a thesaurus.

Page 23: SUBJECT ANALYSIS AND REPRESENTATION Presented by GARRY L. BASTIDA

Pre – coordinate indexing system

POPSI, Postulated – based Permuted Subject Indexing Developed by Bhattacharyya. It uses the anytico-

synthetic method for string formulation and permutation of the constituent terms in order to satisfy different approach points to the document.

There are two parts- the lead heading, which contains the index term or the access term, the context heading, which generally appears in the line following the lead heading and contains the subject words, with auxiliary words, denoting the context in which the lead term has been discussed in the given document.

Page 24: SUBJECT ANALYSIS AND REPRESENTATION Presented by GARRY L. BASTIDA

Rules that govern POSIA manifestation of property follows immediately the

manifestation in relation to which it is a property.A manifestation of action follows immediately the

manifestation in relation to which it is an actionProperty and action can have another property

and/or action directly related.A species or part follows immediately the

manifestation in relation to which it process part, and part is used to denote the whole part relationship

A modifier follows immediately the manifestation in relation to which it is a modifier.

Page 25: SUBJECT ANALYSIS AND REPRESENTATION Presented by GARRY L. BASTIDA

Post – coordinate indexing system

Uniterm Developed by Mortimer Taube in 1953. A card is

prepared for each term that is considered to be an appropriate index term for a given document. It relies on the ability of the searcher to notice matching numbers on the cards that are retrieved.

Optical coincidence/peek-a-boo cards Developed to overcome the problem of manual searching.

This is based on each card is divided into small units of numbered squares, each unit bearing a specific number, and a document number is punched on the appropriate position on the card.

Page 26: SUBJECT ANALYSIS AND REPRESENTATION Presented by GARRY L. BASTIDA

PROBLEMS OF MANUAL INDEXING

Salton and Salton and McGill two major shortcomings: It is not quite clear that all the complexities and

refinements, exemplified by the categorization of terms and assignment of relations between terms, are really beneficial.

It that even if the indexing process is carried out accurately, and at the right level of detail, it is not possible to maintain consistency since more than one indexer will be needed in practice.

Page 27: SUBJECT ANALYSIS AND REPRESENTATION Presented by GARRY L. BASTIDA

Theory of indexing1st level: is concordance, which consist of references to all

words in the original text arranged in alphabetical order.2nd level: information theoretical level, which calculates the

likelihood of a word being chosen for indexing based on its frequency of occurrence in a given text document.

3rd level: linguistic one, which attempts to explain how meaningful words are extracted from large units of text.

4th level: textual or skeletal framework, the text is prepared by the author in an organized manner and held together by a skeletal structure.

5th level: inferential level. An indexer should be able to make inferences about the relationships between words and phrases by observing the sentence and paragraph structure, and by strippping the sentence of extraneous details.

Page 28: SUBJECT ANALYSIS AND REPRESENTATION Presented by GARRY L. BASTIDA

Fugmann proposes theory based on axioms

Axiom of definability, proposes that compiling information relevant to a topic can only be accomplished to the degree to which a topic can be defined.

Axiom of order, suggests that any compilation of information relevant to a topic is an order creation process.

Axiom of the sufficient degree of order, that demands made on the degree of order increase as the size of a collection and frequency of searches increase.

Axiom of predictability, the success of any directed search for relevant information hinges on how readily predictable or reconstructible are the modes of expression for concepts and statements in the search file.

Axiom of fidelity, equates the success of any directed search for relevant information with the fidelity with which concepts and statements are expressed in the search file.

Page 29: SUBJECT ANALYSIS AND REPRESENTATION Presented by GARRY L. BASTIDA

2929