lis512 lecture 4

42
lis512 lecture 4 the MARC format structure, leader, directory

Upload: brooks

Post on 24-Feb-2016

67 views

Category:

Documents


0 download

DESCRIPTION

lis512 lecture 4. the MARC format structure, leader, directory. MARC 21. MARC 21 is as important example of a record format used in by the library community Integrated Library Systems (ILSs) all import MARC21 records into relational database system - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: lis512 lecture 4

lis512 lecture 4

the MARC format structure, leader, directory

Page 2: lis512 lecture 4

MARC 21 MARC 21 is as important example of a record

format used in by the library community Integrated Library Systems (ILSs) all

import MARC21 records into relational database system

export MARC21 records from relational database systems

MARC21 records describe records from library catalogues.

Page 3: lis512 lecture 4

documentation

• I use the documentation provided by the library of congress at http://www.loc.gov/marc/.

Page 4: lis512 lecture 4

warning: the blank

• The MARC record format uses blanks as values.

• Since blanks are not usually seen on a slide, I may denote them here by a special sign ␢. Most of the time ‘ ’ does the job.

• In the documentation at the LoC, they use the # character.

Page 5: lis512 lecture 4

MARC 21 formats

• There are different format of MARC21 records– MARC 21 format for bibliographic data– MARC 21 format for authority data– MARC 21 format for holdings data– MARC 21 format for classification data

• We are only studying the first one here and call this the “MARC format” in what follows.

Page 6: lis512 lecture 4

MARC format

• The MARC format is very complicated. • The basic structure is – leader– directory– control fields– data fields

• The leader and directory are fixed fields. That means they have fixed length.

• The control and data fields are called variable fields. That means they have variable length.

Page 7: lis512 lecture 4

fixed fields

• The leader is 24 characters/bytes long. It can not be repeated.

• Each directory entry is 12 characters/bytes long.

• Both parts can only contain ASCII characters. • There may be many directory entries.

Therefore the total length of the directory is not fixed.

Page 8: lis512 lecture 4

MARC leader

• Described in http://www.loc.gov/marc/bibliographic/bdleader.html

• The leader is 24 bytes long. • Each byte houses one ASCII character, so we

can also say that it is 24 characters long. • We count them from 0 to 23.

Page 9: lis512 lecture 4

grouping of leader bytes

• I have grouped the bytes into “boring” and “interesting” bytes.

• Boring bytes contain no information. Their values can be calculated by a computer.

• Interesting bytes contain some information. They have to be filled in by a human.

Page 10: lis512 lecture 4

sub-classification of boring bytes

• Boring bytes can be classified as fixed boring bytes and variable boring bytes.

• Fixed boring bytes always contain the same value for any MARC record.

• Variable boring bytes contain potentially different values for every MARC record.

• I write “leader/??” when I want to refer to the byte ?? in the leader.

Page 11: lis512 lecture 4

fixed boring byte positions• Here are all fixed boring bytes– leader/10 always contains ‘2’– leader/11 always contains ‘2’– leader/20 always contains ‘4’– leader/21 always contains ‘5’– leader/22 always contains ‘0’– leader/23 always contains ‘0’

• Here is an example with fixed boring bytes underlined: “01178cam a2200313 a 4500”

Page 12: lis512 lecture 4

fixed boring bytes

• For fixed boring bytes, we need to be able to see where they are.

• We do not really need to know what they mean since we can’t change them anyway.

• For the curious, I have explanations in the appendix.

Page 13: lis512 lecture 4

Variable boring bytes

• These bytes contains length data about the record.

• Of course, for most records you will find different values.

• But the values hardly contain any useful information.

• The boring positions are 00–04 and 12–16.• Both are 5 positions long and we study them

in turn.

Page 14: lis512 lecture 4

leader positions 00–04

• At that position, there appears the length of the entire MARC record.

• This is the number of bytes, not the number of characters!

• There are 5 numerical characters. The number is right justified and unused characters contain zeros, as in say “00234”

Page 15: lis512 lecture 4

leader positions 12–16

• This contains the position of the start of the data, after the leader and the directory.

• This is contains the length of the leader plus the length of the directory, plus 1.

• It is encoded like positions 00–04. • The number should be much smaller than the

number encountered at 00–04. If you subtract one, you should find a multiple of 12.

Page 16: lis512 lecture 4

finished with boredom

• Here is an example with fixed all boring bytes underlined: “01178cam a2200313 a 4500”

• In the interesting bytes, we find codings that give us information about the MARC records.

• If we create a MARC record, we need to fill them out.

• Sometimes a system may prepopulate them but we still need to know what they are.

Page 17: lis512 lecture 4

interesting code positions

• From the example, you will see that the code positions that are interesting come in a rear part 17–19 and it a front part 05–09.

Page 18: lis512 lecture 4

leader position 05• The records status indicates the relationship of

the record to a set of records in a file. – ‘a’ The record has been changed to a higher

encoding level as recorded in position 17.– ‘c’ Corrected or revised Addition/change other

than in the encoding level code has been made to the record.

– ‘d’ Deleted Record has been deleted. – ‘n’ New Record is newly input. – ‘p’ Increase in encoding level from

prepublication.

Page 19: lis512 lecture 4

leader position 6, slide 1

• This gives the type of material. Values are – ‘a’ language material– ‘t’ manuscript language material– ‘c’ notated music– ‘d’ a manuscript notated music– ‘e’ cartographic material– ‘f ‘ manuscript cartographic material– ‘g’ projected medium

Page 20: lis512 lecture 4

leader position 6, slide 2

• further values are – ‘I’ nonmusical sound recording– ‘j’ musical sound recording– ‘k’ a two-dimensional non-projectable graphic– ‘m’ a computer file– ‘o’ a kit– ‘p’ mixed materials– ‘r’ a three-dimensional artifact or naturally

occurring object

Page 21: lis512 lecture 4

leader position 7

• Bibliographic level of the description– ‘a’ monographic component part– ‘b’ serial component part– ‘c’ collection– ‘d’ a subunit– ‘I’ an integrating resource– ‘m’ a monograph– ‘s’ a serial

Page 22: lis512 lecture 4

leader position 8• This gives the type of control. There are two

valid values– ‘ ’ –> not specified– ‘a’ the item is described according to archival

descriptive rules, which focus on the contextual relationships between items and on their provenance rather than on bibliographic detail. All forms of material can be controlled archivally.

Page 23: lis512 lecture 4

leader position 9

• This field indicates the character set and encoding scheme used in the record. There are two valid values– ‘ ’ MARC-8 – ‘a’ the UTF-8 encoding of UCS/Unicode

Page 24: lis512 lecture 4

conclusion about the front part

• Normally, if we are cataloging books, and keep changing the records, we expect something like ‘cam a’.

• You are allowed to use that for your record provided you don’t go into multi-part resources.

Page 25: lis512 lecture 4

leader position 17, slide 1• This is the encoding level. It indicates the fullness

of the bibliographic information in the record. – ‘ ’ “full level”. It is a complete MARC record created

from information derived from an inspection of the physical item. For serials, at least one issue of the serial is inspected.

– ‘1’ “full level, material not examined” created from information derived from an extant description of the item, without reinspection of the item.

– “2” “Less-than-full level, material not examined” created from an extant description of the material without reinspection of the item.

Page 26: lis512 lecture 4

leader position 17, slide 2• More allowed values for position 17 are– ‘3’ “abbreviated level” meaning a brief

record that does not meet minimal level cataloging specifications. – ‘4’ “core level” a less-than-full but greater-

than-minimal level cataloging record – ‘5’ “partial (preliminary) level” a

preliminary cataloging level record that is not considered final

Page 27: lis512 lecture 4

leader position 17, slide 3• More allowed values for position 17 are– ‘7’ “minimal level” Record that meets the U.S.

National Level Bibliographic Record minimal level cataloging specifications and is considered final

– ‘8’ “prepublication level” prepublication level record. Includes records created in cataloging in publication programs.

– ‘u’ “unknown” – ‘z’ “not applicable” the concept of encoding

level does not apply to the record.

Page 28: lis512 lecture 4

leader position 18• This field say what cataloging rules were applied

in the descriptive part. Allowed values are – ‘ ’ the description does not follow the

International Standard Bibliographic Description (ISBD) cataloging and punctuation provisions.

– ‘a’ the description uses the AACR2– ‘c’ the description follows the ISBD, but the end of

subfield punctuation is omitted. – ‘i’ the description follows the ISBD including end

of subfield punctuation– ‘u’ unknown

Page 29: lis512 lecture 4

leader position 19• This has the multipart resource record level. This

pertains to the situation where the record describes part of a resource or a resource that has many parts.– ‘ ’ not specified or not applicable– ‘a’ the record describes a set of resources– ‘b’ the record describes a resource which is part of a

set. The resource has a title that allows it to be independent of the set.

– ‘c’ the record describes a resource that is part of a set. The resource does not have a title that makes it understandable separately.

Page 30: lis512 lecture 4

conclusion about the rear part

• Normally, when we download a MARC record, we expect the rear part to be ‘ a ’, with two blanks surrounding an ‘a’.

• In the record that we create for class we make it ‘7a ’. Let’s avoid complications with multi-part resources. ;-)

Page 31: lis512 lecture 4

Thank you for your attention.

http://openlib.org/home/krichel

Page 32: lis512 lecture 4

MARC directory

• The MARC directory follows the leader.• The directory contains fixed-length

information about the variable-length fields.• Each directory entry has 12 bytes numbered 0

to 11.• For each variable field appearing in the record,

there is a directory entry. • The MARC directory contains no information.

Page 33: lis512 lecture 4

appendix

• This is an appendix to the slides.• It contains a description of the MARC

directory• It contains a partial rationale for the fixed

boring bytes.

Page 34: lis512 lecture 4

MARC directory bytes 0–2• Bytes 00–02 give the field name.• The field names used in MARC are numbers. • So you will find three ASCII numeric characters

at that place.

Page 35: lis512 lecture 4

the MARC directory bytes 03–06• They give the field length.• This is 4 (four) ASCII numeric characters.• The fact that there are four of them is

determined by character 20 in the MARC leader. • The number is right justified and filled by 0s if

required. • The field length of the variable field, including

the field names, the field indicators, subfield codes, data, and the field terminator character.

Page 36: lis512 lecture 4

the MARC directory bytes 07–11

• Bytes 07–11 give the starting character position. This five ASCII numeric characters that specify the starting character position of the variable field relative to the base address.

• The fact that there are five of them is determined by character 21 in the MARC leader.

• The number is right justified and filled by 0s if required.

Page 37: lis512 lecture 4

leader position 10

• This is the indicator count.• It says how many bytes are used by indicators

to a field. • The indicator count is always 2.

Page 38: lis512 lecture 4

leader position 11• This indicates the number of character

positions used for each subfield code in a variable data field.

• This always ‘2’.• There are two characters in each subfield. First

there is the delimiter ‘$’ and then there is a lowercase or number code for the field.

Page 39: lis512 lecture 4

leader position 20

• This is the length of the length-of-field portion.

• Meaning how long is the length of a field.• It is always “4”.

Page 40: lis512 lecture 4

leader position 21

• Length of the starting-character-position portion.

• At this position, we always find the value “5”. • This says that in each entry in the MARC

directory (to be covered later) the starting character position part is 5 numbers long.

Page 41: lis512 lecture 4

leader position 22

• This gives the length of the implementation dependent part of the record.

• This is always 0.• This means that the record is done according

to the specification and there is no additional part that is custom made for use on the information system you are working with.

• If it were there, it would have to be short!

Page 42: lis512 lecture 4

leader position 23

• This is the last position.• The meaning of the data at this position is

undefined. • I suppose it could be defined later when we

find a need for it.• This position is always occupied by a 0.