batch-load points counterpeople.uvawise.edu/acv6d/.../presentationblpcproj... · • consult step...

45
Batch-Load Points Counter (MARCEdit project) Amelia C. VanGundy The University of Virginia’s College at Wise Virginia SirsiDynix Library Users Group Meeting Nov. 14, 2012

Upload: nguyenkhue

Post on 25-Aug-2018

214 views

Category:

Documents


0 download

TRANSCRIPT

Batch-Load Points Counter (MARCEdit project)

Amelia C. VanGundy The University of Virginia’s College at Wise

Virginia SirsiDynix Library Users Group Meeting

Nov. 14, 2012

John Cook Wyllie Library http://library.uvawise.edu/

• Ebook titles in OPAC & Ebook packages on web in finding aids

• Rate of e-book acquisition increased

netLibrary – 3k titles per year

EBSCOhost Ebook Academic Collection – 65k titles initial load

– 5-10k titles additional every quarter

2

Batch Loading Problems

• Existing procedures were difficult to follow

• Procedures were inconsistent

– especially for different vendors

• Didn't take advantage of MARCEdit Tools

• 949 holdings field now includes $a class#

– previously, files loaded with AUTO “call#”

3

Solution? Wish list?

Determine quality of MARC records

– OCLC files vs. other vendor files

Determine editing priorities

– required (001/949), recommended, optional

Learn to construct Regular Expression Strings

– Batch Editing Tools & Find/Replace

• Streamlined format

– needed both an outline & more detailed info

• Make available on-line/web-page 4

MARCEdit proficiency

• Beginner

Advanced Beginner – Uses MARCEditor Tools window

(Add/Delete field, Edit Subfield Data, Sort by... )

– Can apply Regular Expression Strings

Intermediate – Uses MARC Tools wizard

(Extract Selected Records, MARCSplit, Extract selected records)

– Can construct Regular Expressions

• Expert

5

Batch-Load Points Counter (BLPC) people.uvawise.edu/acv6d/

6

Batch-Load Points Counter (BLPC) Webpage & Project link

people.uvawise.edu/acv6d/

1. Introduction – project concept & desired outcomes

2. Checklist #

– outlines the batch-load procedures & steps

– points counter: “what to do” & “when to stop”

3. Processing Guidelines #

– procedures & how-tos & copy/paste info

4. 949 processing 7

BLPC Introduction & Outcomes

• Validation

– determine integrity of the file

• Processing

– determine quality of the records

• Statistics

– track vendor pkgs, record counts, 001 prefixes

• Points

– max. points = 150 (2.5 hours)

• STOP & contact vendor (request corrected file) 8

BLPC CheckList w/Time estimates

• Step 1 & 2: Preparation & validation – number of records in file

– integrity of file

– valid URL links

• Step 3-4: Review & processing – quality of records

– lists all processing/edits possible

• Step 5: 949 holdings

Print on one page (2 p. per sheet / front&back) 9

BLPC Processing Guidelines (Procedures)

• Gives details for CheckList – Steps 1-2, Steps 3-4, Step 5

• Gives the regular expression strings (copy/paste)

– Finding/ Replacing/Deleting

– MARCEditor Tools & MARCEdit Tools

• Always use along with Checklist – includes information to process every field, BUT

– not every field needs processing

Do not print out 10

BLPC Step 1: Preparation & Reports

• MARC Validator – Identify Invalid Records – Validate Record (copy/paste into text file)

• Material Type Report

• Field Count – verify vendor count against MARCEditor count

(LDR/000)

– count early / count often

• Deduplicate (See Addt’l Instruct.)

11

Reports/MARC Validator: Identify Invalid Records

12

Reports/MARC Validator: Validate Records

13

Reports/Material Type

14

BLPC Step 2: Verify Field Counts

• Reports/FieldCount for error checking

– first field listed is 000 (corresponds to =LDR)

– last field listed is “numeric”

– 245 count

• Reports/MARCValidator errors

– open text file created in Step 1

– look for specific errors in error file

• Check URL links to make sure they work 15

Reports/Field Count (vendor count = 8556)

16

Field Count Error & "bad field tag" (vendor count =694)

17

Reports/Field Count: Detail (highlight field & right-click)

18

Review Validate Records report (saved as text file in Step 1.B)

19

BLPC: Review for processing Checklist Step 3 workflow

Check field counts

Mark-up notes on the Checklist

– Track/count fields that need processing

Track points for fields that need processing

Track points for fields that need manual editing

Each record to fix means extra points

Rule of thumb: for more than 12 manual edits

Treat as separate post-load maintenance project

20

BLPC Checklist Step 3: Review Fields Examples of required processing

Examine first record & check field count Title control# – 001 (prefer OCLC#)

If lacking: use info. from 035 or create local 001

Check field counts / subfield counts Title/GMD – 245 $h

URL – 856 $3 $y $u

Check Validate Record text file for errors “Invalid field format” / “Subfield cannot repeat”

Check field counts / indicator counts Subject – 650 Ind2 = 4/7 or 5/6/8

21

BLPC Checklist Step 4: Review fields Examples of optional processing

Check field count & delete if present

029 / 583 / 584 / 938

Check field data and delete

Other vendor pkg names (netLibrary/ebrary/myiLibrary/24x7/Ebsco)

Check field data & ignore/defer

300 lacks phrase: (1 electronic resource)

22

BLPC Checklist with mark-ups

23

BLPC Processing workflow Step 3 - Step 4

Review Field Count

Review Field data

– Use Find/Sort window and review first/last field

Add/Delete/Edit field

Review Field data

– look at field in first record or Find/Sort window

– Mistake? Typo? – use the Edit/SpecialUndo

Review FieldCount

Save edited file / SaveAs new filename 24

MARCEditor Tools window

adding/editing/deleting fields

adding/editing deleting subfields

MARCEditor Edit/Find window

editing/replacing field data

displays sortable list

MARCEdit Tools wizard

for select & extract records

extract tab-delimited records for Excel

MARCEditor / MARCEdit Tools BLPC Checklist identifies fields to process

25

BLPC Processing: Add std. Phrase 506 => Step 3.S

• Check Field Count for presence of 506

• Delete existing 506 field (if present)

• Consult Step 3.S in BLPC Procedures

– Determine that AddField Tool is needed for processing

– Copy Std.phrase from Step 3.S notes

– Paste into AddField Tool window and submit

• Review 506 data in first record

• Check field count

• Save file 26

MARCEditor Tools: Add std. Phrase 506 => Step 3.S

27

BLPC Processing: Delete specific fields 650 Ind2= 5/6/8 (non-LC) => Step 3.V

• Check Field Count for Presence of 650 Ind2=5/6/8

• Consult Step 3.V in BLPC Procedures

– Optional Review – FindAll(RegEx) instructions

– Determine that Tools/DeleteField tool is needed

– Copy RegEx pattern from Step 3.V

– Paste into Tools/DeleteField window

– Use Regular Expressions radio button option

– Submit using Delete button

• Check Field Count & Indicator count

• Save file 28

MARCEditor: Delete specific fields 650 Ind2= 5/6/8 (non-LC) => Step 3.V

29

Regular expressions (RegEx)

• Finding/Editing patterns in strings (letters/numbers)

– Like learning another language

• Parentheses are used to group data

– Forces the computer to "store" data in "chunks"

– Data “chunks” are numbered for recall/retrieval/use

– Helps the programmer "read" the pattern

• Optional functionality, and not necessary

• Some punctuation is "reserved" (has a special meaning)

• BLPC uses consistent format for RegEx patterns

30

Reading RegEx Patterns 650 Ind2= 5/6/8 (non-LC)

Pattern: (=650 )(.[568])(\$a)(.+)

(=650 ) look for 650 fields with two blank spaces

(. [568]) look for any Ind1 & listed Ind2 numbers

(\$a) look for subfield $a (used as "anchor chunk")

(.+) any letter/number to the end of the field

Use Edit/FindAll(RegEx) to verify pattern

31

Interpreting RegEx punctuation

Pattern: (=650 )(.[568])(\$a)(.+)

( ) Parentheses for data “chunks”

. Period for any single letter/number

[ ] Square brackets for a list using “OR”

\ Backslash before “reserved” punctuation

esp.: $ \ ( ) [ ]

+ Plus sign for more of the same

“Chunks” are stored as: $1$2$3$4 32

Creating RegEx patterns

• Start with known pattern: For non-LC Subjects: (=650 )(.[568])(\$a)(.+)

FindAll(RegEx) for “local” Subjects (Ind2 = 4/7)

(=650 )(.[47])(\$a)(.+)

FindAll(RegEx) for “local” Genres (Ind2 = 4/7)

(=655 )(.[47])(\$a)(.+)

33

Editing with RegEx string pattern 650 BISAC subjects => 690

Start with known pattern: (=650 )(.[568])(\$a)(.+)

• Use Edit/Replace(RegEx): Change 650 to 690

Identify “BISAC” subjects: Ind2=7 & $2 = bisacsh

• Determine which “chunks” change/stay the same

Find(RegEx): (=650 )(.[7])(\$a)(.+)(\$2bisacsh)

Replace(RegEx): (=690 )$2$3$4$5

34

Reading RegEx Patterns 650 BISAC subjects => 690

Pattern: (=650 )(.[7])(\$a)(.+)(\$2bisacsh)

(=650 ) look for 650 fields with two blank spaces

(.[7]) look for any Ind1 & Ind2 =7

(\$a) look for subfield $a (optional “anchor” text)

(.+) any letter/number to the next “chunk”

(\$2bisacsh) look for subfield & data at end of field

Can be shortened (which makes the pattern look complicated):

Find(RegEx): (=650)(.+\$2bisacsh)

Replace(RegEx): (=690)$2

35

MARCEditor: FindAll(RegEx) Testing the pattern: 650 BISAC subjects

36

MARCEditor: Replace(RegEx) 650 BISAC subjects => 690

37

BLPC Step 5: 949 processing Required processing

Policy: Include Class# in Unicorn Item record

949

$a -- Pull the call# from the 050$a

-- Insert the standard phrase: ' INTERNET'

$v -- Pull the 001/OCLC# as a unique no.

$w $h $t $x $z -- Add standard holdings data

• See Addt'l instruct,

38

Batch-loading • MARCEdit with files no larger than 10k records

– MARCEdit/Tool MARCSplit

• MARCEditor/File: Compile File into MARC

• Unicorn batch load rpt uses 001 match point – 'o' for OCLC# o & 'g' for local vendor key

• Unicorn batch load rpt settings – create new bibliographic records only

• Date cataloged -- back dated to prev. month – prevents interference w/scheduled Authority reports

– max. load two files a day

39

Identifying records for Cleanup

Checklist finds problems to correct post-load

• Item maintenance projects

– 949 lacks call#

• Bibliographic record maintenance projects

– 245 lacks $h (if more than 5-12 records)

– URLs lacking

• Record reload/overlay project

– Record already in OPAC (P-N duplicates)

40

MARCEdit Tools: Select/Extract selected records

Step 3.F: 245 lacks $h

41

MARCEdit Tools: Export Tab Delimited records

42

Help! • MarcEdit Help

http://people.oregonstate.edu/~reeset/marcedit/html/help.html

– Click thru the Contents menu:

Contents / Using MARCEdit / Using the MARCEditor / Editing Functions / Using Regular Expressions.

• RegularExpressions.info

http://www.regular-expressions.info/

MARCEDIT-L list

http://metis3.gmu.edu/cgi-bin/wa?A0=MARCEDIT-L

BATCH list

http://listserv.vt.edu/cgi-bin/wa?A0=batch

43

Amelia C. VanGundy The University of Virginia's College at Wise

John Cook Wyllie Library

276-328-0154 [email protected]

http://people.uvawise.edu/acv6d/

Virginia SirsiDynix Library Users Group Meeting Nov. 14, 2012

44

BLPC Project Presentation revisions

Originally presented Nov. 14, 2012

• Additional Slides:

– BLCP Project web-page

– MARCEditor: FindAll(RegEx)

– MARCEdit Tools: Export Tab Delimited records

– BLPC Project: Presentation revisions

45