voyager® 5.0 and cataloging connie l. braun training consultant [email protected]...

Voyager® 5.0 and CatalogingVoyager® 5.0 and Cataloging

Connie L. BraunTraining Consultant

[email protected]

Confidential and proprietary information of Endeavor Information Systems, copyright © 2006. Reproduction or republication of this information in any form is strictly prohibited without express written consent of Endeavor Information Systems.

AgendaAgenda

Introduction

Your Work Environment

Conversion

New Features

Enhanced Searching

Learning More/Questions??

Voyager 5.0 and Unicode™Voyager 5.0 and Unicode™

Finds and displays records in the native language

Create and edit any MARC record using UTF-8

Import and export of records with any supported character set

Select a Unicode-compliant font in Cataloging

Display Unicode characters in OPAC without proprietary software

Voyager 5.0 and Unicode™Voyager 5.0 and Unicode™

For our customers, it’s business as usual, but with some interesting changes and improvements, especially in Cataloging.

The Unicode standard is an important step towards taking advantage of these changes and improvements.

Implementing the Unicode standard is an extension of Endeavor’s original mission: access to information regardless of location or format.

Following StandardsFollowing Standards

Voyager 5 and Voyager with Unicode rely on standards much more than previously

• see http://www.unicode.org for much more detail on these standards

• see http://lcweb.loc.gov/marc/specifications/speccharucs.html for details on LC’s format of MARC records that use Unicode; Voyager follows this specification

• specifics on the Code Tables may be viewed at http://www.loc.gov/marc/specifications/specchartables.html

Multilingual Input and DisplayMultilingual Input and Display

Improved multilingual input and display capabilities in Voyager means that characters now display correctly according to the Unicode and MARC standards.

Greater script coverage for cataloging items in your collections, published in languages around the world.

How many? The total number of possible characters for UTF-8 is: 2,147,483,648!

AgendaAgenda

Introduction

Your Work Environment • Workstation Requirements• Setting Up For Languages Other Than English• Tag Tables• Session Defaults and Preferences

Conversion

New Features

Enhanced Searching


Workstation RequirementsWorkstation Requirements

This means that staff PCs will need:

• Windows 2000 or XP operating system

• Unicode standard compliant Internet browser• IE 6+• Netscape 6+

• Unicode-compliant font: Lucida Sans Unicode or Arial Unicode MS

MS Windows™MS Windows™

Voyager is more integrated with Windows in terms of using

• standard Windows 2000/XP Unicode support

• standard Unicode fonts

• standard input using Input Method Editors (IMEs)

• standard browser support

Languages Other Than EnglishLanguages Other Than English

• workstations need to be specifically configured to work with languages other than English

• technical IT assistance likely required to install needed languages on staff PCs

• best to install all languages so that cataloger may easily include new ones as necessary

Adding Languages to PCsAdding Languages to PCs• regional and language

options are specific to each PC

• among options available via Start–Settings–Control Panel

• details button on Languages tab lets operator view or change languages and methods to enter text

• may include supplemental language support, too

Choosing LanguagesChoosing Languages

• languages added to PCs will match languages for items found in your collections

• add and remove according to your needs; as few or many as necessary

• may also set preferences for language bar and key settings

Tag TablesTag Tables

MARC Tag Tables have been revised and updated

See KnowledgeBase for Incident #14167 to obtain latest update

Tag TablesTag Tables

• ability to modify tag table configuration remains the same as in earlier releases

• but, may not specify anything for Leader position 9 since that byte is now hard-coded to identify records that have been converted to UTF-8

• see Appendix A of Cataloging User’s Guide for full details on revising, maintaining and updating the Tag Tables

Record ValidationRecord Validation

MARC validation

MARC21 character set validation

Authority control validation

Decomposition of accented characters for MARC21

Record ValidationRecord Validation

Bypass MARC21 Character set validation; unchecked…

• uses MARC21 Repertoire.cfg to control validation of the MARC21 character set

• helps to enforce MARC21 standard

Bypass Decomposition of accented characters for MARC21; unchecked…

• allows records to be saved to the database without decomposing the characters

• helps to enforce MARC21 standard

Mapping TabMapping Tab

Expected Character Set of Imported Records now has six options

Colors/Fonts TabColors/Fonts Tab

AgendaAgenda

Introduction


Conversion• Data Conversion• Conversion Error Logging• Conversion Details• Identifying Non-Unicode Data• The Rest of Voyager

New Features

Enhanced Searching


Data ConversionData Conversion

MARC records are converted from VRLIN (Voyager legacy encoding) to MARC21 compliant UTF-8 encoding

• leader position 9 becomes an ‘a’• conversion log created• UTF-8 allows for variable length characters

(most characters occupy same amount of space as before conversion)

Note: all indexes and database columns with MARC data are regenerated after conversion

Conversion DetailsConversion Details

IMPORTANT! NO RECORDS ARE LOST

Each field in the record handled individually. As it is processed, it may change length, requiring adjustments to the leader and directory of the record.

Both record-level and field-level checking are performed. In rare cases an entire record might fail conversion; it is more likely that an individual field fails to be converted.

Records may not convert if they contain text that cannot be mapped into Unicode according to the standard MARC-8 to Unicode mappings. Records that do not convert are stored in the database as is, without being converted to Unicode.

Conversion Error LoggingConversion Error Logging

Libraries need to know the details about the results of the conversion process.

• full error checking and logging is included as part of the upgrade

• see Voyager with Unicode Technical User’s Guide, Chapter 4, and Voyager with Unicode Cataloging User’s Guide, Appendix C, for more information

• library designates should review this file to plan for correcting any records that have errors

Sample from Conversion Log Sample from Conversion Log FileFile

Conversion Log Details 1Conversion Log Details 1

1 2 3 4 5 6 7 # 11 secs read=982 changed=791 880=0 okay=982 errors=0 written=982# 21 secs read=1931 changed=1558 880=0 okay=1931 errors=0 written=1931# 29 secs read=2848 changed=2087 880=0 okay=2848 errors=0 written=2848# 36 secs read=3699 changed=2533 880=0 okay=3699 errors=0 written=3699# 43 secs read=4607 changed=3076 880=0 okay=4607 errors=0 written=4607# 51 secs read=5519 changed=3610 880=0 okay=5519 errors=0 written=5519

=============================================================

Legend

1 number of seconds used by job so far

2 read=number of records processed

3 changed=number of records changed

4 880=how many records contain 880s

5 okay=# records processed successfully

6 errors=# records not processed due to errors

7 written=# records written to the database


1 2 3 4 5 6 7 8=bib 6213: [17](700): c->8 loose char page=0 at 20 '091e ..‘

9=bib 35322: [14](856): c->8 undefined char page=0 at 61 'fc7220486973746f .r Histo‘

10=bib 35516: [23](856): c->8 no char to combine to page=0 at 82 '1e .‘

================================================================

1 record type and id

2 index within record of field that generated error

3 tag that generated error

4 c->8 indicates conversion to UTF-8 encoding

5 description of error

6 page=subset to which source character belongs

7 at # position of source character that caused error

8 hex dump of source character




loose char: a warning message indicating that a character not strictly part of Voyager encoding has been converted (e.g. unexpected carriage return)

no char to combine to:

a warning message indicating that a combining character appeared but it lacks a base character with which to combine (e.g. umlaut but no a, o, u base letter)

undefined char: an error message indicating that there is a single character that cannot be mapped to UTF-8

Identifying Non-Unicode DataIdentifying Non-Unicode Data

Select a color for Conversion records in Session Defaults and Preferences—Colors-Fonts tab to identify records that did not complete the conversion process.


Records that did not complete the conversion process then display in the color selected in Options/Preferences.


Records that cannot be converted to Unicode are viewable in the Cataloging module with nc (not converted) displayed in the Title Bar.

Any characters that cannot be matched or recognized are replaced with a Unicode substitution character.

Fonts and UnicodeFonts and Unicode

A MARC record may contain non-Roman characters even though you cannot see them.

• records are sure to display correctly if a Unicode-compliant font has been selected

Lucida Sans Unicode• installed by default with Windows

Arial Unicode MS • good choice for libraries with mixed cataloging• included with Microsoft Office and other

Microsoft products

The Rest of VoyagerThe Rest of Voyager

Non-MARC data is not converted• Acquisitions data• Circulation data (patron info, etc.)• Item data

Reporter• not Unicode standard compliant• translates data to LATIN1

AgendaAgenda

Introduction


Conversion

New Features• Cataloging

Diacritics & Special Characters, Importing Records, New Record Views, Search URIs

• WebVoyágeBrowsers, Searching, Displaying

• Interacting with Other Systems

Enhanced Searching


Diacritic/Special Character Diacritic/Special Character EntryEntry

Cataloging practices: then and now

• pre-Unicode input in Cataloging = diacritic precedes the base character

Example: Espa~na• post-Unicode input in Cataloging = diacritic

follows the base characterExample: Espan~a

• ability to display combined characters in Cataloging client is an improvement over past versions and a way to insure accurate entry

Example: España

Special Characters.cfgSpecial Characters.cfg

SpecialCharacters.cfg, located in the C:\Voyager\Catalog folder, defines the content of the special character entry dialog box. Operators may edit this file.

Special Character EntrySpecial Character Entry

Finding Little Used CharactersFinding Little Used Characters

• for situations where a character not part of the Special Characters list is needed, operator can use Character Map from MS Windows

• typically located at Start – Programs – Accessories – System Tools – Character Map

• locate character or perform search

• select and copy character, then paste into position in bib record

Input of Non-Roman TextInput of Non-Roman Text

Voyager with Unicode added the option for Cataloging operators to use all of the standard Microsoft Windows keyboard and input method editors (IMEs).

With this functionality in place, operators may search for, display, and edit the contents of all MARC records using the full range of UTF-8 characters.

Entire JACKPHY group is part of the UTF-8 character set which includes right-to-left input needed for Arabic, Persian, Hebrew and Yiddish.

Linking in a MARC21 RecordLinking in a MARC21 Record

Tag I1 I2 Subfield Data

100 1 ‡6 880-01 ‡a An, Zhen.

245 1 0 ‡6 880-02 ‡a Ri yue yun yan / ‡c An Zhen zhu.

250 ‡6 880-03 ‡a Di 1 ban.

260 ‡6 880-04 ‡a Changchun Shi : ‡b Changchun chu ban she, ‡c 1997.

300 ‡a 4, 2, 291 p. ; ‡c 21 cm.

440 0 ‡6 880-05 ‡a Zhongguo li dai wang chao xing shuai qu shi lu

500 ‡a Non-Roman script – Chinese

651 0 ‡a China ‡x History ‡y Ming dynasty, 1368-1644.

880 1 ‡6 100-01/$1 ‡a 安震 .

880 1 0 ‡6 245-02/$1 ‡a 日月　云烟 / ‡c 安　震　著 .

880 ‡6 250-03/$1 ‡a 第 1 版 .

880 ‡6 260-04/$1 ‡a 长春市 : ‡b 长春出版社 ,‡c 1997.

880 0 ‡6 440-05/$1 ‡a 中国　历代　王朝　兴衰　启示录

Using the On-Screen Using the On-Screen KeyboardKeyboard

Typically, the path is Start—Programs—Accessories—Accessibility—On-Screen Keyboard

Importing RecordsImporting Records

• expected character set needs to be accurately identified if records are to be imported correctly

• some experimentation may be necessary to determine the correct character set

• let’s look at some details to help everyone understand what is happening

Record Exchange ScenariosRecord Exchange Scenarios

Bulk ImportBulk Import

• fundamentally the same as before, although leader byte 9 is checked against the incoming character set identified in the import rule.

• blank = non-Unicode™; converted & imported

• ‘a’ = Unicode™; imported

• neither blank nor ‘a’; errors out – not imported

Expected Character SetExpected Character Set

• Character set mapping for Bulk Import is designated in the Bulk Import rule in SysAdmin—Cataloging—Bulk Import Rules.

MARC ExportMARC Export

• default export character set is MARC21 UTF-8

• use the –a option to choose different character set (in the command line)• see page 10-8, in Voyager with Unicode

Technical User’s Guide for more detail

• if mapping for a composed character is not found, it decomposes and Voyager attempts to find a match for each part

New ISBN IndexesNew ISBN Indexes

For improved duplicate detection:

ISBN Indexes • 020N 020a Number only• 020R 020z Number only

020 |a 1234567890 (Knopf)020 |a 1234567890

Check Bibliographic and Authority duplicate detection profiles in System Administration!

HTTP PostingHTTP Posting

• much easier access to WebVoyáge display from clients

• toggle record view from any staff client to WebVoyáge

• if configured, Send Record To option is available via Record text menu

• configured in voyager.ini file [MARC POSTing] stanza

HTTP PostingHTTP Posting

Send Record To…….in Cataloging

Send Record To…….in Acquisitions

Search URISearch URI

• drives searches to resources on the web• is PC specific and adds new button to search

interface in staff clients when configured in voyager.ini file

• click button…a browser is opened & search is executed

• some possible applications• link to Google or other Internet resource• link to another OPAC• link to LC authorities file

Search URISearch URI

Staff client search URI

Available for use in all staff clients

Adding Search URIsAdding Search URIs

[SearchURI]

Name=GoogleURI=http://www.google.comCopy=YSearchSyntax=/search?&q=<searchtext>

Name=Barnes&NobleURI=http://search.barnesandnoble.comCopy=YSearchSyntax=/booksearch/results.asp?WRD=<searchtext>

Name=Gale GroupURI=http://www.galegroup.comCopy=YSearchSyntax=/servlet/SearchPageServlet?region=9&imprint=<searchtext>

Interacting with Other Interacting with Other SystemsSystems

Incoming Z39.50 Connections

• records in Unicode databases are UTF8-encoded

• z3950svr may send either or both MARC8-encoded or UTF8-encoded records

• default is set to send MARC8 encoded records

• but, two different z3950svr ports can be configured to provide records in both formats, thereby accommodating all sites connecting to database

Interacting with Other Interacting with Other SystemsSystems

Outgoing Z39.50 Connections

• retrieves and displays records of any type in UTF-8

• converts incoming records based on new Database Definitions setting in System Administration called ‘Source Character Set’

• Latin1 (non Unicode)• MARC 21 MARC8 (non Unicode)• MARC21 UTF8• OCLC (non Unicode)• RLIN legacy (non Unicode)• Voyager legacy (non Unicode)

AgendaAgenda

Introduction


Conversion

New Features

Enhanced Searching

• WebVoyáge

• Staff clients


WebVoyáge and UnicodeWebVoyáge and Unicode

MARC data supplied to the browser in UTF-8

• IE 6+ generally displays Unicode characters correctly; some characters do not display correctly unless a Unicode-compliant font is selected

• Netscape 6+ figures out that it needs to display Unicode characters without any special settings

• consider new help text in your OPAC to help patrons understand about language options, especially if there are records using different languages in your database

WebVoyáge and UnicodeWebVoyáge and Unicode

Search and display in native languages for staff and users:

• OPAC and Cataloging client both allow Unicode character input; i.e., you can search for and retrieve records in native languages

• record display includes non-Latin scripts, including right-to-left scripts like Arabic and Hebrew

• Voyager takes advantage of the web browser’s native rendering support to present characters correctly

Other Languages in the OPACOther Languages in the OPAC

Enhanced SearchingEnhanced Searching

Support more strategies for finding results by using:

• Holdings keyword searching

• Keyword-in-Headings searches

• Wildcards for left and internal truncation

Holdings Keyword Holdings Keyword SearchingSearching• available in staff clients only• requires Boolean operators if searching for more

than one term• HKEY index searched by default• new MFHD keyword indexes may be created

Available for:

• staff name headings• staff name/title headings• OPAC name headings• OPAC name/title headings• staff title headings• staff subject headings• OPAC title headings• OPAC subject heading

Keyword-in-HeadingsKeyword-in-Headings SearchingSearching


• available in both staff clients and WebVoyáge

• provide alphabetical results list of every heading containing search term

• use Boolean operators (and, or, not) to combine search strings, with “and” implied

• show all headings keyword searches on the history tab

Left/Internal TruncationLeft/Internal Truncation

• place wildcard characters (? or %) at the beginning or in the middle of a search term

• use the wildcard, ?, to match on zero, one or more characters

• use the wildcard, %, as the new single-character matching character

AgendaAgenda

Introduction


Conversion

New Features

Enhanced Searching


Learning More…Learning More…

Coded Character Sets: A Technical Primer for Librarians (EndUser 2004: Session 29)

Transitioning To Unicode: Strategies for Tidying Your Data (EndUser 2004: Session 45)

Why Unicode? (EndUser 2004: Session 65)

Voyager with Unicode Release Handbook

Voyager 5.0 Release Handbook

Voyager with Unicode Cataloging User’s Guide

Voyager 5.0 Cataloging User’s Guide


880 – Alternate Graphic Representation (R)http://www.loc.gov/marc/bibliographic/ecbdhold.html#mrcb880

OCLC Character Setshttp://www.oclc.org/support/documentation/worldcat/records/subscription/5/5.pdf

Original Scripts in RLG Databaseshttp://www.rlg.org/origscripts.html

MARC 21 Concise Bibliographic: Control Subfieldshttp://www.loc.gov/marc/bibliographic/ecbdcntf.html

MARC 21 Concise Bibliographic: Multiscript Recordshttp://www.loc.gov/marc/bibliographic/ecbdmulti.html


SupportWeb: KnowledgeBase, EndUser archives, Voyager-L listserv archives

Questions????

Thank you for joining me today!

voyager® 5.0 and cataloging connie l. braun training consultant [email protected]...

Documents

unicode voyager

arial unicode ms slide

unicodecompliant font

necessary slide

lucida sans unicode

standard windows

languages tab

needed languages