voyager® 5.0 and cataloging connie l. braun training consultant [email protected]...
TRANSCRIPT
Voyager® 5.0 and CatalogingVoyager® 5.0 and Cataloging
Connie L. BraunTraining Consultant
Confidential and proprietary information of Endeavor Information Systems, copyright © 2006. Reproduction or republication of this information in any form is strictly prohibited without express written consent of Endeavor Information Systems.
AgendaAgenda
Introduction
Your Work Environment
Conversion
New Features
Enhanced Searching
Learning More/Questions??
Voyager 5.0 and Unicode™Voyager 5.0 and Unicode™
Finds and displays records in the native language
Create and edit any MARC record using UTF-8
Import and export of records with any supported character set
Select a Unicode-compliant font in Cataloging
Display Unicode characters in OPAC without proprietary software
Voyager 5.0 and Unicode™Voyager 5.0 and Unicode™
For our customers, it’s business as usual, but with some interesting changes and improvements, especially in Cataloging.
The Unicode standard is an important step towards taking advantage of these changes and improvements.
Implementing the Unicode standard is an extension of Endeavor’s original mission: access to information regardless of location or format.
Following StandardsFollowing Standards
Voyager 5 and Voyager with Unicode rely on standards much more than previously
• see http://www.unicode.org for much more detail on these standards
• see http://lcweb.loc.gov/marc/specifications/speccharucs.html for details on LC’s format of MARC records that use Unicode; Voyager follows this specification
• specifics on the Code Tables may be viewed at http://www.loc.gov/marc/specifications/specchartables.html
Multilingual Input and DisplayMultilingual Input and Display
Improved multilingual input and display capabilities in Voyager means that characters now display correctly according to the Unicode and MARC standards.
Greater script coverage for cataloging items in your collections, published in languages around the world.
How many? The total number of possible characters for UTF-8 is: 2,147,483,648!
AgendaAgenda
Introduction
Your Work Environment • Workstation Requirements• Setting Up For Languages Other Than English• Tag Tables• Session Defaults and Preferences
Conversion
New Features
Enhanced Searching
Learning More/Questions??
Workstation RequirementsWorkstation Requirements
This means that staff PCs will need:
• Windows 2000 or XP operating system
• Unicode standard compliant Internet browser• IE 6+• Netscape 6+
• Unicode-compliant font: Lucida Sans Unicode or Arial Unicode MS
MS Windows™MS Windows™
Voyager is more integrated with Windows in terms of using
• standard Windows 2000/XP Unicode support
• standard Unicode fonts
• standard input using Input Method Editors (IMEs)
• standard browser support
Languages Other Than EnglishLanguages Other Than English
• workstations need to be specifically configured to work with languages other than English
• technical IT assistance likely required to install needed languages on staff PCs
• best to install all languages so that cataloger may easily include new ones as necessary
Adding Languages to PCsAdding Languages to PCs• regional and language
options are specific to each PC
• among options available via Start–Settings–Control Panel
• details button on Languages tab lets operator view or change languages and methods to enter text
• may include supplemental language support, too
Choosing LanguagesChoosing Languages
• languages added to PCs will match languages for items found in your collections
• add and remove according to your needs; as few or many as necessary
• may also set preferences for language bar and key settings
Tag TablesTag Tables
MARC Tag Tables have been revised and updated
See KnowledgeBase for Incident #14167 to obtain latest update
Tag TablesTag Tables
• ability to modify tag table configuration remains the same as in earlier releases
• but, may not specify anything for Leader position 9 since that byte is now hard-coded to identify records that have been converted to UTF-8
• see Appendix A of Cataloging User’s Guide for full details on revising, maintaining and updating the Tag Tables
Record ValidationRecord Validation
MARC validation
MARC21 character set validation
Authority control validation
Decomposition of accented characters for MARC21
Record ValidationRecord Validation
Bypass MARC21 Character set validation; unchecked…
• uses MARC21 Repertoire.cfg to control validation of the MARC21 character set
• helps to enforce MARC21 standard
Bypass Decomposition of accented characters for MARC21; unchecked…
• allows records to be saved to the database without decomposing the characters
• helps to enforce MARC21 standard
AgendaAgenda
Introduction
Your Work Environment
Conversion• Data Conversion• Conversion Error Logging• Conversion Details• Identifying Non-Unicode Data• The Rest of Voyager
New Features
Enhanced Searching
Learning More/Questions??
Data ConversionData Conversion
MARC records are converted from VRLIN (Voyager legacy encoding) to MARC21 compliant UTF-8 encoding
• leader position 9 becomes an ‘a’• conversion log created• UTF-8 allows for variable length characters
(most characters occupy same amount of space as before conversion)
Note: all indexes and database columns with MARC data are regenerated after conversion
Conversion DetailsConversion Details
IMPORTANT! NO RECORDS ARE LOST
Each field in the record handled individually. As it is processed, it may change length, requiring adjustments to the leader and directory of the record.
Both record-level and field-level checking are performed. In rare cases an entire record might fail conversion; it is more likely that an individual field fails to be converted.
Records may not convert if they contain text that cannot be mapped into Unicode according to the standard MARC-8 to Unicode mappings. Records that do not convert are stored in the database as is, without being converted to Unicode.
Conversion Error LoggingConversion Error Logging
Libraries need to know the details about the results of the conversion process.
• full error checking and logging is included as part of the upgrade
• see Voyager with Unicode Technical User’s Guide, Chapter 4, and Voyager with Unicode Cataloging User’s Guide, Appendix C, for more information
• library designates should review this file to plan for correcting any records that have errors
Conversion Log Details 1Conversion Log Details 1
1 2 3 4 5 6 7 # 11 secs read=982 changed=791 880=0 okay=982 errors=0 written=982# 21 secs read=1931 changed=1558 880=0 okay=1931 errors=0 written=1931# 29 secs read=2848 changed=2087 880=0 okay=2848 errors=0 written=2848# 36 secs read=3699 changed=2533 880=0 okay=3699 errors=0 written=3699# 43 secs read=4607 changed=3076 880=0 okay=4607 errors=0 written=4607# 51 secs read=5519 changed=3610 880=0 okay=5519 errors=0 written=5519
=============================================================
Legend
1 number of seconds used by job so far
2 read=number of records processed
3 changed=number of records changed
4 880=how many records contain 880s
5 okay=# records processed successfully
6 errors=# records not processed due to errors
7 written=# records written to the database
Conversion Log Details 2Conversion Log Details 2
1 2 3 4 5 6 7 8=bib 6213: [17](700): c->8 loose char page=0 at 20 '091e ..‘
9=bib 35322: [14](856): c->8 undefined char page=0 at 61 'fc7220486973746f .r Histo‘
10=bib 35516: [23](856): c->8 no char to combine to page=0 at 82 '1e .‘
================================================================
1 record type and id
2 index within record of field that generated error
3 tag that generated error
4 c->8 indicates conversion to UTF-8 encoding
5 description of error
6 page=subset to which source character belongs
7 at # position of source character that caused error
8 hex dump of source character
9 description of error
10 description of error
Conversion Log Details 3Conversion Log Details 3
loose char: a warning message indicating that a character not strictly part of Voyager encoding has been converted (e.g. unexpected carriage return)
no char to combine to:
a warning message indicating that a combining character appeared but it lacks a base character with which to combine (e.g. umlaut but no a, o, u base letter)
undefined char: an error message indicating that there is a single character that cannot be mapped to UTF-8
Identifying Non-Unicode DataIdentifying Non-Unicode Data
Select a color for Conversion records in Session Defaults and Preferences—Colors-Fonts tab to identify records that did not complete the conversion process.
Identifying Non-Unicode DataIdentifying Non-Unicode Data
Records that did not complete the conversion process then display in the color selected in Options/Preferences.
Identifying Non-Unicode DataIdentifying Non-Unicode Data
Records that cannot be converted to Unicode are viewable in the Cataloging module with nc (not converted) displayed in the Title Bar.
Any characters that cannot be matched or recognized are replaced with a Unicode substitution character.
Fonts and UnicodeFonts and Unicode
A MARC record may contain non-Roman characters even though you cannot see them.
• records are sure to display correctly if a Unicode-compliant font has been selected
Lucida Sans Unicode• installed by default with Windows
Arial Unicode MS • good choice for libraries with mixed cataloging• included with Microsoft Office and other
Microsoft products
The Rest of VoyagerThe Rest of Voyager
Non-MARC data is not converted• Acquisitions data• Circulation data (patron info, etc.)• Item data
Reporter• not Unicode standard compliant• translates data to LATIN1
AgendaAgenda
Introduction
Your Work Environment
Conversion
New Features• Cataloging
Diacritics & Special Characters, Importing Records, New Record Views, Search URIs
• WebVoyágeBrowsers, Searching, Displaying
• Interacting with Other Systems
Enhanced Searching
Learning More/Questions??
Diacritic/Special Character Diacritic/Special Character EntryEntry
Cataloging practices: then and now
• pre-Unicode input in Cataloging = diacritic precedes the base character
Example: Espa~na• post-Unicode input in Cataloging = diacritic
follows the base characterExample: Espan~a
• ability to display combined characters in Cataloging client is an improvement over past versions and a way to insure accurate entry
Example: España
Special Characters.cfgSpecial Characters.cfg
SpecialCharacters.cfg, located in the C:\Voyager\Catalog folder, defines the content of the special character entry dialog box. Operators may edit this file.
Finding Little Used CharactersFinding Little Used Characters
• for situations where a character not part of the Special Characters list is needed, operator can use Character Map from MS Windows
• typically located at Start – Programs – Accessories – System Tools – Character Map
• locate character or perform search
• select and copy character, then paste into position in bib record
Input of Non-Roman TextInput of Non-Roman Text
Voyager with Unicode added the option for Cataloging operators to use all of the standard Microsoft Windows keyboard and input method editors (IMEs).
With this functionality in place, operators may search for, display, and edit the contents of all MARC records using the full range of UTF-8 characters.
Entire JACKPHY group is part of the UTF-8 character set which includes right-to-left input needed for Arabic, Persian, Hebrew and Yiddish.
Linking in a MARC21 RecordLinking in a MARC21 Record
Tag I1 I2 Subfield Data
100 1 ‡6 880-01 ‡a An, Zhen.
245 1 0 ‡6 880-02 ‡a Ri yue yun yan / ‡c An Zhen zhu.
250 ‡6 880-03 ‡a Di 1 ban.
260 ‡6 880-04 ‡a Changchun Shi : ‡b Changchun chu ban she, ‡c 1997.
300 ‡a 4, 2, 291 p. ; ‡c 21 cm.
440 0 ‡6 880-05 ‡a Zhongguo li dai wang chao xing shuai qu shi lu
500 ‡a Non-Roman script – Chinese
651 0 ‡a China ‡x History ‡y Ming dynasty, 1368-1644.
880 1 ‡6 100-01/$1 ‡a 安 震 .
880 1 0 ‡6 245-02/$1 ‡a 日月 云烟 / ‡c 安 震 著 .
880 ‡6 250-03/$1 ‡a 第 1 版 .
880 ‡6 260-04/$1 ‡a 长春市 : ‡b 长春 出版社 ,‡c 1997.
880 0 ‡6 440-05/$1 ‡a 中国 历代 王朝 兴衰 启示录
Using the On-Screen Using the On-Screen KeyboardKeyboard
Typically, the path is Start—Programs—Accessories—Accessibility—On-Screen Keyboard
Importing RecordsImporting Records
• expected character set needs to be accurately identified if records are to be imported correctly
• some experimentation may be necessary to determine the correct character set
• let’s look at some details to help everyone understand what is happening
Bulk ImportBulk Import
• fundamentally the same as before, although leader byte 9 is checked against the incoming character set identified in the import rule.
• blank = non-Unicode™; converted & imported
• ‘a’ = Unicode™; imported
• neither blank nor ‘a’; errors out – not imported
Expected Character SetExpected Character Set
• Character set mapping for Bulk Import is designated in the Bulk Import rule in SysAdmin—Cataloging—Bulk Import Rules.
MARC ExportMARC Export
• default export character set is MARC21 UTF-8
• use the –a option to choose different character set (in the command line)• see page 10-8, in Voyager with Unicode
Technical User’s Guide for more detail
• if mapping for a composed character is not found, it decomposes and Voyager attempts to find a match for each part
New ISBN IndexesNew ISBN Indexes
For improved duplicate detection:
ISBN Indexes • 020N 020a Number only• 020R 020z Number only
020 |a 1234567890 (Knopf)020 |a 1234567890
Check Bibliographic and Authority duplicate detection profiles in System Administration!
HTTP PostingHTTP Posting
• much easier access to WebVoyáge display from clients
• toggle record view from any staff client to WebVoyáge
• if configured, Send Record To option is available via Record text menu
• configured in voyager.ini file [MARC POSTing] stanza
Search URISearch URI
• drives searches to resources on the web• is PC specific and adds new button to search
interface in staff clients when configured in voyager.ini file
• click button…a browser is opened & search is executed
• some possible applications• link to Google or other Internet resource• link to another OPAC• link to LC authorities file
Adding Search URIsAdding Search URIs
[SearchURI]
Name=GoogleURI=http://www.google.comCopy=YSearchSyntax=/search?&q=<searchtext>
Name=Barnes&NobleURI=http://search.barnesandnoble.comCopy=YSearchSyntax=/booksearch/results.asp?WRD=<searchtext>
Name=Gale GroupURI=http://www.galegroup.comCopy=YSearchSyntax=/servlet/SearchPageServlet?region=9&imprint=<searchtext>
Interacting with Other Interacting with Other SystemsSystems
Incoming Z39.50 Connections
• records in Unicode databases are UTF8-encoded
• z3950svr may send either or both MARC8-encoded or UTF8-encoded records
• default is set to send MARC8 encoded records
• but, two different z3950svr ports can be configured to provide records in both formats, thereby accommodating all sites connecting to database
Interacting with Other Interacting with Other SystemsSystems
Outgoing Z39.50 Connections
• retrieves and displays records of any type in UTF-8
• converts incoming records based on new Database Definitions setting in System Administration called ‘Source Character Set’
• Latin1 (non Unicode)• MARC 21 MARC8 (non Unicode)• MARC21 UTF8• OCLC (non Unicode)• RLIN legacy (non Unicode)• Voyager legacy (non Unicode)
AgendaAgenda
Introduction
Your Work Environment
Conversion
New Features
Enhanced Searching
• WebVoyáge
• Staff clients
Learning More/Questions??
WebVoyáge and UnicodeWebVoyáge and Unicode
MARC data supplied to the browser in UTF-8
• IE 6+ generally displays Unicode characters correctly; some characters do not display correctly unless a Unicode-compliant font is selected
• Netscape 6+ figures out that it needs to display Unicode characters without any special settings
• consider new help text in your OPAC to help patrons understand about language options, especially if there are records using different languages in your database
WebVoyáge and UnicodeWebVoyáge and Unicode
Search and display in native languages for staff and users:
• OPAC and Cataloging client both allow Unicode character input; i.e., you can search for and retrieve records in native languages
• record display includes non-Latin scripts, including right-to-left scripts like Arabic and Hebrew
• Voyager takes advantage of the web browser’s native rendering support to present characters correctly
Enhanced SearchingEnhanced Searching
Support more strategies for finding results by using:
• Holdings keyword searching
• Keyword-in-Headings searches
• Wildcards for left and internal truncation
Holdings Keyword Holdings Keyword SearchingSearching• available in staff clients only• requires Boolean operators if searching for more
than one term• HKEY index searched by default• new MFHD keyword indexes may be created
Available for:
• staff name headings• staff name/title headings• OPAC name headings• OPAC name/title headings• staff title headings• staff subject headings• OPAC title headings• OPAC subject heading
Keyword-in-HeadingsKeyword-in-Headings SearchingSearching
Keyword-in-HeadingsKeyword-in-Headings SearchingSearching
• available in both staff clients and WebVoyáge
• provide alphabetical results list of every heading containing search term
• use Boolean operators (and, or, not) to combine search strings, with “and” implied
• show all headings keyword searches on the history tab
Left/Internal TruncationLeft/Internal Truncation
• place wildcard characters (? or %) at the beginning or in the middle of a search term
• use the wildcard, ?, to match on zero, one or more characters
• use the wildcard, %, as the new single-character matching character
AgendaAgenda
Introduction
Your Work Environment
Conversion
New Features
Enhanced Searching
Learning More/Questions??
Learning More…Learning More…
Coded Character Sets: A Technical Primer for Librarians (EndUser 2004: Session 29)
Transitioning To Unicode: Strategies for Tidying Your Data (EndUser 2004: Session 45)
Why Unicode? (EndUser 2004: Session 65)
Voyager with Unicode Release Handbook
Voyager 5.0 Release Handbook
Voyager with Unicode Cataloging User’s Guide
Voyager 5.0 Cataloging User’s Guide
Learning More…Learning More…
880 – Alternate Graphic Representation (R)http://www.loc.gov/marc/bibliographic/ecbdhold.html#mrcb880
OCLC Character Setshttp://www.oclc.org/support/documentation/worldcat/records/subscription/5/5.pdf
Original Scripts in RLG Databaseshttp://www.rlg.org/origscripts.html
MARC 21 Concise Bibliographic: Control Subfieldshttp://www.loc.gov/marc/bibliographic/ecbdcntf.html
MARC 21 Concise Bibliographic: Multiscript Recordshttp://www.loc.gov/marc/bibliographic/ecbdmulti.html
Learning More…Learning More…
SupportWeb: KnowledgeBase, EndUser archives, Voyager-L listserv archives
Questions????