unimarc and xml at the bn - bnportugal.gov.pt · open or new issues... the transport of unimarc...
TRANSCRIPT
UNIMARC and XML at the BN
Nuno FreireJosé Borbinha
Hugo Manguinhas INESC-ID
UNIMARC and XML at the BN
• Reference publication of the formats– MDR >>> HTTP://UNIMARC.INFO + PDF, ...
• Validation of records and reporting– MANGAS + IRIS + QUALICAT
• Preservation of records– REPOX – Metadata Repository
• Access and transport of records– URN.PORBASE.ORG
Reference publication of the formats
Linking to the UNIMARC Reference
Publication in the WEB
http://www.unimarc.info/bibliographic/2.3/en/100 urn:unimarc:<type>:<version>:<locale>:/100
Linking to the UNIMARC Reference Publication in the WEB
Links to the Dublin Core Metadata Registry(http://www.dublincore.org/dcregistry/)
Links to the UNIMARC Registry
(http://www.unimarc.info)
The Z39.50 PortugueseGateway
http://zzz.porbase.org
Reference publication of the formats (publication in PDF from the schema and descriptions)
Validation of records and reportingMANGAS DIAG QUALICAT
Applications of the UNIMARC schema
coded in XML
Validation of records and reporting
Validation of records and reporting
Validation of records and reporting
Validation of records and reporting
REPOX – Metadata Repositorycd REPOX Data Model
REPOX Manager
Data Collection
Record Type
Data Source Interface
Access Point
Data Serv ice
Archiv al Information Package
Preserv ation Description InformationContent Information
Data Obj ect
Digital Signature Record
0..*
0..*
0..*
1..*
1..*
1..*
1..*
REPOX – Metadata Repositorycd REPOX Deployment Model
Jav a EE Application Serv er
Data Source
REPOX Manager
File System
Web Interface Web Services Interface
Digital Signature Manager
Web Client External Serv ice
Command Line Interface
Administrator
«arti fact»XML Records
Data Source Interface
Access Points Manager
MySql DBMS
«arti fact»Record Access Point Indexes
1..*
1..*
Access and transport of
records
http://urn.porbase.org
Access to PORBASE records by identifiers
Successful usage ofMARCXML for exchanging UNIMARC records
Access and transport of records...
Demonstration:
http://urn.porbase.org
UNIMARC and XML
Hugo ManguinhasNuno Freire
José Borbinha INESC-ID
UNIMARC and XML
1. The transport of UNIMARC records in XML
2. Representations of UNIMARC • Schema coded in XML (vocabulary and constraints)• Definitions recorded in XML
3. A Metadata Registry for UNIMARC
4. Open or new issues...
The transport of UNIMARC records in XML
• MARCXML
• MarcXchange
MARCXML
• About– Simple and flexible XML Schema
• XML schema to code MARC21 records• The schema supports also UNIMARC
– Lossless conversion between UNIMARC or MARC21 and XML
• Limitations– MARC validations not enforced by the schema
• Usage– Wide usage...
MarcXchangehttp://www.bs.dk/marcxchange/
http://www.niso.org/international/SC4/n577.pdf
• About• Superset of MARCXML (every valid MARCXML file is also a valid
MarcXchange file)• ISO 2709 centric:
• Allows more than 2 indicators• Allows more than 1 subfield code length
• Defines also extensions to the ISO 2709 specifications (format and type)
• Status–”ISO/DIS 25577 Information and documentation – MarcXchange”–Sent to DIS ballot. The voting period started 2006-02-22 and
terminates on 2006-07-24.–Library of Congress has accepted to host the Maintenance Agency
for MarcXchange.
UNIMARC and XML
The UNIMARC SchemaThe formal description of UNIMARC as a family of formats, in a specific schema, coded in XML, able to accommodate the formal lexis, syntaxes and multilingual textual descriptions of any format and version, to be interpreted by both machines and humans.
A Schema for UNIMARC • Purposes
– Support to the maintenance of the formats– Automatic publication of the formats
• For human reference (HTML, PDF, etc.)• For machine processing
– Automatic validation of UNIMARC records– Promote generic interoperability and usage of
UNIMARC metadata in information systems• Requirements
– Formal ways to express the UNIMARC vocabulary and constraints
– A UNIMARC Metadata Registry (MDR)
About Schema Languages...
Generic Schema Languages
XMLSchemaRELAX-NG
Schematron
Generic Schema Languages• Advantages generic schemas
– Wide spread use...• Disadvantages generic schemas
– These languages are usually very generic (which is good) but in this case that represents a level of abstraction too low, which would impose an impracticable scenario for our specific business: to express all the richness of UNIMARC!!!
– May evolve to unexpected directions, out of our control, targeting scenarios incompatibles with our business
• Only Schematron allows the complete definition of UNIMARC vocabulary and constraints (but it is not widely used... yet...)
Requirements for a UNIMARC Schema Language
• Express all UNIMARC vocabulary and constraints– Allowing declaration of UNIMARC formats– Allowing version control
• Close to the MARC concepts– Controlled language, technology independent
• Specific identification of vocabulary and constraints– Allow the definition of a URN space for the elements
of the format (required for metadata registries)
Toward MARC centricSchema Languages...
• UNIMARC Doc Schema (BookMARC)• UNIMARC Schema (BN)
Generic Schema Languages
XMLSchemaRELAX-NG
Schematron
UNIMARCSchema
UNIMARC Doc Schema
MARC centric Schema Languages
UNIMARC Doc Schema
• Developed by BookMARC• Schema language close to the
concepts of UNIMARC format• Combines schema language
concept with descriptive information– Makes it harder to understand– Adds useless information for
validation purposes– Increases schema length
• Actual version describes only a subset of the UNIMARC vocabulary and constraints
UNIMARC Schema
• Under development at BN since early 2003...
• Schema language centric on the UNIMARC concepts
• Allows the definition of all UNIMARC vocabulary and constraints
• Enables version control• Defines a URN mechanism for
all available rules– Required for MDR (Metadata
Registries)• Currently used by BN
The UNIMARC Metadata Registry
• Purpose:– A way to publish the vocabulary...
• Requirement:– Complient with the ISO 11179
Considerations• For exchanging purposes, and from the perspective of
UNIMARC, MarcXchange is a step in the right direction, but...
• ...in order to be recognized by in the Web Architecture we need to be able to distinguish MARC formats (MARC21, UNIMARC, etc) and sub formats (Bibliographic, Authority, Holdings and Classification). This requires a declarative way to identify the specific schemas!!!
• ...for interoperability between UNIMARC and MARC21 (mappings, etc.), we need to express inheritance from a general MARC schema (a requirement to be considered also by a related initiative from the MARC21 community)
BTW, shouldn't we have after all a common MARC language schema, to be shared by UNIMARC and MARC21?
Web Architecture:- http://www.w3.org/TR/webarch/#xml-namespaces
UNIMARC and XML at the BN
• Reference publication of the formats– MDR >>> HTTP://UNIMARC.INFO + PDF, ...
• Validation of records and reporting– MANGAS + IRIS + QUALICAT
• Preservation of records– REPOX – Metadata Repository
• Access and transport of records– URN.PORBASE.ORG