archives hub ead 2010_extended

89
Introduction to EAD (extended version) Lisa Jeskins and Bethan Ruddock Archives Hub Mimas

Upload: lisa-jeskins

Post on 27-May-2015

400 views

Category:

Documents


1 download

DESCRIPTION

Extended version of Archives Hub presentation

TRANSCRIPT

Page 1: Archives hub ead 2010_extended

Introduction to EAD (extended version)

Lisa Jeskins and Bethan RuddockArchives HubMimas

Page 2: Archives hub ead 2010_extended

By the end of today’s session we will have given you an introduction to:

• what interoperability means• what XML is, what it does and why it is important• EAD structure and syntax• EAD and hierarchies• UK Archives Discovery Network (UKAD)

Objectives

Page 3: Archives hub ead 2010_extended

Interoperability

Page 4: Archives hub ead 2010_extended

the ability of two or more systems or components to exchange information and to use the information that has been exchanged

(IEEE Standard Computer Dictionary )

What is Interoperability?

Page 5: Archives hub ead 2010_extended

the ability to exchange/share data

integration of information resources presented in different formats

within a domain or across domains

advantages of cross-searching

XML facilitates interoperability

About Interoperability

Page 6: Archives hub ead 2010_extended

Data exchange standards such as:

◦Z39.50

◦SRU

Types of interoperability

Page 7: Archives hub ead 2010_extended

user can easily search across and retrieve resources from a wealth of systems

moving beyond individual websites for individual resources (silo approach)

End result…

Page 8: Archives hub ead 2010_extended

http://www.ukoln.ac.uk/interop-focus/

◦to explore, publicise and mobilise the benefits and practice of effective interoperability across diverse information sectors

Interoperability Focus

Page 9: Archives hub ead 2010_extended

An Introduction to XML

Page 10: Archives hub ead 2010_extended

Extensible Markup Language

XML is a grammatical system for creating languages: ◦ a meta-language

Use XML to design your own markup language, consisting of meaningful tags that describe the data they contain

Create a language for describing…anything

What is XML?

Page 11: Archives hub ead 2010_extended

XML does not do anything itself. It is pure information wrapped in XML tags

You must use other means to send, receive or display the data

Something to remember about XML

XML XML technologies

is used by to createDetailed description to view in a browser

Summary entry to view in a browser

PDF for print

Page 12: Archives hub ead 2010_extended

XML is not about content, though there might be certain restrictions on content

XML is essentially about structure

Creating a consistent structure via XML tagging enables content to be easily identified (by machines) and used flexibly

XML provides structure

Page 13: Archives hub ead 2010_extended

XML: elements

<title> Alice in Wonderland </title>

*XML allows you to define your tags*

<book>Alice in Wonderland</book>

<filmtitle>Alice in Wonderland</filmtitle>

<tag> content </tag>

Page 14: Archives hub ead 2010_extended

Attributes are simple name/value pairs associated with an element

<tag attribute_name=“attribute_value”>content</tag>

<language>English</language>

<language langcode=“eng”>English</language>

<date normal=“2004”>20 Sept 2004</date>

XML attributes

Page 15: Archives hub ead 2010_extended

XML Syntax

<tag attribute_name=”attribute_value”>content</tag>

<tree>hornbeam</tree>

<tree type=”deciduous”>hornbeam</tree>

<date normal=”2004”>20 May 2004</date>

<date>20 May 2004</date>

This is an XML element

Page 16: Archives hub ead 2010_extended

<trees><tree type=“deciduous”>

<species>oak</species><fruit>acorn</fruit>

</tree><tree type=“coniferous”>

<species>pine</species><fruit>pine cone</fruit>

</tree></trees>

Nested elements

Page 17: Archives hub ead 2010_extended

<catalog><cd>

<title>OK Computer</title><artist type=“band”>Radiohead</artist><genre>pop</genre><year>1997</year>

</cd>

<cd><title>Stanley Road</title><artist type=“solo”>Paul Weller</artist><genre>pop</genre><year>1995</year>

</cd></catalog>

XML example

<title>Stanley Road</title><artist>Paul Weller</artist><type>solo</type><genre>pop</genre><year>1995</year>

Page 18: Archives hub ead 2010_extended

Alice in WonderlandLewis Carroll1 volumehardback

Content

Page 19: Archives hub ead 2010_extended

Title Alice in Wonderland

Author Lewis Carroll

Extent 1 volume

Format hardback

Content in a database

Page 20: Archives hub ead 2010_extended

<books><title>Alice in Wonderland</title><author>Lewis Carroll</author><extent>1 volume</extent><format>hardback</location></books>

XML: Structure

Page 21: Archives hub ead 2010_extended

a root element is required<catalog>

…..all your tags and content…</catalog>

closing tags are required

case matters

XML must be well-formed

Page 22: Archives hub ead 2010_extended

elements must be properly nested

<physdesc><extent>10 boxes</extent></physdesc>

<physdesc><extent>10 boxes</physdesc></extent>

XML must be well-formed (2)

Page 23: Archives hub ead 2010_extended

attribute values must be enclosed in quotation marks, e.g. langcode=“fre”

element names must obey some basic rules◦ e.g. cannot start with numbers or punctuation characters,

cannot contain spaces ◦ e.g. <cd name> or <?name> would be incorrect

XML must be well-formed (3)

Page 24: Archives hub ead 2010_extended

Marking up a recipe

Look at the following recipe for Chocolate Brownies – How would use XML to mark this up?

(I’m reliably informed the recipe works!)

Page 25: Archives hub ead 2010_extended

375g butter 375g dark chocolate 1 tablespoon vanilla extract 6 eggs 500g sugar 225g plain flour

Preheat the oven to 180°C, 350°F or gas mark 4. Grease a swiss roll tin or oblong baking dish. Melt the chocolate and butter in a bowl over a saucepan of hot water. Add the vanilla and set the mixture aside until it is lukewarm.

Whisk the eggs and sugar into the mixture. Sift in the flour and baking powder and fold gently until the mixture is just combined. Pour into the greased tin and bake for 20 to 30 minutes until the brownie is cooked around the edges, but still soft in the middle.

Cool and cut into squares. Makes 48 brownies

Chocolate Brownies

Page 26: Archives hub ead 2010_extended

<recipe><title>Chocolate Brownies</title>

<ingredients><item>375g butter</item><item>375g dark chocolate</item><item>1 tablespoon vanilla extract</item><item>6 eggs</item><item>500g sugar</item><item>225g plain flour</item></ingredients>

<method><p>Preheat the oven to <temp>180°C, 350°F or gas mark 4</temp>.Grease a swiss roll tin or oblong

baking dish. Melt the chocolate and butter in a bowl over a saucepan of hot water. Add the vanilla and set the mixture aside until it is lukewarm. Whisk the eggs and sugar into the mixture.</p>

<p>Sift in the flour and baking powder and fold gently until the mixture is just combined. Pour into the greased tin and bake for <bakingtime>20 to 30 minutes</bakingtime> until the brownie is cooked around the edges, but still soft in the middle.</p>

<p>Cool and cut into squares.</p></method><serving>Makes 48 brownies</serving></recipe>

Possible XML markup for recipe

Page 27: Archives hub ead 2010_extended

<ingredient>375 g butter</ingredient>

Or

<ingredient><item>375 g butter</item>

</ingredient>

Or

<ingredient><type>butter</type><quantity>375 g</quantity>

</ingredient>

Exchanging recipes..?

Page 28: Archives hub ead 2010_extended

http://www.archiveshub.ac.uk/temp/recipe.xml

Displaying the recipe online

Page 29: Archives hub ead 2010_extended

Valid XML: rules specify elements and attributes used and how used

Valid XML provides consistency and facilitates the exchange of data

Valid XML is important for displaying, processing and exchanging XML in a wider environment

Valid XML

Page 30: Archives hub ead 2010_extended

A Document Type Definition or Schema defines the building blocks of an XML document

It specifies elements and attributes and defines how they can be used

People can agree to use a common DTD/Schema for interchanging data

Document Type Definitions

Page 31: Archives hub ead 2010_extended

<?xml version="1.0" encoding="UTF-16"?><!ELEMENT recipe (title, intro?, ingredients+, method, serving*)><!ELEMENT title (#PCDATA)><!ELEMENT intro (#PCDATA)><!ELEMENT ingredients (item+)><!ELEMENT item (#PCDATA)><!ELEMENT method (p+)><!ELEMENT p (#PCDATA | temp | bakingtime)*><!ELEMENT temp (#PCDATA)><!ELEMENT bakingtime (#PCDATA)><!ELEMENT serving (#PCDATA)>

Recipe DTD

Page 32: Archives hub ead 2010_extended

Schemas perform the same task as DTDs

Schemas use XML syntax

Schemas support complex data types

Easier to describe allowable content

One XML document can point to more than one schema

Schemas

Page 33: Archives hub ead 2010_extended

<?xml version="1.0"?><notexmlns="http://www.w3schools.com"xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"xsi:schemaLocation="http://www.w3schools.com note.xsd">

<note> <to>Rachel</to> <from>John</from>

<heading>Reminder</heading> <body>Don't forget the concert!</body>

</note>

A simple XML document

Page 34: Archives hub ead 2010_extended

<?xml version="1.0"?> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"

targetNamespace="http://www.w3schools.com" xmlns="http://www.w3schools.com" elementFormDefault="qualified">

<xs:element name="note"> <xs:complexType>

<xs:sequence> <xs:element name="to" type="xs:string"/> <xs:element name="from" type="xs:string"/> <xs:element name="heading" type="xs:string"/> <xs:element name="body" type="xs:string"/>

</xs:sequence> </xs:complexType> </xs:element> </xs:schema>

Example of a simple Schema

Page 35: Archives hub ead 2010_extended

What about display?

XML file DTD or Schema Valid XML

Blue Elephant Papers

……………………

…………

Blue Elephant Papers

Browse List

Page 36: Archives hub ead 2010_extended
Page 37: Archives hub ead 2010_extended
Page 38: Archives hub ead 2010_extended

Use XML technologies – for displaying, retrieving, transforming, manipulating

XSLT – Extensible Stylesheet Language for Transformations

Many technologies available to manipulate XML documents

Displaying XML

Page 39: Archives hub ead 2010_extended

transformation involves the reading in of an XML file and an XSLT file to a processor, which can then generate some output – typically HTML

Transformation of XML

XSLT

XML

processorHTML output

Page 40: Archives hub ead 2010_extended

HTML is ONLY for display, typically in a Web browser

HTML tags do not describe the content

HTML cannot easily be extracted by machines for different purposes

XML tags can be specified by anyone; HTML tags are prescribed

HTML and XML (1)

Page 41: Archives hub ead 2010_extended

HTML and XML (2)

HTML: <h1> Papers of Peter Rowe </h1>XML: <title> Papers of Peter Rowe </title>

HTML: <b> 21 May 2004 </b>XML: <date> 21 May 2004 </date>

Page 42: Archives hub ead 2010_extended

International standard, supported by the W3C

It is open, licence free and platform neutral

It is human and machine readable

XML documents are text documents

Why use XML?

Page 43: Archives hub ead 2010_extended

XML does not determine the presentation of the data◦ use stylesheets to present XML data◦ with proprietary systems content is inextricably bound up

with format

Hierarchical structure – good for archive descriptions!

More reasons to use XML...

Page 44: Archives hub ead 2010_extended

XML is the main basis for defining data exchange languages

Meaningful tags facilitate extraction – data can be manipulated as required

...and for data exchange

Page 45: Archives hub ead 2010_extended

All publicly funded bodies should use XML for data exchange (e-GIF)

XML has been widely adopted commercially as well as in the public sector

The Government mandates XML

Page 46: Archives hub ead 2010_extended

XML is:◦ simple◦ flexible◦ great for data exchange

XML must be: ◦ well-formed ◦ valid

DTDs and Schemas:◦ to create valid XML◦ provide tags, attributes and rules

XML requires other XML technologies◦ e.g. stylesheets can transform XML for display

Summary

Page 47: Archives hub ead 2010_extended

EAD: An introduction

Page 48: Archives hub ead 2010_extended

EAD = Encoded Archival Description

EAD is XML for finding aids

A data structure standard – not a content standard

A structure that allows finding aids to be indexed, searched, retrieved and navigated

Compatible with ISAD(G)

What is EAD?

Page 49: Archives hub ead 2010_extended

EAD is:

Flexible enough to deal with all types of finding aids: single or multi-level, long or short, lists or calendars etc.

Used to create new finding aids as well as converting old ones to standardised form

Used to share data between systems

What is EAD?

Page 50: Archives hub ead 2010_extended

EAD is maintained and developed by an international working group

Develops and publishes documentation and tools: tag library, guidelines, EAD Cookbook, websites

EAD Working Group - EADWG

Page 51: Archives hub ead 2010_extended

EAD structure

Page 52: Archives hub ead 2010_extended

<ead>

<eadheader></eadheader>

<archdesc><did></did>

</archdesc>

</ead>

Basic EAD file structure

Page 53: Archives hub ead 2010_extended

<ead> EAD root element<eadheader> EAD file information wrapper

</eadheader>

<archdesc> Finding aid wrapper

<did></did> Core collection information wrapper

</archdesc></ead>

Basic EAD file structure

Page 54: Archives hub ead 2010_extended

EAD beetle

<archdesc>

<eadheader>

<did>

sub-fonds descriptions

Page 55: Archives hub ead 2010_extended

<eadheader><eadid><filedesc>

<titlestmt><titleproper>

<profiledesc> <revisiondesc>

<eadheader>

EAD file informationIdentifier

TitleCreationRevision

Page 56: Archives hub ead 2010_extended

Within <archdesc> there are elements for:

Description Presentation Hierarchy

Finding aid elements

Page 57: Archives hub ead 2010_extended

<archdesc><did><scopecontent> <bioghist> <arrangement> <controlaccess>

Descriptive elements

Archival descriptionDescriptive informationScope and ContentBiographical/Admin. HistoryArrangementAccess points

Page 58: Archives hub ead 2010_extended

<did><unitid><unititle><unitdate><origination><repository><physdesc>

<extent><genreform><physfacet>

<physloc><container><abstract>

</did>

Descriptive informationReferenceTitleCovering datesCreator(s)RepositoryPhysical description

ExtentFormPhysical Facet

LocationContainer typeBrief description

<did> elements

Page 59: Archives hub ead 2010_extended

<archdesc level="fonds"> <did> <unitid>GB 0001 Foster</unitid> <unittitle>Papers of Dr Foster</unittitle> <unitdate normal = "1820-1833">1820-1833</unitdate> <repository>University of Gloucestershire</repository> <physdesc> <extent>1 box</extent> <physfacet>Four folders of letters, 230 folios</physfacet> </physdesc> <langmaterial><language langcode=“eng”>English<language> </langmaterial> <origination>Dr Foster</origination> </did>

Hub <did> EAD2002

Page 60: Archives hub ead 2010_extended

<acqinfo><custodhist><appraisal><processinfo><accruals><altformavail><accessresrict><userestrict>

<prefercite>

Acquisition informationCustodial historyAppraisal and selectionProcess InformationAccruals information CopiesAccess restrictionsUser restrictionsCitation information

Administrative information elements

Page 61: Archives hub ead 2010_extended

<bibliography><fileplan><otherfindaid><relatedmaterial><separatedmaterial><index>

Publication noteClassification schemeOther finding aidsRelated materialSeparated material Keywords

Additional information elements

Page 62: Archives hub ead 2010_extended

<controlaccess><name><corpname><persname><famname><geogname><occupation><function><genreform><subject>

Controlled access headingsNames (general)Corporate body namePersonal nameFamily namePlace name OccupationsFunctions (administrative)Genre and FormSubject

<controlaccess> elements

Page 63: Archives hub ead 2010_extended

<head><p>; <lb> <emph>; <blockquote><list><item>;<chronlist><chronitem>;

<ref>; <ptr>; <dao>

HeadingsLayoutItalics and quotesLists

References, pointersand links to digital objects

Presentation elements

Page 64: Archives hub ead 2010_extended

<head><p>; <lb> <emph>; <blockquote><list><item>;<chronlist><chronitem>;

<ref>; <ptr>; <dao>

HeadingsLayoutItalics and quotesLists

References, pointersand links to digital objects

Presentation elements

NB: EAD is NOT about the presentation of your finding aids, but about their

syntax. Separate software will take care of the display of the information.

Page 65: Archives hub ead 2010_extended

ISAD(G) (v.2)

3.1.1 Reference code(s)

3.1.2 Title3.1.3 Dates of creation3.1.4 Level of description3.1.5 Extent of the unit3.2.1 Name of creator3.2.2 Administrative/Biographical

history3.2.3 Custodial history3.2.4 Immediate source of acquisition3.3.1 Scope and content3.3.2 Appraisal, destruction and

scheduling

EAD 2002

<unitid> countrycode and repositorycode attributes

<unittitle><unitdate><archdesc> and <c> level attribute

<physdesc>, <extent><origination><bioghist>

<custodhist><acqinfo>

<scopecontent><appraisal>

ISAD(G) to EAD

Page 66: Archives hub ead 2010_extended

3.3.3 Accruals 3.3.4 System of arrangement3.4.1 Access conditions3.4.2 Copyright/Reproduction3.4.3 Language of material3.4.4 Physical characteristics3.4.5 Finding aids3.5.1 Location of originals3.5.2 Existence of copies3.5.3 Related units of description

3.5.4 Publication note3.6.1 Note

<accruals><arrangement><accessrestrict><userestrict><langmaterial><phystech><otherfindaid><originalsloc><altformavail><relatedmaterial> and <separatedmaterial>

<bibliography><odd>

ISAD(G) to EAD

Page 67: Archives hub ead 2010_extended

EAD version 1 DTD

EAD 2002 DTD

EAD 2002 Schema

Available from http://www.loc.gov/ead/

Human-readable version: EAD Tag Library (Society of American Archivists)

EAD DTD

Page 68: Archives hub ead 2010_extended

Library of Congress Official EAD site: http://www.loc.gov/ead/

Tag Library: http://www.loc.gov/ead/tglib/index.html

EAD Roundtable Help Pages: http://www.archivists.org/saagroups/ead/

EAD Documentation

Page 69: Archives hub ead 2010_extended

EAD and hierarchy

Page 70: Archives hub ead 2010_extended

ISAD(G) states that to be a conformant archival description a finding aid must:

Be hierarchical◦ Description from the general to the specific◦ Information relevant to the level of description◦ Linking of descriptions (logical sequence)◦ Non-repetition of information

Contain a minimum set of data elements

EAD and ISAD(G)

Page 71: Archives hub ead 2010_extended

Recommended elements for lower level descriptions:◦ reference code ◦ title ◦ date(s) ◦ extent of the unit of description ◦ level of description

Lower level elements

Page 72: Archives hub ead 2010_extended

ISAD(G) levels: Fonds Sub-fonds Series Sub-series File Item

EAD levels:<archdesc><dsc><c01><c02><c03><c04> <c05>

EAD and Hierarchy

Page 73: Archives hub ead 2010_extended

<ead>…<archdesc>

[collection level description here]◦ <dsc>

<c01>[series] description 1<c02>[file] description 1</c02><c02>[file] description 2

<c03>[item] 1</c03><c03>[item] 2</c03>

</c02></c01><c01>[series] description 2....

◦ </dsc></archdesc>

</ead>

Representing hierarchies

c02 c02

c03 c03

c01

Page 74: Archives hub ead 2010_extended

<c01 level = "subfonds"><did>

<unitid>GB 0324 MS 54</unitid><unittitle>Correspondence files</unittitle><unitdate>1920-1945</unitdate><physdesc><extent>4 files</extent></physdesc>

</did><scopecontent>…</scopecontent>

<c02 level = "series"><did>…</did><scopecontent>…</scopecontent>

</c02>

</c01>

Nesting items

Page 75: Archives hub ead 2010_extended

EAD supports two ways of representing levels

<c> is used in A2A, <c0*> on the Hub

Slightly easier to use <c0*>, as the numbers give you more of an idea of the level you are working at

<c> or <c0*>?

Page 76: Archives hub ead 2010_extended

<dsc type="combined">

<c level="series"> <did> <unitid>Series 1</unitid><unittitle>Correspondence</unittitle> </did><scopecontent>[...]</scopecontent>

<c level="subseries"> <did> <unitid>Subseries 1.1</unitid> <unittitle>Outgoing Correspondence</unittitle> </did>

<c level="file"> <did> <unittitle>AbbingerAldrich</unittitle> </did> </c> </c> </c> </dsc>

Hierarchy <c> tag

Page 77: Archives hub ead 2010_extended

XML is a meta-language for creating mark-up languages

XML files require other technologies for display, processing, etc.

For archive finding aids EAD is the DTD/Schema to use

Summing-up

Page 78: Archives hub ead 2010_extended

It is XML, which is an international standard

It is a simple and effective way of structuring content and providing meaning

Machines can manipulate the content in all sorts of ways

It is a great format to store finding-aids

EAD is a good thing because…

Page 79: Archives hub ead 2010_extended

Cross-searching initiatives

Page 80: Archives hub ead 2010_extended

Effective cross-searching requires:

◦Interoperability

which requires

◦Common standards

Cross-searching

Page 81: Archives hub ead 2010_extended

UK Archives

Page 82: Archives hub ead 2010_extended

UKAD: http://www.ukad.org/

To promote the opening up of data and to offer capacity for such a cross-searching capability across the UK archive networks and online repository catalogues

To lead and support resource discovery through the promotion of relevant national and international standards

To support the development and use of name authorities

UK Archives Discovery Network

Page 83: Archives hub ead 2010_extended

To advocate for the reduction of cataloguing backlogs and the retro-conversion of hard-copy catalogues

To promote access to digitized and digital archives via cross-searching resource discovery systems.

To work with other domains and potential funders to promote archive discovery

UKAD

Page 84: Archives hub ead 2010_extended

Fairly loose structure

Meetings about twice a year

Forum for discussion, sharing, connecting and collaborating

Creating a framework for activities (matrix)◦ International/national/regional◦ Meeting UKAD objectives, e.g. open up data; standards-based resource

discovery; retro-conversion

UKAD activities

Page 85: Archives hub ead 2010_extended

Not many UK archives currently using EAD as a storage format

EAD will increasingly be used as an export format from proprietary database systems like CALM, for use in XML-based gateways such as Aim25 and the Archives Hub

New software becoming available all the time, which makes it easier to create, search and display XML – much of this is open source and often free

EAD in the real world

Page 86: Archives hub ead 2010_extended

Differences in how EAD is used

Encourages interoperability but still requires work to ensure seamless cross-searching

EAD is flexible and includes a large number of tags which has advantages and disadvantages

EAD in the Hub and Aim25

Page 87: Archives hub ead 2010_extended

XML is an international standard for sharing information

EAD is the XML language for archival finding aids

EAD is not a content standard

Use ISAD(G) for content guidelines and thesauri or authority files for index terms

Summing-up

Page 88: Archives hub ead 2010_extended

You have used the Archives Hub’s EAD editor to create EAD records

XML Editors, such as XMetal or XMLspy can provide help with validating and with selecting tags and attributes

EAD will become increasingly important

Summing-up

Page 89: Archives hub ead 2010_extended

Any Questions?