esip 2009 summer meeting, uc santa barbara, ca, july 7 – 10, 2009 1 stanford digital repository...

28
ESIP 2009 Summer Meeting, UC Santa Barbara, CA, July 7 – 1 0, 2009 1 Stanford Digital Reposito PREMIS & Geospatial Resources Nancy J. Hoebelheinrich InfoAnalytics San Mateo, CA

Upload: owen-freeman

Post on 30-Dec-2015

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: ESIP 2009 Summer Meeting, UC Santa Barbara, CA, July 7 – 10, 2009 1 Stanford Digital Repository PREMIS & Geospatial Resources Nancy J. Hoebelheinrich InfoAnalytics

ESIP 2009 Summer Meeting, UC Santa Barbara, CA, July 7 – 10, 2009

1

Stanford Digital Repository

PREMIS & Geospatial Resources

Nancy J. HoebelheinrichInfoAnalyticsSan Mateo, CA

Page 2: ESIP 2009 Summer Meeting, UC Santa Barbara, CA, July 7 – 10, 2009 1 Stanford Digital Repository PREMIS & Geospatial Resources Nancy J. Hoebelheinrich InfoAnalytics

ESIP 2009 Summer Meeting, UC Santa Barbara, CA, July 7 – 10, 2009

2

To Be Discussed

A Brief History of PREMIS An Overview of PREMIS data elements Uses for Geospatial Resources:

Examples

Page 3: ESIP 2009 Summer Meeting, UC Santa Barbara, CA, July 7 – 10, 2009 1 Stanford Digital Repository PREMIS & Geospatial Resources Nancy J. Hoebelheinrich InfoAnalytics

ESIP 2009 Summer Meeting, UC Santa Barbara, CA, July 7 – 10, 2009

3

A Brief History of PREMIS

PREMIS – Preservation Metadata came initially from cultural heritage / digital preservation communities

Built upon previous initiative (2001 - 02 ) Sponsored by two key library descriptive MD utilities

(OCLC and RLG) Preservation Metadata Framework working group Issued a report outlining types of information that

should be associated with an archived digital object

Page 4: ESIP 2009 Summer Meeting, UC Santa Barbara, CA, July 7 – 10, 2009 1 Stanford Digital Repository PREMIS & Geospatial Resources Nancy J. Hoebelheinrich InfoAnalytics

ESIP 2009 Summer Meeting, UC Santa Barbara, CA, July 7 – 10, 2009

4

A Brief History of PREMIS

In 2003 a PREMIS working group formed Comprised of practitioners building or working

on preservation repositories including national data centers in the UK & US, Netherlands, etc.

Focused upon implementable data elements Resulted in a two pronged effort:

Implementation survey Data dictionary of CORE preservation semantic units

(= data elements)

Page 5: ESIP 2009 Summer Meeting, UC Santa Barbara, CA, July 7 – 10, 2009 1 Stanford Digital Repository PREMIS & Geospatial Resources Nancy J. Hoebelheinrich InfoAnalytics

ESIP 2009 Summer Meeting, UC Santa Barbara, CA, July 7 – 10, 2009

5

A Brief History of PREMIS

PREMIS working group publications:“Implementing Preservation Repositories for

Digital Materials: Current Practice and Emerging Trends in the Cultural Heritage Community”, December 2004

“PREMIS Data Dictionary for Preservation Metadata, version 1.0”, May 2005

Page 6: ESIP 2009 Summer Meeting, UC Santa Barbara, CA, July 7 – 10, 2009 1 Stanford Digital Repository PREMIS & Geospatial Resources Nancy J. Hoebelheinrich InfoAnalytics

ESIP 2009 Summer Meeting, UC Santa Barbara, CA, July 7 – 10, 2009

6

A Brief History of PREMIS

PREMIS ImplementationPREMIS Editorial committee formedMaintained by Library of Congress“PREMIS Data Dictionary for Preservation Me

tadata, version 2.0”, March 2008

Who uses? See implementation registryPREMIS Implementors Group (PIG) listserv

for practitioners

Page 7: ESIP 2009 Summer Meeting, UC Santa Barbara, CA, July 7 – 10, 2009 1 Stanford Digital Repository PREMIS & Geospatial Resources Nancy J. Hoebelheinrich InfoAnalytics

ESIP 2009 Summer Meeting, UC Santa Barbara, CA, July 7 – 10, 2009

7

PREMIS Data Model for an “intellectual entity”

OBJECT

RIGHTS

EVENTS

AGENTS

Discrete unit of information in digital form

Rights or permissions info associated with Object or Agent

Important lifecycle events

Parties to Events and/or Rights

Page 8: ESIP 2009 Summer Meeting, UC Santa Barbara, CA, July 7 – 10, 2009 1 Stanford Digital Repository PREMIS & Geospatial Resources Nancy J. Hoebelheinrich InfoAnalytics

ESIP 2009 Summer Meeting, UC Santa Barbara, CA, July 7 – 10, 2009

8

PREMIS Data Model

Page 9: ESIP 2009 Summer Meeting, UC Santa Barbara, CA, July 7 – 10, 2009 1 Stanford Digital Repository PREMIS & Geospatial Resources Nancy J. Hoebelheinrich InfoAnalytics

ESIP 2009 Summer Meeting, UC Santa Barbara, CA, July 7 – 10, 2009

9

More about PREMIS Object

Is an abstraction, meant to cluster semantic units and clarify relationships

Has 3 subtypes: File – the usual suspect Bitstream – contiguous or non-contiguous data within

a file that has meaningful common properties for preservation purposes

Representation -- set of files, including structural metadata, needed for a complete and reasonable rendition of an Intellectual Entity.

Page 10: ESIP 2009 Summer Meeting, UC Santa Barbara, CA, July 7 – 10, 2009 1 Stanford Digital Repository PREMIS & Geospatial Resources Nancy J. Hoebelheinrich InfoAnalytics

ESIP 2009 Summer Meeting, UC Santa Barbara, CA, July 7 – 10, 2009

10

Assumptions underlying PREMIS

Not about “descriptive” metadata (used for search & discovery)

Not about “technical” metadata (usually about the format(s) of the component files or bitstreams)

These areas to be covered by domain specific metadata, e.g., FGDC or ISO profiles

Mind the Gap!

Page 11: ESIP 2009 Summer Meeting, UC Santa Barbara, CA, July 7 – 10, 2009 1 Stanford Digital Repository PREMIS & Geospatial Resources Nancy J. Hoebelheinrich InfoAnalytics

ESIP 2009 Summer Meeting, UC Santa Barbara, CA, July 7 – 10, 2009

11

Simple Example of use of PREMIS Object Data Elements

Applied at file levelAutomatic insertion by Ingest code to retain

important provenance info for each file before moving into the preservation repository

Original file name from data provider Original checksum Original file size

Page 12: ESIP 2009 Summer Meeting, UC Santa Barbara, CA, July 7 – 10, 2009 1 Stanford Digital Repository PREMIS & Geospatial Resources Nancy J. Hoebelheinrich InfoAnalytics

ESIP 2009 Summer Meeting, UC Santa Barbara, CA, July 7 – 10, 2009

12

Element Subelement or Attribute Value objectIdentifier

objectIdentifierType filename

objectIdentifierValue 0372001.tif preservationLevel bit preservation objectCategory file objectCharacteristics compositionLevel 0 fixity messageDigestAlgorithm MD5 messageDigest 0c77e67bebe3f338

4ec8bf4736648e41 size 315827432 format/ formatDesignation formatName TIFF originalName 0372001.tif

PREMIS Object Excerpt (v1.1)

Page 13: ESIP 2009 Summer Meeting, UC Santa Barbara, CA, July 7 – 10, 2009 1 Stanford Digital Repository PREMIS & Geospatial Resources Nancy J. Hoebelheinrich InfoAnalytics

ESIP 2009 Summer Meeting, UC Santa Barbara, CA, July 7 – 10, 2009

13

More about PREMIS Object relationships Defined as associations b/w two or more:

Object entities or Entities of different types, e.g., an Object & an Agent.

Recorded for long term preservation purposes Typical relationship types = structural (component of

representation), derivative (format varieties), dependent (required schema or database structure)

Could be expressed using other schemas for packaging the resource such as METS or XFDU or MPEG DIDL

Page 14: ESIP 2009 Summer Meeting, UC Santa Barbara, CA, July 7 – 10, 2009 1 Stanford Digital Repository PREMIS & Geospatial Resources Nancy J. Hoebelheinrich InfoAnalytics

ESIP 2009 Summer Meeting, UC Santa Barbara, CA, July 7 – 10, 2009

14

Use of PREMIS Rights data elements Applied at representation level Reference to donor’s Deposit Agreement (using

METS) Key info from the ingested Deposit Agreement

for immediate playback

Page 15: ESIP 2009 Summer Meeting, UC Santa Barbara, CA, July 7 – 10, 2009 1 Stanford Digital Repository PREMIS & Geospatial Resources Nancy J. Hoebelheinrich InfoAnalytics

ESIP 2009 Summer Meeting, UC Santa Barbara, CA, July 7 – 10, 2009

15

PREMIS Rights Excerpt (v1.1)Element Subelement or Attribute Value permissionStatement

xmlID SDR Access Phase 1

permissionStatementIdentifier permissionStatementIdentifierType Repository Permissions permissionStatementIdentifierValue All digital objects falling under

SDR Preservation Agreement_BitPreservation, v6.0, David Rumsey Map Collection

grantingAgreement grantingAgreementIdentification library_stanford_edu_fcab81ee605011db96c4339be

grantingAgreementInformation contractAbstract Version 6.0 of Agreement for Bit Preservation of Rumsey Collection

permissionGranted act Public Access termOfGrant startDate 2006-11-01 endDate 2011-11-01 permissionNote/restrictionDefinition

restriction= ="Stanford only” Stanford community only as defined in agreement.

restriction= ="SDR_GROUP_xxx" Named group controlled by SUNET group as defined in agreement.

restriction= ="No access" No access to content content allowed.

Page 16: ESIP 2009 Summer Meeting, UC Santa Barbara, CA, July 7 – 10, 2009 1 Stanford Digital Repository PREMIS & Geospatial Resources Nancy J. Hoebelheinrich InfoAnalytics

ESIP 2009 Summer Meeting, UC Santa Barbara, CA, July 7 – 10, 2009

16

Use of PREMIS Event for simple event Event 1:

Transform of descriptive MD from MS Access db => XML => MODS

Applied at representation level

Why this event? In case of questions

from outside data provider

Retain singular scripts & transform mechanisms

Test practicability of recording such events in production environment

Page 17: ESIP 2009 Summer Meeting, UC Santa Barbara, CA, July 7 – 10, 2009 1 Stanford Digital Repository PREMIS & Geospatial Resources Nancy J. Hoebelheinrich InfoAnalytics

ESIP 2009 Summer Meeting, UC Santa Barbara, CA, July 7 – 10, 2009

17

PREMIS Event Excerpt (v1.1)Element Subelement or Attribute Value eventIdentifier eventIdentifierType MD_Transformation_Process

eventIdentifierValue Rumsey-MODS 3.2 for SDR eventType normalization eventDateTime 2006-12-01T02:48: 22 eventDetail Steps of process transforming data provider's

descriptive metadata to MODS 3.2 records as required for ingestion into SDR.

eventOutcomeInformation / eventOutcomeDetail /

SDR_Rumsey_Transformation / SDR_RumseyTransformationOutput

The Rumsey Access database, as delivered by Luna Insight, was converted to a single XML document using the MS Access Export function. Both the MS Access database is included as well as the XML file.

A PERL script was used to break the monolithic XML document representing the MS Access database into many XML documents each representing a single image in the Rumsey collection. The single XML document was broken into separate documents at each occurrence of the "Object" tag. PERL script in text format is included.

An XSLT was used to make MODS documents for all the Rumsey images. The XSLT file is included.

SDR conversion code was written to pull geographic coordinates and scale metadata out of SUL MARC records from Unicorn catalog and insert them into the MODS records when available.

SDR conversion codes was written to insert the composite MODS records into the METS record for each Rumsey digital object.

Page 18: ESIP 2009 Summer Meeting, UC Santa Barbara, CA, July 7 – 10, 2009 1 Stanford Digital Repository PREMIS & Geospatial Resources Nancy J. Hoebelheinrich InfoAnalytics

ESIP 2009 Summer Meeting, UC Santa Barbara, CA, July 7 – 10, 2009

18

Another example: GIS Dataset: Street network of given metropolitan area

Dataset 1: official street centerline file used by emergency services to locate street addresses

Dataset 2: aspects of the road network including topography, angles & geometry of the road network used for a tourist map

Event to be documented: Merge c:\temp\states1;c:\temp \states2; c:\temp\

USA

Page 19: ESIP 2009 Summer Meeting, UC Santa Barbara, CA, July 7 – 10, 2009 1 Stanford Digital Repository PREMIS & Geospatial Resources Nancy J. Hoebelheinrich InfoAnalytics

ESIP 2009 Summer Meeting, UC Santa Barbara, CA, July 7 – 10, 2009

19

Use of PREMIS Event Data Elements

Want to describe full process of data creation Includes “merge” and

data sources Advantage of

PREMIS – can describe events once in repository

Why this event? Important to describe

processes during different phases of lifecycle, even prior to ingestion

Page 20: ESIP 2009 Summer Meeting, UC Santa Barbara, CA, July 7 – 10, 2009 1 Stanford Digital Repository PREMIS & Geospatial Resources Nancy J. Hoebelheinrich InfoAnalytics

ESIP 2009 Summer Meeting, UC Santa Barbara, CA, July 7 – 10, 2009

20

Use of PREMIS Agent Data Elements

For data management within the repository

Audit trail for descriptive MD

Version of Ingest code? Data provider who

created / altered the resource or the metadata, e.g., USGS which added FGDC MD to HRO from Monterey Bay Water Resource

Page 21: ESIP 2009 Summer Meeting, UC Santa Barbara, CA, July 7 – 10, 2009 1 Stanford Digital Repository PREMIS & Geospatial Resources Nancy J. Hoebelheinrich InfoAnalytics

ESIP 2009 Summer Meeting, UC Santa Barbara, CA, July 7 – 10, 2009

21

PREMIS & Geospatial data -- Comments based on experiences: Works well when:

Domain specific MD exists, e.g., FGDC for descriptive and technical MD

There are levels of the resource with MD to be associated, e.g., at representation & file(s) level

Need to document various points in the lifecycle of the data

Page 22: ESIP 2009 Summer Meeting, UC Santa Barbara, CA, July 7 – 10, 2009 1 Stanford Digital Repository PREMIS & Geospatial Resources Nancy J. Hoebelheinrich InfoAnalytics

ESIP 2009 Summer Meeting, UC Santa Barbara, CA, July 7 – 10, 2009

22

PREMIS & Geospatial data -- Comments based on experiences: In earlier versions of PREMIS unclear how

to document:ContextEnvironment including at time of creation“Significant properties”Existence of geospatial format registries

Page 23: ESIP 2009 Summer Meeting, UC Santa Barbara, CA, July 7 – 10, 2009 1 Stanford Digital Repository PREMIS & Geospatial Resources Nancy J. Hoebelheinrich InfoAnalytics

ESIP 2009 Summer Meeting, UC Santa Barbara, CA, July 7 – 10, 2009

23

PREMIS v 2.0 more flexible

Still XML binding Allows for containers Allows hierarchical relationships Extensible by use of new <premis:extension> element

to insert other elements, XML fragments, e.g., technical MD, provenance metadata, etc.

Board considering the inclusion of mechanism used by packaging schemas to “wrap” or “reference” other metadata

Page 24: ESIP 2009 Summer Meeting, UC Santa Barbara, CA, July 7 – 10, 2009 1 Stanford Digital Repository PREMIS & Geospatial Resources Nancy J. Hoebelheinrich InfoAnalytics

ESIP 2009 Summer Meeting, UC Santa Barbara, CA, July 7 – 10, 2009

24

PREMIS & Complex Geospatial Data For more detail, see “An Investigation into Archiving Geospatial

data Formats “ prepared for NGDA Project, funded by NDIIPP (http://www.ngda.org/research.php) Formats examined Approaches of FGDC, PREMIS, and Center for International Earth Science

Information Network (CIESIN)‘s Geospatial Electronic Record (GER) model on basis of:

Environment/ computer platform Semantic underpinnings domain specific terminology provenance data quality appropriate use

Page 25: ESIP 2009 Summer Meeting, UC Santa Barbara, CA, July 7 – 10, 2009 1 Stanford Digital Repository PREMIS & Geospatial Resources Nancy J. Hoebelheinrich InfoAnalytics

ESIP 2009 Summer Meeting, UC Santa Barbara, CA, July 7 – 10, 2009

25

Examples of Geospatial “Context”

Placing dataset in Time & Space Semantic underpinnings, e.g.,

Abstract Description of purpose / research methodology Intended use of data to avoid misinterpretation or

misuse Where to put?

FGDC has place PREMIS would not necessarily consider this as

“preservation” metadata, but rather “descriptive” or technical MD, however see v 2.0

Page 26: ESIP 2009 Summer Meeting, UC Santa Barbara, CA, July 7 – 10, 2009 1 Stanford Digital Repository PREMIS & Geospatial Resources Nancy J. Hoebelheinrich InfoAnalytics

ESIP 2009 Summer Meeting, UC Santa Barbara, CA, July 7 – 10, 2009

26

Examples of “Environment” and/or “Significant properties” for geospatial data

HW info pertinent at time of data creation SW info pertinent at time of data creation (?)Lineage or “provenance” data e.g., to

communicate processing steps used to create scientific data product

Events, parameters & source data which influenced or impacted the creation of the data set prior to its ingestion into the archive in order to full understand the data that you’re getting

Page 27: ESIP 2009 Summer Meeting, UC Santa Barbara, CA, July 7 – 10, 2009 1 Stanford Digital Repository PREMIS & Geospatial Resources Nancy J. Hoebelheinrich InfoAnalytics

ESIP 2009 Summer Meeting, UC Santa Barbara, CA, July 7 – 10, 2009

27

“Environment” & “Significant properties”, continued…

Data Quality – describing completeness, logical consistency, attribute accuracy

Data Trustworthiness – data creator / provider reliable? = “authentic”

Data Provenance – processes & sources for dataset = “understandable & reliable”

Understanding of the specific needs of the “designated community”

Page 28: ESIP 2009 Summer Meeting, UC Santa Barbara, CA, July 7 – 10, 2009 1 Stanford Digital Repository PREMIS & Geospatial Resources Nancy J. Hoebelheinrich InfoAnalytics

ESIP 2009 Summer Meeting, UC Santa Barbara, CA, July 7 – 10, 2009

28

Questions? / comments?

Nancy J. Hoebelheinrich

[email protected]