mike smorul, mike mcgann, joseph jaja

15
May 23 2007 Archiving 2007 1 PAWN: A Policy-Driven Software PAWN: A Policy-Driven Software Environment for Implementing Producer- Environment for Implementing Producer- Archive Interactions in Support of Archive Interactions in Support of Long Term Digital Preservation Long Term Digital Preservation Mike Smorul, Mike McGann, Joseph JaJa Institute for Advanced Computer Science Studies University of Maryland, College Park Sponsored by National Archives and Records Administration, Library of Congress and NSF

Upload: travis

Post on 11-Jan-2016

70 views

Category:

Documents


5 download

DESCRIPTION

PAWN: A Policy-Driven Software Environment for Implementing Producer-Archive Interactions in Support of Long Term Digital Preservation. Mike Smorul, Mike McGann, Joseph JaJa Institute for Advanced Computer Science Studies University of Maryland, College Park - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Mike Smorul, Mike McGann, Joseph JaJa

May 23 2007 Archiving 2007 1

PAWN: A Policy-Driven Software PAWN: A Policy-Driven Software Environment for Implementing Environment for Implementing

Producer-Archive Interactions in Producer-Archive Interactions in Support of Long Term Digital Support of Long Term Digital

PreservationPreservation Mike Smorul, Mike McGann, Joseph JaJa

Institute for Advanced Computer Science StudiesUniversity of Maryland, College Park

Sponsored by National Archives and Records Administration, Library of Congress and NSF

Page 2: Mike Smorul, Mike McGann, Joseph JaJa

May 23 2007 Archiving 2007 2

Problems Facing IngestionProblems Facing Ingestion

• Ensure integrity of data ingestion• Each producer-archive interaction is

unique• Final destination for items in an archive is

unique.• Differing roles between producer and

archive• Hostile producers

Page 3: Mike Smorul, Mike McGann, Joseph JaJa

May 23 2007 Archiving 2007 3

What is PAWN?What is PAWN?

• Software that provides an ingestion framework

• Distributed and secure ingestion of digital objects into an archive.

• Handles the process – From package assembly – To archival storage

• Simple, customizable interface for end-users

• Flexible interface for archive publication

Page 4: Mike Smorul, Mike McGann, Joseph JaJa

May 23 2007 Archiving 2007 4

Package WorkflowPackage Workflow

1. Create Producer-Archive Agreement2. Client package template.3. Create package based on template4. Once approved, packages can be archived5. Rejected packages can be held until rectified or

deleted for resubmission.

Package Builder Review

Producer Agreement

· AdministrativeStrategic and Performance PlansAppointment and PromotionPolicies and CommitteesAlumni Affairs

· FinancialContracts and GrantsPayrollDonations

· Publication ReportsTechnical ReportsPresentationsPostersOutreach

Template

Template Name: Research ResultsNotes: Published results and conference presentations

Contents:· Presentations

· Technical Reports

Create Template Create Package Audit Package

Activity Log

Package Lifecycle

ArchiveArchive Gateway

Archive

Page 5: Mike Smorul, Mike McGann, Joseph JaJa

May 23 2007 Archiving 2007 5

Expanding a Simple WorkflowExpanding a Simple Workflow

• Support for multiple workflows.– Grouped into logical domains

• Definable roles per workflow• Pluggable components for assembly and

archival publishing• Distributed components

– Web-service based components

Page 6: Mike Smorul, Mike McGann, Joseph JaJa

May 23 2007 Archiving 2007 6

Domain OrganizationDomain Organization

• Producers organized into domains, each domain contains a transfer agreement negotiated with the archive.

• Each domain contains a hierarchical organization of data grouped into record sets/templates (convenient groupings from the transfer agreement).

• Each domain contains its own users.

• An end-user operates within a set of record sets.

Page 7: Mike Smorul, Mike McGann, Joseph JaJa

May 23 2007 Archiving 2007 7

Domain ExampleDomain Example

Page 8: Mike Smorul, Mike McGann, Joseph JaJa

May 23 2007 Archiving 2007 8

Custom RolesCustom Roles

• Actions in PAWN can be grouped together to create roles.– There are no common roles between archives, so allow custom

ones.

• Default roles– Producer – Individual data supplier– Records Manager – Oversight of producers– Archive Manager – Final review and archive publishing– Global Administrator – Creates domain, sysadmin-like account

• Sample Actions– Setting permissions on record sets– Record Schedule creation and modification– Add or delete whole packages– Modify items in a package…

Page 9: Mike Smorul, Mike McGann, Joseph JaJa

May 23 2007 Archiving 2007 9

Custom Package BuildingCustom Package Building

• PAWN provides an API for developing custom package builders

• Custom package builders can be written in JAVA and implement a simple interface.

• Builders interact with a hierarchical structured package

Manifest·Namespace·Type·Descriptive Name

Data·Type·Descriptive Name·Bits

Metadata…

Manifest…

Metadata·Type·Bits·Name

Page 10: Mike Smorul, Mike McGann, Joseph JaJa

May 23 2007 Archiving 2007 10

PAWN Archive GatewayPAWN Archive Gateway

• Pluggable component that provides an API for developing gateways into various services.

• Each gateway may have multiple instances, each configured differently

• PAWN handles managing and associating gateways with the appropriate data.

Page 11: Mike Smorul, Mike McGann, Joseph JaJa

May 23 2007 Archiving 2007 11

PAWN ArchitecturePAWN Architecture

• Divided into producer and archive side components– Producer: data supplying and domain

management– Archive: data storage, resource

allocation and archival publishing

• Web-service based communication

• Trust relationship between producer and archive components– SAML and PKI

Page 12: Mike Smorul, Mike McGann, Joseph JaJa

May 23 2007 Archiving 2007 12

ComponentsComponents

Page 13: Mike Smorul, Mike McGann, Joseph JaJa

May 23 2007 Archiving 2007 13

Case StudiesCase Studies

• ICDL Book Builder• SLAC Record Ingestion• 10,000 CDroms

• Remote ingestion

• Unskilled labor

• Custom hardware

• Sample NARA ingestion

• Model government roles

• DOE Record Schedule

• Custom package builder

• Multiple data sources

• Model logical books

Page 14: Mike Smorul, Mike McGann, Joseph JaJa

May 23 2007 Archiving 2007 14

PAWN SummaryPAWN Summary

• Platform for ingestion• Customizable Components

– Roles, ingest and publishing

• Distributed architecture

Page 15: Mike Smorul, Mike McGann, Joseph JaJa

May 23 2007 Archiving 2007 15

More informationMore information

• Web site:– http://www.umiacs.umd.edu/research/adapt

• Wiki link for technical details.

• Or “I’m feeling lucky” Google keywords:– ADAPT UMIACS