ncsu libraries digital repository projects at the north carolina state university libraries james...

30
NCSU Libraries Digital Repository Projects at the North Carolina State University Libraries James Jackson Sanborn Jim Tuttle Open Repositories/DSpace User Group ‘07

Upload: tobias-robbins

Post on 30-Dec-2015

215 views

Category:

Documents


2 download

TRANSCRIPT

NCSU Libraries

Digital Repository Projects at the

North Carolina State University Libraries

James Jackson Sanborn

Jim Tuttle

Open Repositories/DSpace User Group ‘07

NCSU Libraries

Early Repository Planning

• Digital Repository Planning Committee• What it wouldn’t be (at least to start)

– Distributed community structure– Open submission– ‘Institutional’ Repository

• What it would be (at least to start)

– Library-managed collections– Building block for campus partnership– Learning opportunity

NCSU Libraries

Repository Building Blocks

• NCSU Electronic Theses and Dissertations– Started 1997– Mandatory since 2002– Virginia Tech’s ETDdb– ~3,000 ETDs

• NCSU Authors Database– Started 1995– Access Database/Cold Fusion front-end– ~22,000 citations

NCSU Libraries

Repository Building Blocks (cont’d)

• Technical Reports Print Collection– Campus Institutes and Departments– Massive fall-off in print distribution

• Special Collections Resource Center– Digitized texts and photographs– Campus Newsletters

• GIS Data– Library managed/acquired data collection– Homegrown data layer database/discovery

tools

NCSU Libraries

Repository Plan

• Target ‘Research’ collections first– Technical Reports– ETDs– Faculty Publications/Citations

• Treat each collection as its own project

• Actively pursue common technological solutions

NCSU Libraries

Technical Reports

• DSpace Application

• Lightly Customized

• Library Harvested– Local Cataloging/Metadata database– Scripted Ingest Object Creation– Batch Ingest

• Mix of ongoing submission by institute/departmental personnel and Library capture.

NCSU Libraries

Tech Rep Screenshot

NCSU Libraries

Technical Reports Item Detail

NCSU Libraries

Electronic Theses & Dissertations

• Partnership with Graduate School

• Hybrid System: DSpace and ETD-db– ETD-db submission/approval/management– Direct database extract for DSpace Ingest

Object creation– Scheduled Batch Ingest process

• DSpace Considerations/Alterations– Metadata Mapping– Author Browse (exclude contributor.advisor)– Various interface changes

NCSU Libraries

ETD-DB screenshot

NCSU Libraries

ETD DSpace screenshot

NCSU Libraries

Faculty Publications

• Built on Existing Author Database– Rebuilt Authors DB from Access/ColdFusion

to Oracle/PHP• Re-modeled data• Added Functionality

– OpenURL– ‘Vita-like’ citation display– Full-text or submission links

– Full-text stored in DSpace• Citation metadata and file exported by script• DSpace Identifier currently manually entered

NCSU Libraries

Faculty Publications Schematic

Scholar

Oracle FacultyPublications DB (citations)

Web interface (php)

DSpaceJava/JSP

(full-text only)

Cataloging and Coll. Mgt.

Access

DSpace Item DisplayWeb Submission Form

ISIAnn. Reps

Etc.

View full-text

S+R Citations

Add/Edit data

Handle IDs

SubmitCitations

and/or Text

File System(files)

PostgreSQL(metadata)

NCSU Libraries

FacPubs Search Screen

NCSU Libraries

FacPubs result screenshot

NCSU Libraries

FacPubs Item screenshot

NCSU Libraries

Repository Governance

• Internal– Digital Repository Planning Committee– Data Repository Architect

• External– Faculty Repository Advisory Committee– Partnerships with departments and institutes

NCSU Libraries

NCGDAP: Overview

• NDIIPP: National Digital Information Infrastructure and Preservation Program

• Collaboration with Library of Congress

• 1 of 8 three year projects to study long-term (50+ years) digital preservation

• Objective: engage existing state/federal geospatial data infrastructures in preservation

• Project approaches: Technical and Social

NCSU Libraries

Repository Requirements

• Dim archive with possible future access– minimal IR/access component

• Minimal repository imprint on data– repository agnostic ingest and export

• Simple digital curation functions– Periodic MD5 checksum validation– Structured metadata index

• Expected archived-data exchange• Leverage existing investments• Free Software with active community

NCSU Libraries

Automation: Threat and format analysis, validationPython wrappers for the following:

• Anti-virus – ClamAV

• Compressed files (tar, zip, gzip, bzip)

• At-risk formats

• Executable files (magic numbers)

• Jhove validation

NCSU Libraries

Automation:Archive package organization• ESRI ArcGIS toolbar for selected formats

NCSU Libraries

Automation:Archive package organization• Rule-based python

logic– filestem – extension

relationships ( multi-file format validation)

– directory structure

• Manual intervention• NOID assignment

NCSU Libraries

Metadata:Seed file form• 'Transfer set' metadata capture in 'Seed

file'– communicates with DSpace backend,

generates xml used to inform later scripts

NCSU Libraries

Metadata:Communities and Collections

• Search by type for 100+ communities• Facilitates creation and reduces errors

NCSU Libraries

Curation Processing

• At-risk format migration, original retained

• Agency-specific XML templates in ArcCatalog with synchronization flags

• Provenance and curation metadata scripted

NCSU Libraries

Source Metadata Translation

• Repository agnostic approach

• Spokes for each transformation

• Facilitates export from Dspace into other repositories

• Generate Dspace QDC, METS; populate Workflow database

NCSU Libraries

Extra-repository AIP management

• Workflow Management Database (WMD) populated as a spoke on the metadata/ingest hub

• External tracking of NOID, Handle, ISO keywords, other metadata for interaction with other systems

• Integrates with existing GIS Lookup tool

NCSU Libraries

Repository Architecture Overview

PostgreSQL

repository tomcat instance

Faculty PublicationsPHP/DSpace hybrid

TomcatDSpace Internal

NDIIPP(DSpace)

SCRC(DSpace)

Asset Store/ATABeast

(sub-directory for each DSpace app)

One shared username. Separate database for each

app

Repository(DSpace)•Technical Reports•ETDs

Collections (DSpace)SCRC --Course Catalogs --Green ‘N’ Growing

NCSU Libraries

Upcoming Repository Related Projects

• Enhancements to current system– XTF search interface– Inter-archive exchange

• Digital Collections Repository– Special Collections Research Center– Other non-faculty collections

• Data Repository– Scientific data– Statistical resources

NCSU Libraries

For More Information:

• James Jackson Sanborn– [email protected]

• Jim Tuttle– [email protected]