enabling interaction and quality in a distributed data dris
DESCRIPTION
Enabling Interaction and Quality in a Distributed Data DRIS. D. Scott Brandt Associate Dean for Research Michael Witt Senior Research Systems Administrator Purdue University Libraries. CRIS 2006 Bergen, Norway May 11, 2006. Background: Purdue University. - PowerPoint PPT PresentationTRANSCRIPT
Enabling Interaction and Quality in a Distributed
Data DRIS
D. Scott BrandtAssociate Dean for Research
Michael WittSenior Research Systems Administrator
Purdue University Libraries
CRIS 2006 Bergen, Norway May 11, 2006
Background: Purdue Purdue UniversityNine Colleges: Agriculture,
Consumer & Family Sciences, Education, Engineering, Liberal Arts, Management, Pharmacy/ Nursing/Health Sciences, Technology, Vet Medicine
73 Departments, several cross-disciplinary: e.g. Agricultural & Biological Engineering
Purdue University Libraries
2004 initiative for Librarians (faculty) to collaborate with other faculty across
campus—apply library science knowledge and expertise to various
research data problems: collect, organize, describe, curate,
archive, disseminate data/information
Strategic directions
University: “interdisciplinaryand collaborative endeavorsgrounded in the strengths of academic disciplines”
Libraries: Libraries faculty are integrated into campus research agenda
Areas of research collaboration
Discovery Learning Center
Earth & Atmospheric Science
English IT at Purdue Mechanical Engineering
Technology Regenstrief Center
Agronomy Biology Cancer Center Center for the
Environment Chemical
Engineering Chemistry Cyber Center
Current areas of participation E. Coli K-12 Model Organism Resource NIH proposal (B.
Wanner, Biology, PI, D. Scott Brandt, Libraries, Co-PI) : create archival process for curated database, assist in applying ontologies for data representation and annotation
An Expert System Multimedia Tutorial for Locating Technical Information, Purdue University TLT Digital Content grant (Megan Sapp, PI, Amy Van Epps and Michael Fosmire, co-PIs, with Bruce Harding, Mechanical Engineering Technology): develop tutorial for MET102 course in using and applying standards
URL-based Search Interface to the Distributed Institutional Repository Purdue University Graduate School (Michael Witt, Libraries, PI, Darcy Bullock, Civil Engineering, Co-PI): develop toolkit to deploy customized searching of dissertations by school, advisor, etc.
AquaEcon Web Library: An Electronic Resource on Economics-Related Literature on Aquaculture, NOAA (K. Quagrainie, Agricultural Economics PI, Hal Kirkwood, Libraries, as co-PI) : build and populate database
Progression towards CRIS
Institutional repository (IR) Distributed institutional repository
(DIR) Interactions related to DIR leading to
CRIS-like applications Leverage DIR for DRIS/CRIS
Distributed Institutional Repository
App
licat
ions
Met
adat
aR
epos
itory
e-prints
archival collections
grid resources
native databases
data archive
OAIService Provider
OAIData Providers
A systems-based approach to Libraries supporting research: linear
inputs experimentation outputs
CRIS Data repositories Document repositoriesA current research information system links people engaged in research with funding and other resources such as interdisciplinary collaborators
A repository of well-described data resulting from research processes is preserved and shared for repurposing
Journal article pre-prints, post-prints, conference and working papers, dissertations and other e-prints represent research outputs in a document repository
A systems-based approach to Libraries supporting research: cyclical
CRIS
data
repository e-print
repository
An example application: SRU
Linking to electronic theses and dissertations (ETD)
URL-based search interface to DIR running as a web service
$16,000 Strategic Development Initiative award for fellowship and server
Getting to the datasets: SRB
The Storage Resource Broker Developed by the San Diego Supercomputer
Center Uniform access to heterogeneous, distributed
storage Metadata catalog (MCAT) and preservation
functionality TeraGrid, collaboration with Information
Technology at Purdue and Rosen Center for Advanced Computing
An example systems interaction OAISRB: provides an OAI-PMH interface to the
SRB to expose metadata from resources on a data grid to OAI service providers
Apache Tomcat Server
OAI- PMH Interface (OAICat)
MCAT (SRB)
SRB Client (Jargon)
OAISRB H A R V E S T E R
HTTP
XML
Data grid
Sample OAISRB config#### OAI Handler Base URL FormatOAIHandler.baseURL=http://128.210.126.231:8080/OAISRB/OAIHandler#### SRB Connection ParametersSRB.HOST=orion.sdsc.eduSRB.PORT=7620SRB.USERNAME=mwittSRB.PASSWORD=nyahSRB.HOMEDIRECTORY=/dspace/home/mwitt.purdueSRB.MDASDOMAINNAME=purdueSRB.DEFAULTSTORAGERESOURCE=dspace-fs1SRB.MCATZONE=dspace#### SRB Collection Count and SRB Collection NamesSRB.root=/TGzone/home/lars.itapSRB.maxcollections=1SRB.collection1=LARSDATA#### Custom Parameters for SRB GRIDSRBRecordFactory.repositoryIdentifier=mwitt.purdueDisplay.MaxListSize=50#### Custom Identify response valuesIdentify.repositoryName=SRB Data GridIdentify.adminEmail=mailto:[email protected]=2000-01-01T00:00:00ZIdentify.deletedRecord=no#### Crosswalk (in this example, FGDC-to-unqualified Dublin Core)DC.Identifier=titleDC.Description=purposeDC.Title=titleDC.Format=File FormatDC.Creator=addressDC.Subject=metprof
Metadata research
Metadata librarian worked for four months analyzing metadata needs and processes for several data sets
Results included DC descriptions, enhanced with thesaurus headings, and a basic crosswalk
Also: metadata descriptions from scratch are too manually intensive…
Metadata- Water Quality
A flat file with only “system” metadata Began with Dublin Core Enhanced subjects with thesaurus from
NAL (US National Agriculture Library) Looked at DIF (Dir. Interchange Format) Looked at cross-walk with FGDC (Federal
Geographic Data Comm.) format
Next steps: Metadata
Articulate metadata workflow to imbed metadata into the process
Review automating all data Determine how/where to validate and
automate descriptive metadata
Conclusions and Questions Use existing, native metadata whenever possible Automate and periodically assess processes to ensure quality Diminishing returns: we settled on discovery and collection-
level metadata Crosswalks are useful but can truncate or distort the original
meaning The importance of interactions, among people and systems How do we implement CRIS/CWIS/DRIS in our environment? What is the role of the Libraries in such?