esi supplemental 1 e-research support slides

43
DuraSpace/ARL/DLF E-Science Institute E-Research Support at Johns Hopkins University & Purdue University Supplemental Webinar Wednesday, October 17, 2012 1:00-2:30 pm EDT

Upload: duraspace

Post on 26-Jan-2015

109 views

Category:

Documents


1 download

DESCRIPTION

E-Research Support at Johns Hopkins University & Purdue University Supplemental Webinar Wednesday, October 17, 2012 Presented by Sayeed Choudhurry & James Mullins

TRANSCRIPT

Page 1: ESI Supplemental 1   E-research Support Slides

DuraSpace/ARL/DLF E-Science Institute

E-Research Support at

Johns Hopkins University & Purdue University

Supplemental Webinar

Wednesday, October 17, 2012

1:00-2:30 pm EDT

Page 2: ESI Supplemental 1   E-research Support Slides

E-Research Support at Johns Hopkins University

Presented by Sayeed Choudhurry,

Johns Hopkins University, Sheridan Libraries

Associate Dean for Library Digital Programs & Director, Hodson Digital Research & Curation Center

DuraSpace/ARL/DLF E-Science Institute

Page 3: ESI Supplemental 1   E-research Support Slides

Data Conservancy

• Data Conservancy (DC) is a community that develops solutions for data preservation and sharing to promote cross-disciplinary re-use.

• DC Service Instance: data centric hardware, software, components, and APIs within an organizational context – installed at Johns Hopkins University and National Snow and Ice Data Center

DuraSpace/ARL/DLF E-Science Institute

Page 4: ESI Supplemental 1   E-research Support Slides

Data Sharing Attributes

• Feature Extraction Framework that atomizes data into constituent parts for indexing, metadata extraction, etc.

• Discipline agnostic data model (inspired by PLANETS project)

• Provenance and Lineage service• Spatial, temporal and (soon) taxonomic query capabilities• Sustainability through diverse funding from Johns

Hopkins University, direct charges to NSF grants, other grants and community development

DuraSpace/ARL/DLF E-Science Institute

Page 5: ESI Supplemental 1   E-research Support Slides

Data Management LayersLayers Characteristics Implication for PI Implication relative

to NSF

Curation Adding value throughout life-cycle

• Feature Extraction• New query

capabilities• Cross-disciplinary

• Competitive advantage

• New opportunities

Preservation Ensuring that data can be fully used and interpreted

• Ability to use own data in the future (e.g. 5 yrs)

• Data sharing

• Satisfies NSF needs across directorates

Archiving Data protection including fixity, identifiers

• Provides identifiers for sharing, references, etc.

• Could satisfy most NSF requirements

Storage Bits on disk, tape, cloud, etc.Backup and restore

• Responsible for:• Restore• Sharing• Staffing

• Could be enough for now but not near-term future

Page 6: ESI Supplemental 1   E-research Support Slides

Establishing the JHU DMS

• May 2010 NSF announces DMP expectations• Services incubated and scoped summer/fall

2010– Build on Data Conservancy expertise

• Proposed in January and launched in July 2011– Consultative data management planning services

to support NSF proposals– Post award data management services

• Assessment of service in March 2012DuraSpace/ARL/DLF E-Science Institute

Page 7: ESI Supplemental 1   E-research Support Slides

Background work to scope services

• Review of data management plan best practices and development of questionnaire

• Piloted data management consultations as cases• Short data survey with over 70 JHU researchers• Analysis of JHU NSF proposal and award activity• Business school capstone project on storage

options and costs• Review of past data archiving projects and work

DuraSpace/ARL/DLF E-Science Institute

Page 8: ESI Supplemental 1   E-research Support Slides

Proposing data management services

• Services scoped to support anticipated NSF requirements and to reflect system capabilities– Defined time limits, volume of data deposited per

project, unencumbered data only for now• Prepared budget for services

– Five year timeframe for costs– All costs included: staffing, hardware, overhead, etc.– Cost assumptions included: total data archived,

complexity of data prep for ingest

DuraSpace/ARL/DLF E-Science Institute

Page 9: ESI Supplemental 1   E-research Support Slides

Developing financial model

Support secured and financial model established• Data management planning for NSF proposals

– Service directly funded by schools – Each school pays percentage according to 3 year

average of total NSF proposals submitted• Post award data management

– Fee based service billed through a service center – First year fee a percent of total direct costs on

grant

DuraSpace/ARL/DLF E-Science Institute

Page 10: ESI Supplemental 1   E-research Support Slides

JHU Data Management Services team

Dedicated group (that collaborates with DC and Digital Research and Curation Center)

• Two data management consultants• Senior technical consultant (Part-time)• Software developer• System administrator (to be hired)• Interim manager (Part-time)

DuraSpace/ARL/DLF E-Science Institute

Page 11: ESI Supplemental 1   E-research Support Slides

Service marketing

• Reach out through all stakeholders– Announcements through Deans– Work with research projects administration– Outreach to department administrators– Briefings with library colleagues/departments– Presentations to researchers, graduate students

• More to do….and then repeat!

DuraSpace/ARL/DLF E-Science Institute

Page 12: ESI Supplemental 1   E-research Support Slides

Observations

• Role of Choudhury as NSF PI within JHU• Sheridan Libraries R&D and experience with

scientific data• Already embedded within research enterprise• Specifics will vary by institution but JHU

approach can be generalized…• …But each institution should consider

appropriate role(s) or approach

DuraSpace/ARL/DLF E-Science Institute

Page 14: ESI Supplemental 1   E-research Support Slides

Acknowledgements

• NSF Award OCI-0830976• Sheridan Libraries financial support• Johns Hopkins University financial support• Data Conservancy colleagues for their exceptional

work and patience

DuraSpace/ARL/DLF E-Science Institute

Page 15: ESI Supplemental 1   E-research Support Slides

Questions

DuraSpace/ARL/DLF E-Science Institute

Page 16: ESI Supplemental 1   E-research Support Slides

Libraries

On overview of Sustaining e-Science Collaboration in

an Academic Research Library – the Purdue

Experience

James L. Mullins, PhDDean of Libraries & Esther Ellis Norton ProfessorOctober 17th, 2012

Page 17: ESI Supplemental 1   E-research Support Slides

Libraries

What is meant when we say the library has a role in sustaining e-science?

• Application of library and archival science principles and theory to data management.

• Collaboration of Libraries with faculty, information technology, research office, and sponsored programs to develop a process and repository to manage and preserve data.

DuraSpace/ARL/DLF E-Science Institute

Page 18: ESI Supplemental 1   E-research Support Slides

Libraries

I. Background and Development of the Libraries collaboration with e-Science at Purdue – on local and national levels.

• Local – Conversations with researchers, research office, etc.• Local – Principles of library and archival sciences.• Local – Restructuring of Libraries. • National – NSF Data Management dialogue.• Local – Creation of Data Research Scientist. • Local – Librarians not able “to service ” funded research.• Local – Librarians with professorial rank and tenure (start-up package of $40,000+).• Local – Distributed Data Curation Center (D2C2)

DuraSpace/ARL/DLF E-Science Institute

Page 19: ESI Supplemental 1   E-research Support Slides

Libraries

I. Background and Development of the Libraries involvement in e-Science at Purdue – on local and national levels (con’t).

• National – IMLS grant to develop Data Curation Profiles. • Local – Partnerships of subject liaison librarians and faculty.• Local – Re-definition of librarian roles. • Local – Collaboration/advising on data management librarian role.• National – IMLS grant to develop Data Information Literacy. • National– Develop/teach ICPSR data science curriculum.• National – IMLS grant to develop Databib-DMPTool collaboration.• International – DataCite-Databib collaboration. • Local – Society of American Archivists (SAA) workshop.

DuraSpace/ARL/DLF E-Science Institute

Page 20: ESI Supplemental 1   E-research Support Slides

Libraries

Sustainability – applied expertise of librarians.

• Must be integrated into role of librarians.• New positions must be created (data curation specialists, etc). • Priority for new positions must be established with a total view of strategic growth areas (at Purdue data management and information literacy).• Salaries partially funded through sponsored research, making funds available for other positions and graduate research assistants. • Cluster hires with colleges and schools.• Critical role of librarians in research garners additional support from University Administration.

DuraSpace/ARL/DLF E-Science Institute

Page 21: ESI Supplemental 1   E-research Support Slides

Libraries

Research Collaborations 2012/2013• Big Data and Complex System Analytics to Enhance Society's

Resilience w/ Agronomy• Human Rights Texts for Digital Research: Archiving and

Analyzing Amnesty International’s Historic Urgent Action Bulletins w/ Political Science

• A Cross-Disciplinary Design Thinking Research Symposium to Catalyze Groundbreaking Research and Practice w/ Engineering Education

• Establishing a Materials Center for Agriculture, Food and Health w/ Food Science

DuraSpace/ARL/DLF E-Science Institute

Page 22: ESI Supplemental 1   E-research Support Slides

Libraries

II. Building a Data Curation Program and Repository

• Not done independently of librarians knowledge & support structure within Libraries• In 2006, collaboration built around Purdue’s HubZero platform in answer to NSF DataNet RFP. • 2007 – 2010 Provost informed of impending data management mandate. • May 2010 – NSF announcement.• Summer 2010 – Provost and VPR appoint taskforce of faculty researchers – co-chaired by CIO and dean of libraries to develop “ template.” Report written August 2010.

Purdue University Research Repository – PURR

DuraSpace/ARL/DLF E-Science Institute

Page 23: ESI Supplemental 1   E-research Support Slides

Libraries

II. Building a Data Curation Program and Repository(Con’t) • 2010 –

• Commitment to develop repository jointly by ITaP, OVPR, and Libraries - $90K

• Working Group created to plan and develop Purdue University Research Repository (PURR).

• Workshops sponsored by OVPR, conducted by Libraries and ITaP;

• Libraries create resources to support faculty in developing DMPs.

PURR

DuraSpace/ARL/DLF E-Science Institute

Page 24: ESI Supplemental 1   E-research Support Slides

Libraries

•2011/2013 • Libraries Budget request indicated need for

positions to support sustainable data curation. • 479 grant proposals to date include PURR in

data management plans

PURR

II. Building a Data Curation Program and Repository(Con’t)

• 36 grants (so far) awarded with PURR as DMP. • TRAC certification underway – ISO 16363.

DuraSpace/ARL/DLF E-Science Institute

Page 25: ESI Supplemental 1   E-research Support Slides

Libraries

II. Building a Data Curation Program and Repository(Con’t)

• What is provided by PURR? Any Purdue faculty, graduate student, or staff can:

• Create a trial project of 500 MB for three years.• External funding project receives 100GB for ten years.• Invite collaborators to join from other institutions. • Datasets can be published w/o grant: 50MB; with, 10GB.• Each project receives to-do lists to manage projects;• Wiki area for notes;• Micro-blogging interface (similar to Facebook) for

discussion among team.

PURR

DuraSpace/ARL/DLF E-Science Institute

Page 26: ESI Supplemental 1   E-research Support Slides

Libraries

II. Building a Data Curation Program and Repository(Con’t)

• PURR Digital Preservation Policy approved April, 2012 http://www.lib.purdue.edu/spcol/content/PURRdigitalpreservationpolicy.pdf

• Working Group report on three year funding requirements• One time - $1.2 M – received January 2012.• Ongoing costs - $194,000 / year.

• Ongoing costs: F&A? Charge Back?

PURR

DuraSpace/ARL/DLF E-Science Institute

Page 27: ESI Supplemental 1   E-research Support Slides

BBPURRDataManagementDiscoveryPreservation

Page 28: ESI Supplemental 1   E-research Support Slides

OVERVIEW OF PURR

PURRITaP

Infrastructure(HUBzero™)

LibrariesData Services(Reference & Consulting) &Preservation

Researchers

Research Collaboration, Data Management Publishing & Archiving

OVPR/SPSPolicy, Submission, and Grant Compliance

Page 29: ESI Supplemental 1   E-research Support Slides

OVERVIEW OF PURR

• Collaboration of ITaP, Libraries, and OVPR• Based on HUBzero, provides a hub for Purdue researchers and

their collaborators to use, manage, and share their data• Comprehensive resource for supporting research data

management (Knowledge Base, tutorials, example plans, boilerplate text, ask questions, etc.)

• Approximately 1/3 of NSF proposals submitted from Purdue last year included PURR as a component of their data management plans

• Purdue researchers are not required to use PURR. Other options may be appropriate such as center facilities or disciplinary repositories.

Page 30: ESI Supplemental 1   E-research Support Slides

WHY USE PURR ?

PURR can be used for…

Managing Data

Publishing Data

Preserving Data and

Research Collaboration

Page 31: ESI Supplemental 1   E-research Support Slides

QUICK STARThttp://research.hub.purdue.edu

What can be done right now:– Create an account– Create a project

• a default allocation of storage for free and can purchase more if you need it

– Invite collaborators– Upload data to project– Publish and/or archive datasets with Digital Object

Identifiers (DOI)– Search, browse, and cite published datasets

Page 32: ESI Supplemental 1   E-research Support Slides

Overview model of PURR functions

Data mgmtplanningresources

Discovery commitment ends,

Long term preservation decision

IF grant awarded,

more space

Creating projects,

collaborating

Uncurated data Discovery & Dissemination

Long term preservation

CurationCreate Research, data generation/collection

Researchers are guided to PURR for help withdata mgmt plans by Pre-Awards, workshops

and promotion, and by word-of-mouth

Data submitted

for publishing/archiving

PURR FUNCTIONSSTEP 1

Page 33: ESI Supplemental 1   E-research Support Slides

Data mgmtplanningresources

Discovery commitment ends,

Long term preservation decision

IF grant awarded,

more space

Creating projects,

collaborating

Uncurated data Discovery & Dissemination

Long term preservation

CurationInitiate Research, data generation/collection

PLAN DEVELOP PROJECT EXPAND PUBLISH DATA DISSEMINATE DATA

Researchers can create projects at anytime, invite others to join… the goal

is to help facilitate research development

Data submitted

for publishing/archiving

PURR FUNCTIONSSTEP 2

Page 34: ESI Supplemental 1   E-research Support Slides

Overview model of PURR functions

Data mgmtplanningresources

Discovery commitment ends,

Long term preservation decision

IF grant awarded,

more space

Creating projects,

collaborating

Uncurated data Discovery & Dissemination

Long term preservation

CurationInitiate Research, data generation/collection

Once a grant is awarded, researchers

get an increase in space allocation and

length of time for project and data

Data submitted

for publishing/archiving

PURR FUNCTIONSSTEP 3

Page 35: ESI Supplemental 1   E-research Support Slides

Data mgmtplanningresources

Discovery commitment ends,

Long term preservation decision

IF grant awarded,

more space

Creating projects,

collaborating

Uncurated data Discovery & Dissemination

Long term preservation

CurationInitiate Research, data generation/collection

To make data sets publicly discoverable and available, there is a submission

and “publishing” process

Data submitted

for publishing/archiving

PURR FUNCTIONSSTEP 4

Page 36: ESI Supplemental 1   E-research Support Slides

Data mgmtplanningresources

Discovery commitment ends,

Long term preservation decision

IF grant awarded,

more space

Creating projects,

collaborating

Uncurated data Discovery & Dissemination

Long term preservation

CurationInitiate Research, data generation/collection

PURR policy allows for a specified timefor discovery, and then decisions are

made regarding long-term preservation

Data submitted

for publishing/archiving

PURR FUNCTIONSSTEP 5

Page 37: ESI Supplemental 1   E-research Support Slides

WHERE CAN I GO FOR HELP ?Overall help: Librarians

(link to subject librarians directory or name)Data Services: http://www.lib.purdue.edu/research/dataservices

Librarians consult on best practices for data formats, metadata, sharing, reuse, archiving, review plans, write letters of support, and collaborate as partners/co-PI’s on proposals.

Grant preparation: Sponsored Programs Services (SPS)

PURR Website: http://research.hub.purdue.edu

Page 38: ESI Supplemental 1   E-research Support Slides

Libraries

•Establish easier access to scientific research data on the Internet.•Increase acceptance of research data as legitimate, citable contributions to the scientific record.•Support data archiving that will permit results to be verified and re-purposed for future study.

http://www.datacite.org/

Retrieval and Citation

Page 39: ESI Supplemental 1   E-research Support Slides

Libraries

The DOI system offers an easy way to connect the article with the underlying data:

The dataset:Kuhlmann, H et al. (2009): Age models, iron intensity, magnetic susceptibility records and dry bulk

density of sediment cores from around the Canary Islands. doi:10.1594/PANGAEA.727522,

Is supplement to the article:Kuhlmann, Holger; Freudenthal, Tim; Helmke, Peer; Meggers, Helge

(2004): Reconstruction of paleoceanography off NW Africa during the last 40,000 years: influence of local and regional factors on sediment accumulation. Marine Geology, 207(1-4), 209-224,

doi:10.1016/j.margeo.2004.03.017

Linking of Dataset to Article

DuraSpace/ARL/DLF E-Science Institute

Page 40: ESI Supplemental 1   E-research Support Slides

Libraries

•Establish easier access to scientific research data on the Internet.•Increase acceptance of research data as legitimate, citable contributions to the scientific record.•Support data archiving that will permit results to be verified and re-purposed for future study.

http://www.datacite.org/

Retrieval and Citation

Page 41: ESI Supplemental 1   E-research Support Slides

Libraries

In the United States three DataCite Members ProvideDOIs for datasets:

http://datacite.org/DataCiteUS

Libraries

Page 42: ESI Supplemental 1   E-research Support Slides

Libraries

No one/right way to sustain e-science or data management; each institutional environment will be different and require its own unique collaborations or roles.

DuraSpace/ARL/DLF E-Science Institute

Page 43: ESI Supplemental 1   E-research Support Slides

Libraries

Thank you

Questions:

[email protected]

DuraSpace/ARL/DLF E-Science Institute