esi supplemental 1 e-research support slides
DESCRIPTION
E-Research Support at Johns Hopkins University & Purdue University Supplemental Webinar Wednesday, October 17, 2012 Presented by Sayeed Choudhurry & James MullinsTRANSCRIPT
DuraSpace/ARL/DLF E-Science Institute
E-Research Support at
Johns Hopkins University & Purdue University
Supplemental Webinar
Wednesday, October 17, 2012
1:00-2:30 pm EDT
E-Research Support at Johns Hopkins University
Presented by Sayeed Choudhurry,
Johns Hopkins University, Sheridan Libraries
Associate Dean for Library Digital Programs & Director, Hodson Digital Research & Curation Center
DuraSpace/ARL/DLF E-Science Institute
Data Conservancy
• Data Conservancy (DC) is a community that develops solutions for data preservation and sharing to promote cross-disciplinary re-use.
• DC Service Instance: data centric hardware, software, components, and APIs within an organizational context – installed at Johns Hopkins University and National Snow and Ice Data Center
DuraSpace/ARL/DLF E-Science Institute
Data Sharing Attributes
• Feature Extraction Framework that atomizes data into constituent parts for indexing, metadata extraction, etc.
• Discipline agnostic data model (inspired by PLANETS project)
• Provenance and Lineage service• Spatial, temporal and (soon) taxonomic query capabilities• Sustainability through diverse funding from Johns
Hopkins University, direct charges to NSF grants, other grants and community development
DuraSpace/ARL/DLF E-Science Institute
Data Management LayersLayers Characteristics Implication for PI Implication relative
to NSF
Curation Adding value throughout life-cycle
• Feature Extraction• New query
capabilities• Cross-disciplinary
• Competitive advantage
• New opportunities
Preservation Ensuring that data can be fully used and interpreted
• Ability to use own data in the future (e.g. 5 yrs)
• Data sharing
• Satisfies NSF needs across directorates
Archiving Data protection including fixity, identifiers
• Provides identifiers for sharing, references, etc.
• Could satisfy most NSF requirements
Storage Bits on disk, tape, cloud, etc.Backup and restore
• Responsible for:• Restore• Sharing• Staffing
• Could be enough for now but not near-term future
Establishing the JHU DMS
• May 2010 NSF announces DMP expectations• Services incubated and scoped summer/fall
2010– Build on Data Conservancy expertise
• Proposed in January and launched in July 2011– Consultative data management planning services
to support NSF proposals– Post award data management services
• Assessment of service in March 2012DuraSpace/ARL/DLF E-Science Institute
Background work to scope services
• Review of data management plan best practices and development of questionnaire
• Piloted data management consultations as cases• Short data survey with over 70 JHU researchers• Analysis of JHU NSF proposal and award activity• Business school capstone project on storage
options and costs• Review of past data archiving projects and work
DuraSpace/ARL/DLF E-Science Institute
Proposing data management services
• Services scoped to support anticipated NSF requirements and to reflect system capabilities– Defined time limits, volume of data deposited per
project, unencumbered data only for now• Prepared budget for services
– Five year timeframe for costs– All costs included: staffing, hardware, overhead, etc.– Cost assumptions included: total data archived,
complexity of data prep for ingest
DuraSpace/ARL/DLF E-Science Institute
Developing financial model
Support secured and financial model established• Data management planning for NSF proposals
– Service directly funded by schools – Each school pays percentage according to 3 year
average of total NSF proposals submitted• Post award data management
– Fee based service billed through a service center – First year fee a percent of total direct costs on
grant
DuraSpace/ARL/DLF E-Science Institute
JHU Data Management Services team
Dedicated group (that collaborates with DC and Digital Research and Curation Center)
• Two data management consultants• Senior technical consultant (Part-time)• Software developer• System administrator (to be hired)• Interim manager (Part-time)
DuraSpace/ARL/DLF E-Science Institute
Service marketing
• Reach out through all stakeholders– Announcements through Deans– Work with research projects administration– Outreach to department administrators– Briefings with library colleagues/departments– Presentations to researchers, graduate students
• More to do….and then repeat!
DuraSpace/ARL/DLF E-Science Institute
Observations
• Role of Choudhury as NSF PI within JHU• Sheridan Libraries R&D and experience with
scientific data• Already embedded within research enterprise• Specifics will vary by institution but JHU
approach can be generalized…• …But each institution should consider
appropriate role(s) or approach
DuraSpace/ARL/DLF E-Science Institute
Resources
• http://dataconservancy.org• Alpha release of software -
https://dataconservancy.org/software/downloads/
• http://dmp.data.jhu.edu• Reviewer guidelines for data management
plans - http://dmp.data.jhu.edu/assistance/grant-reviewers-worksheet-for-data-management-plans/
Acknowledgements
• NSF Award OCI-0830976• Sheridan Libraries financial support• Johns Hopkins University financial support• Data Conservancy colleagues for their exceptional
work and patience
DuraSpace/ARL/DLF E-Science Institute
Questions
DuraSpace/ARL/DLF E-Science Institute
Libraries
On overview of Sustaining e-Science Collaboration in
an Academic Research Library – the Purdue
Experience
James L. Mullins, PhDDean of Libraries & Esther Ellis Norton ProfessorOctober 17th, 2012
Libraries
What is meant when we say the library has a role in sustaining e-science?
• Application of library and archival science principles and theory to data management.
• Collaboration of Libraries with faculty, information technology, research office, and sponsored programs to develop a process and repository to manage and preserve data.
DuraSpace/ARL/DLF E-Science Institute
Libraries
I. Background and Development of the Libraries collaboration with e-Science at Purdue – on local and national levels.
• Local – Conversations with researchers, research office, etc.• Local – Principles of library and archival sciences.• Local – Restructuring of Libraries. • National – NSF Data Management dialogue.• Local – Creation of Data Research Scientist. • Local – Librarians not able “to service ” funded research.• Local – Librarians with professorial rank and tenure (start-up package of $40,000+).• Local – Distributed Data Curation Center (D2C2)
DuraSpace/ARL/DLF E-Science Institute
Libraries
I. Background and Development of the Libraries involvement in e-Science at Purdue – on local and national levels (con’t).
• National – IMLS grant to develop Data Curation Profiles. • Local – Partnerships of subject liaison librarians and faculty.• Local – Re-definition of librarian roles. • Local – Collaboration/advising on data management librarian role.• National – IMLS grant to develop Data Information Literacy. • National– Develop/teach ICPSR data science curriculum.• National – IMLS grant to develop Databib-DMPTool collaboration.• International – DataCite-Databib collaboration. • Local – Society of American Archivists (SAA) workshop.
DuraSpace/ARL/DLF E-Science Institute
Libraries
Sustainability – applied expertise of librarians.
• Must be integrated into role of librarians.• New positions must be created (data curation specialists, etc). • Priority for new positions must be established with a total view of strategic growth areas (at Purdue data management and information literacy).• Salaries partially funded through sponsored research, making funds available for other positions and graduate research assistants. • Cluster hires with colleges and schools.• Critical role of librarians in research garners additional support from University Administration.
DuraSpace/ARL/DLF E-Science Institute
Libraries
Research Collaborations 2012/2013• Big Data and Complex System Analytics to Enhance Society's
Resilience w/ Agronomy• Human Rights Texts for Digital Research: Archiving and
Analyzing Amnesty International’s Historic Urgent Action Bulletins w/ Political Science
• A Cross-Disciplinary Design Thinking Research Symposium to Catalyze Groundbreaking Research and Practice w/ Engineering Education
• Establishing a Materials Center for Agriculture, Food and Health w/ Food Science
DuraSpace/ARL/DLF E-Science Institute
Libraries
II. Building a Data Curation Program and Repository
• Not done independently of librarians knowledge & support structure within Libraries• In 2006, collaboration built around Purdue’s HubZero platform in answer to NSF DataNet RFP. • 2007 – 2010 Provost informed of impending data management mandate. • May 2010 – NSF announcement.• Summer 2010 – Provost and VPR appoint taskforce of faculty researchers – co-chaired by CIO and dean of libraries to develop “ template.” Report written August 2010.
Purdue University Research Repository – PURR
DuraSpace/ARL/DLF E-Science Institute
Libraries
II. Building a Data Curation Program and Repository(Con’t) • 2010 –
• Commitment to develop repository jointly by ITaP, OVPR, and Libraries - $90K
• Working Group created to plan and develop Purdue University Research Repository (PURR).
• Workshops sponsored by OVPR, conducted by Libraries and ITaP;
• Libraries create resources to support faculty in developing DMPs.
PURR
DuraSpace/ARL/DLF E-Science Institute
Libraries
•2011/2013 • Libraries Budget request indicated need for
positions to support sustainable data curation. • 479 grant proposals to date include PURR in
data management plans
PURR
II. Building a Data Curation Program and Repository(Con’t)
• 36 grants (so far) awarded with PURR as DMP. • TRAC certification underway – ISO 16363.
DuraSpace/ARL/DLF E-Science Institute
Libraries
II. Building a Data Curation Program and Repository(Con’t)
• What is provided by PURR? Any Purdue faculty, graduate student, or staff can:
• Create a trial project of 500 MB for three years.• External funding project receives 100GB for ten years.• Invite collaborators to join from other institutions. • Datasets can be published w/o grant: 50MB; with, 10GB.• Each project receives to-do lists to manage projects;• Wiki area for notes;• Micro-blogging interface (similar to Facebook) for
discussion among team.
PURR
DuraSpace/ARL/DLF E-Science Institute
Libraries
II. Building a Data Curation Program and Repository(Con’t)
• PURR Digital Preservation Policy approved April, 2012 http://www.lib.purdue.edu/spcol/content/PURRdigitalpreservationpolicy.pdf
• Working Group report on three year funding requirements• One time - $1.2 M – received January 2012.• Ongoing costs - $194,000 / year.
• Ongoing costs: F&A? Charge Back?
PURR
DuraSpace/ARL/DLF E-Science Institute
BBPURRDataManagementDiscoveryPreservation
OVERVIEW OF PURR
PURRITaP
Infrastructure(HUBzero™)
LibrariesData Services(Reference & Consulting) &Preservation
Researchers
Research Collaboration, Data Management Publishing & Archiving
OVPR/SPSPolicy, Submission, and Grant Compliance
OVERVIEW OF PURR
• Collaboration of ITaP, Libraries, and OVPR• Based on HUBzero, provides a hub for Purdue researchers and
their collaborators to use, manage, and share their data• Comprehensive resource for supporting research data
management (Knowledge Base, tutorials, example plans, boilerplate text, ask questions, etc.)
• Approximately 1/3 of NSF proposals submitted from Purdue last year included PURR as a component of their data management plans
• Purdue researchers are not required to use PURR. Other options may be appropriate such as center facilities or disciplinary repositories.
WHY USE PURR ?
PURR can be used for…
Managing Data
Publishing Data
Preserving Data and
Research Collaboration
QUICK STARThttp://research.hub.purdue.edu
What can be done right now:– Create an account– Create a project
• a default allocation of storage for free and can purchase more if you need it
– Invite collaborators– Upload data to project– Publish and/or archive datasets with Digital Object
Identifiers (DOI)– Search, browse, and cite published datasets
Overview model of PURR functions
Data mgmtplanningresources
Discovery commitment ends,
Long term preservation decision
IF grant awarded,
more space
Creating projects,
collaborating
Uncurated data Discovery & Dissemination
Long term preservation
CurationCreate Research, data generation/collection
Researchers are guided to PURR for help withdata mgmt plans by Pre-Awards, workshops
and promotion, and by word-of-mouth
Data submitted
for publishing/archiving
PURR FUNCTIONSSTEP 1
Data mgmtplanningresources
Discovery commitment ends,
Long term preservation decision
IF grant awarded,
more space
Creating projects,
collaborating
Uncurated data Discovery & Dissemination
Long term preservation
CurationInitiate Research, data generation/collection
PLAN DEVELOP PROJECT EXPAND PUBLISH DATA DISSEMINATE DATA
Researchers can create projects at anytime, invite others to join… the goal
is to help facilitate research development
Data submitted
for publishing/archiving
PURR FUNCTIONSSTEP 2
Overview model of PURR functions
Data mgmtplanningresources
Discovery commitment ends,
Long term preservation decision
IF grant awarded,
more space
Creating projects,
collaborating
Uncurated data Discovery & Dissemination
Long term preservation
CurationInitiate Research, data generation/collection
Once a grant is awarded, researchers
get an increase in space allocation and
length of time for project and data
Data submitted
for publishing/archiving
PURR FUNCTIONSSTEP 3
Data mgmtplanningresources
Discovery commitment ends,
Long term preservation decision
IF grant awarded,
more space
Creating projects,
collaborating
Uncurated data Discovery & Dissemination
Long term preservation
CurationInitiate Research, data generation/collection
To make data sets publicly discoverable and available, there is a submission
and “publishing” process
Data submitted
for publishing/archiving
PURR FUNCTIONSSTEP 4
Data mgmtplanningresources
Discovery commitment ends,
Long term preservation decision
IF grant awarded,
more space
Creating projects,
collaborating
Uncurated data Discovery & Dissemination
Long term preservation
CurationInitiate Research, data generation/collection
PURR policy allows for a specified timefor discovery, and then decisions are
made regarding long-term preservation
Data submitted
for publishing/archiving
PURR FUNCTIONSSTEP 5
WHERE CAN I GO FOR HELP ?Overall help: Librarians
(link to subject librarians directory or name)Data Services: http://www.lib.purdue.edu/research/dataservices
Librarians consult on best practices for data formats, metadata, sharing, reuse, archiving, review plans, write letters of support, and collaborate as partners/co-PI’s on proposals.
Grant preparation: Sponsored Programs Services (SPS)
PURR Website: http://research.hub.purdue.edu
Libraries
•Establish easier access to scientific research data on the Internet.•Increase acceptance of research data as legitimate, citable contributions to the scientific record.•Support data archiving that will permit results to be verified and re-purposed for future study.
http://www.datacite.org/
Retrieval and Citation
Libraries
The DOI system offers an easy way to connect the article with the underlying data:
The dataset:Kuhlmann, H et al. (2009): Age models, iron intensity, magnetic susceptibility records and dry bulk
density of sediment cores from around the Canary Islands. doi:10.1594/PANGAEA.727522,
Is supplement to the article:Kuhlmann, Holger; Freudenthal, Tim; Helmke, Peer; Meggers, Helge
(2004): Reconstruction of paleoceanography off NW Africa during the last 40,000 years: influence of local and regional factors on sediment accumulation. Marine Geology, 207(1-4), 209-224,
doi:10.1016/j.margeo.2004.03.017
Linking of Dataset to Article
DuraSpace/ARL/DLF E-Science Institute
Libraries
•Establish easier access to scientific research data on the Internet.•Increase acceptance of research data as legitimate, citable contributions to the scientific record.•Support data archiving that will permit results to be verified and re-purposed for future study.
http://www.datacite.org/
Retrieval and Citation
Libraries
In the United States three DataCite Members ProvideDOIs for datasets:
http://datacite.org/DataCiteUS
Libraries
Libraries
No one/right way to sustain e-science or data management; each institutional environment will be different and require its own unique collaborations or roles.
DuraSpace/ARL/DLF E-Science Institute
Libraries
Thank you
Questions:
DuraSpace/ARL/DLF E-Science Institute