Page 1: Data management plans archeology class 10 18 2012

Data Management Plans:Graduate Research and Beyond

Elizabeth BrownScholarly Communications and Library Grants OfficerBinghamton University LibrariesOctober 18,2012

Page 2: Data management plans archeology class 10 18 2012

What we’ll cover today

What is an NSF Data Management Plan? How and why was it created?

Why are Libraries a part of data management?

(Short Break)

Creating and Implementing NSF Data Management Plans

Preserving Research Data after a project is completed

Page 3: Data management plans archeology class 10 18 2012

Learning Objectives

To: Understand current NSF and government data

policies requirements. Be aware of research support services within

the Libraries. Locate and use various resources to develop

data management plans (DMPs) for NSF proposal(s).

Write a comprehensive DMP for NSF proposal(s).

Identify and plan for long-term preservation of research data from funded projects.

Page 4: Data management plans archeology class 10 18 2012

What is a Data Management Plan?

Page 5: Data management plans archeology class 10 18 2012

What is a Data Management Plan?

Storing Research Data “Forever”Serge GoldsteinAssociate CIO & Director of Academic

ServicesPrinceton UniversityFall 2010 Coalition for Networked Information


Page 6: Data management plans archeology class 10 18 2012

Some Handy Definitions

Cyberinfrastructure: computing resources & networks, services, & people

Data management: technical processing and preparation of data for analysis

Data curation: selection of data for preservation and adding value for current and future use

Data citation: mechanisms to enable easy reuse and verification, track impact of data, and create structures to recognize and reward researchers (DataCite)

Data sharing: must take into account ethical and legal issues; a spectrum with many options

Source: Heather Coates and Kristi Palmer, Data management plans & planning: Meeting the NSF Requirement, March 7, 2012 URL:

Page 7: Data management plans archeology class 10 18 2012

NSF DMP Requirements by Unit


Page 8: Data management plans archeology class 10 18 2012

Why were Data Management Plans


Page 9: Data management plans archeology class 10 18 2012

Why the NSF created this requirement


Page 10: Data management plans archeology class 10 18 2012

Why the NSF created the DMP requirement


Page 11: Data management plans archeology class 10 18 2012

Why NSF created the NSF requirement


Page 12: Data management plans archeology class 10 18 2012

Acknowledging: Open is a movement

Open Access Open Educational

Tools Open Standards Open Science Open Source

Dorothea Salo, Battle of the Opens, Book of Trogool, March 15, 2010

Page 13: Data management plans archeology class 10 18 2012

Acknowledging:Publishing is changing

Houghton, J.W. (2011). "The costs and potential benefits of alternative scholarly publishing models" Information Research, 16(1) paper 469. [Available at]

Page 14: Data management plans archeology class 10 18 2012

Acknowledging:Scholarly impact measures

Page 15: Data management plans archeology class 10 18 2012

Acknowledging:Accountability of funding agencies


Page 16: Data management plans archeology class 10 18 2012

How will DMPs help me?

Let’s think about it… (discussion)

Page 17: Data management plans archeology class 10 18 2012

How will DMP’s help me?

Saves time Less reorganization for future projects

Increases efficiency Compile and prioritizing data collection(s) Anticipate how your data will be used

Consider data preservation requirements and plan for them

Better aware of funding agency mandates and data preservation culture in your field

Page 18: Data management plans archeology class 10 18 2012

How are Libraries a part of this?

Page 19: Data management plans archeology class 10 18 2012

Libraries Support Scholarship


•Access•Services•Cultural Memory•Preservation

Page 20: Data management plans archeology class 10 18 2012

Support for Scholarship is evolving

URLs:; A42/2/GKL1012;

Print Archives, Collections

Electronic Content, Databases

Research Data

Page 21: Data management plans archeology class 10 18 2012

Libraries’ Research Data Support


NSF Data Management Plan Info

Page 22: Data management plans archeology class 10 18 2012

NSF Data Management Plan support


Find funder requirements

Locate sample plans

Write, edit, review plans

Page 23: Data management plans archeology class 10 18 2012

Copyright information, guidance


Copyright Terms Locating

OwnersClassroom exceptions

Page 24: Data management plans archeology class 10 18 2012

Information and Policy Updates


Page 25: Data management plans archeology class 10 18 2012

Creating and Implementing Data Management Plans

Page 26: Data management plans archeology class 10 18 2012

Consider the Research Life Cycle

Source: DDI Structural Reform Group. “DDI Version 3.0 Conceptual Model." DDI Alliance. 2004. Accessed on 11 August 2008. <>.

Page 27: Data management plans archeology class 10 18 2012

DMPTool: Funder Requirements Info, Templates


Page 28: Data management plans archeology class 10 18 2012

DMP Sections

1. Types of Data2. Data and Metadata Standards3. Policies for Access and Sharing

Data Privacy and Protection4. Data re-use and re-distribution5. Data Archiving and Preservation

Page 29: Data management plans archeology class 10 18 2012

1. Types of Data

Expected data. The DMP should describe the types of data, samples, physical collections, software, curriculum materials, or other materials to be produced in the course of the project. It should then describe the expected types of data to be retained.

The Federal government defines ‘data’ in OMB Circular A-110 as: Research data is defined as the recorded factual material commonly accepted in the scientific

community as necessary to validate research findings, but not any of the following: preliminary analyses, drafts of scientific papers, plans for future research, peer reviews, or communications with colleagues. This "recorded" material excludes physical objects (e.g., laboratory samples). Research data also do not include: (A) Trade secrets, commercial information, materials necessary to be held confidential by a

researcher until they are published, or similar information which is protected under law; and (B) Personnel and medical information and similar information the disclosure of which would

constitute a clearly unwarranted invasion of personal privacy, such as information that could be used to identify a particular person in a research study.

PIs should use the opportunity of the DMP to give thought to matters such as: • The types of data that their project might generate and eventually share with others, and under

what conditions • How data are to be managed and maintained until they are shared with others • Factors that might impinge on their ability to manage data, e.g. legal and ethical restrictions on

access to non-aggregated data • The lowest level of aggregated data that PIs might share with others in the scientific community,

given that community’s norms on data • The mechanism for sharing data and/or making them accessible to others • Other types of information that should be maintained and shared regarding data,


Page 30: Data management plans archeology class 10 18 2012

DMP Sample: I. Types of Data“This research project will generate data resulting from sensor recordings (i.e.

earth pressures, accelerations, wall deformation and displacement and soil settlement) during the centrifuge experiments. In addition to the raw, uncorrected sensor data, converted and corrected data (in engineering units), as well as several other forms of derived data will be produced. Metadata that describes the experiments with their materials, loads, experimental environment and parameters will be produced. The experiments will also be recorded with still cameras and video cameras. Photos and videos will be part of the data collection.”

“A total storage demand of 50 GB is anticipated at the University of Michigan, and 50 GB at Auburn University.”

“Based on the previous viscoelastic turbulent channel flow simulations, the amount of resulting binary data is estimated around 40 TB per year. Some text format data files are also required for post-processing in the laboratory and are anticipated to be around 1 TB per year.”

“In one year, we will perform approximately 2 to 3 simulations. This means ~100 3D plots, 30 restart files, 1000 EUV, X-ray and LASCO-like images, 10 satellite files, 1000 2D plot files (total of about 150 GB of data per year).”


Page 31: Data management plans archeology class 10 18 2012

DMP Sample: I. Types of Data

“The data, samples, and materials expected to be produced will consist of laboratory notebooks, raw data files from experiments, experimental analysis data files, simulation data, microscopy images, optical images, LabView acquisition programs, and quantum dot superlattice nanowire thermoelectric samples.... each of these data is described below:

A. Laboratory notebooks: The graduate student and PI will record by hand any observations, procedures, and ideas generated during the course of the research.

B. Experimental raw data files: These files will consist of ASCII text that represents data directly collected from the various electrical instruments used to measure the thermoelectric properties of the superlattice nanowire thermoelectric devices.

C. Experimental analysis data files: These files will consist of spreadsheets and plots of the raw data mentioned in Part A. The data in these files will have been manipulated to yield meaningful and quantitative values for the device efficiency and ZT. The analysis will be performed using best practice and acceptable methods for calculating device efficiency and ZT.

D. Simulation data: These data will represent the results from commercially available simulation and modeling software to model the quantum confinement.

E. Microscopy images: Images of the proposed silicon nanostructures will be generated by scanning electron microscopy (SEM), transmission electron microscopy (TEM) at high resolution to quantify wire diameter and roughness, and atomic force microscopy (AFM).


Page 32: Data management plans archeology class 10 18 2012

2. Data Formats and Metadata3. Policies for Access and Sharing; Data Privacy and Protection4. Data re-use and re-distribution

Data formats and dissemination. The DMP should describe data formats, media, and dissemination approaches that will be used to make data and metadata available to others. Policies for public access and sharing should be described, including provisions for appropriate protection of privacy, confidentiality, security, intellectual property, or other rights or requirements. Research centers and major partnerships with industry or other user communities must also address how data are to be shared and managed with partners, center members, and other major stakeholders.


Page 33: Data management plans archeology class 10 18 2012

3. Policies for Access and Sharing; Data Privacy and Protection4. Data re-use and re-distribution

Period of data retention. SBE is committed to timely and rapid data distribution. However, it recognizes that types of data can vary widely and that acceptable norms also vary by scientific discipline. It is strongly committed, however, to the underlying principle of timely access, and applicants should address how this will be met in their DMP statement.


Page 34: Data management plans archeology class 10 18 2012

DMP Samples: II. Data Formats and Metadata

“The Dublin Core will be used as the standard for metadata. The metadata set mainly consists of fifteen elements, including title, creator, subject, description, publisher, contributor, date, type, format, identifier, source, language, relation, coverage, and rights. These elements have been ratified as both national (i.e., ANSI/NISO Standard Z39.85) and international standards (i.e., ISO Standard 15836). Further, they describe resources such as text, video, audio, and data files. These standard formats will be used in our study.”

“For each code made available, a user's manual will be provided with instructions for compiling the source codes, installing and running the codes, formulating input data streams, and visualizing the output. Documentation will be in PDF format.”


Page 35: Data management plans archeology class 10 18 2012

DMP Sample: II. Data Formats and Metadata

“Verilog, SPICE, and MATLAB files generated will be processed and submitted to FTP servers as .mat files with TXT documentation. The data will be distributed in several widely used formats, including ASCII, tab-delimited (for use with Excel), and MAT format. Instructional material and relevant technical reports will be provided as PDF. Digital video data files generated will be processed and submitted to the FTP servers in MPEG-4 (.mp4) and .avi formats. Variables will use a standardized naming convention consisting of a prefix, root, suffix system.”

“Plasma image data will be RGB colored JPG or TIFF format with resolution determined by the camera. Video data will be RGB colored AVI format.”

“Images from the scanning electron microscopes (SEMs) and focused ion beam workstations (FIBs) are saved in tagged image file format (TIFF), which is readily readable by a wide variety of imaging and processing applications.”


Page 36: Data management plans archeology class 10 18 2012

DMP Sample: III. Access and Sharing Policies

III. Policies for access and sharing and provisions for appropriate protection/privacy

As detailed in the project description, the CARE platform in intended to be a research cloud service that provides analytical middleware for use in analyzing health data. During the project, access will be limited to project team member and invited expert stakeholders through a password protected website. Commencing with Task 5 (month 26), means for access by the broader research community will be implemented. At that time, the project team will determine whether there is a need for initiating access charges, which may be appropriate for securing the longer terms sustainability of the CARE platform and analysis tools.

All of the data that will be utilized are publicly available data sets that have been de-identified by public agencies and have passed their standards for privacy protection and assurance so that no individually identifiable data is provided. The datasets to be utilized within this project and other intellectual property have been released without restriction.

Over the course of the study, the project team will meet with both the Community Health Institute and the SafeRoadMaps/CERS team to arrive at a data-sharing agreement for postproject utilization of their data. Such an agreement will provide a model for not only this partnership, but for licensing the CARE Platform analytics for use by other health data sets.


Page 37: Data management plans archeology class 10 18 2012

DMP Sample: IV. Reuse and Distribution

“After uploading the data into the NEES Project Warehouse and allowing public access, all data will be available for re-use and re-distribution with proper acknowledgement of their originators.”

“Researchers and practitioners in diverse fields will be able to readily reuse and redistribute shared data. Terms of use will include the prohibition of commercial commercial use of the work – modifications of the work will be allowed with the proper citations.”

“The simulation code will be developed in C and provided to the public in source code format for non-commercial use under GNU General Public License (GPL).”Source:

Page 38: Data management plans archeology class 10 18 2012

DMP Sample: IV. Data Reuse and Distribution

“Before data is stored, it will be stripped of all institutional and individual identifiers to ensure confidentiality by staff of the Center following procedures developed by the researchers.”

“Audio files of interviews will be stored on a password protected secure server during the study and for two years after, and destroyed subsequently.”

“Exceptions to shared data include proprietary DTE GIS utility information (for security reasons) and software code of commercial interest to the project's GOALI partners or identified licensees. Both exceptions are permitted by the ENG DMP policy.... The research team will however develop a set of 3D GIS datasets for distribution the public. These datasets will represent non-existent buried infrastructure and will only be useful for the evaluation of the other research products.”Source:

Page 39: Data management plans archeology class 10 18 2012

DMP Sample: IV. Reuse and redistribution

IV. Policies and provisions for re-use, re-distribution

As noted in the project description, policies for provision and re-use will be developed as part of the research project. It is anticipated that there will be considerable interest in the platform and tools within the research and practice community, including academic researchers, health research agencies, and cloud service providers, among others. The need for such a tool was identified during a recent NSF sponsored symposium on Health Cyberinfrastructure, which was conducted by the PIs.Source:

Page 40: Data management plans archeology class 10 18 2012

5. Data Archiving and Preservation

Data storage and preservation of access. The DMP should describe physical and cyber resources and facilities that will be used for the effective preservation and storage of research data. These can include third party facilities and repositories.


Page 41: Data management plans archeology class 10 18 2012

DMP Sample: V. Archiving and Preservation

V. Plans for archiving and Preservation of access The project website and service will contain all appropriate

information and documentation for using the CARE platform and tool for health research discovery and analysis. The site will also contain all references, research papers, and related products developed throughout the course of the project.

The San Diego Supercomputer Facility at UC San Diego will host the data throughout the research project and provide a minimum of three years of online access beyond the completion of the project. Data storage will be performed at the nominal rates charged by SDSC to any project using the facility. These are relatively modest (~$1000/TB) and can be borne ahead of time for the 3-year period. Should the CARE platform not extend beyond the three years (post grant), the data could then be archived at SDSC at even lower cost. A decision would have to be made at that point in time regarding how exactly to archive the data, and on paying for the archival storage.Source:

Page 42: Data management plans archeology class 10 18 2012

DMP Sample: V. Archiving and Preservation

“For archiving, the data along with any related publications will be deposited in Libra, the UVA archival system, with an appropriate licensing statement. DOIs will be attached to all data stored from this project. Since the current preservation plan for Libra is indefinite data storage, preservation of access is assured.”

“Materials to be publicly shared will be stored with the Deep Blue repository, a service of the UM Libraries that provides deposit access and preservation services. Deposited items will be assigned a persistent URL that will be registered with the Handle System for assigning, managing, and resolving persistent identifiers (‘handles’) for digital objects and other Internet resources.”Source:

Page 43: Data management plans archeology class 10 18 2012

After the project is complete

Page 44: Data management plans archeology class 10 18 2012

Preserving Research Data

What are your goals? Who needs access and when? When/if can data be

shared/distributed? Prepare for future funder mandates Plan beyond individual PI/grant


Page 45: Data management plans archeology class 10 18 2012

Who owns your research data?

• Campus Copyright policy• Collaborator institution copyright and

ownership policies, informal agreements

• Patent and provenance issues• International copyright considerations• Post-project data retention

requirements• Post-employment data agreements


Page 46: Data management plans archeology class 10 18 2012

How large are your research data sets?

Survey sample:308 campus researchers with externally sponsored projects or submitted proposals (2009-2011); 91 survey respondents

Source: Binghamton University Research Faculty Survey, June 2011, Jim Wolf, Director of Academic Computing (ret.)

Page 47: Data management plans archeology class 10 18 2012

Campus Research Data Locations

Source: Jim Wolf, Director of Academic Computing (ret.), June 2011

Page 48: Data management plans archeology class 10 18 2012

Local re-search group


ITS storage Library archive

Disciplinary repository

(e.g., ICPSR)









3-7 yrs

<3 yrs

Who needs access to data? For how long?



rch gr

oup serve


ITS st



y arch



ary re






access granted to in-dividualsopenly available to allproprietaryprivate

Source: Research Faculty Survey, Jim Wolf, Director of Academic Computing (ret.), June 2011

Data Accessibility

Data Preservation Timeframe

Page 49: Data management plans archeology class 10 18 2012

Preservation: more than just backup

Create consistent, standardized metadata

Perform regular file fixity and format checks

Identify, update and migrate file formats

Mitigate and eliminate file degradation Provide storage space, controlled

access and an “exit strategy”

Page 50: Data management plans archeology class 10 18 2012

Bit Rot: Files decay over timeThankó verù mucè foò á lovelù luncheoî anä somå splendiä views® Wå imaginå �yoõ no÷ iî Indiá anä wondeò iæ yoõ arå listeninç tï somå oæ thå samå Indianó �witè whoí wå talkeä yearó ago® Thå artistó anä economistó werå quitå �remarkable¬ buô thå politicaì scientistó useä tï talë abouô atomiã �bombó foò Indiá witè eager¬ burninç eyeó whilå beinç verù carefuì noô tï kilì �anù insects® (Severaì haä theiò beardó covereä iî whitå silk so that no insect �would get caught and be stifled there.)

Sources: Hoover Institution Library and Archives Blog, Nov. 18, 2011;

Page 51: Data management plans archeology class 10 18 2012

Data formats, devices, readers evolve

Media Deterioration and Format Obsolescence Demonstrate that “Backups” are Inadequate for Long-Term Preservation


Page 52: Data management plans archeology class 10 18 2012

Preservation is an iterative process

Build content from one project to the next

Create a set of policies based on current best practices and funder requirements

Refine data collection, access, use, distribution, and preservation policies over time

Page 53: Data management plans archeology class 10 18 2012

Thank You

Elizabeth BrownLS-2504C(607) [email protected]


Top Related