data management for undergraduate research

Post on 09-Aug-2015

771 Views

Category:

Data & Analytics

4 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Data Management for Undergraduate

ResearchersOffice of Undergraduate Research Seminar and Workshop Series

Rebekah Cummings, Research Data Management LibrarianJ. Willard Marriott Library, University of Utah

June 18, 2015

• Introductions

•What are data?

•Why manage data?

•Data Management Plans

• File Naming

•Metadata

•Storage and Archiving

•Questions

NameMajorResearch Project

What are data?

“The recorded factual material commonly accepted in the

research community as necessary to validate research

findings.”

- U.S. OMB Circular A-110

Data are diverse

Data are messy

Why manage data?

Your best collaborator is yourself six months from now, and your past self doesn’t answer emails.

Why else manage data?

•Save time and efficiency

•Meet grant requirements

•Promote reproducible research

•Enable new discoveries from your data

•Make the results of publicly funded research publicly available

We are trying to avoid this scenario…

Two bears data management

problems1. Didn’t know where he stored the data

2. Saved one copy of the data on a USB drive

3. Data was in a format that could only be read by outdated, proprietary software

4. No codebook to explain the variable names

5. Variable names were not descriptive

6. No contact information for the co-author Sam Lee

Data Management Plan

PLANNINGPLANNING

Courtesy of the UK Data Archive http://www.data-

archive.ac.uk/create-manage/life-cycle

Scenario

You develop a research project during your undergraduate experience. You write up the results, which are accepted by a reputable journal. People start citing your work! Three years later someone accuses you of falsifying your work.

Scenario adapted from MANTRA training module

•Would you be able to prove you did the work as you described in the article?

•What would you need to prove you hadn’t falsified the data?

•What should you have done throughout your research study to be able to prove you did the work as described?

Elements of a DMP•Types of data, including file formats

•Data description

•Data storage

•Data sharing, including confidentiality or security restrictions

•Data archiving and responsibility

•Data management costs

File naming

File naming best practices

•Be descriptive

•Don’t be generic

•Appropriate length

•Be consistent

•PLPP_EvaluationData_Workshop2_2014.xlsx

•MyData.xlsx

•publiclibrarypartnershipsprojectevaluationdataworkshop22014CummingsHelenaMontana.xlsx

Who filed better?

File naming best practices

•Files should include only letters, numbers, and underscores.

•No special characters (%@#*?!)

•No spaces

•Lowercase or camel case (LikeThis)

•Not all systems are case sensitive. Assume this, THIS, and tHiS are the same.

Dates and numbering…

1. Use leading zeros for scalability

001

002

009

019

999

2. If using dates use YYYYMMDD

June2015 = BAD!

06-18-2015 = BAD!

20150618 = GREAT!

2015-06-18 = This is fine too

Who filed better?

•July 24 2014_SoilSamples%_v6

•20140724_NSF_SoilSamples_Cummings

•SoilSamples_FINAL

File organization best practices

•Top level folder should include project title and date.

•Sub-structure should have a clear and consistent naming convention.

•Document your structure in a README text file.

File organization exercise

MetadataUnstructure

d Data

Structured Data

There was a study put out by Dr.

Gary Bradshaw from the

University of Nebraska Medical

Center in 1982 called “ Growth

of Rodent Kidney Cells in Serum

Media and the Effect of Viral

Transformation On Growth”. It

concerns the cytology of kidney

cells.

Title Growth of rodent kidney cells in serum media and the effect of viral transformations on growth.

Author Gary Bradshaw

Date 1982

Publisher

University of Nebraska Medical Center

Subject Kidney -- Cytology

Why create metadata?

IJ?

XVAR?

FNAME

?

Data documentation includes…

•Questionnaires

•Interview protocols

•Lab notebooks

•Code or scripts

•Consent forms

•Samples, weights, methods

•Read me files

LOCKSS (Lots of Copies

Keeps Stuff Safe)

Options for data storage

•Personal computers or laptops

•Networked drives

•External storage devices

Storing sensitive data

•If possible, collect the necessary data without using direct identifiers

•Otherwise, de-identify your data upon collection or immediately afterwards

•Do not store or share sensitive data on unencrypted devices

•Talk to IRB

Thinking long-term

Archiving options

•Public repository – FigShare

•Domain-specific repository

•Institutional repository

Major takeaways•Data management starts at the

beginning of a project

•Document your data so that someone else could understand it

•Have more than one copy of your data

•Consider archiving options when you are done with your project

Questions?

rebekah.cummings@utah.edu

(801) 581-7701

Marriott Library, 1705Y

…or ask now!

top related