data management for undergraduate researchers

43
Data Management for Undergraduate Researchers Office of Undergraduate Research Seminar and Workshop Series Rebekah Cummings, Research Data Management Librarian J. Willard Marriott Library, University of Utah September 21, 2015

Upload: rebekah-cummings

Post on 23-Jan-2018

400 views

Category:

Education


1 download

TRANSCRIPT

Data Management for Undergraduate

ResearchersOffice of Undergraduate Research Seminar and Workshop Series

Rebekah Cummings, Research Data Management LibrarianJ. Willard Marriott Library, University of Utah

September 21, 2015

• Introductions

• What are data?

• Why manage data?

• Data Management Plans

• Data Organization

• Metadata

• Storage and Archiving

• Questions

NameMajorResearch Project

What is data management?

The process of controlling the information (read: data) generated

during a research project.

https://www.libraries.psu.edu/psul/pubcur/what_is_dm.html

What are data?

“The recorded factual material commonly accepted in the research community as necessary to validate

research findings.”

- U.S. OMB Circular A-110

Data are diverse

Data are messy

Why manage data? • Save time and efficiency

• Meet grant requirements

• Promote reproducible research

• Enable new discoveries from your data

• Make the results of publicly funded research publicly available

We are trying to avoid this scenario…

Two bears data management problems

1. Didn’t know where he stored the data

2. Saved one copy of the data on a USB drive

3. Data was in a format that could only be read by outdated, proprietary software

4. No codebook to explain the variable names

5. Variable names were not descriptive

6. No contact information for the co-author Sam Lee

Scenario

You develop a research project during your undergraduate experience. You write up the results, which are accepted by a reputable journal. People start citing your work! Three years later someone accuses you of falsifying your work.

Scenario adapted from MANTRA training module

• Would you be able to prove you did the work as you described in the article?

• What would you need to prove you hadn’t falsified the data?

• What should you have done throughout your research study to be able to prove you did the work as described?

Data Management Plans

• What data are generated by your research?

• What is your plan for managing the data?

• How will your data be shared?

Research Data Lifecycle

Courtesy of the UK Data Archive http://www.data-

archive.ac.uk/create-manage/life-cycle

• Types of data

• Data description

• Data storage

• Data sharing

• Data archiving and

responsibility

• Data management costs

Data organization

File naming

MyData.xls

MeetingNotes.doc

Presentation.ppt

Assignment1.pdf

File naming best practices1. Be descriptive

2. Don’t be generic

3. Appropriate length

4. Be consistent

5. Think critically about your file names

File naming best practices• Files should include only letters,

numbers, and underscores/dashes.

• No special characters

• No spaces; Use dashes, underscores, or camel case (like-this or likeThis)

• Not all systems are case sensitive. Assume this, THIS, and tHiS are the same.

Version Control - Numbering

001002003009010099

Use leading zeros for scalability

Bonus Tip: Use ordinal numbers (v1,v2,v3) for major version changes and decimals for minor changes (v1.1, v2.6)

110239

99

Version Control - DatesIf using dates use YYYYMMDD

June2015 = BAD!

06-18-2015 = BAD!

20150618 = GREAT!

2015-06-18 = This is fine too

From a DMP…

“Each file name, for all types of data, will contain the project acronym PUCCUK; a reference to the file content (survey, interview, media) and the date of an event (such as the date of an interview).

• PLPP_EvaluationData_Workshop2_2014.xlsx

• MyData.xlsx

• publiclibrarypartnershipsprojectevaluationdataworkshop22014CummingsHelenaMontana.xlsx

Who filed better?

Who filed better? • July 24 2014_SoilSamples%_v6

• 20140724_NSF_SoilSamples_Cummings

• SoilSamples_FINAL

File organization best practices

• Top level folder should include project title and date.

• Sub-structure should have a clear and consistent naming convention.

• Document your structure in a README text file.

File organization exercise

Describing data

Research Documentation • Grant proposals and related reports

• Applications and approvals (e.g. IRB)

• Codebooks, data dictionaries

• Consent forms

• Surveys, questionnaires, interview protocols

• Transcripts, hard copies of audio and video files

• Any software or code you used (no matter how insignificant or buggy)

Three levels of documentation

• Project level – what the study set out to do, research questions, methods, sampling frames, instruments, protocols, members of the research team

• File or database level – How all the files relate to one another. A README file is a classic way of capturing this information.

• Variable or item level – Full label explaining the meaning of each variable.

http://datalib.edina.ac.uk/mantra/documentation_metadata_citation/

IJ?

XVAR?

FNAME?

http://www.icpsr.umich.edu/files/deposit/Guide-to-Codebooks_v1.pdf

MetadataUnstructured

Data

Structured Data

There was a study put out by Dr. Gary

Bradshaw from the University of

Nebraska Medical Center in 1982

called “ Growth of Rodent Kidney

Cells in Serum Media and the Effect of

Viral Transformation On Growth”. It

concerns the cytology of kidney cells.

Title Growth of rodent kidney cells in serum media and the effect of viral transformations on growth.

Author Gary Bradshaw

Date 1982

Publisher University of Nebraska Medical Center

Subject Kidney -- Cytology

Dublin Core

Disciplinary MetadataDigital Curation Centre’s list of subject-specific metadata schemas - http://www.dcc.ac.uk/resources/metadata-standards

LOCKSS (Lots of Copies Keeps

Stuff Safe)

Options for data storage

• Personal computers or laptops

• Networked drives

• External storage devices

Ubox – box.utah.edu

Language from a DMP“All data files will be stored on the University server that is backed up nightly. The University's computing network is protected from viruses by a firewall and anti-virus software. Digital recordings will be copied to the server each day after interviews.

Signed consent forms will be stored in a locked cabinet in the office. Interview recordings and transcripts, which may contain personal information, will be password protected at file-level and stored on the server.

Original versions of the files will always be kept on the server. If copies of files are held on a laptop and edits made, their file names will be changed.”

Thinking long-term

Archiving options

• Domain-specific repository

• General Purpose Data Repository

• Institutional repository

Major takeaways• Data management starts at the beginning of

a project

• Document your data so that someone else could understand it

• Have more than one copy of your data

• Consider archiving options when you are done with your project

Questions?

[email protected]

(801) 581-7701

Marriott Library, 1705Y

…or ask now!