data life cycle

17
Data Life Cycle Data Organization, Documentation and Metadata Oklahoma State University Libraries - Data Management Workshop 11/29/2016

Upload: jason-henderson

Post on 12-Apr-2017

84 views

Category:

Education


0 download

TRANSCRIPT

Page 1: Data Life Cycle

Data Life CycleData Organization, Documentation and Metadata

Oklahoma State University Libraries - Data Management Workshop 11/29/2016

Page 2: Data Life Cycle

File Format Standards?1

▪ File formats and file naming according to standards are necessary to ensure that your data can be uniquely identified and made accessible for future uses. When selecting tools for storing your data, pay special attention to the output formats of your data.

▪ For preservation purposes, when possible use data formats that are:

Open standard, and in an easily re-usable format that is machine readable and non-proprietary in nature

(Example: .txt as opposed to .pdf)

Page 3: Data Life Cycle

File Format Standards?2

▪ When listing out the data format you will be using, make sure to include:

Software necessary to view the data Information about version controlIf data will be stored in one format during collection

and analysis and then transferred to another format for preservation:

List out features that may be lost in data conversion such as

system specific labels.

Page 4: Data Life Cycle

File Naming3

▪ 1. Be consistent. Have conventions for naming:(1) Directory structure(2) Folder names(3) File names

▪ Always include the same information (eg. date and time)

Page 5: Data Life Cycle

File Naming4

▪ Retain the order of information(eg. YYYYMMDD, not MMDDYYY )

Page 6: Data Life Cycle

File Naming5

▪ 2. Be descriptive so others can understand your meaning.

Try to keep file and folder names under 32 characters

Within reason, Include relevant information such as: Unique identifier(Project name or grant number in folder

name)Project or research data nameDate (In file properties as well)

Page 7: Data Life Cycle

File Naming6

▪ Use application-specific codes in 3-letter file extension and lowercase: mov, tif, pdf

▪ When using sequential numbering, make sure to use leading zeros to allow for multi-digit versions. For example, a sequence of 1-10 should be numbered 01-10; a sequence of 1-100 should be numbered001-010-100.

▪ No special characters: & , * % # ; * ( ) ! @$ ^ ~ ' { } [ ] ? < > -

Page 8: Data Life Cycle

File Naming7

▪ Use only one period and before the file extension(e.g. NamePaper.doc NOT Name.Paper.doc OR Name_Paper..doc)

Page 9: Data Life Cycle

Tracking Changes with Version Control8

▪ Keep track of versions of files (version control): ▪ Automatically: Utilize a system that will automatically

create multiple versions of your work, such as GIT.▪ Don't use confusing labels, such as 'revision', 'final',

'final2', etc.

Page 10: Data Life Cycle

Tracking Changes with Version Control9

▪ Manually: Use a sequential numbered system: v01, v02

Page 11: Data Life Cycle

What is Metadata?10

▪ Metadata is a term that has primarily been used by library and archives communities to describe standards used to aid the discovery of objects. A metadata record consists of all the metadata elements describing an object. Metadata records are often expressed in machine-readable formats for easy integration within systems.

▪ There are three basic categories of metadata elements: descriptive, technical/structural, and administrative. All objects also have a unique identifier metadata element.

Page 12: Data Life Cycle

What is Metadata?11

▪ 1. Descriptive metadata elements consist of information about the content and context of an object. For example, descriptive metadata for an image may include: title, creator, subject (tags), and description.

▪ 2. Technical/structural metadata elements describe the format, process, and inter-relatedness of objects. EX: An image may include: camera, aperture, exposure, file format, and set, if in a series.

▪ 3. Administrative metadata elements describe information needed to manage or use the object. Elements may include: creation date, copyright permissions, required software, provenance and file integrity checks.

Page 13: Data Life Cycle

What is Metadata?12

▪ Data centers and repositories may require specific metadata standards in order to deposit data. Check with any repositories before you begin outlining the metadata plan for your data. A good starting place for a metadata plan if a standard has not been defined for your discipline is Dublin Core or Data-Cite recommendations.

Page 14: Data Life Cycle

Metadata Best Practices13

▪ Good data documentation includes information on:The context of data collection: project history, aims,objectives and hypotheses.Dataset structure of data files, cases, relationships

between filesData validation, proofing, cleaning and quality

assuranceModifications made to data over time since their

original creation and identification of different versions of datasets

Information on data confidentiality, access and use conditions, where applicable

Page 15: Data Life Cycle

Metadata Best Practices14

▪ At data-level, datasets should also be documented with:Names, labels and descriptions for variables,

records and valuesExplanation of codes and classification schemes

usedCodes of, and reasons for, missing valuesDerived data created after collection, with code,

algorithm or command file used to create themWeighting and grossing variables createdData listing with descriptions for cases, individuals

or items

Page 16: Data Life Cycle

References

▪ 01. https://library.uoregon.edu/datamanagement/fileformats.html

▪ 02. https://library.uoregon.edu/datamanagement/fileformats.html

▪ 03. https://library.uoregon.edu/datamanagement/filenaming.html

▪ 04. http://www.exadox.com/files/pdf/en/external/fnc12.pdf (Pg 6)

▪ 05. https://library.uoregon.edu/datamanagement/filenaming.html

▪ 06. https://library.uoregon.edu/datamanagement/filenaming.html

▪ 07. http://www.exadox.com/files/pdf/en/external/fnc12.pdf (Pg 4)

▪ 08. https://library.uoregon.edu/datamanagement/metadata.html