data life cycle
TRANSCRIPT
Data Life CycleData Organization, Documentation and Metadata
Oklahoma State University Libraries - Data Management Workshop 11/29/2016
File Format Standards?1
▪ File formats and file naming according to standards are necessary to ensure that your data can be uniquely identified and made accessible for future uses. When selecting tools for storing your data, pay special attention to the output formats of your data.
▪ For preservation purposes, when possible use data formats that are:
Open standard, and in an easily re-usable format that is machine readable and non-proprietary in nature
(Example: .txt as opposed to .pdf)
File Format Standards?2
▪ When listing out the data format you will be using, make sure to include:
Software necessary to view the data Information about version controlIf data will be stored in one format during collection
and analysis and then transferred to another format for preservation:
List out features that may be lost in data conversion such as
system specific labels.
File Naming3
▪ 1. Be consistent. Have conventions for naming:(1) Directory structure(2) Folder names(3) File names
▪ Always include the same information (eg. date and time)
File Naming4
▪ Retain the order of information(eg. YYYYMMDD, not MMDDYYY )
File Naming5
▪ 2. Be descriptive so others can understand your meaning.
Try to keep file and folder names under 32 characters
Within reason, Include relevant information such as: Unique identifier(Project name or grant number in folder
name)Project or research data nameDate (In file properties as well)
File Naming6
▪ Use application-specific codes in 3-letter file extension and lowercase: mov, tif, pdf
▪ When using sequential numbering, make sure to use leading zeros to allow for multi-digit versions. For example, a sequence of 1-10 should be numbered 01-10; a sequence of 1-100 should be numbered001-010-100.
▪ No special characters: & , * % # ; * ( ) ! @$ ^ ~ ' { } [ ] ? < > -
File Naming7
▪ Use only one period and before the file extension(e.g. NamePaper.doc NOT Name.Paper.doc OR Name_Paper..doc)
Tracking Changes with Version Control8
▪ Keep track of versions of files (version control): ▪ Automatically: Utilize a system that will automatically
create multiple versions of your work, such as GIT.▪ Don't use confusing labels, such as 'revision', 'final',
'final2', etc.
Tracking Changes with Version Control9
▪ Manually: Use a sequential numbered system: v01, v02
What is Metadata?10
▪ Metadata is a term that has primarily been used by library and archives communities to describe standards used to aid the discovery of objects. A metadata record consists of all the metadata elements describing an object. Metadata records are often expressed in machine-readable formats for easy integration within systems.
▪ There are three basic categories of metadata elements: descriptive, technical/structural, and administrative. All objects also have a unique identifier metadata element.
What is Metadata?11
▪ 1. Descriptive metadata elements consist of information about the content and context of an object. For example, descriptive metadata for an image may include: title, creator, subject (tags), and description.
▪ 2. Technical/structural metadata elements describe the format, process, and inter-relatedness of objects. EX: An image may include: camera, aperture, exposure, file format, and set, if in a series.
▪ 3. Administrative metadata elements describe information needed to manage or use the object. Elements may include: creation date, copyright permissions, required software, provenance and file integrity checks.
What is Metadata?12
▪ Data centers and repositories may require specific metadata standards in order to deposit data. Check with any repositories before you begin outlining the metadata plan for your data. A good starting place for a metadata plan if a standard has not been defined for your discipline is Dublin Core or Data-Cite recommendations.
Metadata Best Practices13
▪ Good data documentation includes information on:The context of data collection: project history, aims,objectives and hypotheses.Dataset structure of data files, cases, relationships
between filesData validation, proofing, cleaning and quality
assuranceModifications made to data over time since their
original creation and identification of different versions of datasets
Information on data confidentiality, access and use conditions, where applicable
Metadata Best Practices14
▪ At data-level, datasets should also be documented with:Names, labels and descriptions for variables,
records and valuesExplanation of codes and classification schemes
usedCodes of, and reasons for, missing valuesDerived data created after collection, with code,
algorithm or command file used to create themWeighting and grossing variables createdData listing with descriptions for cases, individuals
or items
References
▪ 01. https://library.uoregon.edu/datamanagement/fileformats.html
▪ 02. https://library.uoregon.edu/datamanagement/fileformats.html
▪ 03. https://library.uoregon.edu/datamanagement/filenaming.html
▪ 04. http://www.exadox.com/files/pdf/en/external/fnc12.pdf (Pg 6)
▪ 05. https://library.uoregon.edu/datamanagement/filenaming.html
▪ 06. https://library.uoregon.edu/datamanagement/filenaming.html
▪ 07. http://www.exadox.com/files/pdf/en/external/fnc12.pdf (Pg 4)
▪ 08. https://library.uoregon.edu/datamanagement/metadata.html
References
▪ 09. http://www.exadox.com/files/pdf/en/external/fnc12.pdf (Pg 5)
▪ 10. https://library.uoregon.edu/datamanagement/metadata.html
▪ 11. https://library.uoregon.edu/datamanagement/metadata.html
▪ 12. https://library.uoregon.edu/datamanagement/metadata.html
▪ 13. https://library.uoregon.edu/datamanagement/metadata.html
▪ 14. https://library.uoregon.edu/datamanagement/metadata.html