michener workshop montpellier

Post on 22-Jan-2018

57 Views

Category:

Science

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

DataONEData Life Cycle:

Tools and Tips

The DataONE Data Life Cycle

2

Plan

Collect

Assure

Describe

Preserve

Discover

Integrate

Analyze

Field Research

3

Plan

Collect

Assure

Describe

Preserve

Discover

Integrate

Analyze

Monitoring Project

4

Publish

Plan

Collect

Assure

Describe

Preserve

Discover

Integrate

Analyze

Synthesis Project

5

Plan

Collect

Assure

Describe

Preserve

Discover

Integrate

Analyze

Publish

Develop Solutions for Research

6

Plan

Collect

Assure

Describe

Preserve

Discover

Integrate

Analyze

The DataONE Data Life Cycle

7

Plan

Collect

Assure

Describe

Preserve

Discover

Integrate

Analyze

1. Plan:Create and Follow a Data Management Plan

8

Michener WK (2015) Ten Simple Rules

for Creating a Good Data Management Plan.

PLoS Comput Biol 11(10): e1004525.

doi:10.1371/journal.pcbi.1004525

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

The DataONE Data Life Cycle

26

Plan

Collect

Assure

Describe

Preserve

Discover

Integrate

Analyze

2. Collect and Organize:Logically Structure the Data to Support Use

27

CC

im

ag

e b

y J

ustin

Se

e o

n F

lickr

Jones et al. 2007

2. Collect and Organize

28

• Columns of data are consistent:

only numbers, dates, or text

• Consistent Names, Codes, Formats (date) used in each column

• Data are all in one table, which is much easier for a statistical program to work with than multiple small tables which each require human intervention

2. Collect and Organize

29

• Columns of data are consistent:

only numbers, dates, or text

• Consistent Names, Codes, Formats (date) used in each column

• Data are all in one table, which is much easier for a statistical program to work with than multiple small tables which each require human intervention

Googledocs Forms

Googledocs Forms

Data Entry Tools: Excel

Data Entry Tools: Excel

Excel: Data Validation

20

Excel: Data Validation

20

Excel: Data Validation

20

The DataONE Data Life Cycle

37

Plan

Collect

Assure

Describe

Preserve

Discover

Integrate

Analyze

3. Assure:Incorporate Quality Assurance & Quality

Control

38

0

10

20

30

40

50

60

0 10 20 30 40

Quality Engine

MetaDIG DIBBs

3. Assure

39

3. Assure

40

3. Assure

41

3. Assure

42

3. Assure

43

3. Assure

44

3. Assure

45

3. Assure

46

3. Assure

47

3. Assure

• JMP

• R

• MATLAB

• many others

48

The DataONE Data Life Cycle

49

Plan

Collect

Assure

Describe

Preserve

Discover

Integrate

Analyze

4. Describe:Develop Comprehensive, Standardized

Metadata

50

Darwin Core – species and biodiversity

collections

EML – Ecological Metadata Language

ISO 19115 – geospatial data

http://rs.tdwg.org/dwc/

4. Describe

51

Tools Specify

Morpho

https://knb.ecoinformatics.org/#tools/morpho

http://specifyx.specifysoftware.org

The DataONE Data Life Cycle

52

Plan

Collect

Assure

Describe

Preserve

Discover

Integrate

Analyze

5. Preserve:Protect and Preserve Data for Long-term

Use

53

Catalog of 1,500+ Data Repositories

Exercise• Search for repositories that host particular

types of data (e.g., biodoversity, trait)

• Visit one of the repositories and identify the

services that they offer

54

The DataONE Data Life Cycle

55

Plan

Collect

Assure

Describe

Preserve

Discover

Integrate

Analyze

6. Discover Search a Domain Portal

56

57

58

59

60

Dryad links to journals

61

Provides citation instructions

6. Discover Search a Data Aggregator

62

63

64

65

Data Federations (DataONE,

GBIF)

66

Data Federations (DataONE,

GBIF)carbon cycling

67

Data Federations (DataONE,

GBIF)carbon cycling

68

Data Federations (DataONE,

GBIF)carbon cycling plant biomass

69

Data Federations (DataONE,

GBIF)carbon cycling plant biomass

70

Data Federations (DataONE,

GBIF)carbon cycling plant biomass

ocean nitrogen avian distribution

71

Exercise• Search datadryad.org for plant trait

• Search DataONE.org for plant trait

72

73

74

75

76

77

78

6. Discover:Support Discovery of Relevant Data

79

Dryad DataONE google

plant trait 2,137 26,300,000

plant trait datadryad 803 1,908 17,400

• Differential content searched

• Automated annotation via ontologies and other

approaches

• Differential filtering

• Different definitions of data sets (e.g., entire

package vs individual data sets)

The DataONE Data Life Cycle

80

Plan

Collect

Assure

Describe

Preserve

Discover

Integrate

Analyze

7. Integrate:Enable Data Integration from Different

Sources

81 Jones et al. 2007

7. Integrate:DataONE Provenance Tracking System

82

The DataONE Data Life Cycle

83

Plan

Collect

Assure

Describe

Preserve

Discover

Integrate

Analyze

8. Analyze:https://www.vistrails.org

84

85

8. Analyze:http://kepler-project.org

86

8. Analyze:http://kepler-project.org

87

8. Analyze:https://taverna.incubator.apache.org

8. Analyze:https://www.myexperiment.org/

88

Best PracticesWebinar series Lessons and

Exercises

DataONE.orgEducation Resources

89

90

DataONE Vision and Mission

91

92

dataone.org

top related