chapter 8 managing and curating data. the second step storing and curating data
TRANSCRIPT
![Page 1: CHAPTER 8 Managing and Curating Data. The Second Step Storing and Curating Data](https://reader035.vdocuments.mx/reader035/viewer/2022081506/56649de85503460f94ae20dd/html5/thumbnails/1.jpg)
CHAPTER 8Managing and Curating Data
![Page 2: CHAPTER 8 Managing and Curating Data. The Second Step Storing and Curating Data](https://reader035.vdocuments.mx/reader035/viewer/2022081506/56649de85503460f94ae20dd/html5/thumbnails/2.jpg)
The Second StepStoring and Curating Data
![Page 3: CHAPTER 8 Managing and Curating Data. The Second Step Storing and Curating Data](https://reader035.vdocuments.mx/reader035/viewer/2022081506/56649de85503460f94ae20dd/html5/thumbnails/3.jpg)
Storage: Temporary and Archival
Permanent archives The only medium acceptable as truly archival is acid-free paper
Electronic storage Do not expect electronic media to last more than 5-10 years Should be used primarily for working copies If used, copy datasets onto newer electronic media on a regular basis
![Page 4: CHAPTER 8 Managing and Curating Data. The Second Step Storing and Curating Data](https://reader035.vdocuments.mx/reader035/viewer/2022081506/56649de85503460f94ae20dd/html5/thumbnails/4.jpg)
Curating Data
Most ecological and environmental data are collected by researchers using funds obtained through grants and contracts
They are technically owned by the grantingagency, and they need to be made widelyavailable (e.g., Internet)
Unfortunately, when budgets are cut, data management and curation costs are often the first items to be dropped
![Page 5: CHAPTER 8 Managing and Curating Data. The Second Step Storing and Curating Data](https://reader035.vdocuments.mx/reader035/viewer/2022081506/56649de85503460f94ae20dd/html5/thumbnails/5.jpg)
The Final StepTransforming the Data
![Page 6: CHAPTER 8 Managing and Curating Data. The Second Step Storing and Curating Data](https://reader035.vdocuments.mx/reader035/viewer/2022081506/56649de85503460f94ae20dd/html5/thumbnails/6.jpg)
Transformation
A mathematical function that is applied to all of the observations of a given variable Y*=f(Y)
Most are fairly simple algebraic functions as long as they are continuous monotonic functions
DO NOT change the rank order of the dataDO change relative spacing
![Page 7: CHAPTER 8 Managing and Curating Data. The Second Step Storing and Curating Data](https://reader035.vdocuments.mx/reader035/viewer/2022081506/56649de85503460f94ae20dd/html5/thumbnails/7.jpg)
Why Transform Data?
(1) Patterns in the data may be easier to understand and communicate than patterns in the raw dataConverting curves into straight lines
(2) Necessary for analysis to be valid – “meeting the assumptions”
![Page 8: CHAPTER 8 Managing and Curating Data. The Second Step Storing and Curating Data](https://reader035.vdocuments.mx/reader035/viewer/2022081506/56649de85503460f94ae20dd/html5/thumbnails/8.jpg)
The Species-Area RelationshipA classic example
If we plot the number of species against the area of the island, the data often follow a simple power function, S=cAz where
S = number of speciesA = is island areac and z are constants fitted to the data
![Page 9: CHAPTER 8 Managing and Curating Data. The Second Step Storing and Curating Data](https://reader035.vdocuments.mx/reader035/viewer/2022081506/56649de85503460f94ae20dd/html5/thumbnails/9.jpg)
The Species-Area RelationshipA classic example
Island Area (km2) No. of species Log10 (Area) Log10 (Species)
Albermarle 5824.9 325 3.765 2.512
Charles 165.8 319 2.220 2.504
Chatham 505.1 306 2.703 2.486
James 525.8 224 2.721 2.350
Indefatigable 1007.5 193 3.003 2.286
Abingdon 51.8 119 1.714 2.076
Duncan 18.4 103 1.265 2.013
Narborough 634.6 80 2.803 1.903
Hood 46.6 79 1.668 1.898
Seymour 2.6 52 0.415 1.716
Barrington 19.4 48 1.288 1.681
Gardner 0.5 48 -0.301 1.681
Bindloe 116.6 47 2.067 1.672
Jervis 4.8 42 0.681 1.623
Tower 11.4 22 1.057 1.342
Wenman 47 14 1.672 1.146
Culpepper 2.3 7 0.362 0.845
![Page 10: CHAPTER 8 Managing and Curating Data. The Second Step Storing and Curating Data](https://reader035.vdocuments.mx/reader035/viewer/2022081506/56649de85503460f94ae20dd/html5/thumbnails/10.jpg)
The Species-Area Relationship
(km2)Island Area
0 1000 2000 3000 4000 5000 6000 7000
Num
ber
of S
peci
es
0
100
200
300
400
![Page 11: CHAPTER 8 Managing and Curating Data. The Second Step Storing and Curating Data](https://reader035.vdocuments.mx/reader035/viewer/2022081506/56649de85503460f94ae20dd/html5/thumbnails/11.jpg)
The Species-Area Relationship
If species richness and island area are related exponentially, we can transform this equation by taking logarithms of both sides
log (S) = log (cAz)
log (S) = log (c) + zlog (A)
![Page 12: CHAPTER 8 Managing and Curating Data. The Second Step Storing and Curating Data](https://reader035.vdocuments.mx/reader035/viewer/2022081506/56649de85503460f94ae20dd/html5/thumbnails/12.jpg)
The Species-Area Relationship
(Island Area)
-1 0 1 2 3 4
(Num
ber
of S
peci
es)
0.6
0.8
1.0
1.2
1.4
1.6
1.8
2.0
2.2
2.4
2.6
log 1
0
log10
![Page 13: CHAPTER 8 Managing and Curating Data. The Second Step Storing and Curating Data](https://reader035.vdocuments.mx/reader035/viewer/2022081506/56649de85503460f94ae20dd/html5/thumbnails/13.jpg)
Other Transformations
Cube-Root Transformation (Y3) measures of mass or volume that are allometrically related to linear measures of body size or length
Logarithmically transformed examines relationships between two measures of masses or volumes (Y3), and transforms both X and Y
![Page 14: CHAPTER 8 Managing and Curating Data. The Second Step Storing and Curating Data](https://reader035.vdocuments.mx/reader035/viewer/2022081506/56649de85503460f94ae20dd/html5/thumbnails/14.jpg)
Why Transform Data?Statistics Demands it
All statistical tests require data to fit certain mathematical assumptions
ExamplesAnalysis of Variance (1) homoscedastic
(2) residuals must be normal random variables
Regression (1) normally-distributed residuals that are uncorrelated with the independent variable
![Page 15: CHAPTER 8 Managing and Curating Data. The Second Step Storing and Curating Data](https://reader035.vdocuments.mx/reader035/viewer/2022081506/56649de85503460f94ae20dd/html5/thumbnails/15.jpg)
Five Common Transformations
(1)Logarithmic Transformation
(2)Square-root Transformation
(3)Angular (or arcsine) Transformation
(4)Reciprocal Transformation
(5)Box-Cox Transformation
![Page 16: CHAPTER 8 Managing and Curating Data. The Second Step Storing and Curating Data](https://reader035.vdocuments.mx/reader035/viewer/2022081506/56649de85503460f94ae20dd/html5/thumbnails/16.jpg)
Logarithmic Transformation
Replaces each observation with its logarithmY*=log (Y)
Often equalizes variances for data which mean and variance are positively correlated, which also tend to have outliers with positively-skewed residuals
Logarithm of 0 is not defined – add 1 to each observation
![Page 17: CHAPTER 8 Managing and Curating Data. The Second Step Storing and Curating Data](https://reader035.vdocuments.mx/reader035/viewer/2022081506/56649de85503460f94ae20dd/html5/thumbnails/17.jpg)
Square-root Transformation
Replaces each observation with its square rootY*=SQRT(Y)
Used most frequently for count data, which often follows a Poisson distribution
Yields a variance independent of mean
Does not transform data values equal to 0 – add some small number to observations
![Page 18: CHAPTER 8 Managing and Curating Data. The Second Step Storing and Curating Data](https://reader035.vdocuments.mx/reader035/viewer/2022081506/56649de85503460f94ae20dd/html5/thumbnails/18.jpg)
Arcsine TransformationAlso Arcsine-square root or angular
Replaces each observation with the arcsine of the square root of the value
Y*=arcsine(SQRT(Y))
Principally used for proportions
Removes the dependence of the variance on the mean
Gives transformed data in units of radians, not degrees
![Page 19: CHAPTER 8 Managing and Curating Data. The Second Step Storing and Curating Data](https://reader035.vdocuments.mx/reader035/viewer/2022081506/56649de85503460f94ae20dd/html5/thumbnails/19.jpg)
Reciprocal Transformation
Replaces each value with its reciprocalY*=1/Y
Commonly used for data that records rates, which often appear as hyperbolic
![Page 20: CHAPTER 8 Managing and Curating Data. The Second Step Storing and Curating Data](https://reader035.vdocuments.mx/reader035/viewer/2022081506/56649de85503460f94ae20dd/html5/thumbnails/20.jpg)
Box-Cox TransformationA family of transformations
Y*=(Ylambda-1)/lambda (for lambda 0)Y*=loge (Y) (for lambda=0)
L= -(v/2)loge(s2T)+(lambda-1)(v/n)sigma
(logeY)
V=degrees of freedomN=sample sizes2
T=variance of transformed values of Y
![Page 21: CHAPTER 8 Managing and Curating Data. The Second Step Storing and Curating Data](https://reader035.vdocuments.mx/reader035/viewer/2022081506/56649de85503460f94ae20dd/html5/thumbnails/21.jpg)
Box-Cox TransformationY*=(Ylambda-1)/lambda (for lambda not equal to 0)Y*=loge (Y) (for lambda=0)
L= -(v/2)loge(s2T)+(lambda-1)(v/n)sigma (logeY)
The value of lambda that results when the last equation is maximized is used in one of the first two equations to provide the closest fit of the transformed data to a normal distribution
The last equation must be solved iteratively (trying different lambda values until L is maximized) using computer software
![Page 22: CHAPTER 8 Managing and Curating Data. The Second Step Storing and Curating Data](https://reader035.vdocuments.mx/reader035/viewer/2022081506/56649de85503460f94ae20dd/html5/thumbnails/22.jpg)
Box-Cox TransformationY*=(Ylambda-1)/lambda (for lambda not equal to 0)Y*=loge (Y) (for lambda=0)
L= -(v/2)loge(s2T)+(lambda-1)(v/n)sigma (logeY)
When lambda=1, equation 1 results in a linear transformation When lambda=1/2, a square-root transformation When lambda=-1, a reciprocal transformation When lambda=0, equation 2 results in a natural logarithmic transformation ALWAYS try using simple arithmetic transformations FIRST
![Page 23: CHAPTER 8 Managing and Curating Data. The Second Step Storing and Curating Data](https://reader035.vdocuments.mx/reader035/viewer/2022081506/56649de85503460f94ae20dd/html5/thumbnails/23.jpg)
Box-Cox TransformationY*=(Ylambda-1)/lambda (for lambda not equal to 0)Y*=loge (Y) (for lambda=0)
L= -(v/2)loge(s2T)+(lambda-1)(v/n)sigma (logeY)
ALWAYS try using simple arithmetic transformations FIRST
If data is right-skewed, try using familiar transformations from the series1/SQRT(Y), SQRT(Y), ln (Y), 1/Y
If left-skewed, try Y2, Y3, etc
![Page 24: CHAPTER 8 Managing and Curating Data. The Second Step Storing and Curating Data](https://reader035.vdocuments.mx/reader035/viewer/2022081506/56649de85503460f94ae20dd/html5/thumbnails/24.jpg)
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.2 0.4 0.6 0.8 1
Original
Logarithmic
Square Root
Arcsine
Reciprocal
![Page 25: CHAPTER 8 Managing and Curating Data. The Second Step Storing and Curating Data](https://reader035.vdocuments.mx/reader035/viewer/2022081506/56649de85503460f94ae20dd/html5/thumbnails/25.jpg)
Reporting Results
You should report results in the original units, which includes back-transforming the transformed values
Back-transformed mean will be very different from arithmetic mean
Also, back-transformations will normally result in asymmetrical confidence intervals
![Page 26: CHAPTER 8 Managing and Curating Data. The Second Step Storing and Curating Data](https://reader035.vdocuments.mx/reader035/viewer/2022081506/56649de85503460f94ae20dd/html5/thumbnails/26.jpg)
Back-Transformations
Logarithmic – antilog(Y*) or eY
Square Root – Y*2
Arcsine – Sin(Y*2)
Reciprocal – 1/(Y*)
![Page 27: CHAPTER 8 Managing and Curating Data. The Second Step Storing and Curating Data](https://reader035.vdocuments.mx/reader035/viewer/2022081506/56649de85503460f94ae20dd/html5/thumbnails/27.jpg)
Lastly, transforming data should be added to your audit trail (documented in the metadata)
Create a new spreadsheet and store it onpermanent media
Reporting Results