pca examples & applications · 2018. 2. 9. · •least-squares regression analysis (anova),...

PCA – Examples & Applications

➢ Objectives:

Showcase PCA analysis – in PC-ORD and the literature

Principal Components (PCA) – PC-ORD

➢ Results: Randomization tests


➢ Results: Species Loadings onto the PC Axes

➢ Use the scaled eigenvectors


➢ Results: Correlation with Axes

Principal Components (PCA) – Example

➢ Results: Graphs

Samples: points Species: vectors


➢ Results: Graphs

Samples: points

Species: vectors

PC ORD recommends

displaying

species as vectors /

samples as points

➢ Rotation by NEDO

Stretch plot along

direction of most

variation for species

NEDO Axes

Correlations

Axis 1: +0.51

Axis 2: -0.74

➢Rotation: Highlights certain patterns. Report in results


➢ Percent of pattern

explained in original

distance matrix

➢ Orthogonality of PCA axes


Principal Components (PCA) – Reporting➢ What type of cross-correlation matrix you used?

➢ If used with community data, justify using this linear

model for species data?

➢ How many axes were interpreted, and what proportion

of variance was explained by these axes?

➢ Principal eigenvectors - Test of significance?

➢ Rotation of the solution? Use of interpretation aids?

Use covariance matrix. Use euclidean distance

Were assumptions of linearity / normality met?

Describe the axes – and the individual / cumulative variance

Not necessary, but an option using randomization tests

Explain overlays and correlations of variables with axes

PCA Example – Upwell

Where do we start ? Data Exploration + Summarization

What do we look for ?

Mean, S.D., Skewness, Kurtosis (Transformations ?)

Value Ranges, Outliers (Typos ?)


Principal Component - Results


1st Stopping Rule (Eigenvalue > Broken-Stick Eigenvalue)

0 Axes meet this criterion

How Many Axes Give us the Right Answer?


2nd Stopping Rule (Eigenvalue > Mean Randomization)


3rd Stopping Rule (p value)


➢ Performing a Randomization Test:

The randomization: shuffle values within variables (columns) and re-

compute correlation matrix and eigenvalues. Repeat many times.

The test: Compare the actual eigenvalues (from test of real data)

against eigenvalues from the randomizations.

Calculate p value as:

1

1

N

n = p

where

n = number of randomizations where test statistic ≥ observed value

N = the total number of randomizations.

Principal Components – Randomizations

Rnd-Lambda – Compare eigenvalue for axis to observed eigenvalue for that axis

• fairly conservative and generally effective criterion

• more effective with uncorrelated variables included in the data, than Avg-Rnd

• performs better than other measures with strongly non-normal data

Rnd-F – Compare pseudo-F-ratio for an axis to the observed pseudo-F for that axis

Pseudo-F-ratio: is the eigenvalue for an axis divided by the sum of the remaining

(smaller) eigenvalues (sum of squares error – the remaining unexplained variance)

• particularly effective against uncorrelated variables

• performs poorly with grossly non-normal error structures

Avg-Rnd – Compare observed eigenvalue for a given axis to the average

eigenvalue obtained for that axis after randomization

• good when the data did not contain uncorrelated variables

• less stringent, too liberal when the data contain uncorrelated variables

Principal Components – Randomizations

PCA Example – Upwell39

PCA Example – Upwell36

PCA Example – Time

PCA Example – MEI

PCA Example – PDO


T IM E

M EIPDO

upwel l 36upwel l 39

PCA_Upwell

Axis 1

Axis

2

T IM E

M EIPDO

upwel l 36upwel l 39

PCA_Upwell

Axis 1

Axis

2


T IM E

M EI

PDO

upwel l 36

upwel l 39

PCA_Upwell

Axis 1

Axis

3


NOTE: Use the

Euclidean Distance

Principal Components – Reporting➢ What type of cross-correlation matrix you used?

➢ If used with community data, justify using this linear

model for species data?

➢ How many axes were interpreted, and what proportion

of variance was explained by these axes?

➢ Principal eigenvectors - Test of significance?

➢ Rotation of the solution? Use of interpretation aids?

Use covariance matrix. Use euclidean distance

Were assumptions of linearity / normality met?

Describe the axes – and the individual / cumulative variance

Not necessary, but an option using randomization tests

Explain overlays and correlations of variables with axes

Principal Components (PCA) – Paper I

➢ Example: Weichler et al. (2004).

➢ Objective: Relate seabird densities to seven

environmental parameters:

(1) water depth, (2) distance to nearest land, (3) number

of trawlers within a radius of 5 km, (4) sea surface

temperature, (5) water temperature difference (0 – 10 m) ,

(6) water temperature difference (0 – 30 m), and (6) water

temperature difference (10 – 50 m)

➢ NOTE: Did Not Report Cross-correlations

of environmental parameters


➢ Data Manipulations To Avoid Biases:

• Species densities (birds / km 2) were selected as variables

and 10 min intervals (samples), were selected as cases

• Only species seen in at least five counting intervals were

included, an arbitrary choice that allowed covering a wide

spectrum of species while ignoring those with few occurrences

• Only commoner species with numbers exceeding 1% of all

individuals counted were included in the analysis

• Dataset of 46 sections of the cruise tracks. Each section

comprised a hydrographic station approximately midway and

10 min intervals in two opposite directions (4 – 8 km away)

• Sample Size: 46 samples / 7 variables: Ratio of 6.5


➢ Community-Wide Result: Six principal eigenvalues (> 1),

showing % of variation explained and ecological interpretation

➢ PC Axis Interpretations:


➢ Community-Wide Result: Loadings for the 11 seabird

species and 7 variables on the six principal eigenvalues

• 3 principal components: 50 % of variance

• 6 principal components: 78 % of variance

L

O

A

D

I

N

G

S


➢ Community-Wide Result: Axes explained using(strongest)

loadings of different species and environmental variables

➢ Note: Cannot determine which loadings are significant

(what can we use to quantify correlation w axes?)

Principal Components (PCA) – Paper II

➢Example: Ainley, D.G. et al. (2005).

➢ Objective: Relate densities of the 12 most abundant

species of seabirds to 12 habitat variables:

5 biological, 4 oceanographic, 3 geographic (spatial)

82.3%


➢ Oceanographic variables examined:

sea-surface temperature / salinity, thermocline depth / strength

Date Distance to Fronts Chl

MaxAcoustic

Biomass


➢ Data Manipulations To Avoid Biases:

• Densities log-transformed to meet normality assumptions

• Nevertheless, residuals generated in the regressions for

some species did not meet those assumptions (Skewness /

Kurtosis Test for Normality of Residuals, p < 0.05)

• Least-squares regression analysis (ANOVA), however,

is a very robust procedure with respect to non-normality

(Seber, 1977, Kleinbaum et al., 1988)

• Yet, while these analyses yield the best linear unbiased

estimator in the absence of normally distributed residuals, p-

values near 0.05 must be viewed with caution (Seber, 1977)


➢ To avoid double-absences:

• Only 15-min transects in which any given species was

recorded were analyzed

• The total sample size for the 12 species was 1209

➢ Is this an adequate sample size ?

Rule of thumb:

• 5 samples per variable (Tabachnick and Fidell 1989)

• 1209 / 12 ~ 100 samples per variable


➢ Analysis Methods:

• Principal components analysis (PCA), in combination

with Sidak multiple comparison tests, used to assess

differences in habitat selection among 12 seabird species

• To test for significant differences in habitat affinities

among seabird species, used two one-way ANOVAs:

In the first, tested for differences among PC1 scores of

each species; in the second, compared the PC2 scores

• Differences between two species significant if either

one or both PC scores differed significantly


➢ Community-Wide Result: First and second PC axes

explained 60% of variance in distribution of 12 species


➢ Species-specific Results:

• Species mapped onto two

(independent) dimensions

• Pair-wise associations

(tested) denoted by circles

Near

Fronts

Zoop

Prey

Salty, Green

Fish Prey

Principal Components (PCA) – Comparisons

➢ Number of Axes:

- Selected 2 – easy to interpret (Ainley et al. 2005)

- Selected 6 – based on eigenvalues > 1 (Weichler et al. 2004)

➢ Display of Results:

- Plot & table of eigenvalues (Ainley et al. 2005)

- Eigenvalues & interpretation (description) (Weichler et al. 2004)

➢ Significance Tests:

- Pairwise species comparisons (ANOVA) (Ainley et al. 2005)

- Correlations with selected variables (Weichler et al. 2004)

Principal Components (PCA) – ReferencesAinley DG, Spear LB, Tynan CT, Barth JA, Pierce SD, Ford RG, Cowles TJ

(2005). Physical and biological variables affecting seabird distributions

during the upwelling season of the northern California Current. Deep-Sea

Research II 52: 123–143

Weichler T, Garthe S, Luna-Jorquera G, Moraga J (2004). Seabird

distribution on the Humboldt Current in northern Chile in relation to

hydrography, productivity, and fisheries. ICES J. Marine Science

61 (1):148-154

Disclaimer ReferencesSeber, G.A.F. (Ed.), 1977, Linear Regression Analysis. Wiley, New York.

Kleinbaum, D.G., Kupper, L.L., Muller, K.E., 1988. Applied Regression

Analysis and other Multivariable Methods. PWS-KENT Publishing Company,

Boston.

Tabachnik, B.G. and L.S. Fidell. 1989. Using Multivariate Statistics. 2nd ed.

New York: Harper and Row.

pca examples & applications · 2018. 2. 9. · •least-squares regression analysis (anova),...

Documents