multivariate methods

Multivariate Methods

Nels Johnson and Matt WilliamsLaboratory for Interdisciplinary

Statistical Analysis

Outline

• Principal Component Analysis• Factor Analysis• Multivariate T Tests• MANOVA• Multidimensional Scaling• Correspondence Analysis

PCA – Motivating Examples

• You have measured a number of variables concerning the size of aphids. You’d like to reduce the number of variables used for classification.

• You have a bunch of football statistics for teams and would like to organize related teams based on these statistics.

What is it?• Based on an eigenvalue decomposition of the covariance

matrix S (or correlation matrix R) of the variables.• Goal: Maximizes the variance of linear combinations of the

variables.• Obtained by transforming the variables so that the covariance

of the new variables is diagonal.• These new variables are called the principal components (PCs)

and their covariance matrix contains the eigenvalues along the diagonal.

• This transformation can be thought of as a rotation of the axes.• Note: No variables are designated as dependent.

What do we get out of it?

• We can form an index measure (i.e. a score) or a weighted average of variables based on a subset of the PCs.

• This reduces the number of variables we have to work with.

• With some subject matter area knowledge we might be able to interpret the meaning of some of the PCs based on correlations.

How to reduce the number of PCs?

• Pick a proportion of variation you want to explain ahead of time, pick the number of PCs so that the sum of their eigenvalues (the proportion of variation explained by those PCs) is at least that amount.

• Scree Plots• All PCs with eigenvalue >1 (Kaiser’s Rule)• Broken stick method

What are some issues?

• The scale variables are measured on matters.– Standardize variables so they are all on same scale.

• Variables with a high amount of variability (i.e. large variance) will naturally steer the decomposition.– Again, standardize the variables.

• When separation occurs perpendicular to an axis (i.e. PC) it might not be picked up without looking at other axes.– Plot the pairwise scores for each PC. This may require

looking at too many graphs to be feasible.

Scree Plot

Biplot of Scores

Factor Analysis – Some Motivating Examples

• You have the ratings people give to their family members in areas such as Kindness, Intelligent, Happy, etc. Want to associate family members with some sort of overall construct of these words.

• You have conducted a survey and want to group question based on a topic they address.

What is it?• We assume the variables Y can be summarized by some underlying,

unobserved, and reduced set of variables called factors (you must pick how many factors).

• Goal is to estimate the factors.• After the factors are estimated, the next goal is to orthogonally

rotate the solution to get simpler factors.• For Principal Factor Solution (more later):• Model : Y-μ = loadings*factors + error• var(Y-μ) or corr(Y-μ) = V = loadings*loadingsT + Ψ• The diagonals of H = V – Ψ are called the communalities. They are

R2-like numbers.• Ψ is called the specific variance.

How to Estimate the Factors?• Two main ways:

– Principal Component Solution (Not PCA!)• Focuses on the diagonal of V (the variance).• Does poorly on the off diagonal (the covariance).

– Principal Factor Solution• Focuses on the off diagonals of V and pretty much ignores the diagonal.

– Maximum Likelihood Method• Assume normality of error and estimate the factors and loadings using

an iterative MLE method.• May give nonsensical answers (i.e. Haywood case).• Can adjust iterative method so this doesn’t happen.• Rotations are unique.

More On Rotations

• If the rotation is orthogonal then– loadings*loadingsT =

loadings*rotation*rotationT*loadingsT = (loadings*rotation)*(loadings*rotation)T

• So we can redistribute the total variance and variation explain by each variable differently among the factors without actually changing them.

• Lots of methods to pick rotations.

Interpreting Analysis

• Loadings represent the covariance (or correlation) between factors and variables.

• So we look for high loadings to represent how underlying factors influence variables.

• With some subject matter knowledge we can name factors based off of these loadings (when they make sense).

Some Issues

• Results can change depending on model choices (This is a big deal)!– Number of factors– Estimation method– Rotation method

• Haywood cases when using MLE.• Existence of actual factors is suspect.

ExampleBefore Rotation After Rotationf1 f2 f3 f1 f2 f3

y1 x x xy2 x x xy3 x x x xy4 x x xy5 x x xy6 x x x

Multivariate T Tests

• Univariate t-test– Normal data, with unknown mean and variance

• Hotelling’s T2 Test– Multivariate Normal data with unknown mean and

Covariance

t (X 0) / S2 /n

t2 F1,

T 2 (X 0)T (S /n) 1(X 0)

p1p Tp,

2 Fp, p1

One Sample Test

• Assumptions– Observations are independent and multivariate

normal• Testing– Null Hypothesis: μ = μ0 (vectors)

– Alternative: μ ≠ μ0 (vectors)

T 2 (X 0)T (S /n) 1(X 0)

n p(n 1)p Tp,n 1

2 Fp,n p

Example: One Sample Test

• We are interested in 3 different types of calcium in the soil

• We wish to test if our observed means are the true means (15,6,2.85)

Y (28.1,7.18,3.09)

T 2 24.559

T 2.05,3,9 16.766

S 140.54 49.68 1.9449.68 72.25 3.681.94 3.68 0.25

Two Sample Test

• Assumptions– Two groups of multivariate normal data• Observations are independent• Means may be different but covariance is the same for

both groups

• Testing– Null Hypothesis: μ1 = μ2 (vectors)– Alternative: μ1 ≠ μ2 (vectors)

T 2 (X 1 X 2)T (Sp (

1n2)) 1(X 1 X 2)

n1 n2 p 1(n1 n2 2)p

Tp,n1 n2 22 Fp,n1 n2 p 1

Example: Two Sample Test

• Four psychological tests were given to 32 men and 32 women

• We are interested in seeing if the mean vectors are the same

Y 2 (12.34,13.91,16.66,21.94)

T 2 97.602

T 2.01,4,62 15.373

7.164 6.047 5.693 4.7016.047 15.89 8.492 5.8565.693 8.492 29.36 13.984.701 5.856 13.98 22.32

Y 1 (15.97,15.91,27.19,22.75)

Other Tests

• Two sample paired test– Use difference vector D = X1 – X2

• Partial Tests– Testing μi = μi0 in the presence of the other (p-1)

means• What about more than 2 groups?– We had ANOVA instead of a t-test– Now we have MANOVA instead of a T2

Multivariate Analysis of Variance MANOVA

• Suppose we have data organized into several groups, with each observation giving a vector of responses

• We would like to test the hypothesis that all the means for each of the groups are equal

• We can do this in a manner very similar to the univariate Analysis of Variance (ANOVA)

MANOVA

• In ANOVA– We compare Sums of Squares within groups to Sums of

Squares between groups– Sums of Squares are the sums of the squared

differences between the observed values and the means• In MANOVA– We compare Sums of Squares matrices from within the

groups to those between the groups– E is the “within” Sums of Squares matrix– H is the “between” Sums of Squares matrix

Four Tests

• There are four tests based on the eigenvalues of E-

1H: λ1 > λ2 > … > λs with s ≤ pd• Pillai:

• Lawley-Hotelling:

• Wilk’s Lambda:– (reject for small values)

• Roy’s Largest Root:

U (s) ii1

V (s) i

Comparison of the Four Tests

• In the collinear case– The groups have means that lie on a line in space

(approximately)– θ ≥ U(s) ≥ Λ ≥ V(s) in terms of power

• In the diffuse case– The groups means are spread out in a higher

dimensional space (not a line)– θ ≤ U(s) ≤ Λ ≤ V(s) in terms of power

Post-Test Analysis

• Just like with ANOVA, after the test we can– Do pair-wise comparisons or contrasts

• In MANOVA we can also– Do tests for the p individual variables– F tests to identify which variables are different

Example: Rootstock Data• We wish to compare apple

trees of different rootstocks• We have 8 trees from each

of 6 rootstocks• Our four measurements are

– Trunk girth at 4 years (y1)– Extension growth at 4 years

(y2)– Trunk girth at 15 years (y3)– Extension growth at 15 years

Rootstock Data

• Test Results– Λ = .154 < Λ.05,4,5,40 = .455

– V(s) = 1.305 > V(s)

.05 = .645

– U(s) = 2.921 > U(s).05

– θ = .652 > θ.05 = .377

• Follow-up tests for individual variables– Y1 : F = 1.93, p = .1094– Y2 : F = 2.91, p = .024– Y3 : F = 11.97, p < .0001– Y4 : F = 12.16, p < .0001

Extensions

• Two-way MANOVA• Multivariate Contrasts• Mixed Models• Split plot designs• Profile Analysis• Different R2-like numbers

Multidimensional Scaling (MDS)

• Data is a distance or similarity matrix– Many ways to generate

• Goal is to reduce dimension and visualize– Often look at only 2 or 3 dimensions

• Motivating Examples– Number of teeth for different species of mammals– Discriminating between colors (red vs. orange)– Distances between cities

Two Kinds of MDS

• Metric scaling (principal coordinates analysis)– Distances (Euclidean) in the reduced dimension

are close to those measured in the full dimension• Non-metric scaling– Rank order of distances in the reduced dimension

are close to those measured in the full dimension

Types of Measures

• There are MANY measures that can be used– Depends on type of data– Depends on interest in observations vs. variables

• Properties1. Minimum of 0, D(x,y) = 0 if x = y2. Positive otherwise, D(x,y) > 03. Symmetric, D(x,y) = D(y,x)4. Triangle Inequality, D(x,y) + D(y,z) > D(x,z)

Types of Measures

• Measures that satisfy 1-4 are called Metrics• Measures satisfying 1-3 are Semi-metrics• Some measures have negative values and are

called Non-metrics• Certain measures can be plotted or visualized in

a Euclidean space– Distances and relationships plotted are meaningful– This is a stronger property than the triangle

inequality

Measures for our Examples

• Mammal teeth - counts of teeth types– Manhattan (city block) distance– Total teeth different between two species

• Difference between colors (Ekman)– Similarity measure – converted to distance– How well people distinguish between colors– We use the Kruskal measure (non-metric)

• Distances between cities– Euclidean distance– Miles between cities

Basic Procedure for MDS

• Metric Scaling– Eigenvalue/eigenvector decomposition– Choose a reduced number of components that still

preserves distances– Create new coordinates based on reduced components

• Non-metric scaling– Reduce dimensions but preserve rank order– Done using Isotonic regression and iterative algorithms

Examples: Teeth Data

• 32 mammals and 8 categories of teeth• We are interested in how “close” these

mammals are based on their teeth counts• We use city block distance and look at want to

reduce things to 2 dimensions (from 8)

Teeth Data

Example: Ekman Color Study

• 14 different wavelengths• 31 subjects asked to rate how well they could

distinguish between different pairs• Ratings were averaged and scaled to get a

similarity index between 0 and 1• We use non-metric scaling and look at a

reduction to 2 dimensions (from 14)

Color Study

Example: Distances between cities

• We have 10 U.S. cities and distances between all pairs

• Can we reduce this distance matrix to a lower dimension like 2 (from 10).

City Distances

Comments on MDS

• There are MANY measures we can use– Some make more sense than others– It depends on the data and what you are

interested in– Different measures can lead to different results

• How many dimensions should you use?– It’s easiest to explain 2-3 three dimensions– There are different criteria or guidelines for metric

and non-metric scaling

One More Example

• Supposed we have data that can be organized into a two-way table or binary or count values.

• For a small table we can do some contingency table analyses like tests for homogeneity or independence.

• For large tables we might like to reduce or summarize the table

• One method is called Correspondence Analysis

Correspondence Analysis

• Our distance measure is the Pearson chi-square measure between the observed cell value and its expected value.

• As before, we need to decide if we are interested in our subjects or our variables

• Similar or analogous to PCA and MDS in terms of dimension reduction and interpretation.

• Unfortunately, the terminology is a little different. So be careful.

Example: Postal Employees

• Postal employees for 6 positions were drug tested• Results include negative, marijuana, cocaine, and

other• We are interested in identifying any patterns or

trends

Postal Employees

Sources

• We compiled the information from this talk from Methods of Multivariate Analysis 2nd ed. by Alvin C. Rencher and from our notes from STAT 5504 compiled by Dr. Eric Smith, Dept. of Statistics.

• Thanks! Any questions?

multivariate methods

number of variables

new variables

variables y

scale variables

reduced set of variables

weighted average of

number of pcs

principal components

Documents

multivariate methods part1

multivariate methods in hep

multivariate methods

5. multivariate methods - university of...

bayesian methods for dynamic multivariate models

multivariate phylogenetic comparative methods: evaluations...

bayesian multivariate time series methods for empirical...

multivariate statistical methods for discrimination of...

modeling and multivariate methods

chapter 17 overview of multivariate analysis methods

statistical methods for hep lecture 2: multivariate methods

multivariate statistical methods and data mining in

annotated bibliography of multivariate statistical methods

statistical methods in particle physics day 3: ...

introduction to machine learning multivariate methods

abdulkader aljandali multivariate methods and forecasting

[abeyasekera] multivariate methods for index construction

graphical methods for exploratory multivariate...

multivariate controller performance analysis: methods,...

multivariate methods - cornell university