understanding principle component approach of detecting population structure jianzhong ma pi:...

44
Understanding Principle Component Approach of Detecting Population Structure Jianzhong Ma PI: Chris Amos

Upload: brooke-lambert

Post on 20-Jan-2018

230 views

Category:

Documents


0 download

DESCRIPTION

Introduction Genomic control approach (GC) Transmission/disequilibrium test (TDT) Structured association (MCMC) Principle component approaches –Traditional: marker-oriented –Eigenstrat: sample-oriented Eigenstart Theory: implemented in EIGENSTART and HelixTree

TRANSCRIPT

Page 1: Understanding Principle Component Approach of Detecting Population Structure Jianzhong Ma PI: Chris…

Understanding Principle Component Approach of

DetectingPopulation Structure

Jianzhong MaPI: Chris Amos

Page 2: Understanding Principle Component Approach of Detecting Population Structure Jianzhong Ma PI: Chris…

Introduction

• Analysis of association between markers and disease causing loci because of strong linkage (i.e. linkage disequilibrium) is more efficient than linkage analysis

• When samples arise from different ethnic groups, or an admixed population, spurious association occurs, resulting in false positives

Page 3: Understanding Principle Component Approach of Detecting Population Structure Jianzhong Ma PI: Chris…

Introduction• Genomic control approach (GC)• Transmission/disequilibrium test (TDT)• Structured association (MCMC)• Principle component approaches

– Traditional: marker-oriented– Eigenstrat: sample-oriented

• Eigenstart Theory: implemented in EIGENSTART and HelixTree

Page 4: Understanding Principle Component Approach of Detecting Population Structure Jianzhong Ma PI: Chris…

Eigenstrat References

• 1. Price, Alkes L., Patterson, Nick J. Plenge, Robert M. Weinblatt, Michael E. Shadick, Nancy A. Reich, David. (2006). ユ Principal Components Analysis Corrects for Stratification in Genome-Wide Associations Studies ユ . Nature Genetics 38, 904-909.2.

• Patterson N, Price AL, Reich D (2006) Population Structure and Eigenanalysis PLoS Genet 2(12): e190. doi:10.1371/journal.pgen.0020190.

Page 5: Understanding Principle Component Approach of Detecting Population Structure Jianzhong Ma PI: Chris…

Eigenstart Theory: Model• Data for association

test:

MK1 MK2 ……. MKN

Ind1 g11 g12 ……. g1N

…………………………………..

IndM gM1 gM2 ……. gMN

Page 6: Understanding Principle Component Approach of Detecting Population Structure Jianzhong Ma PI: Chris…

Eigenstrat Theory: Model

• Define random vector with M components for the M individuals

• Values of genotypes of the M Individuals at any marker are a special realization of this random vector

Page 7: Understanding Principle Component Approach of Detecting Population Structure Jianzhong Ma PI: Chris…

Eigenstrat Theory: Model

• The randomness is from both drawing genotypes and choosing allele frequency

• Under this model, genetically independent individuals will not be independent to each other

• Covariance between individuals from different subpopulations are smaller than that from the same subpopulations

Page 8: Understanding Principle Component Approach of Detecting Population Structure Jianzhong Ma PI: Chris…

Eigenstrat Theory: Model

• Only population properties of the PCA are considered (no sample properties considered), in order to gain some theoretical guidelines for interpreting PC-PC plots

Page 9: Understanding Principle Component Approach of Detecting Population Structure Jianzhong Ma PI: Chris…

Case 1: one-subpopulation

Covariance matrix

Page 10: Understanding Principle Component Approach of Detecting Population Structure Jianzhong Ma PI: Chris…

Case 1: one-subpopulation

Large eigenvalue

Eigenvector:

Small eigenvalue

Page 11: Understanding Principle Component Approach of Detecting Population Structure Jianzhong Ma PI: Chris…

Case 1: one-subpopulation

Large eigenvalue reflects co-variation of individuals

Small eigenvalues reflect variations between individuals

Neither is for population stratification!

Page 12: Understanding Principle Component Approach of Detecting Population Structure Jianzhong Ma PI: Chris…

Case 1: one-subpopulation

Zero-mean transform

Page 13: Understanding Principle Component Approach of Detecting Population Structure Jianzhong Ma PI: Chris…

Case 2: two-subpopulations

Random vector

Covariance matrix

Page 14: Understanding Principle Component Approach of Detecting Population Structure Jianzhong Ma PI: Chris…

Case 2: two-subpopulation• There are two large eigenvalues, with

corresponding eigenvectors having constant values for individuals in the same subpopulations. --- They are mixture of variances caused by stratification and intra-population co-variations

• Small eigenvalues are the same as in homogenous population

Page 15: Understanding Principle Component Approach of Detecting Population Structure Jianzhong Ma PI: Chris…

Case 2: Two-subpopulation

Zero-mean transform

Page 16: Understanding Principle Component Approach of Detecting Population Structure Jianzhong Ma PI: Chris…

Case 2: two-subpopulation

The two large eigenvalues and corresponding eigenvectors

Page 17: Understanding Principle Component Approach of Detecting Population Structure Jianzhong Ma PI: Chris…

Case 2: two-subpopulation case

2* Reflecting variation caused by

stratificationIf there are only two subpopulations, do NOT plot a PC vs PC figure; only the eigenvector of the largest eigenvalue shows the population structure.

Page 18: Understanding Principle Component Approach of Detecting Population Structure Jianzhong Ma PI: Chris…

Case 3: Three subpopulations

There are now three sub-populations

Page 19: Understanding Principle Component Approach of Detecting Population Structure Jianzhong Ma PI: Chris…

Case 3: Three subpopulations

Zero-mean transform

Page 20: Understanding Principle Component Approach of Detecting Population Structure Jianzhong Ma PI: Chris…
Page 21: Understanding Principle Component Approach of Detecting Population Structure Jianzhong Ma PI: Chris…

Case 3

Page 22: Understanding Principle Component Approach of Detecting Population Structure Jianzhong Ma PI: Chris…
Page 23: Understanding Principle Component Approach of Detecting Population Structure Jianzhong Ma PI: Chris…
Page 24: Understanding Principle Component Approach of Detecting Population Structure Jianzhong Ma PI: Chris…
Page 25: Understanding Principle Component Approach of Detecting Population Structure Jianzhong Ma PI: Chris…
Page 26: Understanding Principle Component Approach of Detecting Population Structure Jianzhong Ma PI: Chris…
Page 27: Understanding Principle Component Approach of Detecting Population Structure Jianzhong Ma PI: Chris…
Page 28: Understanding Principle Component Approach of Detecting Population Structure Jianzhong Ma PI: Chris…
Page 29: Understanding Principle Component Approach of Detecting Population Structure Jianzhong Ma PI: Chris…
Page 30: Understanding Principle Component Approach of Detecting Population Structure Jianzhong Ma PI: Chris…
Page 31: Understanding Principle Component Approach of Detecting Population Structure Jianzhong Ma PI: Chris…
Page 32: Understanding Principle Component Approach of Detecting Population Structure Jianzhong Ma PI: Chris…
Page 33: Understanding Principle Component Approach of Detecting Population Structure Jianzhong Ma PI: Chris…

General Case: K subpopulations

Page 34: Understanding Principle Component Approach of Detecting Population Structure Jianzhong Ma PI: Chris…
Page 35: Understanding Principle Component Approach of Detecting Population Structure Jianzhong Ma PI: Chris…
Page 36: Understanding Principle Component Approach of Detecting Population Structure Jianzhong Ma PI: Chris…
Page 37: Understanding Principle Component Approach of Detecting Population Structure Jianzhong Ma PI: Chris…
Page 38: Understanding Principle Component Approach of Detecting Population Structure Jianzhong Ma PI: Chris…
Page 39: Understanding Principle Component Approach of Detecting Population Structure Jianzhong Ma PI: Chris…
Page 40: Understanding Principle Component Approach of Detecting Population Structure Jianzhong Ma PI: Chris…
Page 41: Understanding Principle Component Approach of Detecting Population Structure Jianzhong Ma PI: Chris…
Page 42: Understanding Principle Component Approach of Detecting Population Structure Jianzhong Ma PI: Chris…
Page 43: Understanding Principle Component Approach of Detecting Population Structure Jianzhong Ma PI: Chris…

Summary• Only large eigenvalues reflect variations caused by

stratification• There are K-1 large eigenvalues if there are K

subpopulations• If there are merely two subpopulations, only the

eigenvector of the first largest eigenvalue tells the population structure; no two-dimensional PC-PC plot should be inspected

• In the case of multiple subpopulations, all K-1 vectors of the large eigenvalues should be carefully inspected in order to classify individuals into K subpopulations and infer the inter-population relationships

Page 44: Understanding Principle Component Approach of Detecting Population Structure Jianzhong Ma PI: Chris…

……

First off, if you choose as many components as there are markers, if thatユ s possible, you will wind up subtracting out ALL effects, thus getting nothing from your tests!

The best answer consists of first simply obtaining the components themselves and their corresponding eigenvectors. (Do this either while running uncorrected tests or from the separate PCA window.)

Then look at the pattern of the eigenvalues. If the first few are very large compared with the remaining eigenvalues, then use that many components in a second analysis in which you DO apply the PCA technique.

…….

Helix Manual: