genetic epidemiology of complex traits: issues and methods
DESCRIPTION
Genetic epidemiology of complex traits: issues and methods. M.W.Zuurman, Werkbespreking Medische Biologie 28 november 2005. Breedte strategie. The presentation. Background Issues Methods. What do we want anyway?. We want to cure disease!. We want to explain disease!. - PowerPoint PPT PresentationTRANSCRIPT
Genetic epidemiology of complex traits: issues and methods
M.W.Zuurman, Werkbespreking Medische Biologie28 november 2005
Breedtestrategie
The presentation
1.Background
2.Issues
3.Methods
We want to cure disease!
What do we want anyway?
We want to explain disease!
We want to counteract disease!
Let’s explain disease
Breedtestrategie: Let’s explain Cardiovascular and Renal disease
What is disease?
What is disease?
Disease is a condition in the organism that impairs normal function of the organism
Disease is a condition in the organism that impairs normal function of the organism
Disease:
•Conforming with or constituting a norm or standard or level or type or social norm•In accordance with scientific laws •Being approximately average or within certain limits •Convention: something regarded as a normative example•A statistical measure of usually observed structures, typical, or representative type
Normal:
Subjectivity of ‘normal’ vs ‘diseased’:
A disease is any abnormal condition of the body or mind that causes discomfort, dysfunction, or distress to the person affected or those in contact with the person.(Wikipedia)
Disease (Platonic)
Symptoms (Phenotypes!) Causes:Nurture/Nature
Intervention
-
++
+-
Medical/Research practice:
+
Nature versus Nurture
Genetic versus environmental influence
In reality, you can’t have one without the other:
Mulcaster (1581): “that treasure bestowed on them by nature, to be bettered in them by nurture”
Organisms are born with a set of genes in a certain environment
Disease
Genotype Environment
When seeking to explain disease:
Summary (1)
-Define disease clearly
-What is normal and why? It will determine the extend of ‘abnormal’
-Define phenotypes clearly Make them quantifiable with sufficient
Specificity :the probability to detect a negative result (e.g. ‘healthy’ or ‘control’)
and
Sensitivity the probability to detect a positive result(e.g. ‘diseased’ or ‘case’)
Genetic epidemiology of complex traits
What is the genetic basis of complex traits (=disease/disease phenotypes)?
Main issue
Research Question:
Genetic variation
Single Nucleotide Polymorphisms (SNP) (genotyping):
~AATGCCGA~ ~AATACCGA~~TTACGGCT~ ~TTATGGCT~
Divided in wild type and mutated allelesHas the genotype form of AA AB BBCan be functionalCan be a neighbor of a functional variation (haplotype)Can be none of those
Locus (e.g. QTL)
Gene (e.g. expression arrays)
Complex Traits
Mendelian traits: a single gene phenotype
- e.g. eye colour, curly hair etc. - also called dichotomous traits
- irrespective of environment in most cases
Continuously variable trait: polygenic and/or pleiotropic
polygenic : multiple genes affect a single traitpleiotropic : one gene affects multiple traits
Note: pure polygenic/pleiotropic (without environmentalinfluences) hardly exist
Complex Trait: polygenic- and pleiotropic gene-environment interaction
Examples: stature, atherosclerosis, blood pressure regulation, and many many more.
25
50
75
100
125
Co
un
t
A B C
D E F
G H I
25
50
75
100
125
Co
un
t
1,00 2,00 3,00
HDL
25
50
75
100
125
Co
un
t
1,00 2,00 3,00
HDL1,00 2,00 3,00
HDL
50
100
150
200
250
Co
un
t
1,00
2,00
3,00
1,00 2,00 3,00
HDL
50
100
150
200
250
Co
un
t
Context effect of genetic variance in complex traits
1,24 1,30 1,34
1,32 1,33 1,39
1,41 1,39 1,40
1,27
1,33
1,40
2 SNPs (4 alleles, 9 possible combinations)
2236 393
3126 587
1214 246
AA
AB
BB
SNP1N
No
N
Yes
Obesity
Power drainage
One SNP
1326 219
1340 259
407 87
785 150
1473 263
588 112
119 23
307 64
216 46
AA
AB
BB
SNP1CC
AA
AB
BB
CD
AA
AB
BB
DD
SNP2N
No
N
Yes
Obesity
Two SNPs
Three SNPsSNP2 CC CD DDSNP1 AA AB BB AA AB BB AA AB BBObesity N N N N N N N N N
SNP3 EE No 115 126 32 74 131 50 4 32 14Yes 19 20 13 9 28 6 1 4 4
SNP3 EF No 531 554 166 327 604 250 45 127 86Yes 77 116 36 63 105 51 7 25 20
SNP3 FF No 673 658 204 377 734 284 69 146 115Yes 120 123 38 77 127 55 15 35 22
Methods (1)
We need methods to:
-Preserve power-Reduce noise-Lift shadows of stronger determinants
FGClustor
Hypothesis driven Exploration via FGClustor
Conceptual thinking:
Given any outcome parameter measured in a population one is able to detect differences in frequency of a combination of geno- or phenotypes along the range of the parameter when compared to the prevalence of that combination in the whole population.
HD
L-c
0
Fre
qu
en
cy o
f co
mb
ina
tio
n 1
combinations
Fre
qu
en
cy o
f co
mb
ina
tio
n 2
Fre
qu
en
cy o
f co
mb
ina
tio
n 3
Fre
qu
en
cy o
f co
mb
ina
tio
n n
0
combinations
y f1a f2aa f3a f4a
b
c
d
g
f
h
e
f1b f2b f3b f4b
f1c f2c f3c f4c
f1d f2d f3d f4d
f1e f2e f3e f4e
f1f f2f f3f f4f
f1g f2g f3g f4g
f1h f2h f3h f4h
FGClustor principle
FGClustor
FGClustor
FGClustor
0% 20% 40% 60% 80% 100%
85
96,5
108
119,5
131
142,5
154
165,5
177
188,5
SB
P r
ange
_12
_01
_02
_13
_14
_03
_04
_11
FGClustor and strong confoundersPhenotype: Systolic blood pressureComplex Trait: Quartiles Cholesterol + Gender Chi-square Test
F2 F1
M1 M2
F3 F4
M3 M4
F2 0,000217
M1 0,000703
M2 0,000606
F3 0,019796
F4 0,001564
M3 0,000205
M4 0,000192
F1 6,67E-05
FGClustor
0% 20% 40% 60% 80% 100%
1
1,2
1,4
1,6
1,8
2
2,2
2,4
2,6
2,8
Ran
ge H
DL-
c _22
_20
_02
_00
_12
_10
_21
_11
_01
Phenotype : HDL-cholesterolComplex Trait: SNP1 + SNP2
AACC
ABDD
ABCC
Chi-square Test
cc p
AACC 0.0008
ABDD 0.0073
ABCC 0.0069
FGClustor and SNPsFGClustor
FGClustor
Summary
Pro:
FGClustor used in hypothesis driven approach can shed light on relationships of covariates of interest
FGClustor can visualize context-based main effects of parameters of interest
“Standard” statistical methods are needed in conjunction with FGClustor output to confirm context-based main effects
Con:
FGClustor is not statistically powerful
MDR
Multifactor Dimensionality Reduction
What is MDR?
-Nonparametric and genetic model-free -Alternative to logistic regression -Detecting nonlinear interactions among discrete genetic and environmental attributes.
The MDR method combines
attribute selection, attribute construction, classification, cross-validation and visualization
http://www.epistasis.org/mdr.html
Moore (Expert Review of Molecular Diagnostics, 4:795-803, 2004)
MDRWorked example: SBP (dichotomous by median)Covariates: Sex and Quartiles of Total cholesterol
Combination Class 1 Class 0 Ratio New Class
0,1 479 480 0,9979 0
0,2 662 462 1,4329 1
0,3 769 341 2,2551 1
0,4 752 313 2,4026 1
1,1 258 913 0,2826 0
1,2 319 686 0,465 0
1,3 476 546 0,8718 0
1,4 572 491 1,165 1
MDRMDRWorked example: SBP (dichotomous by median)Covariates: Sex and Quartiles of Total cholesterol
Best Model output:
Combination Class 1 Class 0 Ratio
0 408 1131 0,3607
0,1 221 713 0,31
0,2 26 114 0,2281
1 341 1255 0,2717
1,1 357 1372 0,2602
1,2 59 306 0,1928
2 77 413 0,1864
2,1 105 593 0,1771
2,2 44 217 0,2028
Best Model output:
MDRMDRWorked example: HDL-c (dichotomous by <= 1 mmol/L)Covariates: SNP1 SNP2
AACC
ABDD
ABCC
MDR
Summary
Pro:
Includes cross-validation in the same population
Can be used as dataminer, not necessarily hypothesis driven
Statistically powerful to uncover also weak (genotype) effects
Con:
Can be used as dataminer, not necessarily hypothesis driven
Limited by categorical data only
Discussion
-Standard methods in genetic epidemiology only show very strong association in case of direct or extremely close relationship between gene and outcome parameter of interest.
-Complex traits are build of individual contributors (genetic variants, environmental parameters) that each in itself have a weak main effect on the trait.
-Noise and strong confounders limit detection of the weaker contributors in complex traits by standard statistics
-Main effects of the individual contributors can be visualized using novel tools (e.g. FGClustor, MDR) in a context dependent approach at the background of solid hypothesis