health research using genomic...
TRANSCRIPT
![Page 1: Health Research Using Genomic Informationhosting03.snu.ac.kr/~hokim/seminar/health_genome20040922.pdf · • Figure3 NPL analysis of 25 families with bipolar disorder from the Portuguese](https://reader036.vdocuments.mx/reader036/viewer/2022070901/5f444735754c621d753d45a0/html5/thumbnails/1.jpg)
Health Research Using Genomic Information
김호
서울대학교 보건대학원
![Page 2: Health Research Using Genomic Informationhosting03.snu.ac.kr/~hokim/seminar/health_genome20040922.pdf · • Figure3 NPL analysis of 25 families with bipolar disorder from the Portuguese](https://reader036.vdocuments.mx/reader036/viewer/2022070901/5f444735754c621d753d45a0/html5/thumbnails/2.jpg)
CONTENTS
• Linkage Analysis
• Segregation Analysis
• SNP and haplotype analysis
• Association Studies
• Discussion
![Page 3: Health Research Using Genomic Informationhosting03.snu.ac.kr/~hokim/seminar/health_genome20040922.pdf · • Figure3 NPL analysis of 25 families with bipolar disorder from the Portuguese](https://reader036.vdocuments.mx/reader036/viewer/2022070901/5f444735754c621d753d45a0/html5/thumbnails/3.jpg)
Putative gene(locus)
Gene ?Phenotype
Linkage analysisLinkage analysis(LD, (LD, sibpairsibpair et al)et al)
Association studyAssociation study
New GeneNew GeneDiscoveryDiscovery
SegregationSegregation
![Page 4: Health Research Using Genomic Informationhosting03.snu.ac.kr/~hokim/seminar/health_genome20040922.pdf · • Figure3 NPL analysis of 25 families with bipolar disorder from the Portuguese](https://reader036.vdocuments.mx/reader036/viewer/2022070901/5f444735754c621d753d45a0/html5/thumbnails/4.jpg)
Biological Basis of Linkage
• If two loci are on different chromosome, they recombine with probability 0.5
• Similarly, if two loci are very far apart on the same chromosome,..
• But then the two loci are very close together, recombination tends towards zero.
![Page 5: Health Research Using Genomic Informationhosting03.snu.ac.kr/~hokim/seminar/health_genome20040922.pdf · • Figure3 NPL analysis of 25 families with bipolar disorder from the Portuguese](https://reader036.vdocuments.mx/reader036/viewer/2022070901/5f444735754c621d753d45a0/html5/thumbnails/5.jpg)
![Page 6: Health Research Using Genomic Informationhosting03.snu.ac.kr/~hokim/seminar/health_genome20040922.pdf · • Figure3 NPL analysis of 25 families with bipolar disorder from the Portuguese](https://reader036.vdocuments.mx/reader036/viewer/2022070901/5f444735754c621d753d45a0/html5/thumbnails/6.jpg)
· PARAMETRIC LINKAGE ANALYSISTo estimate the recombination fraction between markers and a hypothesized trait locus, where inheritance parameters of the trait locus (mode of inheritance, penetrance, phenocopy rate, allele frequencies etc) must be specified.
Ex. Lod score method
![Page 7: Health Research Using Genomic Informationhosting03.snu.ac.kr/~hokim/seminar/health_genome20040922.pdf · • Figure3 NPL analysis of 25 families with bipolar disorder from the Portuguese](https://reader036.vdocuments.mx/reader036/viewer/2022070901/5f444735754c621d753d45a0/html5/thumbnails/7.jpg)
· LOD SCORE
The common logarithm of the likelihood ratio:
Z(θ) = log10 [L(θ ) / L(½)]
where θ is the recombination fraction between two loci
![Page 8: Health Research Using Genomic Informationhosting03.snu.ac.kr/~hokim/seminar/health_genome20040922.pdf · • Figure3 NPL analysis of 25 families with bipolar disorder from the Portuguese](https://reader036.vdocuments.mx/reader036/viewer/2022070901/5f444735754c621d753d45a0/html5/thumbnails/8.jpg)
· Purpose Of The Lod Score Method
1. Estimation of the recombination fraction, θ
2. Hypothesis testing
H0: θ = ½ (absence of linkage)
H1: θ < ½ (linkage)
max 10( ) log [ ( ) / (1/ 2)]Z Z L L= =θ̂ θ̂
![Page 9: Health Research Using Genomic Informationhosting03.snu.ac.kr/~hokim/seminar/health_genome20040922.pdf · • Figure3 NPL analysis of 25 families with bipolar disorder from the Portuguese](https://reader036.vdocuments.mx/reader036/viewer/2022070901/5f444735754c621d753d45a0/html5/thumbnails/9.jpg)
· Scale For Testing Linkage
Zmax ≥ 3 : Strong linkage
Zmax > 0 : Support linkage
Zmax < 0 : Against linkage
Zmax = 0 : No support
(not related to recombination in linkage or no linkage)
![Page 10: Health Research Using Genomic Informationhosting03.snu.ac.kr/~hokim/seminar/health_genome20040922.pdf · • Figure3 NPL analysis of 25 families with bipolar disorder from the Portuguese](https://reader036.vdocuments.mx/reader036/viewer/2022070901/5f444735754c621d753d45a0/html5/thumbnails/10.jpg)
· Asymptotic Distribution
2 ln [L(θ ) / L(½)] = 4.6 × Zmax ~ χ21
under the null hypothesis of no linkage
P (Zmax ≥ 3) = P (χ21 ≥ 13.8) = 0.0002
α = 0.0001
![Page 11: Health Research Using Genomic Informationhosting03.snu.ac.kr/~hokim/seminar/health_genome20040922.pdf · • Figure3 NPL analysis of 25 families with bipolar disorder from the Portuguese](https://reader036.vdocuments.mx/reader036/viewer/2022070901/5f444735754c621d753d45a0/html5/thumbnails/11.jpg)
Phase known pedigree
![Page 12: Health Research Using Genomic Informationhosting03.snu.ac.kr/~hokim/seminar/health_genome20040922.pdf · • Figure3 NPL analysis of 25 families with bipolar disorder from the Portuguese](https://reader036.vdocuments.mx/reader036/viewer/2022070901/5f444735754c621d753d45a0/html5/thumbnails/12.jpg)
Figure 2 Phase known pedigree
• The maximum likelihood estimator of is 2/6=1/3
2 46 2 4
10 102 4
(1 )( ) log log 2 (1 )0.5 0.5
Z θ θθ θ θ−= = −⋅
(1/ 3) 0.1475Z =
θ
![Page 13: Health Research Using Genomic Informationhosting03.snu.ac.kr/~hokim/seminar/health_genome20040922.pdf · • Figure3 NPL analysis of 25 families with bipolar disorder from the Portuguese](https://reader036.vdocuments.mx/reader036/viewer/2022070901/5f444735754c621d753d45a0/html5/thumbnails/13.jpg)
Phase-unknown pedigree
![Page 14: Health Research Using Genomic Informationhosting03.snu.ac.kr/~hokim/seminar/health_genome20040922.pdf · • Figure3 NPL analysis of 25 families with bipolar disorder from the Portuguese](https://reader036.vdocuments.mx/reader036/viewer/2022070901/5f444735754c621d753d45a0/html5/thumbnails/14.jpg)
Figure 3 Phase-unknown pedigree
• The maximum likelihood estimator of is not so trivial
• The MLE is found to be 0.5 by numerical method
4 2 2 4
2 2 2 2
1 1( ) (1 ) (1 )2 21 = (1 ) [ (1 ) ]2
L θ θ θ θ θ
θ θ θ θ
= − + −
− + −
θ
![Page 15: Health Research Using Genomic Informationhosting03.snu.ac.kr/~hokim/seminar/health_genome20040922.pdf · • Figure3 NPL analysis of 25 families with bipolar disorder from the Portuguese](https://reader036.vdocuments.mx/reader036/viewer/2022070901/5f444735754c621d753d45a0/html5/thumbnails/15.jpg)
Genotype Unknown-Phenotype known
![Page 16: Health Research Using Genomic Informationhosting03.snu.ac.kr/~hokim/seminar/health_genome20040922.pdf · • Figure3 NPL analysis of 25 families with bipolar disorder from the Portuguese](https://reader036.vdocuments.mx/reader036/viewer/2022070901/5f444735754c621d753d45a0/html5/thumbnails/16.jpg)
Figure 4 Genotype Unknown-Phenotype known
( ; ) Pr( ) Pr( )
Pr( | , ; )
and we know thatPr( ) Pr( ) Pr( | )
ma pa
offs ma paoffspring
ma G
L data Ph Ph
Ph Ph Ph
Ph G Ph G
θ
θ
=
×
=
∏
∑
![Page 17: Health Research Using Genomic Informationhosting03.snu.ac.kr/~hokim/seminar/health_genome20040922.pdf · • Figure3 NPL analysis of 25 families with bipolar disorder from the Portuguese](https://reader036.vdocuments.mx/reader036/viewer/2022070901/5f444735754c621d753d45a0/html5/thumbnails/17.jpg)
· NONPARAMETRIC LINKAGE ANALYSISInheritance parameters of the trait locus are not specified. Rather, one focuses on pairs (or multiples) of affected individuals and investigates marker allele sharing among these individuals, contrasting observed allele sharing with that expected when the marker has nothing to do with the trait.
Ex. IBD (identical by descent) test
![Page 18: Health Research Using Genomic Informationhosting03.snu.ac.kr/~hokim/seminar/health_genome20040922.pdf · • Figure3 NPL analysis of 25 families with bipolar disorder from the Portuguese](https://reader036.vdocuments.mx/reader036/viewer/2022070901/5f444735754c621d753d45a0/html5/thumbnails/18.jpg)
AN EXAMPLE FAMILY WITH DISEASE LOCUS AT THE MARKER
3 4+ –
3 2+ –
3 3+ +
3 4+ –
2 3– +
2 4– –
• Only ‘+ +’ indicates as “affected”(‘+’ is recessive to ‘–’)
** Qualitative Trait
Sib-Pair Markers
sib1 sib23 | 3 3 | 33 | 3 3 | 43 | 3 2 | 33 | 3 2 | 43 | 4 3 | 43 | 4 2 | 33 | 4 2 | 42 | 3 2 | 32 | 3 2 | 42 | 4 2 | 4
Disease Status
d1 d2+ ++ -+ -+ -- -- -- -- -- -- -
# ofShared i.b.d.
2110201212
C
10.250.250.250.50.50.50.50.50.5
• Cj = (dj1 – µ) (dj2 – µ)
= α + β IBDj + εj
![Page 19: Health Research Using Genomic Informationhosting03.snu.ac.kr/~hokim/seminar/health_genome20040922.pdf · • Figure3 NPL analysis of 25 families with bipolar disorder from the Portuguese](https://reader036.vdocuments.mx/reader036/viewer/2022070901/5f444735754c621d753d45a0/html5/thumbnails/19.jpg)
· Linkage And LD
- The two loci can be assumed to reside on different chromosomes.
The presence of LD does not necessarily imply linkage between the loci considered.
- Although LD originally referred to an association of alleles at different loci, it has become customary to take LD to mean association among alleles due to close linkage. “allelic association”
![Page 20: Health Research Using Genomic Informationhosting03.snu.ac.kr/~hokim/seminar/health_genome20040922.pdf · • Figure3 NPL analysis of 25 families with bipolar disorder from the Portuguese](https://reader036.vdocuments.mx/reader036/viewer/2022070901/5f444735754c621d753d45a0/html5/thumbnails/20.jpg)
· Another Approach To LD Analysis(“Family-Based Study”)
1. Haplotype relative risk (HRR) method
: Falk and Rubinstein (1987)
2. Haplotype-based haplotype relative risk (HHRR) method: Terwilliger and Ott (1992)
3. Transmission/ disequilibrium test (TDT)
: Spielman et al. (1993)
4. Sib-Transmission/ disequilibrium test (S-TDT): Spielman and Ewens (1998)
![Page 21: Health Research Using Genomic Informationhosting03.snu.ac.kr/~hokim/seminar/health_genome20040922.pdf · • Figure3 NPL analysis of 25 families with bipolar disorder from the Portuguese](https://reader036.vdocuments.mx/reader036/viewer/2022070901/5f444735754c621d753d45a0/html5/thumbnails/21.jpg)
· Transmission/ disequilibrium test
1 2 1 2
1 1 0(d)0 (c)Allele2 (A2)
2(b)0 (a)Allele 1 (A1)Transm
itted
Allele2 (A2)
Allele1 (A1)
Not transmitted
- Focus on heterozygous parents only, and allow the use of multiple affected siblings.- McNemar’s test (standard χ2 test) H0: b = cThe TDT statistic:
- Powerful only in the presence of LD. cb
cb+−=
221
)(χ
![Page 22: Health Research Using Genomic Informationhosting03.snu.ac.kr/~hokim/seminar/health_genome20040922.pdf · • Figure3 NPL analysis of 25 families with bipolar disorder from the Portuguese](https://reader036.vdocuments.mx/reader036/viewer/2022070901/5f444735754c621d753d45a0/html5/thumbnails/22.jpg)
• Genomewide Linkage Analysis of Bipolar Disorder by Use of a High-Density Single-Nucleotide Polymorphism (SNP) Genotyping Assay: A Comparison with MicrosatelliteMarker Assays and Finding of Significant Linkage to Chromosome 6q22
• F. A. Middleton,1,2,3 M. T. Pato,2,3,4 K. L. Gentile,1,2 C. P. Morley,2 X. Zhao,1,2 A. F. Eisener,2 A. Brown,1,2 T. L. Petryshen,6 A. N. Kirby,5,6 H. Medeiros,2,4 C. Carvalho,2 A. Macedo,8 A. Dourado,8 I. Coelho,8 J. Valente,8 M. J. Soares,8 C. P. Ferreira,9 M. Lei,9 M. H. Azevedo,4 J. L. Kennedy,10 M. J. Daly,5 P. Sklar,6,7 and C. N. Pato2,3,4,9
• Am. J. Hum. Genet., 74:000, 2004
![Page 23: Health Research Using Genomic Informationhosting03.snu.ac.kr/~hokim/seminar/health_genome20040922.pdf · • Figure3 NPL analysis of 25 families with bipolar disorder from the Portuguese](https://reader036.vdocuments.mx/reader036/viewer/2022070901/5f444735754c621d753d45a0/html5/thumbnails/23.jpg)
We performed a linkage analysis on 25 extended multiplex Portuguese families segregating for bipolar disorder, by use of a high-density single-nucleotide polymorphism (SNP) genotyping assay, the GeneChip Human Mapping 10K Array (HMA10K). Of these families, 12 were used for a direct comparison of the HMA10K with the traditional 10-cM microsatellite marker set and the more dense 4-cM marker set. This comparative analysis indicated the presence of significant linkage peaks in the SNP assay in chromosomal regions characterized by poor coverage and low information content on the microsatellite assays. The HMA10K provided consistently high information and enhanced coverage throughout these regions. Across the entire genome, the HMA10K had an average information content of 0.842 with 0.21-Mb intermarker spacing. In the 12-family set, the HMA10K-based analysis detected two chromosomal regions with genomewide significant linkage on chromosomes 6q22 and 11p11; both regions had failed to meet this strict threshold with the microsatelliteassays. The full 25-family collection further strengthened the findings on chromosome 6q22, achieving genomewide significance with a maximum nonparametric linkage (NPL) score of 4.20 and a maximum LOD score of 3.56 at position 125.8 Mb. In addition to this highly significant finding, several other regions of suggestive linkage have also been identified in the 25-family data set, including two regions on chromosome 2 (57 Mb, NPL = 2.98; 145 Mb, NPL = 3.09), as well as regions on chromosomes 4 (91 Mb, NPL = 2.97), 16 (20 Mb, NPL = 2.89), and 20 (60 Mb, NPL = 2.99).We conclude that at least some of the linkage peaks we have identified may have been largely undetected in previous whole-genome scans for bipolar disorder because of insufficient coverage or information content, particularly on chromosomes 6q22 and 11p11.
![Page 24: Health Research Using Genomic Informationhosting03.snu.ac.kr/~hokim/seminar/health_genome20040922.pdf · • Figure3 NPL analysis of 25 families with bipolar disorder from the Portuguese](https://reader036.vdocuments.mx/reader036/viewer/2022070901/5f444735754c621d753d45a0/html5/thumbnails/24.jpg)
• Figure 2 Linkage signals obtained with 10-cM spaced and 4-cM spaced microsatellite assays, as well as the HMA10K SNP genotyping assay. These assays were performed on the same individuals from each of the same 12 families. Note the high correlation of the different assays in general, and that for both chromosomes 6 and 11, the SNP assay detected major linkage peaks at locations where the information content and coverage of the microsatellite panels were relatively low. Mb, megabaseposition; MSM, microsatellitemarkers.
![Page 25: Health Research Using Genomic Informationhosting03.snu.ac.kr/~hokim/seminar/health_genome20040922.pdf · • Figure3 NPL analysis of 25 families with bipolar disorder from the Portuguese](https://reader036.vdocuments.mx/reader036/viewer/2022070901/5f444735754c621d753d45a0/html5/thumbnails/25.jpg)
• Figure 3 NPL analysis of 25 families with bipolar disorder from the Portuguese Island Collection. The number of each chromosome is shown at the top of each plot. The X-axis indicates the physical position (Mb) of the SNP marker. The Y-axis indicates the NPL Z score (black) or Kong and Cox LOD score (gray). For this scan, the empirical limit for genomewide significance was an NPL score of 3.85 and a LOD score of 3.15. Note that only the peak on chromosome 6 at 125.8 Mb was significant when both NPL Z and LOD thresholds were used.
![Page 26: Health Research Using Genomic Informationhosting03.snu.ac.kr/~hokim/seminar/health_genome20040922.pdf · • Figure3 NPL analysis of 25 families with bipolar disorder from the Portuguese](https://reader036.vdocuments.mx/reader036/viewer/2022070901/5f444735754c621d753d45a0/html5/thumbnails/26.jpg)
![Page 27: Health Research Using Genomic Informationhosting03.snu.ac.kr/~hokim/seminar/health_genome20040922.pdf · • Figure3 NPL analysis of 25 families with bipolar disorder from the Portuguese](https://reader036.vdocuments.mx/reader036/viewer/2022070901/5f444735754c621d753d45a0/html5/thumbnails/27.jpg)
Figure 4 Comparison of the 12-family (gray) and 25-family (black)genomewide linkage scans for selected
chromosomes showing suggestive or
significant linkage (see table 1). The X-axis indicates physical position (Mb). Notethat for both scans,
the signal on chromosome 6 at
position 125.8 Mb is the only genomic
region that achievesgenomewide
significance (of NPLscore and/or LOD
score).
![Page 28: Health Research Using Genomic Informationhosting03.snu.ac.kr/~hokim/seminar/health_genome20040922.pdf · • Figure3 NPL analysis of 25 families with bipolar disorder from the Portuguese](https://reader036.vdocuments.mx/reader036/viewer/2022070901/5f444735754c621d753d45a0/html5/thumbnails/28.jpg)
· QUANTITATIVE TRAITA phenotype with a continuous (normal/ lognormal) distribution.
Ex. Height, blood pressure, head circumstance and the cholesterol level in the blood
· QUALITATIVE TRAITA phenotype with a discrete distribution. Ex. Signs and symptoms indicate whether a disease state is present or absent.
![Page 29: Health Research Using Genomic Informationhosting03.snu.ac.kr/~hokim/seminar/health_genome20040922.pdf · • Figure3 NPL analysis of 25 families with bipolar disorder from the Portuguese](https://reader036.vdocuments.mx/reader036/viewer/2022070901/5f444735754c621d753d45a0/html5/thumbnails/29.jpg)
· HERITABILITY Of The Trait (H2)
The fraction of the variation caused by genetic variation.
H2 = Vg / Vp =Vg / (Vg + Ve ) (broad sense)
= Va / Vp (narrow sense)
· QUANTITATIVE TRAIT LOCI (QTL)
The location of a gene that affects a trait that is measured on a quantitative (linear) scale. The loci that are determinants of quantitative trait expression.
![Page 30: Health Research Using Genomic Informationhosting03.snu.ac.kr/~hokim/seminar/health_genome20040922.pdf · • Figure3 NPL analysis of 25 families with bipolar disorder from the Portuguese](https://reader036.vdocuments.mx/reader036/viewer/2022070901/5f444735754c621d753d45a0/html5/thumbnails/30.jpg)
예제: Descriptive statistics
1.8 0 CHOL≥240(%)
10.07 8.78 BMI≥30(%)
21.2 14.8 HP(%)
38.7 9.4 ALCHOL(%)
18.7 14.3 SMOK(%)
69.9 90.5 MADE(%)
39.9 46.3 MALE(%)
8.0 8.2 EDU-YR(MEAN)
38.2 29.4 AGE (MEAN)
SelengeDornod D (지역) S (지역)
![Page 31: Health Research Using Genomic Informationhosting03.snu.ac.kr/~hokim/seminar/health_genome20040922.pdf · • Figure3 NPL analysis of 25 families with bipolar disorder from the Portuguese](https://reader036.vdocuments.mx/reader036/viewer/2022070901/5f444735754c621d753d45a0/html5/thumbnails/31.jpg)
16.8 10.5 16.6 7.3 SKIN FOLD
163.4 165.4 159.7 154.2 CHOL
74.1 75.0 74.1 69.2 WC
76.1 82.0 73.2 67.4 DBP
116.0 127.0 114.0 107.6 SBP
24.3 22.3 24.1 21.0 BMI
54.9 58.1 54.7 48.0 WEIGHT
151.9 159.8 149.6 147.5 HEIGHT
Female Male Female Male
Selenge Dornod D S
![Page 32: Health Research Using Genomic Informationhosting03.snu.ac.kr/~hokim/seminar/health_genome20040922.pdf · • Figure3 NPL analysis of 25 families with bipolar disorder from the Portuguese](https://reader036.vdocuments.mx/reader036/viewer/2022070901/5f444735754c621d753d45a0/html5/thumbnails/32.jpg)
**0.50WC ^
**
**
**
**
**
**
*
**
**
유의성
0.53DBP ^
0.35BMI ^
0.38SKIN_FOLD ^
0.43BMD_LF1 ^
0.50HDL-C
0.42HC
0.17HEIGHT
0.39WEIGHT
0.51SBP
유전율변수
** P-value <0.05 * P-value < 0.1^ 정규성, 왜도, 첨도를 위해 변환을 실행한 변수들HEIGHT의 경우는 첨도에 문제가 있어서 유전율이 낮게 나왔음. Covariate으로 age sex age^2 age*sex bmi 중에서 사용하였는데 변수마다그 covariate 들이 다르다
SOLARSOLAR로로 살펴본살펴본 일부일부 PHENOTYPEPHENOTYPE의의 유전율유전율
![Page 33: Health Research Using Genomic Informationhosting03.snu.ac.kr/~hokim/seminar/health_genome20040922.pdf · • Figure3 NPL analysis of 25 families with bipolar disorder from the Portuguese](https://reader036.vdocuments.mx/reader036/viewer/2022070901/5f444735754c621d753d45a0/html5/thumbnails/33.jpg)
SNPs(pronounced snips)
![Page 34: Health Research Using Genomic Informationhosting03.snu.ac.kr/~hokim/seminar/health_genome20040922.pdf · • Figure3 NPL analysis of 25 families with bipolar disorder from the Portuguese](https://reader036.vdocuments.mx/reader036/viewer/2022070901/5f444735754c621d753d45a0/html5/thumbnails/34.jpg)
SNPs as DNA Landmarks
• Help in DNA sequencing
• Help in the discovery of genes responsible for many major diseases:
– asthma, diabetes, heart disease, schizophrenia and cancer among others
![Page 35: Health Research Using Genomic Informationhosting03.snu.ac.kr/~hokim/seminar/health_genome20040922.pdf · • Figure3 NPL analysis of 25 families with bipolar disorder from the Portuguese](https://reader036.vdocuments.mx/reader036/viewer/2022070901/5f444735754c621d753d45a0/html5/thumbnails/35.jpg)
From SNP to Haplotype
DNA Sequence
GATATTCGTACGGA-TGATGTTCGTACTGAATGATATTCGTACGGA-TGATATTCGTACGGAATGATGTTCGTACTGAATGATGTTCGTACTGAAT
SNP
SNP
1 2
3
4
5 6
AG- 2/6
GTA 3/6AGA 1/6
Haplotypes
PhenotypeBlack eyeBrown eyeBlack eyeBlue eyeBrown eyeBrown eye
![Page 36: Health Research Using Genomic Informationhosting03.snu.ac.kr/~hokim/seminar/health_genome20040922.pdf · • Figure3 NPL analysis of 25 families with bipolar disorder from the Portuguese](https://reader036.vdocuments.mx/reader036/viewer/2022070901/5f444735754c621d753d45a0/html5/thumbnails/36.jpg)
![Page 37: Health Research Using Genomic Informationhosting03.snu.ac.kr/~hokim/seminar/health_genome20040922.pdf · • Figure3 NPL analysis of 25 families with bipolar disorder from the Portuguese](https://reader036.vdocuments.mx/reader036/viewer/2022070901/5f444735754c621d753d45a0/html5/thumbnails/37.jpg)
In-silico Haplotyping: Approaches
1) Clark’s algorithm
2) E-M algorithm (expectation-maximization algorithm)
3) Bayesian algorithm
![Page 38: Health Research Using Genomic Informationhosting03.snu.ac.kr/~hokim/seminar/health_genome20040922.pdf · • Figure3 NPL analysis of 25 families with bipolar disorder from the Portuguese](https://reader036.vdocuments.mx/reader036/viewer/2022070901/5f444735754c621d753d45a0/html5/thumbnails/38.jpg)
Clark’s Algorithm
1) Find Homozygotes or heterozygotes at one locus
SNP1 T T
SNP2 A A
SNP3 C C
T-A-C
T-A-C
SNP1 T T
SNP2 A A
SNP3 C G
T-A-C
T-A-G
Unambiguously defined
![Page 39: Health Research Using Genomic Informationhosting03.snu.ac.kr/~hokim/seminar/health_genome20040922.pdf · • Figure3 NPL analysis of 25 families with bipolar disorder from the Portuguese](https://reader036.vdocuments.mx/reader036/viewer/2022070901/5f444735754c621d753d45a0/html5/thumbnails/39.jpg)
Clark’s Algorithm2) Try to solve ambiguous haplotype as a combination of solved ones
SNP1 A T
SNP2 A A
SNP3 C G
T-A-C : solved one
A-A-G
Continue until either all haplotypes have been solved or until no more haplotypes can be found in this way
……………………………
![Page 40: Health Research Using Genomic Informationhosting03.snu.ac.kr/~hokim/seminar/health_genome20040922.pdf · • Figure3 NPL analysis of 25 families with bipolar disorder from the Portuguese](https://reader036.vdocuments.mx/reader036/viewer/2022070901/5f444735754c621d753d45a0/html5/thumbnails/40.jpg)
Clark’s Algorithmproblems
• No homozygotes or single SNP heterozygotes -> chain might never get started
•Many unsolved haplotypes left at the end
•Quite useful in practice !!
![Page 41: Health Research Using Genomic Informationhosting03.snu.ac.kr/~hokim/seminar/health_genome20040922.pdf · • Figure3 NPL analysis of 25 families with bipolar disorder from the Portuguese](https://reader036.vdocuments.mx/reader036/viewer/2022070901/5f444735754c621d753d45a0/html5/thumbnails/41.jpg)
EM Algorithm• Use multinomial likelihood with HWE
Pr(AT//AA//CG)
=pr(AAC/TAG)+pr(AAG/TAC)
=pr(AAC)pr(TAG)+pr(AAG)pr(TAC)
Falling and Schork(2000) showed that EM is better than Clark’s algorithm
![Page 42: Health Research Using Genomic Informationhosting03.snu.ac.kr/~hokim/seminar/health_genome20040922.pdf · • Figure3 NPL analysis of 25 families with bipolar disorder from the Portuguese](https://reader036.vdocuments.mx/reader036/viewer/2022070901/5f444735754c621d753d45a0/html5/thumbnails/42.jpg)
A Gibbs sampler, Stephens et al (2001)
• G=(G1, …, Gn) observed multilocus genotype freq
H=(H1, …, Hn) unknown haplotype pairs
F=(F1, …, FM) M unknown pop’n hap freq
1. Choose individual i from all ambiguous individuals
2. Sample Hi(t+1) from pr(Hi|G,H-i
(t))
3. Set Hj(t+1)=Hj
(t) for j=1,2,…,i-1,i+1,…n
![Page 43: Health Research Using Genomic Informationhosting03.snu.ac.kr/~hokim/seminar/health_genome20040922.pdf · • Figure3 NPL analysis of 25 families with bipolar disorder from the Portuguese](https://reader036.vdocuments.mx/reader036/viewer/2022070901/5f444735754c621d753d45a0/html5/thumbnails/43.jpg)
![Page 44: Health Research Using Genomic Informationhosting03.snu.ac.kr/~hokim/seminar/health_genome20040922.pdf · • Figure3 NPL analysis of 25 families with bipolar disorder from the Portuguese](https://reader036.vdocuments.mx/reader036/viewer/2022070901/5f444735754c621d753d45a0/html5/thumbnails/44.jpg)
Haplotype InferenceA: SNP data: 0 (MM), 1 (Mm), 2 (mm) for a single locus
B: Haplotype data: 0(M), 1 (m) for a single locus
![Page 45: Health Research Using Genomic Informationhosting03.snu.ac.kr/~hokim/seminar/health_genome20040922.pdf · • Figure3 NPL analysis of 25 families with bipolar disorder from the Portuguese](https://reader036.vdocuments.mx/reader036/viewer/2022070901/5f444735754c621d753d45a0/html5/thumbnails/45.jpg)
• Traditional linkage studies > use recombination information only in pedigrees
• Association methods > use recombination information at the population level
• Association methods have greater power to detect small and moderate genetic effects than does linkage analysis (Risch and Merikangas 1996)
![Page 46: Health Research Using Genomic Informationhosting03.snu.ac.kr/~hokim/seminar/health_genome20040922.pdf · • Figure3 NPL analysis of 25 families with bipolar disorder from the Portuguese](https://reader036.vdocuments.mx/reader036/viewer/2022070901/5f444735754c621d753d45a0/html5/thumbnails/46.jpg)
Strategy for Suggested AssoSt for complex disease
1. Small # of people (10-20) genotyped at a very dense SNP map, haps also determined
2. Hap block partitioning algorithm : hap block and tag SNPs
3. Large # of people genotyped at tag SNP marker loci
4. Association study analysis
![Page 47: Health Research Using Genomic Informationhosting03.snu.ac.kr/~hokim/seminar/health_genome20040922.pdf · • Figure3 NPL analysis of 25 families with bipolar disorder from the Portuguese](https://reader036.vdocuments.mx/reader036/viewer/2022070901/5f444735754c621d753d45a0/html5/thumbnails/47.jpg)
Polymorphisms in the XRCC1gene and alcohol consumption are associated with colorectal
cancer risk • a case-control study of 209
colorectal cancer cases and 209 age- and sex-matched controls in the Korean population
• Allelic variants of the XRCC1 gene at codons 194, 280 and 399 were analyzed in lymphocyte DNA by PCR-RFLP
![Page 48: Health Research Using Genomic Informationhosting03.snu.ac.kr/~hokim/seminar/health_genome20040922.pdf · • Figure3 NPL analysis of 25 families with bipolar disorder from the Portuguese](https://reader036.vdocuments.mx/reader036/viewer/2022070901/5f444735754c621d753d45a0/html5/thumbnails/48.jpg)
Table 1. Frequencies of single nucleotide polymorphisms and the odds ratios of colorectal cancer
0.0171.61 (1.09, 2.39)73 (34.9)97 (46.4)Arg/Gln or Gln/Gln
0.6911.21 (0.47, 3.16)9 (4.3)9 (4.3)Gln/Gln
0.0141.67 (1.11, 2.51)64 (30.6)88 (42.1)Arg/Gln
1136 (65.1)112 (53.6)Arg/Arg
XRCC1 Codon 399
0.1441.43 (0.88, 2.32)36 (17.2)48 (23.0)Arg/His or His/His
0.6130.54 (0.05, 5.98)2 (1.0)1 (0.5)His/His
0.1141.49 (0.91, 2.43)34 (16.2)47 (22.5)Arg/His
1173 (82.8)161 (77.0)Arg/Arg
XRCC1 Codon 280
0.2801.24 (0.84, 1.82)108 (51.7)119 (57.0)Arg/Trp or Trp/Trp
0.8091.08 (0.58, 2.00)26 (12.5)25 (12.0)Trp/Trp
0.2291.29 (0.85, 1.94)82 (39.2)94 (45.0)Arg/Trp
1101 (48.3)90 (43.0)Arg/Arg
XRCC1 Codon 194
P-valueOR (95% CI)*Controls (%)Patients (%)
![Page 49: Health Research Using Genomic Informationhosting03.snu.ac.kr/~hokim/seminar/health_genome20040922.pdf · • Figure3 NPL analysis of 25 families with bipolar disorder from the Portuguese](https://reader036.vdocuments.mx/reader036/viewer/2022070901/5f444735754c621d753d45a0/html5/thumbnails/49.jpg)
Table 2. Estimated haplotype frequencies and odds ratios of colorectal cancer based on haplotypes
0.0021.78 (1.23, 2.59)82 (19.6)106 (25.4)194Arg-280Arg-399Gln
0.0151.81 (1.12, 2.94)38 (9.1)50 (12.0)194Arg-280His-399Arg
0.0231.47 (1.05, 2.05)134 (32.1)143 (34.2)194Trp-280Arg-399Arg
1164 (39.2)119 (28.4)194Arg-280Arg-399Arg
P-valueOR (95% CI)Controls (%)Patients (%)XRCC1*
* The frequencies of 194Trp-280His-399Arg, 194Trp-280Arg-399Gln, 194Arg-280His-399Gln, 194Trp-280His-399Gln were zero in both groups.
![Page 50: Health Research Using Genomic Informationhosting03.snu.ac.kr/~hokim/seminar/health_genome20040922.pdf · • Figure3 NPL analysis of 25 families with bipolar disorder from the Portuguese](https://reader036.vdocuments.mx/reader036/viewer/2022070901/5f444735754c621d753d45a0/html5/thumbnails/50.jpg)
Table 3. Estimated genotype frequencies and the odds ratios of colorectal cancer aftercontrolling for alcohol intake, smoking, dietary habits and exercise
0.4861.28 (0.64, 2.54)9 (4.3)9 (4.3)194Arg-280Arg-399Gln /194Arg-280Arg-399Gln
0.0043.69 (1.53, 8.90)6 (2.9)17 (8.1)194Arg-280His-399Arg /194Arg-280Arg-399Gln
0.2801.79 (0.62, 5.14)2 (1.0)1 (0.5)194Arg-280His-399Arg /194Arg-280His-399Arg
0.0042.08 (1.27, 3.40)22 (10.5)42 (20.1)194Trp-280Arg-399Arg /194Arg-280Arg-399Gln
0.5641.54 (0.36, 6.60)12 (5.7)14 (6.7)194Trp-280Arg-399Arg /194Arg-280His-399Arg
0.6091.32 (0.46, 3.75)26 (12.4)25 (12.0)194Trp-280Arg-399Arg /194Trp-280Arg-399Arg
0.7701.07 (0.68, 1.69)36 (17.2)29 (13.9)194Arg-280Arg-399Arg /194Arg-280Arg-399Gln
0.5401.14 (0.76, 1.71)16 (7.7)17 (8.1)194Arg-280Arg-399Arg /194Arg-280His-399Arg
0.8260.90 (0.34, 2.35)48 (23.0)37 (17.7)194Arg-280Arg-399Arg /194Trp-280Arg-399Arg
132 (15.3)18 (8.6)194Arg-280Arg-399Arg /194Arg-280Arg-399Arg
P-value
OR (95% CI)Controls (%)Patients (%)
![Page 51: Health Research Using Genomic Informationhosting03.snu.ac.kr/~hokim/seminar/health_genome20040922.pdf · • Figure3 NPL analysis of 25 families with bipolar disorder from the Portuguese](https://reader036.vdocuments.mx/reader036/viewer/2022070901/5f444735754c621d753d45a0/html5/thumbnails/51.jpg)
Table 4. Risk of colorectal cancer associated with alcohol intake after controlling for smoking, dietary habits and exercise, and the risk modification by genotype
0.3154.14 (0.26, 66.36)1 (16.7)7 (41.2)A bottle or more a week
5 (83.3)10 (58.8)Less than a bottle a week
0.0317.15 (1.20, 42.46)19 (86.4)3 (13.6)
27 (64.3)15 (35.7)
194Trp-280Arg-399Arg /194Arg-280Arg-399Gln Less than a bottle a weekA bottle or more a week
194Arg-280His-399Arg /194Arg-280Arg-399Gln
0.6181.58 (0.26, 9.65)6 (18.7)6 (33.3)A bottle or more a week
26 (81.3)12 (66.7)Less than a bottle a week
194Arg-280Arg-399Arg /194Arg-280Arg-399Arg
0.0012.45 (1.41, 4.25)52 (24.9)64 (30.6)A bottle or more a week
157 (75.1)145 (69.4)Less than a bottle a week
All subjects
P-valueOR (95% CI)Controls (%)Patients (%)Amount of alcohol intake
![Page 52: Health Research Using Genomic Informationhosting03.snu.ac.kr/~hokim/seminar/health_genome20040922.pdf · • Figure3 NPL analysis of 25 families with bipolar disorder from the Portuguese](https://reader036.vdocuments.mx/reader036/viewer/2022070901/5f444735754c621d753d45a0/html5/thumbnails/52.jpg)
Haplotype 분석시의 유의점
• Haplotype estimation에서의 불확실성
• LD를 살펴봄
• Sub-cell의 freq 가 너무 적은 경우에는비모수적인 방법 등을 고려해야함
• Population mixture의 문제
![Page 53: Health Research Using Genomic Informationhosting03.snu.ac.kr/~hokim/seminar/health_genome20040922.pdf · • Figure3 NPL analysis of 25 families with bipolar disorder from the Portuguese](https://reader036.vdocuments.mx/reader036/viewer/2022070901/5f444735754c621d753d45a0/html5/thumbnails/53.jpg)
무유
0.7134/0.1333=5.52상대위험도
28/(28+182)=0.133318228a
81/(81+29)=0.73642981A
위험도질병상태
유전정보
표1. 질병상태의 유전정보에 따른 위험도 (예제1)
결론 : 유전정보와 질병상태에는 연관이 있다.
![Page 54: Health Research Using Genomic Informationhosting03.snu.ac.kr/~hokim/seminar/health_genome20040922.pdf · • Figure3 NPL analysis of 25 families with bipolar disorder from the Portuguese](https://reader036.vdocuments.mx/reader036/viewer/2022070901/5f444735754c621d753d45a0/html5/thumbnails/54.jpg)
표2. 혼란변수 유무에 따른 위험도 (예제1)
무유
1.00상대위험도
0.80028a
0.8002080A
위험도질병상태유전
정보무유
1.00상대위험도
0.10018020a
0.10091A
위험도질병상태유전
정보
A 인종 B 인종
결론 : 두 인종 모두에서 유전정보와 질병상태에는 연관이 없다.
![Page 55: Health Research Using Genomic Informationhosting03.snu.ac.kr/~hokim/seminar/health_genome20040922.pdf · • Figure3 NPL analysis of 25 families with bipolar disorder from the Portuguese](https://reader036.vdocuments.mx/reader036/viewer/2022070901/5f444735754c621d753d45a0/html5/thumbnails/55.jpg)
요약하면
• 전체 집단에서는 질병과 유전정보에 연관 있다.
• A 인종에서는 질병과 유전정보에 연관 없다.
• A 인종에서는 질병과 유전정보에 연관 없다.
• ? ? ?
![Page 56: Health Research Using Genomic Informationhosting03.snu.ac.kr/~hokim/seminar/health_genome20040922.pdf · • Figure3 NPL analysis of 25 families with bipolar disorder from the Portuguese](https://reader036.vdocuments.mx/reader036/viewer/2022070901/5f444735754c621d753d45a0/html5/thumbnails/56.jpg)
무유
1.0000상대위험
도
0.3636350200a
0.3636420240A
위험도질병상태
유전정보
표3. 질병상태의 유전정보에 따른 위험도 (예제2)
결론 : 질병상태와 유전정보에는 연관이 없다.
![Page 57: Health Research Using Genomic Informationhosting03.snu.ac.kr/~hokim/seminar/health_genome20040922.pdf · • Figure3 NPL analysis of 25 families with bipolar disorder from the Portuguese](https://reader036.vdocuments.mx/reader036/viewer/2022070901/5f444735754c621d753d45a0/html5/thumbnails/57.jpg)
표4. 혼란변수 유무에 따른 위험도 (예제2)
무유
2.45상대위험도
0.3900305195a
0.95455105A
위험도질병상태유전
정보무유
2.45상대위험도
0.1000455a
0.2455415135A
위험도질병상태유전
정보
A 인종 B 인종
결론 : 두 인종 모두에서 유전정보와 질병상태에는 연관이 없다.
![Page 58: Health Research Using Genomic Informationhosting03.snu.ac.kr/~hokim/seminar/health_genome20040922.pdf · • Figure3 NPL analysis of 25 families with bipolar disorder from the Portuguese](https://reader036.vdocuments.mx/reader036/viewer/2022070901/5f444735754c621d753d45a0/html5/thumbnails/58.jpg)
요약하면
• 전체 집단에서는 질병과 유전정보에 연관 없다.
(RR=1.00)
• A 인종에서는 질병과 유전정보에 연관 있다. (RR=2.45)
• B 인종에서는 질병과 유전정보에 연관 있다. (RR=2.45)
• ? ? ?
![Page 59: Health Research Using Genomic Informationhosting03.snu.ac.kr/~hokim/seminar/health_genome20040922.pdf · • Figure3 NPL analysis of 25 families with bipolar disorder from the Portuguese](https://reader036.vdocuments.mx/reader036/viewer/2022070901/5f444735754c621d753d45a0/html5/thumbnails/59.jpg)
• 질병상태와 유전정보는 인종에 의해 혼란
(Confounding) 되고 있다
• 이러한 경우 올바른 자료의 분석을 위해서는인종은 질병상태와 유전정보와 함께 반드시 고려해야 한다. (성별을 혼란변수라고 부른다.)
• Population mixture 문제
![Page 60: Health Research Using Genomic Informationhosting03.snu.ac.kr/~hokim/seminar/health_genome20040922.pdf · • Figure3 NPL analysis of 25 families with bipolar disorder from the Portuguese](https://reader036.vdocuments.mx/reader036/viewer/2022070901/5f444735754c621d753d45a0/html5/thumbnails/60.jpg)
Time line of developments in human statistical genetics
Theory Technology Study design
![Page 61: Health Research Using Genomic Informationhosting03.snu.ac.kr/~hokim/seminar/health_genome20040922.pdf · • Figure3 NPL analysis of 25 families with bipolar disorder from the Portuguese](https://reader036.vdocuments.mx/reader036/viewer/2022070901/5f444735754c621d753d45a0/html5/thumbnails/61.jpg)
References• Clark (1990). Inference of haplotypes from PCR-amplified samples of diploid populations. Mol Bio Evol 7: 111-122
• Escoffier and Slatkin (1995). Maximum likelihood estimation of molecular haplotype frequencies in a diploid population. Mol Bio Evol 12: 921-927.
• Stephens, Smith, and Donnelly (2001) A new statistical method for haplotype reconstruction from population data. Am J Hum Genet 68, 978-989.
• Niu, Qin, Xu and Liu (2002) Bayesian haplotype inference for multiple linked single-nucleotide ploymorphisms. Am J Hum Genet 70;157-169
•Patil et al (2001) Science 294: 1719-1723
![Page 62: Health Research Using Genomic Informationhosting03.snu.ac.kr/~hokim/seminar/health_genome20040922.pdf · • Figure3 NPL analysis of 25 families with bipolar disorder from the Portuguese](https://reader036.vdocuments.mx/reader036/viewer/2022070901/5f444735754c621d753d45a0/html5/thumbnails/62.jpg)
•Escoffier and Slatkin (1995). Maximum likelihood estimation of molecular haplotype frequencies in a diploid population. Mol Bio Evol 12: 921-927.
• Stephens, Smith, and Donnelly (2001) A new statistical method for haplotype reconstruction from population data. Am J Hum Genet 68, 978-989.
• Niu, Qin, Xu and Liu (2002) Bayesian haplotype inference for multiple linked single-nucleotide ploymorphisms. Am J Hum Genet 70;157-169
•Toivonen et al. (2000) Data Mining Applied to Linkage Disequilibrium Mapping. AM J Hum Genet 67: 133-145
•Petteri Sevon, Hannu T.T. Toivonen, Vesa Ollikainen. TreeDT: Gene Mapping by Tree Disequilibrium Test. The Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2001), pp. 365-370. San Francisco, California, August 2001.
![Page 63: Health Research Using Genomic Informationhosting03.snu.ac.kr/~hokim/seminar/health_genome20040922.pdf · • Figure3 NPL analysis of 25 families with bipolar disorder from the Portuguese](https://reader036.vdocuments.mx/reader036/viewer/2022070901/5f444735754c621d753d45a0/html5/thumbnails/63.jpg)
• Wallenstein, Hodge, Weston (1998) Logistic regression model for analyzing extended haplotype data, Genet Epidemiol 15:173-181.
•Http://www.genome.helsinki.fi/eng/research/projects/DM/index.html
•ZHAOHUI S. QIN, TIANHUA NIU, JUN S. LIU (2002) Partition-Ligation–Expectation-Maximization Algorithm for Haplotype Inference with Single-Nucleotide Polymorphisms Am. J. Hum. Genet. 71:1242–1247, 2002
•Petteri Sevon, Vesa Ollikainen, Päivi Onkamo, Hannu Toivonen, Heikki Mannila, and Juha Kere.
•Johnson et al. (2001) Nat Genetics 29: 233-237
![Page 64: Health Research Using Genomic Informationhosting03.snu.ac.kr/~hokim/seminar/health_genome20040922.pdf · • Figure3 NPL analysis of 25 families with bipolar disorder from the Portuguese](https://reader036.vdocuments.mx/reader036/viewer/2022070901/5f444735754c621d753d45a0/html5/thumbnails/64.jpg)
인간의 건강
유전적 요인환경적 요인
사회적 요인
Environmental Epidemiology
Genetic Epidemiology
Social Epidemiology
상호작용
상호
작용 상
호작
용