biostatistics 666 statistical models in human genetics · biostatistics 666 statistical models in...

37
Biostatistics 666 Statistical Models in Statistical Models in Human Genetics Instructor Gonçalo Abecasis Gonçalo Abecasis

Upload: others

Post on 16-Mar-2020

35 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Biostatistics 666 Statistical Models in Human Genetics · Biostatistics 666 Statistical Models in Human Genetics Instructor Gonçalo AbecasisGonçalo Abecasis. Course Logistics Grading

Biostatistics 666Statistical Models in Statistical Models in

Human Genetics

InstructorGonçalo AbecasisGonçalo Abecasis

Page 2: Biostatistics 666 Statistical Models in Human Genetics · Biostatistics 666 Statistical Models in Human Genetics Instructor Gonçalo AbecasisGonçalo Abecasis. Course Logistics Grading

Course LogisticsCourse Logistics

GradingGrading Office HoursClass Notes

Page 3: Biostatistics 666 Statistical Models in Human Genetics · Biostatistics 666 Statistical Models in Human Genetics Instructor Gonçalo AbecasisGonçalo Abecasis. Course Logistics Grading

Course Objective

Provide an understanding of statistical gmodels used in gene mapping studies

Survey commonly used algorithms and procedures in genetic analysisprocedures in genetic analysis

Page 4: Biostatistics 666 Statistical Models in Human Genetics · Biostatistics 666 Statistical Models in Human Genetics Instructor Gonçalo AbecasisGonçalo Abecasis. Course Logistics Grading

Assessment

Weekly Assignmentsy g• About 60% of the final mark

2 Half Term Assessments• About 40% of the final markAbout 40% of the final mark

Page 5: Biostatistics 666 Statistical Models in Human Genetics · Biostatistics 666 Statistical Models in Human Genetics Instructor Gonçalo AbecasisGonçalo Abecasis. Course Logistics Grading

Academic IntegrityAll i d i di id l All assignments are made on an individual basis – the solutions you provide must represent your own work.p y

Cheating, plagiarism and aiding and b ti th t tit t d iabeting these acts constitutes academic

misconduct and is a serious offense.

See also the School policy on academic conduct.

Page 6: Biostatistics 666 Statistical Models in Human Genetics · Biostatistics 666 Statistical Models in Human Genetics Instructor Gonçalo AbecasisGonçalo Abecasis. Course Logistics Grading

Office Hours

Please cross out times for which you are yunavailable in the sheet going around

Room M4132School of Public Health II

Page 7: Biostatistics 666 Statistical Models in Human Genetics · Biostatistics 666 Statistical Models in Human Genetics Instructor Gonçalo AbecasisGonçalo Abecasis. Course Logistics Grading

Class Website

PDF versions of notes and problem sets

http://genome.sph.umich.edu/wiki/666p g p

It is a wiki, so you should be able to add , ycomments, notes and discussion to each lecture.

Page 8: Biostatistics 666 Statistical Models in Human Genetics · Biostatistics 666 Statistical Models in Human Genetics Instructor Gonçalo AbecasisGonçalo Abecasis. Course Logistics Grading

Course ContentsCourse Contents

Brief Overview

Page 9: Biostatistics 666 Statistical Models in Human Genetics · Biostatistics 666 Statistical Models in Human Genetics Instructor Gonçalo AbecasisGonçalo Abecasis. Course Logistics Grading

Genetic Mapping

“Compares the inheritance patternof a trait with the inheritance pattern

of chromosomal regions”

Positional Cloning“Allows one to find where a gene isAllows one to find where a gene is,

without knowing what it is.”

Page 10: Biostatistics 666 Statistical Models in Human Genetics · Biostatistics 666 Statistical Models in Human Genetics Instructor Gonçalo AbecasisGonçalo Abecasis. Course Logistics Grading

Some of the Topics Covered

Maximum Likelihood

Modeling Genes in Populations Modeling Genes in Populations

M d li G i P di Modeling Genes in Pedigrees

Page 11: Biostatistics 666 Statistical Models in Human Genetics · Biostatistics 666 Statistical Models in Human Genetics Instructor Gonçalo AbecasisGonçalo Abecasis. Course Logistics Grading

Modeling Modeling Genes in Populations Hardy Weinberg Equilibrium

Linkage Disequilibrium Linkage Disequilibrium

The Coalescent The Coalescent

Methods for Haplotypingp yp g

Methods for Handling Shotgun Sequence Data

Page 12: Biostatistics 666 Statistical Models in Human Genetics · Biostatistics 666 Statistical Models in Human Genetics Instructor Gonçalo AbecasisGonçalo Abecasis. Course Logistics Grading

Modeling Modeling Genes within Pedigrees

El S l i h Elston-Stewart algorithm

Lander-Green algorithm Lander-Green algorithm

Checking Genetic Data for Errorsg

Genetic Linkage Tests

Genetic Association Tests

Page 13: Biostatistics 666 Statistical Models in Human Genetics · Biostatistics 666 Statistical Models in Human Genetics Instructor Gonçalo AbecasisGonçalo Abecasis. Course Logistics Grading

Let’s Get Started!Let’s Get Started!

The Basics

Page 14: Biostatistics 666 Statistical Models in Human Genetics · Biostatistics 666 Statistical Models in Human Genetics Instructor Gonçalo AbecasisGonçalo Abecasis. Course Logistics Grading

Today – Primer In Genetics

How information is stored in DNA

How DNA is inherited

Types of DNA variation

Common designs for Genetic studies

Page 15: Biostatistics 666 Statistical Models in Human Genetics · Biostatistics 666 Statistical Models in Human Genetics Instructor Gonçalo AbecasisGonçalo Abecasis. Course Logistics Grading

DNA – Information Store

Encodes the information required for cells and organisms to function and produce new cells and organisms.

DNA variation is responsible for many individual differences, some of which are ,medically important.

Page 16: Biostatistics 666 Statistical Models in Human Genetics · Biostatistics 666 Statistical Models in Human Genetics Instructor Gonçalo AbecasisGonçalo Abecasis. Course Logistics Grading

Human Genome

Multiple chromosomes• 22 autosomes

• P t i 2 i i di id l• Present in 2 copies per individual• One maternally and one paternally inherited copy

• 1 pair of sex chromosomes• Females have two X chromosomes• Males have one X chromosome and one Y chromosome

Total of ~3 x 109 bases (each A, C, T or G)

Page 17: Biostatistics 666 Statistical Models in Human Genetics · Biostatistics 666 Statistical Models in Human Genetics Instructor Gonçalo AbecasisGonçalo Abecasis. Course Logistics Grading

Inheritance of DNATh h bi i “DNA i ” i f d b Through recombination, a new “DNA string” is formed by combining two parental DNA strings

Th h h i i f th t Thus, each chromosome we carry is a mosaic of the two chromosomes carried by our parents

Only a small number of changeovers between the two Only a small number of changeovers between the two parental chromosomes • On average ~1 per Morgan (~108 bases)

Copying of DNA sequences is imperfect and, for typical sequences, the error rate is about 1 per 108 bases copied

Page 18: Biostatistics 666 Statistical Models in Human Genetics · Biostatistics 666 Statistical Models in Human Genetics Instructor Gonçalo AbecasisGonçalo Abecasis. Course Logistics Grading

Human Variation

Every chromosome is unique …

… but when two chromosomes are compared most of their sequence is identicalq

About 1 per 1,000 bases differs between pairs of human chromosomes

Page 19: Biostatistics 666 Statistical Models in Human Genetics · Biostatistics 666 Statistical Models in Human Genetics Instructor Gonçalo AbecasisGonçalo Abecasis. Course Logistics Grading

DNA Sequences That Vary… Genes (protein coding sequences, which total <2% of all DNA)

• ~20,000-25,000 in humans

P d Pseudogenes• Ancient genes, inactivated through mutation

Promoters and Enhancers Promoters and Enhancers• Sequences which control gene expression

Repeat DNA Repeat DNA• Often more variable than other types of sequences• Useful for tracking DNA through families or populations

Packaging sequences, “spacer” DNA, etc.

Page 20: Biostatistics 666 Statistical Models in Human Genetics · Biostatistics 666 Statistical Models in Human Genetics Instructor Gonçalo AbecasisGonçalo Abecasis. Course Logistics Grading

Important Vocabulary … Locus Polymorphism

Allele

Phenotype• Mendelian Traits• Complex Traits Allele

Mutation Linkage

Complex Traits Chromosomal landmarks

• CentromeresLinkage Genetic Marker Genotype

• Telomeres Gene RNA RNA Protein

Page 21: Biostatistics 666 Statistical Models in Human Genetics · Biostatistics 666 Statistical Models in Human Genetics Instructor Gonçalo AbecasisGonçalo Abecasis. Course Logistics Grading

Data for a Genetic Study

Pedigree• Set of individuals of known relationship

Observed marker genotypes Observed marker genotypes• SNPs, VNTRs, microsatellites

Phenotype data for individuals

Page 22: Biostatistics 666 Statistical Models in Human Genetics · Biostatistics 666 Statistical Models in Human Genetics Instructor Gonçalo AbecasisGonçalo Abecasis. Course Logistics Grading

Genetic Markers Genetic variants that can be measured conveniently

Typically we characterize them by: Typically, we characterize them by:• Number of Alleles• Frequency of Each Allele

• These are summarized by the heterozygosity

The most commonly used genetic markers are microsatellites and SNPs

Page 23: Biostatistics 666 Statistical Models in Human Genetics · Biostatistics 666 Statistical Models in Human Genetics Instructor Gonçalo AbecasisGonçalo Abecasis. Course Logistics Grading

Phenotypes

Measured characters of individuals

Mendelian Phenotypes• Completely determined by genes• e.g. Cystic Fibrosis, Retinoblastoma

Complex Phenotypes• Controlled by multiple genes and environmental factors• eg. Diabetes, Inflammatory Bowel Disease

Page 24: Biostatistics 666 Statistical Models in Human Genetics · Biostatistics 666 Statistical Models in Human Genetics Instructor Gonçalo AbecasisGonçalo Abecasis. Course Logistics Grading

Ultimate Aim of Gene-Mapping Ultimate Aim of Gene Mapping Experiments

Localize and identify variants that control interesting traitsg• Susceptibility to human disease• Phenotypic variation in the populationy

The difficulty The difficulty…• Testing several million variants is impractical…

Page 25: Biostatistics 666 Statistical Models in Human Genetics · Biostatistics 666 Statistical Models in Human Genetics Instructor Gonçalo AbecasisGonçalo Abecasis. Course Logistics Grading

3 Common Questions

Are there “genes” influencing this trait?• Epidemiological studies

Where are those “genes”?• Li k l i• Linkage analysis

What are those “genes”? What are those genes ?• Association analysis

Page 26: Biostatistics 666 Statistical Models in Human Genetics · Biostatistics 666 Statistical Models in Human Genetics Instructor Gonçalo AbecasisGonçalo Abecasis. Course Logistics Grading

Is a trait genetic?

Examine distribution of trait in the population and among relatives

E.g. Inflammatory Bowel Disease (Crohn’s)• G l l ti• General population

• 1-3 cases per 1,000 individuals• Twins of affected individuals

• 44% of monozygotic twins also have Crohn’s• 3.8% of dizygotic twins also have Crohn’s

Page 27: Biostatistics 666 Statistical Models in Human Genetics · Biostatistics 666 Statistical Models in Human Genetics Instructor Gonçalo AbecasisGonçalo Abecasis. Course Logistics Grading

Where are those genes?

Find genetic markers that co-segregate with disease

E.g. D16S3136 E.g. D16S3136co-segregateswith Crohn’s

Page 28: Biostatistics 666 Statistical Models in Human Genetics · Biostatistics 666 Statistical Models in Human Genetics Instructor Gonçalo AbecasisGonçalo Abecasis. Course Logistics Grading

What are those genes? Identify genetic variants that are associated

with disease…

E.g. Mutations which disrupt NOD2 are much more common in Crohn’s patientsp

Crohn’s Controls• Arg702Trp: 11% 4%• Arg702Trp: 11% 4%• Gly908Arg: 4% 2%• Leu1007fs 8% 4%

Page 29: Biostatistics 666 Statistical Models in Human Genetics · Biostatistics 666 Statistical Models in Human Genetics Instructor Gonçalo AbecasisGonçalo Abecasis. Course Logistics Grading

Common Designs for Common Designs for Genetic Studies

Parametric Linkage analysis

Allele-sharing methods

Association analysis

Page 30: Biostatistics 666 Statistical Models in Human Genetics · Biostatistics 666 Statistical Models in Human Genetics Instructor Gonçalo AbecasisGonçalo Abecasis. Course Logistics Grading

Parametric Linkage Analysis Evaluate a specific model and location

• Allele frequencies at disease loci• Probability of disease for each genotype

Potentially very powerful Vulnerable to heterogeneity, model misspecification

Page 31: Biostatistics 666 Statistical Models in Human Genetics · Biostatistics 666 Statistical Models in Human Genetics Instructor Gonçalo AbecasisGonçalo Abecasis. Course Logistics Grading

Allele Sharing Analysis

Reject null hypothesis that sharing is random at a particular region

Less powerful, but more robust Quantitative trait extensions exist

Page 32: Biostatistics 666 Statistical Models in Human Genetics · Biostatistics 666 Statistical Models in Human Genetics Instructor Gonçalo AbecasisGonçalo Abecasis. Course Logistics Grading

Association Analysis Simplest case compares frequency of allele among

cases and controls Genome-wide search requires hundreds of thousands q

of markers Typically, focuses on candidate genes

Page 33: Biostatistics 666 Statistical Models in Human Genetics · Biostatistics 666 Statistical Models in Human Genetics Instructor Gonçalo AbecasisGonçalo Abecasis. Course Logistics Grading

Which Design to Choose?The Right Choice Depends on the Alleles We Seek…

cte

of e

ffec

Rare, high penetrancemutations – use linkage

C l

agni

tude Common, low penetrance

variants – use association

M

Frequency in population

Page 34: Biostatistics 666 Statistical Models in Human Genetics · Biostatistics 666 Statistical Models in Human Genetics Instructor Gonçalo AbecasisGonçalo Abecasis. Course Logistics Grading

Genetic Linkage Studies Identify variants with relatively large contributions to

disease risk

Require only a coarse measurement of genetic variation• 400 – 800 microsatellites can extract most of the linkage

information in typical pedigrees• Until recently, the only option for conducting whole genome

studies

High-throughput SNP genotyping has already sped up and facilitated these studies• Data analysis methods must select subset of independent

SNPs or model disequilibrium between markers

Page 35: Biostatistics 666 Statistical Models in Human Genetics · Biostatistics 666 Statistical Models in Human Genetics Instructor Gonçalo AbecasisGonçalo Abecasis. Course Logistics Grading

Genetic Association StudiesId if i i i h l i l ll Identify genetic variants with relatively small individual contributions to disease risk

Require detailed measurement of genetic variation• 10 000 000 t l d ti i t• > 10,000,000 catalogued genetic variants, so …• Until recently, limited to candidate genes or regions

• A hit-and-miss approach…

SNP resources and decreasing assay costs now make it possible to examine 100,000s of p ,markers

Page 36: Biostatistics 666 Statistical Models in Human Genetics · Biostatistics 666 Statistical Models in Human Genetics Instructor Gonçalo AbecasisGonçalo Abecasis. Course Logistics Grading

Recommended Reading

An introduction to important issues in genetics:

• Lander and Schork (1994)Science 265:2037-48

A good reference on molecular genetics:

• Human Molecular GeneticsTom Strachan and Andrew Read

Page 37: Biostatistics 666 Statistical Models in Human Genetics · Biostatistics 666 Statistical Models in Human Genetics Instructor Gonçalo AbecasisGonçalo Abecasis. Course Logistics Grading

Reading for Next Lecture Will be discussing Hardy-Weinberg equilibrium

• A basic feature of genotypes in human populations

Wigginton, Cutler, Abecasis (2005)A note on exact tests of Hardy-Weinberg equilibrium.Am J Hum Genet 76:887-93

This paper describes an efficient method for testing Hardy-Weinberg equilibrium and includes many importantHardy Weinberg equilibrium and includes many important historical references