genetics and molecular biology tutorial ii -- computational perspective the goal is to introduce...

Post on 15-Jan-2016

222 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Genetics and Molecular Biology Tutorial II -- Computational

Perspective

The goal is to introduce some topics to individuals with a minimal background in

genetics/biology, and yet try to provide some examples of topics to maintain the interest of individuals with extensive biological/genetics

backgrounds.

2

Outline Gene structure

– genomic structure vs mRNA structure– coding and noncoding exons– introns– primary transcript processing

aside -- nonsense mediated mRNA degradation

– alternative splicing and differential polyadenylation– evolutionary conservation of coding and

noncoding sequences

3

Outline… Genomic structure

– repetitive sequences LINES and SINES

– example -- Y chromosome palindromes– C value paradox– genomes of model organisms

example– yeast genome and gene-chip– single/double knockouts

– cross-species sequence similarities for putative function identification example -- “chaperonine”

4

Fundamental Genetics and Probability Concepts

meiosis and sampling patterns of inheritance monogenic and complex inheritance

– phenocopy– reduced penetrance

DNA variation– polymorphisms, SNPs, and mutations

positional cloning

5

Gene Structure

6

Transcript Processing

DNA -> pre-mRNA -> mRNA -> protein

7

Nonsense mediated mRNA degradation

– unknown mechanism– more rapidly degrades mRNA containing– Lykke-Andersen, “mRNA quality control:

Marking the message for life or death.” Current Biology, 11, 2001.

8

Nonsense Mediated mRNA Degradation

9

Genome Structure -- repeat classesClass (blocks) Size of

RepeatChr Locations

Megasatellite (100s ofkb)

several kb various locations

RS447 4.7 kb ~50-70 copies on 4, several on 8untitled 2.5 kb ~400 copies on 4 and 19untitled 3.0 kb ~50 copies on XSatellite (100kb to Mbs) 5-171 bp centromericalphoid 171 bp centromeric hetero all chrsSau3 A family 68 bp centromeric hetero 1 9 13 14 15 21

22 6satellite 1 (AT rich) 25-48 bp centromeric hetero most chrssatellites 2 and 3 5 bp most chrsMinisatellite (0.1-20 kb) 6-64 bp At or close to telomerestelomeric family 6 bp all telomereshypervariable family 9-64 bp all chrs, often near telomeresMicrosatellite (<150bp)

1-4 bp dispersed through all chromosomes

10

C-Value ParadoxHartl, “Molecular melodies in high and low C,” Nat. Rev. Genetics, Nov 20001

refers to the massive, counterintuitive and seemingly arbitrary differences in genome size observed in eukaryotic organisms– Drosophila melanogaster 180 Mb– Podisma pedestris 18,000 Mb– difference is difficult to explain in view of

apparently similar levels of evolutionary, developmental, and behavioral complexity

11

Alternative Splicing Every conceivable pattern of alternative

splicing is found in nature. Exons have multiple 5’ or 3’ splice sites alternatively used (a, b). Single cassette exons can reside between 2 constitutive exons such that alternative exon is either included or skipped ( c ). Multiple cassette exons can reside between 2 constitutive exons such that the splicing machinery must choose between them (d). Finally, introns can be retained in the mRNA and become translated.

Graveley, “Alternative splicing: increasing diversity in the proteomic world.” Trends in Genetics, Feb., 2001.

12

Classic View of Gene No Longer Valid -- Strachan pg 185

Mechanism Frequency/Examples

multigenic transcription units rare. 18S, 28S, and 5.8S rRNA,mitochondria

alternative promoters common. dystrophin gene (8)

alternative splicing very frequent. slo gene (8cassettes), >500 mRNAs

alternative polyadenylation common. calcitonin gene (2)

RNA editing extremely rare. apolipoprotein Bgene (tissue specific editing –codon changed)

post-translational cleavage rare. may generate functionallyrelated polypeptides – hormones.insuline

13

Alternative Splicing Example -- Graveley 2001

14

Alternative PolyAdenylation

common in human RNA (Edwards-Gilbert 1997)

in many genes, 2 or more poly-A signals in 3’ UTR– alternative transcripts can show tissue

specificity alternative poly-A signals may be brought

into play following alternative splicing

15

Edwards-Gilbert. Nucleic Acids Res, 13, 1997

16

Evolution of the mitochondrial genome and origin of eukaryotic cells

17

Evolutionary Conservation of Coding and Noncoding Sequences

Sequencing of H. sapiens and model organisms is basis for comparative genomics

Generally, functional solutions (encoded as genes) across organisms allows us to compare gene sequences and infer function

protein functional/structural region == “domains” Intergenic regions are generally not conserved

(always exceptions)

18

Example - MKKS (UniGene Clusters)

human rat 87.4 % human mouse 84.9 % human cow 87.1 % mouse rat 97.8 % rat cow 91.0% mouse cow 85.1 % frog rat 62.5 %

19

Example - MKKS

20

21

Computational Approach to Using Conserved Regions

Problem -- want to screen genes for mutations

Conventional approach -- screen all exons of a single gene

Alternative -- identify domains with in multiple genes, and screen domains first, to optimize screening time and resources

22

Cross-Species Similarities

yeast– gene chip for hybridization/expression– complete genome (first eukaryote)– singe knockouts and double knockouts

23

Fundamental Genetics

meiosis– Hs are diploid– meiosis produces haploid gametes– mechanism for transmission of genetic

material to offspring– recombination by cross-over (Holliday

structure) or by independent segregation of homologous pairs

24

Fundamental Genetics (Background for Linkage Analysis)

Rule of Segregation– offspring receive ONE allele (genetic material) from

the pair of alleles possessed by BOTH parents Rule of Independent Assortment

– alleles of one gene can segregate independently of alleles of other genes

– (Linkage Analysis relies on the violation of Independent Assortment Rule)

25

Genetic Marker … Prelude to LA– A genetic marker allows for the observation of

the genetic state at a particular genomic location (locus). A genotype is the measured state of a genetic marker. May never be feasible to sequence cases directly.

– An “informative” marker is often “heterozygous,” or “polymorphic” and enables the observation of the inheritance of genetic material.

26

Monogenic and Polygenic Diseases– monogenic (Mendelian) -- one gene

“simple” (dominant and recessive) Mendelian inheritance direct correspondence between one gene mutation and one

disorder majority of disease genes found are monogenic

– polygenic -- (complex) multiple genes heterogeneity and epistasis combinatorics no longer have direct correspondence between one gene and

disorder majority of disorders are probably polygenic

– complexity of organisms and observed pathways

27

...Mongenic and Polygenic Diseases

phenocopy reduced penetrance

– Example -- sickle cell anemia “classic” recessive disorder defect in red blood cells (hemoglobin) but… infant hemoglobin gene can “leak” wide range of phenotypes

28

Examples

29

Examples

30

Example

31

BBS4 Pedigree

32

Hardy-Weinberg Equilibrium

Rule that relates allelic and genotypic frequencies in a population of diploid, sexually reproducing individuals if that population has random mating, large size, no mutation or migration, and no selection

Assumptions– allelic frequencies will not change in a population from

one generation to the next– genotypic frequencies are determined in a predictable

way by allelic frequencies– the equilibrium is neutral -- if perturbed, it will reestablish

within one generation of random mating at the new allelic frequency

33

34

H-W

f(AA) = p2

f(Aa) = 2pq f(aa) = q2

(p+q)2

(p2 + q2 + r2 + 2pq + 2pr + 2qr)= (p+q+r)2

35

Dominant and Recessive Penetrance Modeled

penetrance = P(pt | gt)

DD Dd dd

1 1 0

DD Dd dd

0.9 0.9 0.0

DD Dd dd

0 0 1

DD Dd dd

0 0 0.8

36

D-R Heterogeneous, DD Epistatic

AA Aa aaBB 1 1 0Bb 1 1 0bb 1 1 1

reduced penetrance 3,9,27,81,243… 3n

AA Aa aaBB 1 1 0Bb 1 1 0bb 0 0 0

37

Dom-Rec Heterozygous

Screen genes A, B?, b

38

Uninformative Marker

39

Informative Marker

40

Given the following observations: family structure, affection status, genotypes, and disease allele frequencies. Assuming a model for the disease, can we calculate the probability that these observations “fit” an assumed model???

41

Linkage

42

Linkage Analysis

Goal: find a marker “linked” to a disease gene. LOD score = log of likelihood ratio LR[θ;data] == k P[data; θ] theta = estimate of genetic distance

(recombination fraction) between marker and disease

= proportion of recombinant gametes/total gametes

43

…Linkage Analysis Linkage analysis calculates the likelihood that

the inheritance pattern of the phenotype (disease) is supported by the observed inheritance patterns (genotypes) in a pedigree.

– few monogenic models, easy to test– more difficult to find models explaining inheritance

in polygenic models– parameter maximization

44

Linkage Analysis Programs

FASTLINK - 2 point– O(n2), where n = number of markers

GeneHunter - multipoint, 2 point– O(n2), where n = number of people

45

Allele Sharing

tries to show that affected family members inherit the same chromosomal regions more often than expected by chance

46

Allele Sharing Example

Needs at least sibs.

47

Association Studies

“Allelic association studies provide the most powerful method for locating genes of small effect contributing to complex diseases and traits.” Daniels, Am J Hum Genet 62:1189-1197, 1998.

Linkage analysis – genome wide screen, 400 markers ~ 10 cM (10 MB),

association needs 4000+ polymorphic markers– generally need nuclear family or larger

Association finds “linkage disequilibruim”

48

Association Studies

“Association is simply a statistical statement about the co-occurrence of alleles or phenotypes. Allele A is associated with disease D if people who have D also have A more (or maybe less) often than would be predicted from the individual frequencies of D and A in the population.” Pg. 286 Human Molecular Genetics 2, Tom Strachan

49

Examples HLA-DR4 (antigen marker)

– 36% in UK– 78% with rheumatoid arthritis

CF( RFLP markers XV2.c (X1,X2), KM19(K1,K2))

– Marker Alleles CF(case) Normal(control)

– X1, K1 3 49– X1, K2 147 19– X2, K1 8 70– X2, K2 8 25– CF associated with X1, K2 in ‘89 (Strachan)

50

Linkage Disequilibrium

linkage equilibrium (aka Hardy-Weinberg) is true if– P(gt1,gt1’;gt2,gt2’) = P(gt1,gt1’)*P(gt2,gt2’) where

[P(haplotype)] case vs controls TDT (heterozygous marker transmitted),

HRR (untransmitted alleles as control) allelic associations (outbred populations)

maintained at only <= 1cM

51

Equilibrium

52

“SNPs” Single-Nucleotide Polymorphisms 1 every 1000 bp (estimated) 2,972,052 SNPs submitted to dbSNP

– dbSNP summary link– 50% of all SNPs are in question– 10% of UTRs have SNPs

100,000 - 500,000 SNPs needed Why don’t we do this?

– $$$

53

Homozygosity Mapping

54

Positional Cloning

55

Disease Gene Identification

SSCP -- single strand conformational polymorphism

PCR -- polymerase chain reaction– primers amplify template sequence

direct sequencing

BBS2 (Bardet-Biedl Syndrome)

56

BBS2 genetic mapping

C16 1 2 3 4 5 6 7 8 9101112

57

BBS2 genetic mapping

C16 1 2 3 4 5 6 7 8 9101112

unaffectedaffected

58

BBS4 Gene (Direct Sequencing)(Hs.26471)

59

BBS4 Deletion (by PCR)

exons 3 4

60

BBS4 Mutations (direct sequencing)

(R295P)

61

Summary

Disease Gene Identification– challenges– interval localization

genotyping and genetic markers, linkage analysis, allele sharing, association studies (“SNiPs”), homozygosity mapping

– disease gene identification techniques Take home

– A complex disorder (with interacting genes) has yet to be characterized

62

Demo -- installing a database A database organizes data Most common

– relational database (oracle, sybase)– perceived as a collection of tables,– where table is an unordered collection of rows– each row has a fixed number of fields, and each field

can store a predefined type of data value (date, integer, string, etc.)

simplest– flat file

63

Databases

NCBI BLAST Amazon Yahoo Several of our own

– genotypes– rat ESTs– eye clones from differential display– micro-array data

64

This space intentionally left blank

top related