genetics and molecular biology tutorial ii -- computational perspective the goal is to introduce...

Genetics and Molecular Biology Tutorial II -- Computational

Perspective

The goal is to introduce some topics to individuals with a minimal background in

genetics/biology, and yet try to provide some examples of topics to maintain the interest of individuals with extensive biological/genetics

backgrounds.

Outline Gene structure

– genomic structure vs mRNA structure– coding and noncoding exons– introns– primary transcript processing

aside -- nonsense mediated mRNA degradation

– alternative splicing and differential polyadenylation– evolutionary conservation of coding and

noncoding sequences

Outline… Genomic structure

– repetitive sequences LINES and SINES

– example -- Y chromosome palindromes– C value paradox– genomes of model organisms

example– yeast genome and gene-chip– single/double knockouts

– cross-species sequence similarities for putative function identification example -- “chaperonine”

Fundamental Genetics and Probability Concepts

meiosis and sampling patterns of inheritance monogenic and complex inheritance

– phenocopy– reduced penetrance

DNA variation– polymorphisms, SNPs, and mutations

positional cloning

Gene Structure

Transcript Processing

DNA -> pre-mRNA -> mRNA -> protein

Nonsense mediated mRNA degradation

– unknown mechanism– more rapidly degrades mRNA containing– Lykke-Andersen, “mRNA quality control:

Marking the message for life or death.” Current Biology, 11, 2001.

Nonsense Mediated mRNA Degradation

Genome Structure -- repeat classesClass (blocks) Size of

RepeatChr Locations

Megasatellite (100s ofkb)

several kb various locations

RS447 4.7 kb ~50-70 copies on 4, several on 8untitled 2.5 kb ~400 copies on 4 and 19untitled 3.0 kb ~50 copies on XSatellite (100kb to Mbs) 5-171 bp centromericalphoid 171 bp centromeric hetero all chrsSau3 A family 68 bp centromeric hetero 1 9 13 14 15 21

22 6satellite 1 (AT rich) 25-48 bp centromeric hetero most chrssatellites 2 and 3 5 bp most chrsMinisatellite (0.1-20 kb) 6-64 bp At or close to telomerestelomeric family 6 bp all telomereshypervariable family 9-64 bp all chrs, often near telomeresMicrosatellite (<150bp)

1-4 bp dispersed through all chromosomes

C-Value ParadoxHartl, “Molecular melodies in high and low C,” Nat. Rev. Genetics, Nov 20001

refers to the massive, counterintuitive and seemingly arbitrary differences in genome size observed in eukaryotic organisms– Drosophila melanogaster 180 Mb– Podisma pedestris 18,000 Mb– difference is difficult to explain in view of

apparently similar levels of evolutionary, developmental, and behavioral complexity

Alternative Splicing Every conceivable pattern of alternative

splicing is found in nature. Exons have multiple 5’ or 3’ splice sites alternatively used (a, b). Single cassette exons can reside between 2 constitutive exons such that alternative exon is either included or skipped ( c ). Multiple cassette exons can reside between 2 constitutive exons such that the splicing machinery must choose between them (d). Finally, introns can be retained in the mRNA and become translated.

Graveley, “Alternative splicing: increasing diversity in the proteomic world.” Trends in Genetics, Feb., 2001.

Classic View of Gene No Longer Valid -- Strachan pg 185

Mechanism Frequency/Examples

multigenic transcription units rare. 18S, 28S, and 5.8S rRNA,mitochondria

alternative promoters common. dystrophin gene (8)

alternative splicing very frequent. slo gene (8cassettes), >500 mRNAs

alternative polyadenylation common. calcitonin gene (2)

RNA editing extremely rare. apolipoprotein Bgene (tissue specific editing –codon changed)

post-translational cleavage rare. may generate functionallyrelated polypeptides – hormones.insuline

Alternative Splicing Example -- Graveley 2001

Alternative PolyAdenylation

common in human RNA (Edwards-Gilbert 1997)

in many genes, 2 or more poly-A signals in 3’ UTR– alternative transcripts can show tissue

specificity alternative poly-A signals may be brought

into play following alternative splicing

Edwards-Gilbert. Nucleic Acids Res, 13, 1997

Evolution of the mitochondrial genome and origin of eukaryotic cells

Evolutionary Conservation of Coding and Noncoding Sequences

Sequencing of H. sapiens and model organisms is basis for comparative genomics

Generally, functional solutions (encoded as genes) across organisms allows us to compare gene sequences and infer function

protein functional/structural region == “domains” Intergenic regions are generally not conserved

(always exceptions)

Example - MKKS (UniGene Clusters)

human rat 87.4 % human mouse 84.9 % human cow 87.1 % mouse rat 97.8 % rat cow 91.0% mouse cow 85.1 % frog rat 62.5 %

Example - MKKS

Computational Approach to Using Conserved Regions

Problem -- want to screen genes for mutations

Conventional approach -- screen all exons of a single gene

Alternative -- identify domains with in multiple genes, and screen domains first, to optimize screening time and resources

Cross-Species Similarities

yeast– gene chip for hybridization/expression– complete genome (first eukaryote)– singe knockouts and double knockouts

Fundamental Genetics

meiosis– Hs are diploid– meiosis produces haploid gametes– mechanism for transmission of genetic

material to offspring– recombination by cross-over (Holliday

structure) or by independent segregation of homologous pairs

Fundamental Genetics (Background for Linkage Analysis)

Rule of Segregation– offspring receive ONE allele (genetic material) from

the pair of alleles possessed by BOTH parents Rule of Independent Assortment

– alleles of one gene can segregate independently of alleles of other genes

– (Linkage Analysis relies on the violation of Independent Assortment Rule)

Genetic Marker … Prelude to LA– A genetic marker allows for the observation of

the genetic state at a particular genomic location (locus). A genotype is the measured state of a genetic marker. May never be feasible to sequence cases directly.

– An “informative” marker is often “heterozygous,” or “polymorphic” and enables the observation of the inheritance of genetic material.

Monogenic and Polygenic Diseases– monogenic (Mendelian) -- one gene

“simple” (dominant and recessive) Mendelian inheritance direct correspondence between one gene mutation and one

disorder majority of disease genes found are monogenic

– polygenic -- (complex) multiple genes heterogeneity and epistasis combinatorics no longer have direct correspondence between one gene and

disorder majority of disorders are probably polygenic

– complexity of organisms and observed pathways

...Mongenic and Polygenic Diseases

phenocopy reduced penetrance

– Example -- sickle cell anemia “classic” recessive disorder defect in red blood cells (hemoglobin) but… infant hemoglobin gene can “leak” wide range of phenotypes

Examples

Example

BBS4 Pedigree

Hardy-Weinberg Equilibrium

Rule that relates allelic and genotypic frequencies in a population of diploid, sexually reproducing individuals if that population has random mating, large size, no mutation or migration, and no selection

Assumptions– allelic frequencies will not change in a population from

one generation to the next– genotypic frequencies are determined in a predictable

way by allelic frequencies– the equilibrium is neutral -- if perturbed, it will reestablish

within one generation of random mating at the new allelic frequency

f(AA) = p2

f(Aa) = 2pq f(aa) = q2

(p+q)2

(p2 + q2 + r2 + 2pq + 2pr + 2qr)= (p+q+r)2

Dominant and Recessive Penetrance Modeled

penetrance = P(pt | gt)

DD Dd dd

0.9 0.9 0.0

DD Dd dd

0 0 0.8

D-R Heterogeneous, DD Epistatic

AA Aa aaBB 1 1 0Bb 1 1 0bb 1 1 1

reduced penetrance 3,9,27,81,243… 3n

AA Aa aaBB 1 1 0Bb 1 1 0bb 0 0 0

Dom-Rec Heterozygous

Screen genes A, B?, b

Uninformative Marker

Informative Marker

Given the following observations: family structure, affection status, genotypes, and disease allele frequencies. Assuming a model for the disease, can we calculate the probability that these observations “fit” an assumed model???

Linkage

Linkage Analysis

Goal: find a marker “linked” to a disease gene. LOD score = log of likelihood ratio LR[θ;data] == k P[data; θ] theta = estimate of genetic distance

(recombination fraction) between marker and disease

= proportion of recombinant gametes/total gametes

…Linkage Analysis Linkage analysis calculates the likelihood that

the inheritance pattern of the phenotype (disease) is supported by the observed inheritance patterns (genotypes) in a pedigree.

– few monogenic models, easy to test– more difficult to find models explaining inheritance

in polygenic models– parameter maximization

Linkage Analysis Programs

FASTLINK - 2 point– O(n2), where n = number of markers

GeneHunter - multipoint, 2 point– O(n2), where n = number of people

Allele Sharing

tries to show that affected family members inherit the same chromosomal regions more often than expected by chance

Allele Sharing Example

Needs at least sibs.

Association Studies

“Allelic association studies provide the most powerful method for locating genes of small effect contributing to complex diseases and traits.” Daniels, Am J Hum Genet 62:1189-1197, 1998.

Linkage analysis – genome wide screen, 400 markers ~ 10 cM (10 MB),

association needs 4000+ polymorphic markers– generally need nuclear family or larger

Association finds “linkage disequilibruim”

Association Studies

“Association is simply a statistical statement about the co-occurrence of alleles or phenotypes. Allele A is associated with disease D if people who have D also have A more (or maybe less) often than would be predicted from the individual frequencies of D and A in the population.” Pg. 286 Human Molecular Genetics 2, Tom Strachan

Examples HLA-DR4 (antigen marker)

– 36% in UK– 78% with rheumatoid arthritis

CF( RFLP markers XV2.c (X1,X2), KM19(K1,K2))

– Marker Alleles CF(case) Normal(control)

– X1, K1 3 49– X1, K2 147 19– X2, K1 8 70– X2, K2 8 25– CF associated with X1, K2 in ‘89 (Strachan)

Linkage Disequilibrium

linkage equilibrium (aka Hardy-Weinberg) is true if– P(gt1,gt1’;gt2,gt2’) = P(gt1,gt1’)*P(gt2,gt2’) where

[P(haplotype)] case vs controls TDT (heterozygous marker transmitted),

HRR (untransmitted alleles as control) allelic associations (outbred populations)

maintained at only <= 1cM

Equilibrium

“SNPs” Single-Nucleotide Polymorphisms 1 every 1000 bp (estimated) 2,972,052 SNPs submitted to dbSNP

– dbSNP summary link– 50% of all SNPs are in question– 10% of UTRs have SNPs

100,000 - 500,000 SNPs needed Why don’t we do this?

– $$$

Homozygosity Mapping

Positional Cloning

Disease Gene Identification

SSCP -- single strand conformational polymorphism

PCR -- polymerase chain reaction– primers amplify template sequence

direct sequencing

BBS2 (Bardet-Biedl Syndrome)

BBS2 genetic mapping

C16 1 2 3 4 5 6 7 8 9101112

BBS2 genetic mapping

C16 1 2 3 4 5 6 7 8 9101112

unaffectedaffected

BBS4 Gene (Direct Sequencing)(Hs.26471)

BBS4 Deletion (by PCR)

exons 3 4

BBS4 Mutations (direct sequencing)

(R295P)

Summary

Disease Gene Identification– challenges– interval localization

genotyping and genetic markers, linkage analysis, allele sharing, association studies (“SNiPs”), homozygosity mapping

– disease gene identification techniques Take home

– A complex disorder (with interacting genes) has yet to be characterized

Demo -- installing a database A database organizes data Most common

– relational database (oracle, sybase)– perceived as a collection of tables,– where table is an unordered collection of rows– each row has a fixed number of fields, and each field

can store a predefined type of data value (date, integer, string, etc.)

simplest– flat file

Databases

NCBI BLAST Amazon Yahoo Several of our own

– genotypes– rat ESTs– eye clones from differential display– micro-array data

This space intentionally left blank

genetics and molecular biology tutorial ii -- computational perspective the goal is to introduce...

Documents

04-0084-00 - biolegend. 4lbiolegend0 ... here, we introduce...

von minimal music zu minimal techno

minimal pairs and quasi-minimal degrees for joint...

recent advances on minimal systems and minimal...

genetics 101: demysifying genetics

monster mash-up of genetics - science4inquiry€¦ ·...

standard persyataran minimal & standard penampilan minimal

minimal nets and minimal minimal surfaces...

algorithm for finding domination set in intuitionistic...

genetics and variations genetics and variations the language...

genetics classic genetics – mendelian genetics

general genetics. 1. introduce the students to digest...

analytical model for axial fan performance … ·...

human genetics mendelian genetics

sc b-4.9 exemplify ways that introduce new genetic...

february, 2002 genetics 453 evolutionary genetics...

mendelian genetics. introduction to genetics introduction to...

human genetics evolutionary genetics

genetics: fundamentals of mendelian genetics classical...

standar persyataran minimal & standar penampilan minimal