comparative genomics: analysis of the mouse...
TRANSCRIPT
1
Comparative Genomics:Analysis of the Mouse Genome
Initial sequencing and comparativeanalysis of the mouse genome.
Mouse Genome Sequencing Consortium2002, Nature 420:520-562.
Mouse/human genome comparison
• Conservation of synteny: number ofchromosome rearrangements
• Repeats
• Evolution of orthologues. Ratio Ka/Ks
• Evolution of gene families
• Selection
2
Purpose
Highlights
• Genome 14% smaller than human (2.5Gb vs 2.9 Gb).
• 90% corresponds to regions of conserved synteny.
• 40% can be aligned at the nucleotide level.
• ~0.5 nucleotide substitutions per site since the divergence of the two species.
• 5% under purifying selection.
3
More highlights
• Various measures of divergence show substantial variation across the genome.
• 30,000 protein-coding genes.• Dozens of local gene expansions.• Estimation of the rate of protein evolution
in mammals. Certain classes of secreted proteins under positive selection.
• Marked differences in activity but similar types of repeat sequences.
• 80,000 SNP identified.
Divergence time
p.521
4
Sequencing strategy
p.522
5
88 mapped ultracontigs with N50 length = 50.6 Mb
6
Syntenic segments and syntenic blocks
7
8
Size distribution of segments and blockswith conserved synteny
betwwen mouse and human
24.046.433.538.6Total
1.03.00.40.9DNA
4.18.68.79.9LTR
10.713.67.68.2SINEs
7.921.016.519.2LINEs
Lineage specific
HumanLineage specific
MouseTEs
Composition of repeats in the mouse and human genome Fraction of lineage-specific repeats
Ancestral repeats 5% 22%
9
Twofold higher of nucleotide substitution rate in the mouse lineage
(estimated from comparison of ancestral repeats)
Human Mouse
0.17 substitutions per site
0.34 substitutions per site
Age distribution of interspersed repeats in the mouse genome
10
Pseudogenes in mouse genome: ~14.000. More than half processedpseudogenes.
Gapdh: 1 single functional gene and ~400 pseudogenes distriburedacross 19 of the mouse chromosomes.
11
Comparison of 12.845 1:1 orthologues
12
Evolution of Cytochrome P450 gene familiesin mouse
Changes in genome size
Human 2.9 Gb Mouse 2.5 Gb
Ancestor 2.9 Gb
Lineage-specific repeats + 900 Mb
Deletion -1.300 Mb
-----------------------------------------------
Net change - 400 Mb
Lineage-specific repeats + 700 Mb
Deletion - 700 Mb
-----------------------------------------------
Net change - 0 Mb
Expected proportion of the ancestral genome retained in both species
76% x 55% = 42%
13
Neutral substitution rate
• Ancestral repeat sequence.
66.7% nucletide identity 0.46-0.47 substitutions per site
• Fourfold degenerate sites in codons ofgenes
67% nucletide identity 0.46-0.47 substitutions per site
Example: n = 100; = 0.667 (genome-wide average); p = 0.8; S = 2.8
14
Proportion of mammalian genome underevolutionary selection for biological function
Sneutral
Sgenome
Sselected
20.8% of the windows are under selection
25.2% of human genomecontained in windows
20.8 x 25.2 = 5.25% ofgenome under selection
15
Proportion of genome under selection
p. 552
• 1.5% protein-coding regions of genes• 1% UTR of protein-coding genes• Regulatory regions that control gene-
expression• Non-protein coding RNAs (ncRNAs)• Chromosomal structural elements• Recent pseudogenes• Other??????
Proportion of genome under selection