http://cs273a.stanford.edu [bejerano fall11/12] 1 primer friday 10am beckman b-302 introduction to...
TRANSCRIPT
![Page 1: Http://cs273a.stanford.edu [Bejerano Fall11/12] 1 Primer Friday 10am Beckman B-302 Introduction to the UCSC Browser](https://reader034.vdocuments.mx/reader034/viewer/2022042616/56649dff5503460f94ae851f/html5/thumbnails/1.jpg)
http://cs273a.stanford.edu [Bejerano Fall11/12] 1
Primer Friday 10am Beckman B-302
Introduction to the UCSC Browser.
![Page 2: Http://cs273a.stanford.edu [Bejerano Fall11/12] 1 Primer Friday 10am Beckman B-302 Introduction to the UCSC Browser](https://reader034.vdocuments.mx/reader034/viewer/2022042616/56649dff5503460f94ae851f/html5/thumbnails/2.jpg)
http://cs273a.stanford.edu [Bejerano Fall11/12] 2
Lecture 6
Genome Evolution
Chromosomal Mutations
Paralogy & Orthology
Chains & Nets
![Page 3: Http://cs273a.stanford.edu [Bejerano Fall11/12] 1 Primer Friday 10am Beckman B-302 Introduction to the UCSC Browser](https://reader034.vdocuments.mx/reader034/viewer/2022042616/56649dff5503460f94ae851f/html5/thumbnails/3.jpg)
http://cs273a.stanford.edu [Bejerano Fall11/12] 3
One Cell, One Genome, One Replication
Every cell holds a copy of all its DNA = its genome.
The human body is made of ~1013 cells.
All originate from a single cell through repeated cell divisions.
cell
genome =
all DNA
chicken ≈ 1013 copies(DNA) of egg (DNA)
chicken
eggegg
egg
cell
division
DNA strings =
Chromosomes
![Page 4: Http://cs273a.stanford.edu [Bejerano Fall11/12] 1 Primer Friday 10am Beckman B-302 Introduction to the UCSC Browser](https://reader034.vdocuments.mx/reader034/viewer/2022042616/56649dff5503460f94ae851f/html5/thumbnails/4.jpg)
Mutation Rate per bp
• 10-9 per base pair per cell division
• This refers to mutations that are not repaired
• Thus, there are at least six new mutations in each kid that were not present in either parent
• Mutations range from the smallest possible (single base pair change) to the largest – whole genome duplication.
• Selection does not tolerate all of these mutation, but it sure does tolerate some.
chicken
egg
chicken
4
![Page 5: Http://cs273a.stanford.edu [Bejerano Fall11/12] 1 Primer Friday 10am Beckman B-302 Introduction to the UCSC Browser](https://reader034.vdocuments.mx/reader034/viewer/2022042616/56649dff5503460f94ae851f/html5/thumbnails/5.jpg)
5
Example: Human-Chimp Genomic DifferencesN
umbe
r of
eve
nts
Nucleotid
e substi
tutions
Indels < 10 K
b
Microinve
rsions <
100 Kb
Deletions/D
uplicatio
ns
Microinve
rsions >
100 Kb
Pericentric
inve
rsions
Fusion
1%
3%
Open question..
![Page 6: Http://cs273a.stanford.edu [Bejerano Fall11/12] 1 Primer Friday 10am Beckman B-302 Introduction to the UCSC Browser](https://reader034.vdocuments.mx/reader034/viewer/2022042616/56649dff5503460f94ae851f/html5/thumbnails/6.jpg)
Chromosomal (ie Big) Mutations
• May Involve:– Changing
the structure of a chromosome
– The loss or gain of part of a chromosome
![Page 7: Http://cs273a.stanford.edu [Bejerano Fall11/12] 1 Primer Friday 10am Beckman B-302 Introduction to the UCSC Browser](https://reader034.vdocuments.mx/reader034/viewer/2022042616/56649dff5503460f94ae851f/html5/thumbnails/7.jpg)
Chromosome Mutations
• Five types exist:–Deletion– Inversion–Translocation–Nondisjunction
–Duplication
![Page 8: Http://cs273a.stanford.edu [Bejerano Fall11/12] 1 Primer Friday 10am Beckman B-302 Introduction to the UCSC Browser](https://reader034.vdocuments.mx/reader034/viewer/2022042616/56649dff5503460f94ae851f/html5/thumbnails/8.jpg)
Deletion
• Due to breakage• A piece of a
chromosome is lost
![Page 9: Http://cs273a.stanford.edu [Bejerano Fall11/12] 1 Primer Friday 10am Beckman B-302 Introduction to the UCSC Browser](https://reader034.vdocuments.mx/reader034/viewer/2022042616/56649dff5503460f94ae851f/html5/thumbnails/9.jpg)
Inversion• Chromosome segment
breaks off• Segment flips around
backwards• Segment reattaches
![Page 10: Http://cs273a.stanford.edu [Bejerano Fall11/12] 1 Primer Friday 10am Beckman B-302 Introduction to the UCSC Browser](https://reader034.vdocuments.mx/reader034/viewer/2022042616/56649dff5503460f94ae851f/html5/thumbnails/10.jpg)
Duplication
• Occurs when a genomic region is repeated
![Page 11: Http://cs273a.stanford.edu [Bejerano Fall11/12] 1 Primer Friday 10am Beckman B-302 Introduction to the UCSC Browser](https://reader034.vdocuments.mx/reader034/viewer/2022042616/56649dff5503460f94ae851f/html5/thumbnails/11.jpg)
Whole Genome Duplication at the Base of the Vertebrate Tree
http://cs273a.stanford.edu [Bejerano Fall11/12] 11
Xen.Laevis WGD
![Page 12: Http://cs273a.stanford.edu [Bejerano Fall11/12] 1 Primer Friday 10am Beckman B-302 Introduction to the UCSC Browser](https://reader034.vdocuments.mx/reader034/viewer/2022042616/56649dff5503460f94ae851f/html5/thumbnails/12.jpg)
Translocation
• Involves two chromosomes that aren’t homologous
•Part of one chromosome is transferred to another chromosomes
![Page 13: Http://cs273a.stanford.edu [Bejerano Fall11/12] 1 Primer Friday 10am Beckman B-302 Introduction to the UCSC Browser](https://reader034.vdocuments.mx/reader034/viewer/2022042616/56649dff5503460f94ae851f/html5/thumbnails/13.jpg)
Nondisjunction• Failure of chromosomes to
separate during meiosis• Causes gamete to have too many
or too few chromosomes• Disorders:
– Down Syndrome – three 21st chromosomes
– Turner Syndrome – single X chromosome– Klinefelter’s Syndrome – XXY
chromosomes
![Page 14: Http://cs273a.stanford.edu [Bejerano Fall11/12] 1 Primer Friday 10am Beckman B-302 Introduction to the UCSC Browser](https://reader034.vdocuments.mx/reader034/viewer/2022042616/56649dff5503460f94ae851f/html5/thumbnails/14.jpg)
Chromosome Mutation Animation
![Page 15: Http://cs273a.stanford.edu [Bejerano Fall11/12] 1 Primer Friday 10am Beckman B-302 Introduction to the UCSC Browser](https://reader034.vdocuments.mx/reader034/viewer/2022042616/56649dff5503460f94ae851f/html5/thumbnails/15.jpg)
The Species Tree
How to infer a species tree?•Phenotype
• Phenotypic characters• Inc. fossil evidence
•Genotype• Molecular Evolution• Inc. Mobile Elements
![Page 16: Http://cs273a.stanford.edu [Bejerano Fall11/12] 1 Primer Friday 10am Beckman B-302 Introduction to the UCSC Browser](https://reader034.vdocuments.mx/reader034/viewer/2022042616/56649dff5503460f94ae851f/html5/thumbnails/16.jpg)
The Species Tree
Sampled Genomes
S
S
S SpeciationTime
![Page 17: Http://cs273a.stanford.edu [Bejerano Fall11/12] 1 Primer Friday 10am Beckman B-302 Introduction to the UCSC Browser](https://reader034.vdocuments.mx/reader034/viewer/2022042616/56649dff5503460f94ae851f/html5/thumbnails/17.jpg)
17
The Species Tree
Sampled Genomes
S
S
S SpeciationTime
![Page 18: Http://cs273a.stanford.edu [Bejerano Fall11/12] 1 Primer Friday 10am Beckman B-302 Introduction to the UCSC Browser](https://reader034.vdocuments.mx/reader034/viewer/2022042616/56649dff5503460f94ae851f/html5/thumbnails/18.jpg)
18
A Gene tree evolves with respect to a Species tree
Species tree
Gene tree
SpeciationSpeciationDuplicationLoss
![Page 19: Http://cs273a.stanford.edu [Bejerano Fall11/12] 1 Primer Friday 10am Beckman B-302 Introduction to the UCSC Browser](https://reader034.vdocuments.mx/reader034/viewer/2022042616/56649dff5503460f94ae851f/html5/thumbnails/19.jpg)
http://cs273a.stanford.edu [Bejerano Fall11/12] 19
Terminology
Orthologs : Genes related via speciation (e.g. C,M,H3)
Paralogs: Genes related through duplication (e.g. H1,H2,H3)
Homologs: Genes that share a common origin (e.g. C,M,H1,H2,H3)
Species tree
Gene tree
SpeciationSpeciationDuplicationLoss
single
ancestral
gene
![Page 20: Http://cs273a.stanford.edu [Bejerano Fall11/12] 1 Primer Friday 10am Beckman B-302 Introduction to the UCSC Browser](https://reader034.vdocuments.mx/reader034/viewer/2022042616/56649dff5503460f94ae851f/html5/thumbnails/20.jpg)
http://cs273a.stanford.edu [Bejerano Fall11/12] 20
Gene trees and even species trees are figments of our (scientific) imagination
Species trees and gene trees can be wrong.
All we really have are extant observations, and fossils.
Species tree
Gene tree
SpeciationSpeciationDuplicationLoss
single
ancestral
gene
ObservedInferred
![Page 21: Http://cs273a.stanford.edu [Bejerano Fall11/12] 1 Primer Friday 10am Beckman B-302 Introduction to the UCSC Browser](https://reader034.vdocuments.mx/reader034/viewer/2022042616/56649dff5503460f94ae851f/html5/thumbnails/21.jpg)
Gene Families
http://www.ncbi.nlm.nih.gov/Education/BLASTinfo/orthologs3.gif21
![Page 22: Http://cs273a.stanford.edu [Bejerano Fall11/12] 1 Primer Friday 10am Beckman B-302 Introduction to the UCSC Browser](https://reader034.vdocuments.mx/reader034/viewer/2022042616/56649dff5503460f94ae851f/html5/thumbnails/22.jpg)
Gu et al. Age distribution of human gene families shows significant roles of both large-scale and small-scale duplication in vertebrate evolution (2002) Nature Genetics 31; 205-208
22
![Page 23: Http://cs273a.stanford.edu [Bejerano Fall11/12] 1 Primer Friday 10am Beckman B-302 Introduction to the UCSC Browser](https://reader034.vdocuments.mx/reader034/viewer/2022042616/56649dff5503460f94ae851f/html5/thumbnails/23.jpg)
http://cs273a.stanford.edu [Bejerano Fall11/12] 23
Chaining Alignments
Chaining highlights homologous regions between genomes (it bridges the gulf between syntenic blocks and base-by-base alignments.
Local alignments tend to break at transposon insertions, inversions, duplications, etc.
Global alignments tend to force non-homologous bases to align.
Chaining is a rigorous way of joining together local alignments into larger structures.
![Page 24: Http://cs273a.stanford.edu [Bejerano Fall11/12] 1 Primer Friday 10am Beckman B-302 Introduction to the UCSC Browser](https://reader034.vdocuments.mx/reader034/viewer/2022042616/56649dff5503460f94ae851f/html5/thumbnails/24.jpg)
24
“Raw” Blastz track (no longer displayed)
Protease Regulatory Subunit 3
Alignment = homologous regions
![Page 25: Http://cs273a.stanford.edu [Bejerano Fall11/12] 1 Primer Friday 10am Beckman B-302 Introduction to the UCSC Browser](https://reader034.vdocuments.mx/reader034/viewer/2022042616/56649dff5503460f94ae851f/html5/thumbnails/25.jpg)
Chains & Nets: How they’re built
• 1: Blastz one genome to another– Local alignment algorithm– Finds short blocks of similarity
Hg18: AAAAAACCCCCAAAAA
Mm8: AAAAAAGGGGG
Hg18.1-6 + AAAAAAMm8.1-6 + AAAAAA
Hg18.7-11 + CCCCCMm8.1-5 - CCCCC
Hg18.12-16 + AAAAAMm8.1-5 + AAAAA
25
![Page 26: Http://cs273a.stanford.edu [Bejerano Fall11/12] 1 Primer Friday 10am Beckman B-302 Introduction to the UCSC Browser](https://reader034.vdocuments.mx/reader034/viewer/2022042616/56649dff5503460f94ae851f/html5/thumbnails/26.jpg)
Chains & Nets: How they’re built• 2: “Chain” alignment blocks together
– Links blocks that preserve order and orientation– Not single coverage in either species
Hg18: AAAAAACCCCCAAAAA
Mm8: AAAAAAGGGGGAAAAA
Hg18: AAAAAACCCCCAAAAA Mm8 chains
Mm8.1-6 +
Mm8.7-11 -
Mm8.12-16 +
Mm8.12-15 + Mm8.1-5 + 26
![Page 27: Http://cs273a.stanford.edu [Bejerano Fall11/12] 1 Primer Friday 10am Beckman B-302 Introduction to the UCSC Browser](https://reader034.vdocuments.mx/reader034/viewer/2022042616/56649dff5503460f94ae851f/html5/thumbnails/27.jpg)
Another Chain Example
A B CD E
Ancestral Sequence
A B CD E
Human SequenceA B CD E
Mouse Sequence
B’
In Human BrowserImplicitHumansequence
Mousechains B’
…
…
D E
D E
In Mouse BrowserImplicitMousesequence
Humanchains
…
… D E
27
![Page 28: Http://cs273a.stanford.edu [Bejerano Fall11/12] 1 Primer Friday 10am Beckman B-302 Introduction to the UCSC Browser](https://reader034.vdocuments.mx/reader034/viewer/2022042616/56649dff5503460f94ae851f/html5/thumbnails/28.jpg)
Gap Types: Single vs Double sided
A B CD E
Ancestral Sequence
A B CD E
Human SequenceA B CD E
Mouse Sequence
B’
In Human BrowserImplicitHumansequence
Mousechains B’
…
…
D E
D E
In Mouse BrowserImplicitMousesequence
Humanchains
…
… D E
28
![Page 29: Http://cs273a.stanford.edu [Bejerano Fall11/12] 1 Primer Friday 10am Beckman B-302 Introduction to the UCSC Browser](https://reader034.vdocuments.mx/reader034/viewer/2022042616/56649dff5503460f94ae851f/html5/thumbnails/29.jpg)
The Use of an Outgroup
A B CD E
Outgroup Sequence
A B CD E
Human SequenceA B CD E
Mouse Sequence
B’
In Human BrowserImplicitHumansequence
Mousechains B’
…
…
D E
D E
In Mouse BrowserImplicitMousesequence
Humanchains
…
… D E
29
![Page 30: Http://cs273a.stanford.edu [Bejerano Fall11/12] 1 Primer Friday 10am Beckman B-302 Introduction to the UCSC Browser](https://reader034.vdocuments.mx/reader034/viewer/2022042616/56649dff5503460f94ae851f/html5/thumbnails/30.jpg)
What if my topology is wrong?
A B CD E
“Outgroup” SequenceA B C
D E
Human SequenceA B CD E
Mouse Sequence
B’
In Human BrowserImplicitHumansequence
Mousechains B’
…
…
D E
D E
In Mouse BrowserImplicitMousesequence
Humanchains
…
… D E
30
![Page 31: Http://cs273a.stanford.edu [Bejerano Fall11/12] 1 Primer Friday 10am Beckman B-302 Introduction to the UCSC Browser](https://reader034.vdocuments.mx/reader034/viewer/2022042616/56649dff5503460f94ae851f/html5/thumbnails/31.jpg)
http://cs273a.stanford.edu [Bejerano Fall11/12] 31
Chains join together related local alignments
Protease Regulatory Subunit 3
likely ortholog
likely paralogsshared domain?
![Page 32: Http://cs273a.stanford.edu [Bejerano Fall11/12] 1 Primer Friday 10am Beckman B-302 Introduction to the UCSC Browser](https://reader034.vdocuments.mx/reader034/viewer/2022042616/56649dff5503460f94ae851f/html5/thumbnails/32.jpg)
http://cs273a.stanford.edu [Bejerano Fall11/12] 32
Chains• a chain is a sequence of gapless aligned blocks, where there must
be no overlaps of blocks' target or query coords within the chain.• Within a chain, target and query coords are monotonically non-
decreasing. (i.e. always increasing or flat)• double-sided gaps are a new capability (blastz can't do that) that
allow extremely long chains to be constructed.• not just orthologs, but paralogs too, can result in good chains. but
that's useful!• chains should be symmetrical -- e.g. swap human-mouse -> mouse-
human chains, and you should get approx. the same chains as if you chain swapped mouse-human blastz alignments.
• chained blastz alignments are not single-coverage in either target or query unless some subsequent filtering (like netting) is done.
• chain tracks can contain massive pileups when a piece of the target aligns well to many places in the query. Common causes of this include insufficient masking of repeats and high-copy-number genes (or paralogs). [Angie Hinrichs, UCSC wiki]
![Page 33: Http://cs273a.stanford.edu [Bejerano Fall11/12] 1 Primer Friday 10am Beckman B-302 Introduction to the UCSC Browser](https://reader034.vdocuments.mx/reader034/viewer/2022042616/56649dff5503460f94ae851f/html5/thumbnails/33.jpg)
http://cs273a.stanford.edu [Bejerano Fall11/12] 33
Before and After Chaining
![Page 34: Http://cs273a.stanford.edu [Bejerano Fall11/12] 1 Primer Friday 10am Beckman B-302 Introduction to the UCSC Browser](https://reader034.vdocuments.mx/reader034/viewer/2022042616/56649dff5503460f94ae851f/html5/thumbnails/34.jpg)
http://cs273a.stanford.edu [Bejerano Fall11/12] 34
Chaining Algorithm
Input - blocks of gapless alignments from blastzDynamic program based on the recurrence relationship:
score(Bi) = max(score(Bj) + match(Bi) - gap(Bi, Bj))
Uses Miller’s KD-tree algorithm to minimize which parts of dynamic programming graph to traverse. Timing is O(N logN), where N is number of blocks (which is in hundreds of thousands)
j<i
![Page 35: Http://cs273a.stanford.edu [Bejerano Fall11/12] 1 Primer Friday 10am Beckman B-302 Introduction to the UCSC Browser](https://reader034.vdocuments.mx/reader034/viewer/2022042616/56649dff5503460f94ae851f/html5/thumbnails/35.jpg)
http://cs273a.stanford.edu [Bejerano Fall11/12] 35
Netting AlignmentsCommonly multiple mouse alignments can be found for a particular human region, particularly for coding regions.
Net finds best match mouse match for each human region.Highest scoring chains are used first.Lower scoring chains fill in gaps within chains inducing a natural hierarchy.
![Page 36: Http://cs273a.stanford.edu [Bejerano Fall11/12] 1 Primer Friday 10am Beckman B-302 Introduction to the UCSC Browser](https://reader034.vdocuments.mx/reader034/viewer/2022042616/56649dff5503460f94ae851f/html5/thumbnails/36.jpg)
http://cs273a.stanford.edu [Bejerano Fall11/12] 36
Net Focuses on Ortholog
![Page 37: Http://cs273a.stanford.edu [Bejerano Fall11/12] 1 Primer Friday 10am Beckman B-302 Introduction to the UCSC Browser](https://reader034.vdocuments.mx/reader034/viewer/2022042616/56649dff5503460f94ae851f/html5/thumbnails/37.jpg)
http://cs273a.stanford.edu [Bejerano Fall11/12] 37
Nets
• a net is a hierarchical collection of chains, with the highest-scoring non-overlapping chains on top, and their gaps filled in where possible by lower-scoring chains, for several levels.
• a net is single-coverage for target but not for query.• because it's single-coverage in the target, it's no longer symmetrical.• the netter has two outputs, one of which we usually ignore: the target-
centric net in query coordinates. The reciprocal best process uses that output: the query-referenced (but target-centric / target single-cov) net is turned back into component chains, and then those are netted to get single coverage in the query too; the two outputs of that netting are reciprocal-best in query and target coords. Reciprocal-best nets are symmetrical again.
• nets do a good job of filtering out massive pileups by collapsing them down to (usually) a single level.
[Angie Hinrichs, UCSC wiki]
![Page 38: Http://cs273a.stanford.edu [Bejerano Fall11/12] 1 Primer Friday 10am Beckman B-302 Introduction to the UCSC Browser](https://reader034.vdocuments.mx/reader034/viewer/2022042616/56649dff5503460f94ae851f/html5/thumbnails/38.jpg)
http://cs273a.stanford.edu [Bejerano Fall11/12] 38
Before and After Netting