ggtgccagggaaagggcaggaggtgagtgctgggag ...bejerano.stanford.edu/talks/bejeranoapr06.pdfcoelacanth...

37
Gill Bejerano 1 GGTGCCAGGGAAAGGGCAGGAGGTGAGTGCTGGGAGGCAGCTGAGGTCAACTTCTTTTGAACTTCCACGTGGTATTTACTCAGAGCAATTGGTGCCAGAG GCTCAGGGCCCTGGAGTATAAAGCAGAATGTCTGCTCTCTGTGCCCAGACGTGAGCAGGTGAGCAGCTGGGGCGAAAGACCTGTTGGAGGCTATGAATGC AATCAAGGTGACAGACAACTGGTGCAATGATGGTAGTGGAAATGGAGGAGAGGGGATTGATTCAAGATGCATTTAGGACCAAGAATCGGGAGCTTGTGAA CGTGTGTATGAGTACTGTAGACGGAGTGGGTGTGTCATCAGAGAAGATCTGAGCATTTGGGCTTGCTCTCCTCAGAGGCCCTGCGAGTGGAGTTCAGCTT TTCCTCATGGGGCAAATCTCACTTTCGCTCCAGTTCCTGGGGCTCAGAGTCCCTGGCCCAGATGCCTCTTGCCATCTCATCTTCACCCTGCCTGGCTTCC CTTGCTTGTTCCAGGATTGTTTCATAAAGAGGGATGTGGTTGGTCTTTAACCCTATGAATGCTGGCTGAGGATGCCTGCGGAACCTGTAGTGAAGCTTTC AGGGGCTGCTCGGGTTCTGGCTGGTAGGTGAACACTGTCCATCTTGCCGGCTGGGACACAGTGACTCTGGGTAGTTGTGTAAGAGAGGGGCCCTTGGCAG ACAAACAGGTTCTTCTCTGTTGGTGGGCCAGCCAGCAGGTCAGTGGGAAGGTTAAAGGTCATGGGGTTTGGGAGAACTGGGTGAGGAGTTCAGCCCCATC CCCCGTAAAGCTCCTGGGAAGCACTTCTCTACTGGGGCAGCCCCTGATACCAGGGCACTCATTAACCCTCTGGGTGCCAGGGAAAGGGCAGGAGGTGAGT GCTGGGAGGCAGCTGAGGTCAACTTCTTTTGAACTTCCACGTGGTATTTACTCAGAGCAATTGGTGCCAGAGGCTCAGGGCCCTGGAGTATAAAGCAGAA TGTCTGCTCTCTGTGCCCAGACGTGAGCAGGTGAGCAGCTGGGGCTGTCTGCTCTCTGTGCCCAGACGTGAGCAGGTGAGCAGCTGGGGCTGTCTGCTCT CTGTGCCCAGACGTGAGCAGGTGAGCAGCTGGGGCTGTCTGCTCTCTGTGCCCAGACGTGAGCAGGTGAGCAGCTGGGGCTGTCTGCTCTCTGTGCCCAG GAAAGACCTGTTGGAGGCTATGAATGCAATCAAGGTGACAGACAACTGGTGCAATGATGGTAGTGGAAATGGAGGAGAGGGGATTGATTCAAGATGCATT TAGGACCAAGAATCGGGAGCTTGTGAACGTGTGTATGAGTACTGTAGACGGAGTGGGTGTGTCATCAGAGAAGATCTGAGCATTTGGGCTTGCTCTCCTC AGAGGCCCTGCGAGTGGAGTTCAGCTTTTCCTCATGGGGCAAATCTCACTTTCGCTCCAGTTCCTGGGGCTCAGAGTCCCTGGCCCAGATGCCTCTTGCC ATCTCATCTTCACCCTGCCTGGCTTCCCTTGCTTGTTCCAGGATTGTTTCATAAAGAGGGATGTGGTTGGTCTTTAACCCTATGAATGCTGGCTGAGGAT GCCTGCGGAACCTGTAGTGAAGCTTTCAGGGGCTGCTCGGGTTCTGGCTGGTAGGTGAACACTGTCCATCTTGCCGGCTGGGACACAGTGACTCTGGGTA GTTGTGTAAGAGAGGGGCCCTTGGCAGACAAACAGGTTCTTCTCTGTTGGTGGGCCAGCCAGCAGGTCAGTGGGAAGGTTAAAGGTCATGGGGTTTGGGA GAAACTGGGTGAGGAGTTCAGCCCCATCCCCCGTAAAGCTCCTGGGAAGCACTTCTCTACTGGGGCAGCCCCTGATACCAGGGCACTCATTAACCCTCTG GGTGCCAGGGAAAGGGCAGGAGGTGAGTGCTGGGAGGCAGCTGAGGTCAACTTCTTTTGAACTTCCACGTGGTATTTACTCAGAGCAATTGGTGCCAGAG GCTCAGGGCCCTGGAGTATAAAGCAGAATGTCTGCTCTCTGTGCCCAGACGTGAGCAGGTGAGCAGCTGGGGCGAAAGACCTGTTGGAGGCTATGAATGC AATCAAGGTGACAGACAACTGGTGCAATGATGGTAGTGGAAATGGAGGAGAGGGGATTGATTCAAGATGCATTTAGGACCAAGAATCGGGAGCTTGTGAA CGTGTGTATGAGTACTGTAGACGGAGTGGGTGTGTCATCAGAGAAGATCTGAGCATTTGGGCTTGCTCTCCTCAGAGGCCCTGCGAGTGGAGTTCAGCTT TTCCTCATGGGGCAAATCTCACTTTCGCTCCAGTTCCTGGGGCTCAGAGTCCCTGGCCCAGATGCCTCTTGCCATCTCATCTTCACCCTGCCTGGCTTCC CTTGCTTGTTCCAGGATTGTTTCATAAAGAGGGATGTGGTTGGTCTTTAACCCTATGAATGCTGGCTGAGGATGCCTGCGGAACCTGTAGTGAAGCTTTC AGGGGCTGCTCGGGTTCTGGCTGGTAGGTGAACACTGTCCATCTTGCCGGCTGGGACACAGTGACTCTGGGTAGTTGTGTAAGAGAGGGGCCCTTGGCAG ACAAACAGGTTCTTCTCTGTTGGTGGGCCAGCCAGCAGGTCAGTGGGAAGGTTAAAGGTCATGGGGTTTGGGAGAACTGGGTGAGGAGTTCAGCCCCATC CCCCGTAAAGCTCCTGGGAAGCACTTCTCTACTGGGGCAGCCCCTGATACCAGGGCACTCATTAACCCTCTGGGTGCCAGGGAAAGGGCAGGAGGTGAGT GCTGGGAGGCAGCTGAGGTCAACTTCTTTTGAACTTCCACGTGGTATTTACTCAGAGCAATTGGTGCCAGAGGCTCAGGGCCCTGGAGTATAAAGCAGAA TGTCTGCTCTCTGTGCCCAGACGTGAGCAGGTGAGCAGCTGGGGCTGTCTGCTCTCTGTGCCCAGACGTGAGCAGGTGAGCAGCTGGGGCTGTCTGCTCT CTGTGCCCAGACGTGAGCAGGTGAGCAGCTGGGGCTGTCTGCTCTCTGTGCCCAGACGTGAGCAGGTGAGCAGCTGGGGCTGTCTGCTCTCTGTGCCCAG Ultraconserved Elements in Eukaryote Genome Gill Bejerano UC Santa Cruz w: www.soe.ucsc.edu/~jill e: [email protected]

Upload: others

Post on 06-Aug-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: GGTGCCAGGGAAAGGGCAGGAGGTGAGTGCTGGGAG ...bejerano.stanford.edu/talks/BejeranoApr06.pdfCoelacanth “the Living Fossil” Fish Gill Bejerano 19 Appeared in Fossil Record >360Mya; Peaked

Gill Bejerano 1

GGTGCCAGGGAAAGGGCAGGAGGTGAGTGCTGGGAGGCAGCTGAGGTCAACTTCTTTTGAACTTCCACGTGGTATTTACTCAGAGCAATTGGTGCCAGAGGCTCAGGGCCCTGGAGTATAAAGCAGAATGTCTGCTCTCTGTGCCCAGACGTGAGCAGGTGAGCAGCTGGGGCGAAAGACCTGTTGGAGGCTATGAATGCAATCAAGGTGACAGACAACTGGTGCAATGATGGTAGTGGAAATGGAGGAGAGGGGATTGATTCAAGATGCATTTAGGACCAAGAATCGGGAGCTTGTGAACGTGTGTATGAGTACTGTAGACGGAGTGGGTGTGTCATCAGAGAAGATCTGAGCATTTGGGCTTGCTCTCCTCAGAGGCCCTGCGAGTGGAGTTCAGCTTTTCCTCATGGGGCAAATCTCACTTTCGCTCCAGTTCCTGGGGCTCAGAGTCCCTGGCCCAGATGCCTCTTGCCATCTCATCTTCACCCTGCCTGGCTTCCCTTGCTTGTTCCAGGATTGTTTCATAAAGAGGGATGTGGTTGGTCTTTAACCCTATGAATGCTGGCTGAGGATGCCTGCGGAACCTGTAGTGAAGCTTTCAGGGGCTGCTCGGGTTCTGGCTGGTAGGTGAACACTGTCCATCTTGCCGGCTGGGACACAGTGACTCTGGGTAGTTGTGTAAGAGAGGGGCCCTTGGCAGACAAACAGGTTCTTCTCTGTTGGTGGGCCAGCCAGCAGGTCAGTGGGAAGGTTAAAGGTCATGGGGTTTGGGAGAACTGGGTGAGGAGTTCAGCCCCATCCCCCGTAAAGCTCCTGGGAAGCACTTCTCTACTGGGGCAGCCCCTGATACCAGGGCACTCATTAACCCTCTGGGTGCCAGGGAAAGGGCAGGAGGTGAGTGCTGGGAGGCAGCTGAGGTCAACTTCTTTTGAACTTCCACGTGGTATTTACTCAGAGCAATTGGTGCCAGAGGCTCAGGGCCCTGGAGTATAAAGCAGAATGTCTGCTCTCTGTGCCCAGACGTGAGCAGGTGAGCAGCTGGGGCTGTCTGCTCTCTGTGCCCAGACGTGAGCAGGTGAGCAGCTGGGGCTGTCTGCTCTCTGTGCCCAGACGTGAGCAGGTGAGCAGCTGGGGCTGTCTGCTCTCTGTGCCCAGACGTGAGCAGGTGAGCAGCTGGGGCTGTCTGCTCTCTGTGCCCAGGAAAGACCTGTTGGAGGCTATGAATGCAATCAAGGTGACAGACAACTGGTGCAATGATGGTAGTGGAAATGGAGGAGAGGGGATTGATTCAAGATGCATTTAGGACCAAGAATCGGGAGCTTGTGAACGTGTGTATGAGTACTGTAGACGGAGTGGGTGTGTCATCAGAGAAGATCTGAGCATTTGGGCTTGCTCTCCTCAGAGGCCCTGCGAGTGGAGTTCAGCTTTTCCTCATGGGGCAAATCTCACTTTCGCTCCAGTTCCTGGGGCTCAGAGTCCCTGGCCCAGATGCCTCTTGCCATCTCATCTTCACCCTGCCTGGCTTCCCTTGCTTGTTCCAGGATTGTTTCATAAAGAGGGATGTGGTTGGTCTTTAACCCTATGAATGCTGGCTGAGGATGCCTGCGGAACCTGTAGTGAAGCTTTCAGGGGCTGCTCGGGTTCTGGCTGGTAGGTGAACACTGTCCATCTTGCCGGCTGGGACACAGTGACTCTGGGTAGTTGTGTAAGAGAGGGGCCCTTGGCAGACAAACAGGTTCTTCTCTGTTGGTGGGCCAGCCAGCAGGTCAGTGGGAAGGTTAAAGGTCATGGGGTTTGGGAGAAACTGGGTGAGGAGTTCAGCCCCATCCCCCGTAAAGCTCCTGGGAAGCACTTCTCTACTGGGGCAGCCCCTGATACCAGGGCACTCATTAACCCTCTGGGTGCCAGGGAAAGGGCAGGAGGTGAGTGCTGGGAGGCAGCTGAGGTCAACTTCTTTTGAACTTCCACGTGGTATTTACTCAGAGCAATTGGTGCCAGAGGCTCAGGGCCCTGGAGTATAAAGCAGAATGTCTGCTCTCTGTGCCCAGACGTGAGCAGGTGAGCAGCTGGGGCGAAAGACCTGTTGGAGGCTATGAATGCAATCAAGGTGACAGACAACTGGTGCAATGATGGTAGTGGAAATGGAGGAGAGGGGATTGATTCAAGATGCATTTAGGACCAAGAATCGGGAGCTTGTGAACGTGTGTATGAGTACTGTAGACGGAGTGGGTGTGTCATCAGAGAAGATCTGAGCATTTGGGCTTGCTCTCCTCAGAGGCCCTGCGAGTGGAGTTCAGCTTTTCCTCATGGGGCAAATCTCACTTTCGCTCCAGTTCCTGGGGCTCAGAGTCCCTGGCCCAGATGCCTCTTGCCATCTCATCTTCACCCTGCCTGGCTTCCCTTGCTTGTTCCAGGATTGTTTCATAAAGAGGGATGTGGTTGGTCTTTAACCCTATGAATGCTGGCTGAGGATGCCTGCGGAACCTGTAGTGAAGCTTTCAGGGGCTGCTCGGGTTCTGGCTGGTAGGTGAACACTGTCCATCTTGCCGGCTGGGACACAGTGACTCTGGGTAGTTGTGTAAGAGAGGGGCCCTTGGCAGACAAACAGGTTCTTCTCTGTTGGTGGGCCAGCCAGCAGGTCAGTGGGAAGGTTAAAGGTCATGGGGTTTGGGAGAACTGGGTGAGGAGTTCAGCCCCATCCCCCGTAAAGCTCCTGGGAAGCACTTCTCTACTGGGGCAGCCCCTGATACCAGGGCACTCATTAACCCTCTGGGTGCCAGGGAAAGGGCAGGAGGTGAGTGCTGGGAGGCAGCTGAGGTCAACTTCTTTTGAACTTCCACGTGGTATTTACTCAGAGCAATTGGTGCCAGAGGCTCAGGGCCCTGGAGTATAAAGCAGAATGTCTGCTCTCTGTGCCCAGACGTGAGCAGGTGAGCAGCTGGGGCTGTCTGCTCTCTGTGCCCAGACGTGAGCAGGTGAGCAGCTGGGGCTGTCTGCTCTCTGTGCCCAGACGTGAGCAGGTGAGCAGCTGGGGCTGTCTGCTCTCTGTGCCCAGACGTGAGCAGGTGAGCAGCTGGGGCTGTCTGCTCTCTGTGCCCAG

Ultraconserved Elementsin Eukaryote Genome

Gill BejeranoUC Santa Cruz

w: www.soe.ucsc.edu/~jill e: [email protected]

Page 2: GGTGCCAGGGAAAGGGCAGGAGGTGAGTGCTGGGAG ...bejerano.stanford.edu/talks/BejeranoApr06.pdfCoelacanth “the Living Fossil” Fish Gill Bejerano 19 Appeared in Fossil Record >360Mya; Peaked

Genomics

Gill Bejerano 2

Page 3: GGTGCCAGGGAAAGGGCAGGAGGTGAGTGCTGGGAG ...bejerano.stanford.edu/talks/BejeranoApr06.pdfCoelacanth “the Living Fossil” Fish Gill Bejerano 19 Appeared in Fossil Record >360Mya; Peaked

Conserved Non Coding Regions

Gill Bejerano 3

other48%

repeats50%

coding exons1.5%

known function

0.5%

all human-mouse DNAneutral human-mouse DNA

Difference: 5% of Human

[Mouse Consortium 2002]conservation level

frequ

ency

HumanGenome

2.9Gb

[Science 2004 Breakthrough of the Year, 5th runner up]

1/3 coding2/3 non coding

of the 5%40% DNA alignable95% coding genes shared

Page 4: GGTGCCAGGGAAAGGGCAGGAGGTGAGTGCTGGGAG ...bejerano.stanford.edu/talks/BejeranoApr06.pdfCoelacanth “the Living Fossil” Fish Gill Bejerano 19 Appeared in Fossil Record >360Mya; Peaked

What To Do?

Gill Bejerano 4

Assay! Which Assay? Which elements?

one million elementsgenome wide...

http://genome.ucsc.edu

[Bejerano et al., Nature Methods 2005]

Page 5: GGTGCCAGGGAAAGGGCAGGAGGTGAGTGCTGGGAG ...bejerano.stanford.edu/talks/BejeranoApr06.pdfCoelacanth “the Living Fossil” Fish Gill Bejerano 19 Appeared in Fossil Record >360Mya; Peaked

DNA Replication is Imperfect

Gill Bejerano 5

Small Scale: bases are mutated, deleted, insertedMedium Scale: regions are duplicated, deleted, invertedLarge Scale: whole chromosomes are duplicated, merged

junk functional

...ACGTACGACTGACTAGCATCGACTACGA...

...ACGTACGACTGACTAGCATCGACTACGA........TCTGACTAGCATCGACTACGA...

...ACGTACGACTGACTAGCATCGACTACGA........TCTGACTAGCATCGACTACGA...

functionalfunctional

duplication

functional’’functional’divergenceCG AA

Page 6: GGTGCCAGGGAAAGGGCAGGAGGTGAGTGCTGGGAG ...bejerano.stanford.edu/talks/BejeranoApr06.pdfCoelacanth “the Living Fossil” Fish Gill Bejerano 19 Appeared in Fossil Record >360Mya; Peaked

Computational Approach

Gill Bejerano 6

.....ACGTGCATGACTGACTAGCATCAGACGACTAC..GATAATACGCTACGACTAGCTAC.....human DNA

Group them into paralog families of human functional regions of common origins:• Annotated members induce function on all.• Examine core, substitutions in family.• Test for “guilt by association”. [Bejerano et al., ISMB 2004]

...TGACTAGCATCGACTAC..GATAATACGAC... ...CATCGACTAC..GATAATACGACGGTTGGT...AC T

Page 7: GGTGCCAGGGAAAGGGCAGGAGGTGAGTGCTGGGAG ...bejerano.stanford.edu/talks/BejeranoApr06.pdfCoelacanth “the Living Fossil” Fish Gill Bejerano 19 Appeared in Fossil Record >360Mya; Peaked

Conserved Element Clustering

Gill Bejerano 7

w

edgeremoval

vertexsplitting [Bejerano et al., ISMB 2004]

vertex

vertex

edge

conservedelement

similaritygraphevolutionary edit distance

conservedelement

Page 8: GGTGCCAGGGAAAGGGCAGGAGGTGAGTGCTGGGAG ...bejerano.stanford.edu/talks/BejeranoApr06.pdfCoelacanth “the Living Fossil” Fish Gill Bejerano 19 Appeared in Fossil Record >360Mya; Peaked

Functional Annotation by Families

Gill Bejerano 8

[Bejerano et al., ISMB 2004]

Puzzling News:96% of the 700,000are unique(!)

novel putative ncRNAs, cis-regulatory elements, etc.

Good News:We still find12,027 families

After removing from top 5% Human all annotated regions, and more:700,000 elements, covering 3.5% Human Genome

metrics

Page 9: GGTGCCAGGGAAAGGGCAGGAGGTGAGTGCTGGGAG ...bejerano.stanford.edu/talks/BejeranoApr06.pdfCoelacanth “the Living Fossil” Fish Gill Bejerano 19 Appeared in Fossil Record >360Mya; Peaked

Families of Novel ncRNAs

Gill Bejerano 9

stochastic contextfree grammar model

[Pedersen, Bejerano et al., PLoS CompBio, 2006]

Page 10: GGTGCCAGGGAAAGGGCAGGAGGTGAGTGCTGGGAG ...bejerano.stanford.edu/talks/BejeranoApr06.pdfCoelacanth “the Living Fossil” Fish Gill Bejerano 19 Appeared in Fossil Record >360Mya; Peaked

Genes, Proteins and Gene Control

Gill Bejerano 10

exon (how to)control region(when & where)distal: in 106 bases

DNA

classical: in 103 bases

Page 11: GGTGCCAGGGAAAGGGCAGGAGGTGAGTGCTGGGAG ...bejerano.stanford.edu/talks/BejeranoApr06.pdfCoelacanth “the Living Fossil” Fish Gill Bejerano 19 Appeared in Fossil Record >360Mya; Peaked

Gill Bejerano 11

Cis-Regulatory Families

IRX-3 IRX-5 IRX-6Hs.chr16

IRX-1 IRX-2 IRX-4

Hs.chr5

70-80%id

IRX-1 IRX-2 IRX-3 IRX-4 IRX-5 IRX-6IRX-1 IRX-2 IRX-3 IRX-4 IRX-5 IRX-6earlybody

patterning

Page 12: GGTGCCAGGGAAAGGGCAGGAGGTGAGTGCTGGGAG ...bejerano.stanford.edu/talks/BejeranoApr06.pdfCoelacanth “the Living Fossil” Fish Gill Bejerano 19 Appeared in Fossil Record >360Mya; Peaked

Ultraconserved Elements

Gill Bejerano 12

[Bejerano et al., Science 2004]

Page 13: GGTGCCAGGGAAAGGGCAGGAGGTGAGTGCTGGGAG ...bejerano.stanford.edu/talks/BejeranoApr06.pdfCoelacanth “the Living Fossil” Fish Gill Bejerano 19 Appeared in Fossil Record >360Mya; Peaked

No known function requires this much conservation

Gill Bejerano 13

*****?

CDS ncRNA TFBS

seq.

Page 14: GGTGCCAGGGAAAGGGCAGGAGGTGAGTGCTGGGAG ...bejerano.stanford.edu/talks/BejeranoApr06.pdfCoelacanth “the Living Fossil” Fish Gill Bejerano 19 Appeared in Fossil Record >360Mya; Peaked

Ultraconserved Elements

Gill Bejerano 14

481 regions perfectly conserved over 200bp or more, between human, mouse and rat (P<10-22 neutrally evolving)

• 20-fold fewer SNPs than human average.• Most do not overlap coding DNA.• The non-exonic tend to cluster spatially,

in or near DNA binding proteins.Dozens validated as enhancers by now.

• The exonic tend to overlap alternatively spliced exons, in RNA binding proteins.

• The ultras cannot be found beyond vertebrates.• The tip of a continuum of very slowly evolving elements.

[Bejerano et al., Science 2004Chicken Consortium, Nature 2004]

conservation

freq

Page 15: GGTGCCAGGGAAAGGGCAGGAGGTGAGTGCTGGGAG ...bejerano.stanford.edu/talks/BejeranoApr06.pdfCoelacanth “the Living Fossil” Fish Gill Bejerano 19 Appeared in Fossil Record >360Mya; Peaked

Similar Phenomena in Flies

Gill Bejerano 15

rich in conserved non-coding

fly-specific ultraconserved elements

[Glazov, ..., Bejerano, Mattick, Genome Research 2005][Siepel, Bejerano et al., Genome Research 2005]

Page 16: GGTGCCAGGGAAAGGGCAGGAGGTGAGTGCTGGGAG ...bejerano.stanford.edu/talks/BejeranoApr06.pdfCoelacanth “the Living Fossil” Fish Gill Bejerano 19 Appeared in Fossil Record >360Mya; Peaked

Mammalian Ultras - Genomic Distribution

Gill Bejerano 16

•exonic•non•possibly

Page 17: GGTGCCAGGGAAAGGGCAGGAGGTGAGTGCTGGGAG ...bejerano.stanford.edu/talks/BejeranoApr06.pdfCoelacanth “the Living Fossil” Fish Gill Bejerano 19 Appeared in Fossil Record >360Mya; Peaked

Ultraconserved Element uc.338

Gill Bejerano 17

uc.338 paralog family

Page 18: GGTGCCAGGGAAAGGGCAGGAGGTGAGTGCTGGGAG ...bejerano.stanford.edu/talks/BejeranoApr06.pdfCoelacanth “the Living Fossil” Fish Gill Bejerano 19 Appeared in Fossil Record >360Mya; Peaked

Coelacanth Homologs Closer than Human Ones

Gill Bejerano 18

[Bejerano, Lowe et al., Nature, 2006]

Page 19: GGTGCCAGGGAAAGGGCAGGAGGTGAGTGCTGGGAG ...bejerano.stanford.edu/talks/BejeranoApr06.pdfCoelacanth “the Living Fossil” Fish Gill Bejerano 19 Appeared in Fossil Record >360Mya; Peaked

Coelacanth “the Living Fossil” Fish

Gill Bejerano 19

Appeared in Fossil Record >360Mya; Peaked 240Mya; Disappeared 80Mya; Rediscovered (by science) in 1938.

Page 20: GGTGCCAGGGAAAGGGCAGGAGGTGAGTGCTGGGAG ...bejerano.stanford.edu/talks/BejeranoApr06.pdfCoelacanth “the Living Fossil” Fish Gill Bejerano 19 Appeared in Fossil Record >360Mya; Peaked

Coelacanth Repeat

Gill Bejerano 20

1 Mb sequenced.4 genomic regions.Instances in all.

uc.338 like

481bp repeat59 copies/1MbHighly similar copiesUnlike any known repeat

human consensus

coelacanthself comparison

Page 21: GGTGCCAGGGAAAGGGCAGGAGGTGAGTGCTGGGAG ...bejerano.stanford.edu/talks/BejeranoApr06.pdfCoelacanth “the Living Fossil” Fish Gill Bejerano 19 Appeared in Fossil Record >360Mya; Peaked

Coelacanth Repeat Distribution

Gill Bejerano 21

Reconstruct ancestral coelacanth repeat.Search all available genomes.

?

x

Similar distribution in GenBank, Trace Archives.

repeat repeat

Page 22: GGTGCCAGGGAAAGGGCAGGAGGTGAGTGCTGGGAG ...bejerano.stanford.edu/talks/BejeranoApr06.pdfCoelacanth “the Living Fossil” Fish Gill Bejerano 19 Appeared in Fossil Record >360Mya; Peaked

Interim Summary

Gill Bejerano 22

480bp recently active repeat80%id to 360bp incl. uc.338105 copies/genomeUnlike any known repeat

Page 23: GGTGCCAGGGAAAGGGCAGGAGGTGAGTGCTGGGAG ...bejerano.stanford.edu/talks/BejeranoApr06.pdfCoelacanth “the Living Fossil” Fish Gill Bejerano 19 Appeared in Fossil Record >360Mya; Peaked

Repeats are a Major Force in Vertebrate Evolution

Gill Bejerano 23

>½ the human genome consists of relics of

self-replicating DNA

LINEs & SINEs

out

back

Page 24: GGTGCCAGGGAAAGGGCAGGAGGTGAGTGCTGGGAG ...bejerano.stanford.edu/talks/BejeranoApr06.pdfCoelacanth “the Living Fossil” Fish Gill Bejerano 19 Appeared in Fossil Record >360Mya; Peaked

The LF SINE (for Lobefin Fish / “Living Fossil”)

Gill Bejerano 24

Reconstructionout back

target siteduplications

Page 25: GGTGCCAGGGAAAGGGCAGGAGGTGAGTGCTGGGAG ...bejerano.stanford.edu/talks/BejeranoApr06.pdfCoelacanth “the Living Fossil” Fish Gill Bejerano 19 Appeared in Fossil Record >360Mya; Peaked

>360My Old and Going Strong

Gill Bejerano 25

?

xB

D

Upto 80%id between Coelacanth SINEand some human instances, inc uc.338.

Page 26: GGTGCCAGGGAAAGGGCAGGAGGTGAGTGCTGGGAG ...bejerano.stanford.edu/talks/BejeranoApr06.pdfCoelacanth “the Living Fossil” Fish Gill Bejerano 19 Appeared in Fossil Record >360Mya; Peaked

Co-Option (Exaptation) of obile Elements

Gill Bejerano 26

Page 27: GGTGCCAGGGAAAGGGCAGGAGGTGAGTGCTGGGAG ...bejerano.stanford.edu/talks/BejeranoApr06.pdfCoelacanth “the Living Fossil” Fish Gill Bejerano 19 Appeared in Fossil Record >360Mya; Peaked

Exapted Into Which Cellular Roles?

Gill Bejerano 27

gene

?

xHuman instances cluster together, found <1Mb from 35 TFs (P<3*10-6).

No evidence for Transcription (Tx) as small RNAs,no orientation preference in introns, not in antisense Tx.

Page 28: GGTGCCAGGGAAAGGGCAGGAGGTGAGTGCTGGGAG ...bejerano.stanford.edu/talks/BejeranoApr06.pdfCoelacanth “the Living Fossil” Fish Gill Bejerano 19 Appeared in Fossil Record >360Mya; Peaked

Transient Transgenic Enhancer Assay

Gill Bejerano 28

Eddy Rubin’s Lab, LBNL

Reporter GeneMinimal PromoterConservedElement

in situ

Construct is injected into 1 cell embryosTaken out at embryonic day 10.5-14.5Assayed for reporter gene activity

transgenic

Page 29: GGTGCCAGGGAAAGGGCAGGAGGTGAGTGCTGGGAG ...bejerano.stanford.edu/talks/BejeranoApr06.pdfCoelacanth “the Living Fossil” Fish Gill Bejerano 19 Appeared in Fossil Record >360Mya; Peaked

Instance 500kb Downstream of ISL1

Gill Bejerano 29

ISL1 is a neuro-developmental gene, also expressed in testis.Three previously known enhancers are conserved in all vertebrates.

1Mbtyp

ical

Page 30: GGTGCCAGGGAAAGGGCAGGAGGTGAGTGCTGGGAG ...bejerano.stanford.edu/talks/BejeranoApr06.pdfCoelacanth “the Living Fossil” Fish Gill Bejerano 19 Appeared in Fossil Record >360Mya; Peaked

LF SINE Transgenic (B) vs. Mouse Isl1 in situ (C)

Gill Bejerano 30

Matched staining in genital eminence

Matched staining in dorsal apical

ectodermal ridge (part of limb bud)

Nadav Ahituv, Eddy Rubin

Page 31: GGTGCCAGGGAAAGGGCAGGAGGTGAGTGCTGGGAG ...bejerano.stanford.edu/talks/BejeranoApr06.pdfCoelacanth “the Living Fossil” Fish Gill Bejerano 19 Appeared in Fossil Record >360Mya; Peaked

Matched Level Sections

Gill Bejerano 31

Bryan King, Sofie Salama, Nadav Ahituv, Eddy Rubin

in situtransgenic

Corresponding expression patterns in: (a, b) the developing thalamus (Th)

and basal plate (BP) in the brain. (c, d) the trigeminal (V) ganglion and

facio-acoustic (VII/VIII) ganglia in the head region.

(e, f) the dorsal root ganglion (DRG), and the lateral region of the ventral horn (VH) of the spinal cordin thoracic sections.

Page 32: GGTGCCAGGGAAAGGGCAGGAGGTGAGTGCTGGGAG ...bejerano.stanford.edu/talks/BejeranoApr06.pdfCoelacanth “the Living Fossil” Fish Gill Bejerano 19 Appeared in Fossil Record >360Mya; Peaked

Gill Bejerano 32

Mobile Elements Give Birth to Distal Cis-Regulatory Elements: ImplicationsGene numbers do not correlate with organism complexity.Many gene families are surprisingly old.“Regulatory sequence evolution must be the major contribution

to the evolution of form.” [Sean Carroll, PLoS Bio 2005]

In/vertebrate DivideIn/vertebrate Dividefly

wormhuman

weedfishrice

Page 33: GGTGCCAGGGAAAGGGCAGGAGGTGAGTGCTGGGAG ...bejerano.stanford.edu/talks/BejeranoApr06.pdfCoelacanth “the Living Fossil” Fish Gill Bejerano 19 Appeared in Fossil Record >360Mya; Peaked

uc.338 Has Undergone Exonization

Gill Bejerano 33

PCBP2 is an RNA binding protein.Involved in mRNA processing.Two Isoforms: with/out exonized SINE.Exon is mammal specific,involved in post-transcriptional regulation.

Page 34: GGTGCCAGGGAAAGGGCAGGAGGTGAGTGCTGGGAG ...bejerano.stanford.edu/talks/BejeranoApr06.pdfCoelacanth “the Living Fossil” Fish Gill Bejerano 19 Appeared in Fossil Record >360Mya; Peaked

Exonized LF SINE Instances

Gill Bejerano 34

The 19 proteins are unrelated19/19 exon antisense to SINE17/19 novel coding exon16/17 alternatively-spliced11/17 introduce early stop codon*post-transcriptional regulation

19 different tetrapod exonizations

* The 6 read-thru exons all match a 93bp internal LF-SINE region, (still!) free of stop codons in all 3 frames (p~0.002)

Page 35: GGTGCCAGGGAAAGGGCAGGAGGTGAGTGCTGGGAG ...bejerano.stanford.edu/talks/BejeranoApr06.pdfCoelacanth “the Living Fossil” Fish Gill Bejerano 19 Appeared in Fossil Record >360Mya; Peaked

Unexpected Kin: Enhancer, Exon, Repeat

Gill Bejerano 35

one of 29 human instancesspanning coelacanth SINE

?

Page 36: GGTGCCAGGGAAAGGGCAGGAGGTGAGTGCTGGGAG ...bejerano.stanford.edu/talks/BejeranoApr06.pdfCoelacanth “the Living Fossil” Fish Gill Bejerano 19 Appeared in Fossil Record >360Mya; Peaked

Some Challenges

Gill Bejerano 36

• What molecular mechanisms lead to ultra conservation?• Which ultraconserved elements (ultras) were seeded

by transposons? When? Origins of the others? • How do ultras evolve: all by purifying selection, or is

there some reduced mutation or hyper-repair?• Why the strong association with DNA binding and

RNA binding genes? Transcription/splicing enhancers?• What information do ultras encode and how?• Are ultras part of delicate self-regulating gene networks

that are critical in development?• Are they related to human disease?• Are vertebrate and fly ultras related?• ...

Page 37: GGTGCCAGGGAAAGGGCAGGAGGTGAGTGCTGGGAG ...bejerano.stanford.edu/talks/BejeranoApr06.pdfCoelacanth “the Living Fossil” Fish Gill Bejerano 19 Appeared in Fossil Record >360Mya; Peaked

Gill Bejerano 37

Kudos

UC Santa CruzDavid HausslerSofie Salama, Craig Lowe, Bryan KingJim Kent, Adam Siepel, Jakob Pedersen Katie Pollard, Courtney OnoderaRachel Harte, Genomics/Browser Group

Lawrence Berkeley LabsEddy RubinNadav Ahituv, Marcelo Nobrega

Northwestern U.Jack Kessler, Alex Bassuk

McGill U.Mathieu Blanchette

Penn State U.Webb Miller’s group

U. QueenslandJohn Mattick’s group

Genome Sequencing ConsortiaAll GenBank contributors

w: www.soe.ucsc.edu/~jill e: [email protected]