genome and transcriptome of the regeneration...

1
The Worm Genome and Transcriptome Assembly Genome and Transcriptome Annotation Transcripts Involved in Regeneration in M. lignano Additional Findings RNA seq 0 3 6 12 24 48 72 hours RNA extraction DEseq amputation 0h 3h 6h 12h 24h 48h 72h 5 0 5 Value Color Key MAPK signalling pathway LIF LIFR JAK STAT3 Grb2 Mek PI3K Akt Erk1/2 Klf4 c-Myc Tbx3 Sox2 Nanog Oct4 Jak-Stat signalling pathway MAPK signalling pathway Activin Nodal ACVRI/II SMAD2/3 Proliferation DNA DNA Bmp4 BMPRI/II SMADs SMAD4 Erk1/2 p38 ID DUSP9 TGFβR TGFβ TGFβ signalling pathway Wnt Frizzled Dvl GSK-3β β-catenin Axin APC Wnt signalling pathway FGF2 FGFR Ras Raf Mek Erk1/2 IGF IGF-IR PI3K Akt DNA DNA DNA DNA DNA DNA Core transcriptional network c-Myc MAPK signalling pathway PI3K-Akt signalling pathway Pluripotency pathways from human/murine stem cells: Not found in M.lignano transcriptome Found in M.lignano transcriptome Factors with TGFβ -like domain found M.lignano transcriptome STAT3 not found, other STATs identified in M.lignano transcriptome KLF4 not found, other KLFs identified in M.lignano transcriptome BMP4 not found, other BMPs identified in M.lignano transcriptome DNA DNA DNA DNA DNA Genes from human/murine stem cells: 0.3 max_hvu diminutive_dme 5392611_gvu 4367212__psi 2235912_cle 8844611_ece 8320811_csp 1677112_pca max_aqu 329761_mili 8264415_csp 6889311_pst 358631_mosp 232631_meli 1291201_ltr 3759811_pca 331811_mosp 3433911_mli 4337911_mfu 4338711_sst max_hsa 1643511_psi 7137911_rsp 1674911_mfu 3098031_mfu USF2_hsa 7573711_mli 4658511_mili 170531_meli 4991111_ltr 9649121_mli 5204311_gvu 4338712_sst 401711_ece 1384111_pca 8002813_rsp 1535711_msc 1865211_psi 1677111_pca myc2_hvu S000209_sma 4247911_psi 133235_lgi 2472711_psi 9705812_mli 3981211_mfu 96815g22_mli 8721511_ltr 5733111_gvu 4658514_mili S35429_sma 368781_mosp mycl_hvu 6927111_pst 6637312_rsp 2694411_nco 2289111_nco 7019111_pst 3029911_gvu 193181i_pca Max_dme 500662_mili myc_aqu MXL1_cel lmyc_hsa 1961211_msc 4186911_sst 1086232_mli 437061_meli 442451_meli 1131511_msc 2256811_cle 5046551_psi 173401_mosp 6012811_rsp 8769711_nco 785741_scma - 785751_scma 4186914_sst 370851_mili 323911_mosp 5856311_mfu 1028284_mli nmyc_hsa 699701_csp 1030111_mli 166474_cte USF1_hsa MXL3_cel 88480_lgi 740831_csp myc1_hvu 4161311_mfu 301971_mosp 118760_cte cmyc_hsa 8624611_mli myc_pdu Max_pdu mycAl_hvu 0.837 0.891 0.976 0.784 0.998 0.892 0.728 0.938 0.918 0.769 0.862 0.834 0.963 0.743 0.695 0.923 0.861 0.81 0.9 0.974 0.727 0.789 0.98 0.918 0.995 0.93 0.767 0.836 0.953 0.9 0.815 0.771 0.996 1 1 0.974 0.822 0.931 0.77 0.907 0.783 0.903 0.998 0.787 0.919 0.996 0.962 0.991 0.828 0.969 1 0.926 0.951 0.928 1122 820 500 1439 402 480 355 317 1747 535 262 281 218 581 1368 M. lignano S. mediterranea C. elegans D. melanogaster Stephanostomum sp. -ACC----TATACGGTT---CTCT-GCCGTGTA------TATTAGT-C-ATGGT-AAGAA Haematolechus sp. -ACC----TATACGGTT---CTCT-GCCGTGTA------TCAGTG--C-ATGGT-AAGAA Fasciola sp. AACC----TTAACGGTT---CTCTTGCCCTGTA------TATTAGTGC-ATGGTAAAGAA S.mansoni AACC----GTCACGGTT---TTAC--TCTTGTG------ATTTGTTGC-ATGGT-AAGAA Echinococcus sp. CACCG --TTAATCGGTC---CTTA--CCTTGCA------ATTTTGT---ATGGT-GAGTA M.lignano -GCCG--TAAAGACGGT---CTCTTACTGCGAAGACTCAATTTATTGC-ATGCT-CAGTA S.med SL1 -GCCG--TTAGACGGTC---TTATCGAAATCTATAT---AAATCTTAT-ATGGT-ACGGA S.med SL2 -GCCG--TTAGACGGTC---TTATCGAAATCTATAT---AAAAATTAT-ATGGT-GAGG A Stylochus sp. TGCCGTATTTGACGGTCTCAAAAATTTCGTGTTTATTGCAATAATTGCAATGGT-AAGCA Notoplana sp. TGCCGTATTTGACGGTCTCAAAAATTTCGTGTTTATTGCAATAATTGCAATGGT-AAGCA .** : . * : : : *** * .* * Stephanostomum sp. TCGAA-----TTCGAC------CTATGGTCGAATAA-ATTCTTTGGCTAG-CCTCT---- Haematolechus sp. TCGAG-----TTCGACTCACATCGTTGGTCGAATAAGATTATTTGGCTAG-CCTCCACTC Fasciola sp. TCG-------TTGGAC------CATCGGTCCAAACCCATTATTTGGCTAG-CCTCCATTC S.mansoni CCG--------TCGAC------CAAGAATCGAAGTT--TTCTTTGGCAGC-CCTAACACA Echinococcus sp. TCGATGCAGCTCAGGCTG-TGCCTACGGAGCTGACCCAGTATTTGGCTGGTCCTT----- M.lignano TCGACCCAGCTTCATCAAAT-AAAAGAATGCGAATCGAATATACAGCCGAGCCCGACAAC S.med SL1 CCG--------TTATC------CAACATTAGTTGGTTAATTTTTGACAGTCACTTGAATC S.med SL2 CCG--------TTTGC------CAGCATTAGTTGGCTAATTTTTGACAGTAGCTTGCAT - Stylochus sp. TCAAAT-------GAT------CCAGTGTGATCGTCGAGTCTTTG--ACAGGCCG----- Notoplana sp. TCAAA--------GAT------CCA-TGTGATCGTCGAGTCTTTGACACAGGCCG----- *. . : * *: . * Stephanostomum sp. ---TCGGGGGCTAA------ 96 Haematolechus sp. TGGTCGGGGGCTA------- 108 Fasciola sp. TG--CAGAGGCTAAGAATCC 110 S.mansoni ----CGGGG----------- 91 Echinococcus sp. ----CGAGGGCC-------- 105 M.lignano TCGGCACTGTCTGCTCCGC- 130 S.med SL1 --ACAAGTGACTAT------ 107 S.med SL2 --GCAAGTGACTAT------ 106 Stylochus sp. ----CGAGGCCTATAT---- 111 Notoplana sp. ----CAAGGCCTATTT---- 111 .. * C.elegans D.melanogaster H.sapiens S.mansoni M.lignano 100 worms M.lignano sorted stem/germ cells M.lignano RNA Sequence complexity Normalized frequency X 10 4 0 20 100 40 60 80 0 2 4 6 8 10 12 Genome % of bases masked by TRF M.lignano 24.8 C.elegans 6.8 D.melanogaster 4.7 H.sapiens 2.2 S.mansoni 0.3 A ** eyes pharynx testes ovaries egg seminal vesicle stylet brain vesicula granulorum female antrum gut B E Nematostella vectensis Aurelia aurita Hydra magnipapillata Isodiametra pulchra Macrostomum lignano Schistosoma japonicum Schistosoma mansoni Dugesia japonica Schmidtea mediterranea Agropecten irradians Lumbricus rubellus Platynereis dumerilii Daphnia pulex Anopheles gambiae Drosophila melanogaster Caenorhabditis elegans Caenorhabditis briggsae Strongylocentrotus purpuratus Ciona intestinalis Danio rerio Xenopus laevis Gallus gallus Homo sapiens Cnidaria Acoela Ecdysozoa Lophotrochozoa Deuterostomia 0.1 Stenostomum sthenum Catenula lemnae Microstomum lineare Macrostomum lignano Prorhynchus stagnalis Geocentrophora sphyrocephala Prosthiostomum siphunculus Maritigrella crozieri Leptoplana tremerralis Echinoplana celerrima Microdalyellia schmidtii Microdalyellia fusca Mesostoma lingua Nematoplana coelogynoporoides Monocelis sp. Itaspiella helgolandica Schmidtea mediterranea Dugesia japonica Procotyla fluviatilis Dendrocoelum lacteum Bothrioplana semperi Neobenedenia melleni Gyrodactylus salaris Taenia solium Schistosoma japonicum Schistosoma mansoni Clonorchis siniensis C D Catenulida Macrostomorpha Lecithoepitheliata Polycladida Rhabdocoela Proseriata Tricladida Bothrioplanida Monogenea Cestoda Trematoda Platyhelminthes male gonopore adhesive organs ** head tail DAPI EdU T O DE 50μm 0 100 200 300 400 500 600 0 5 10 15 20 M. lignano K-mer model 23-mer Coverage 23-mer frequency X10 5 observed errors p1= 110 p2= 220 p3= 330 p4= 440 composite F 0 20 40 60 80 100 Contig size NG(x) % ML2 Assembly ML1 Assembly 100 1000 1e+06 1e+05 1e+04 G A B A B C D A B The free-living flatworm, Macrostomum lignano, much like its better known planarian relative, Schmidtea mediterranea, has a nearly unlimited regenerative capacity. Following injury, this species has the ability to regenerate almost an entirely new organism. This is attributable to the presence of an abundant somatic stem cell population, the neo- blasts. These cells are also essential for the ongoing maintenance of most tissues, as their loss leads to the rapid and irreversible degeneration of the animal. This set of unique properties makes flatworms an attractive species for studying the evolution of pathways involved in self-renewal, fate specification, and regeneration. The use of Macrosto- mum lignano, or other flatworms, as models, however, is hampered by the lack of a well-assembled and annotated genome sequence, fundamental to modern genetic and mo- lecular studies. Here we report the genomic sequence of Macrostomum lignano and an accompanying characterization of its transcriptome. The genome structure of Macrostomum lignano is remarkably complex, with ~75% of its sequence being comprised of simple repeats and transposon sequences. This has made high quality assembly from Illumina reads alone impossible (N50=414bp). We therefore obtained 130X coverage by long sequencing reads from the PacBio platform and combined this with more than 250X Illumina coverage to create a mixed assembly with a significantly improved N50 of 64 kb. We complemented the reference genome with an assembled and annotated transcriptome, and used both of these datasets in combination to probe gene expression patterns during regeneration, examining pathways important to stem cell function. Aditionally we found evidence of low levels of CpG methylation in Macrostomum lignano’s genome and evidence of trans-splicing in the worm’s transcriptome. Interestingly we found that flatworms lack Myc - a very conserved pluripotency factor in Bilaterians and beyond (cnidarians, poriferans). As a whole, our data will provide a crucial resource for the community for the study not only of invertebrate evolution but also of regeneration and so- matic pluripotency. Genome and transcriptome of the regeneration-competent flatworm, Macrostomum lignano. Wasik K.A.* 1 , Gurtowski J.* 1 , Zhou X. 1,2 , Ramos O.M 1 , Delas M.J. 1,3 , Battistoni G. 1,3 , El Demerdash O. 1 , Falciatori I. 1,3 , Vizoso D.B. 4 , Smith A.D. 5 , , Ladurner P. 6 , Scharer L. 4 , McCombie W.R. 1 , Hannon G.J. 1,3 and Schatz M. 1 1 Watson School of Biological Sciences, Howard Hughes Medical Institute, Cold Spring Harbor Laboratory, New York 11724, USA; 2 Molecular and Cellular Biology Graduate Program, Stony Brook University, NY 11794; 3 Cancer Research UK Cambridge Institute, Li Ka Shing Centre, University of Cambridge, Cambridge CB2 0RE, United Kingdom; 4 Department of Evolutionary Biology, Zoological Institute, University of Basel, 4051 Basel, Switzerland; 5 Molecular and Computational Biology, University of Southern California, Los Angeles, CA 90089, USA; 6 Department of Evolutionary Biology, Institute of Zoology and Center for Molecular Biosciences Innsbruck, University of Innsbruck, A-6020 Innsbruck, Austria A. Phylogenetic analysis of 23 animal species using partial sequences of 43 genes. Fig. modified from Egger et al. (2015). B. Interference contrast image and a diagrammatic representation of an adult Macrostomum lignano. C. Phylogenomic analysis of 27 flatworm species (21 free-living and 6 neodermatan) using >100,000 aligned amino acids. Fig. modified from Egger et al. (2015). D. Electron micrograph of a M. lignano neoblast. Note the small rim of cytoplasm (yellow) and the lack of cytoplasmic differentiation. er - endoplasmic reticulum; mi - mitochondria; mu - muscle; ncl - nucleolus; nu - nucleus (red). E. Immunofluorescence labeling of dividing neoblasts with EdU (red) in an adult worm. All cell nuclei are stained with DAPI (blue). T- testes, O - ovaries, DE – developing eggs, asterisks denote eyes. F. Representation of 23-mer frequency and cover- age in the Illumina sequencing data generated from DNA extracted from a population of adult worms. M. lignano shows unusual 4-modal 23-mer distribution. G. Comparison of Illumina only (ML1) and Pacbio (ML2) assemblies. Contig length distribution (Log2 scale) over the M. lignano genome in the ML1 (green) and ML2 (red) assemblies. Note that the ML1 assembly covers only about 55% of the genome. 10 20 Contig size X 10 Kbp C1 Contig ID miRNA count Large repetitive elements Trancsript abundance Tandem repeat unit size in bp Tandem repeat count GC content Illumina coverage 0 10 20 30 40 50 60 0 10 20 30 40 50 0 10 20 30 40 0 10 20 30 40 0 10 20 0 10 20 30 40 0 10 20 30 40 0 10 20 30 40 0 10 20 30 40 0 10 20 30 40 0 10 20 30 0 10 20 30 0 10 20 30 0 10 20 30 0 10 20 30 0 10 20 30 0 10 20 30 0 10 20 30 0 10 20 30 0 10 20 30 0 10 20 30 0 10 20 30 0 10 20 30 0 10 20 30 0 10 20 30 0 10 20 30 0 10 20 30 0 10 20 30 0 10 20 30 0 10 20 30 0 10 20 30 0 10 20 30 0 10 20 30 0 10 20 30 0 10 20 30 0 10 20 30 0 10 20 30 0 10 20 30 0 10 20 30 0 10 20 30 0 10 20 30 0 10 20 30 0 10 20 30 0 10 20 30 0 10 20 30 0 10 20 30 0 10 20 30 0 10 20 30 0 10 20 0 10 20 C1 C50 10 A. Schematic representation of the experimental design: 200 worms (per replicate) underwent amputation at a level between the brain and the gonads. The heads were allowed to regenerate, and regenerating animals were collected at different timepoints post amputation (0, 3, 6, 12, 24, 48, 72 hours). RNA-Seq libraries from each timepoint were analyzed for differentially expressed genes. Below - a heat map of differentially expressed genes at different regeneration timepoints. Each replicate is plotted separately. Downregulated and upregulated transcripts are labeled in green and red, respectively. Scale covers Log2 values. The samples are grouped with complete-linkage clustering using Euclidean distance. B.Known pluripotency pathways from H. sapiens and M. musculus were adapted from the Kyoto Encyclopedia of Genes and Genomes. Factors that had potential homologues in M. lignano are labeled. A.Overview of the 50 largest contigs in the M. lignano genome, making up about 2.6 % of the total assembly. Different tracks denote (moving inwards): contig size X 10 Kbp; miRNA count (1-54 mapped miRNAs); large repetitive elements (RepeatScout) (1-4476 identified repeats); transcript count (1-43 mapped transcripts); Tandem repeat unit size in base pairs (1-500); Tandem repeat count (1-28); GC content (0-1); and Illumina coverage (4-160X). The color gradients correspond to the range of values for each track (lower values are lighter, higher values are darker). B. Sequence complexity comparison across five organisms. Drosophila melanogaster has an abundance of very low complexity sequence, not found in the other species. Macrostomum lignano has a sizable amount of moderately complex sequences that are not found in other species and that do not appear to be expressed. C. Tandem Repeat Finder was run on five species to assess their tandem and low complexity sequence composition. Macrostomum lignano had far more bases masked by Tandem Repeat Finder than the other organisms in the test set. D.The number of reciprocal blast hits against the Homo sapiens tran- scriptome for four different species: Macrostomum lignano, Schmidtea mediterranea, Caenorhabditis elegans, and Drosophila melanogaster. Only the number of hits passing the E-value cutoff of ≤1e-10 is shown. A. M. lignano is lacking the Myc gene. Evolution of the of Myc and Max gene families across different repre- sentatives of the animal phyla. Mycs and Maxs gene candidates are retrieved based on reciprocal best BLASTp from the available transcriptomes. The distance tree was inferred using neighbor-joining based on JTT sequence evolution model (1000 bootstrap replicates). Human USF proteins are used as an outgroup. The Myc branch is labeled in green, the Max branch is labeled in blue. dme – D. melanogaster, hsa – H. sapiens, lgi – L. gigantea, cte – C. teleta, hvu – H. vulgaris, aqu – A. queenslandica, cel – C. elegans, mli – M. lignano, mfu - M. fusca, mosp – Monocelis sp, psi – P. siphunculus, ltr – L. tremelaris, ece – E. celerrima, , meli – M. lingua , msc – M. schmidtii, mili – M. lineare, nco – N. coelogynoporoides, rsp – Rhabdopleura sp., gvu – G. vulgaris, csp – Cerebratulus sp., pca – P. caudatus, sst S. sthenum, cle – C. lemnae, pdu – P. dumerilli, sma – S. mediterranea, scma – S. mansoni. Transcript ID is next to each phylum name. B. Trans splicing in M. lignano. Align- ment between first 130nt of M. lignano’s putative SL RNA and SL RNAs from other flatworms. The conserved splice junction is indicated by an arrowhead. Spliced leader sequences are labeled in blue. The potential initiator AUG (last three nucleotides of the spliced leader) is labeled in green. Conclusions: We have assembled and annotated a highly repetitive genome using a mix of Pacbio and Illumina sequencing • We have found that: - M. lignano’s genome shows evidence of CpG methylation - It has retained a large number of homeoboxes as compared to other flatworms - The transcriptome shows evidence of trans-splicing - Flatworms lost the very conserved Myc gene • We have characterized the gene expression patterns during regeneration in M. lignano • Wasik et al. PNAS (2015); doi: 10.1073/pnas.1516718112

Upload: phunghuong

Post on 01-Apr-2018

220 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Genome and transcriptome of the regeneration …schatzlab.cshl.edu/publications/posters/2015/2015.GI.Macro.pdfGenome and transcriptome of the regeneration-competent flatworm, Macrostomum

The Worm Genome and Transcriptome Assembly

Genome and Transcriptome Annotation

Transcripts Involved in Regeneration in M. lignano

Additional Findings

RNAseq

0 3 6 12 24 48 72hours

RNAextraction

DEseqamputation

0h 3h 6h 12h 24h 48h 72h5 0 5

Value

Color Key

MAPK signalling pathwayLIF LIFR

JAK STAT3

Grb2 Mek

PI3K Akt

Erk1/2

Klf4

c-Myc

Tbx3Sox2

Nanog

Oct4

Jak-Stat signalling pathway

MAPK signalling pathway

ActivinNodal ACVRI/II SMAD2/3 Proliferation

DNA

DNA

Bmp4 BMPRI/II

SMADs

SMAD4

Erk1/2p38

ID

DUSP9

TGFβRTGFβ

TGFβ signalling pathway

Wnt Frizzled Dvl GSK-3β β-cateninAxin APC

Wnt signalling pathway

FGF2 FGFR Ras Raf Mek Erk1/2

IGF IGF-IR PI3K Akt

DNA

DNA

DNA

DNA

DNA

DNA

Core transcriptional network

c-MycMAPK signalling pathway

PI3K-Akt signalling pathway

Pluripotency pathways from human/murine stem cells:

Not found in M.lignano transcriptome

Found in M.lignano transcriptome

Factors with TGFβ -like domain found M.lignano transcriptome

STAT3 not found, other STATs identified in M.lignano transcriptome

KLF4 not found, other KLFs identified in M.lignano transcriptome

BMP4 not found, other BMPs identified in M.lignano transcriptome

DNA

DNA

DNA

DNA

DNA

Genes from human/murine stem cells:

0.3

max_hvu

diminutive_dme

5392611_gvu

4367212__psi

2235912_cle

8844611_ece

8320811_csp

1677112_pca

max_aqu

329761_mili

8264415_csp

6889311_pst

358631_mosp

232631_meli

1291201_ltr

3759811_pca

331811_mosp

3433911_mli

4337911_mfu

4338711_sst

max_hsa

1643511_psi

7137911_rsp

1674911_mfu

3098031_mfu

USF2_hsa

7573711_mli

4658511_mili

170531_meli

4991111_ltr9649121_mli

5204311_gvu

4338712_sst

401711_ece

1384111_pca

8002813_rsp

1535711_msc

1865211_psi

1677111_pca

myc2_hvu

S000209_sma

4247911_psi

133235_lgi

2472711_psi

9705812_mli

3981211_mfu

96815g22_mli

8721511_ltr

5733111_gvu

4658514_mili

S35429_sma

368781_mosp

mycl_hvu

6927111_pst

6637312_rsp

2694411_nco2289111_nco

7019111_pst

3029911_gvu

193181i_pca

Max_dme

500662_mili

myc_aqu

MXL1_cel

lmyc_hsa

1961211_msc

4186911_sst

1086232_mli

437061_meli

442451_meli

1131511_msc

2256811_cle

5046551_psi

173401_mosp

6012811_rsp

8769711_nco

785741_scma-785751_scma

4186914_sst

370851_mili

323911_mosp

5856311_mfu

1028284_mli

nmyc_hsa

699701_csp

1030111_mli

166474_cte

USF1_hsa

MXL3_cel

88480_lgi

740831_csp

myc1_hvu

4161311_mfu

301971_mosp

118760_cte

cmyc_hsa

8624611_mli

myc_pdu

Max_pdu

mycAl_hvu

0.837

0.891

0.9760.784

0.998

0.892

0.728

0.938

0.9180.769

0.8620.834

0.963

0.7430.695

0.923

0.861

0.81

0.9

0.974

0.727

0.789

0.98

0.918

0.995

0.930.767

0.836

0.953

0.9

0.815

0.771

0.996

1

1

0.974

0.822

0.931

0.770.907

0.783

0.903

0.998

0.787

0.919

0.996

0.962

0.991

0.828

0.969

1

0.926

0.951

0.928

1122 820

500

1439

402480

355

317

1747

535

262

281

218

581

1368

M. lig

nano

S. mediterranea C. elegans

D. m

elanogaste

r

Stephanostomum sp. -ACC----TATACGGTT---CTCT-GCCGTGTA------TATTAGT-C-ATGGT-AAGAA

Haematolechus sp. -ACC----TATACGGTT---CTCT-GCCGTGTA------TCAGTG--C-ATGGT-AAGAA

Fasciola sp. AACC----TTAACGGTT---CTCTTGCCCTGTA------TATTAGTGC-ATGGTAAAGAA

S.mansoni AACC----GTCACGGTT---TTAC--TCTTGTG------ATTTGTTGC-ATGGT-AAGAA

Echinococcus sp. CACCG--TTAATCGGTC---CTTA--CCTTGCA------ATTTTGT---ATGGT-GAGTA

M.lignano -GCCG--TAAAGACGGT---CTCTTACTGCGAAGACTCAATTTATTGC-ATGCT-CAGTA

S.med SL1 -GCCG--TTAGACGGTC---TTATCGAAATCTATAT---AAATCTTAT-ATGGT-ACGGA

S.med SL2 -GCCG--TTAGACGGTC---TTATCGAAATCTATAT---AAAAATTAT-ATGGT-GAGG

A

Stylochus sp. TGCCGTATTTGACGGTCTCAAAAATTTCGTGTTTATTGCAATAATTGCAATGGT-AAGCA

Notoplana sp. TGCCGTATTTGACGGTCTCAAAAATTTCGTGTTTATTGCAATAATTGCAATGGT-AAGCA

.** : . * : : : *** * .* *

Stephanostomum sp. TCGAA-----TTCGAC------CTATGGTCGAATAA-ATTCTTTGGCTAG-CCTCT----

Haematolechus sp. TCGAG-----TTCGACTCACATCGTTGGTCGAATAAGATTATTTGGCTAG-CCTCCACTC

Fasciola sp. TCG-------TTGGAC------CATCGGTCCAAACCCATTATTTGGCTAG-CCTCCATTC

S.mansoni CCG--------TCGAC------CAAGAATCGAAGTT--TTCTTTGGCAGC-CCTAACACA

Echinococcus sp. TCGATGCAGCTCAGGCTG-TGCCTACGGAGCTGACCCAGTATTTGGCTGGTCCTT-----

M.lignano TCGACCCAGCTTCATCAAAT-AAAAGAATGCGAATCGAATATACAGCCGAGCCCGACAAC

S.med SL1 CCG--------TTATC------CAACATTAGTTGGTTAATTTTTGACAGTCACTTGAATC

S.med SL2 CCG--------TTTGC------CAGCATTAGTTGGCTAATTTTTGACAGTAGCTTGCAT

-

Stylochus sp. TCAAAT-------GAT------CCAGTGTGATCGTCGAGTCTTTG--ACAGGCCG-----

Notoplana sp. TCAAA--------GAT------CCA-TGTGATCGTCGAGTCTTTGACACAGGCCG-----

*. . : * *: . *

Stephanostomum sp. ---TCGGGGGCTAA------ 96 Haematolechus sp. TGGTCGGGGGCTA------- 108 Fasciola sp. TG--CAGAGGCTAAGAATCC 110 S.mansoni ----CGGGG----------- 91 Echinococcus sp. ----CGAGGGCC-------- 105 M.lignano TCGGCACTGTCTGCTCCGC- 130 S.med SL1 --ACAAGTGACTAT------ 107 S.med SL2 --GCAAGTGACTAT------ 106 Stylochus sp. ----CGAGGCCTATAT---- 111 Notoplana sp. ----CAAGGCCTATTT---- 111 .. *

C.elegans

D.melanogaster

H.sapiens

S.mansoni

M.lignano 100 wormsM.lignano sorted stem/germ cellsM.lignano RNA

Sequence complexity

Norm

alize

d fre

quen

cy X

104

0 20 10040 60 800

2

4

6

8

10

12

Genome% of bases masked by TRF

M.lignano 24.8

C.elegans 6.8

D.melanogaster 4.7

H.sapiens 2.2

S.mansoni 0.3

A

**

eyes

pharynx

testes ovaries egg seminal vesicle

stylet

brain

vesiculagranulorum

femaleantrumgut

B E

Nematostella vectensis

Aurelia aurita

Hydra magnipapillata

Isodiametra pulchra

Macrostomum lignano

Schistosoma japonicum

Schistosoma mansoni

Dugesia japonica

Schmidtea mediterranea

Agropecten irradians

Lumbricus rubellus

Platynereis dumerilii

Daphnia pulex

Anopheles gambiae

Drosophila melanogaster

Caenorhabditis elegans

Caenorhabditis briggsae

Strongylocentrotus purpuratus

Ciona intestinalis

Danio rerio

Xenopus laevis

Gallus gallus

Homo sapiens

Cnidaria

Acoela

Ecdysozoa

Loph

otro

choz

oaDe

uter

osto

mia

0.1

Stenostomum sthenum

Catenula lemnaeMicrostomum lineare

Macrostomum lignano

Prorhynchus stagnalis

Geocentrophora sphyrocephalaProsthiostomum siphunculusMaritigrella crozieri

Leptoplana tremerralis

Echinoplana celerrima

Microdalyellia schmidtii

Microdalyellia fuscaMesostoma lingua

Nematoplana coelogynoporoidesMonocelis sp.

Itaspiella helgolandicaSchmidtea mediterranea

Dugesia japonicaProcotyla fluviatilis

Dendrocoelum lacteumBothrioplana semperi

Neobenedenia melleni

Gyrodactylus salaris

Taenia solium

Schistosoma japonicumSchistosoma mansoni

Clonorchis siniensis

C

D

CatenulidaMacrostomorphaLecithoepitheliataPolycladidaRhabdocoelaProseriata

TricladidaBothrioplanidaMonogeneaCestodaTrematoda

Platyhelminthes

malegonopore

adhesiveorgans

* *head

tail

DAPIEdU

TO

DE

50μm

0 100 200 300 400 500 600

05

1015

20

M. lignano K-mer model

23-mer Coverage

23-m

er fr

eque

ncy

X105

observederrorsp1= 110p2= 220p3= 330p4= 440composite

F

0 20 40 60 80 100

Cont

ig s

ize

NG(x) %

ML2 AssemblyML1 Assembly

100

100

0 1

e+06

1e+

05 1

e+04

G

A B

A B

C D

A B

The free-living flatworm, Macrostomum lignano, much like its better known planarian relative, Schmidtea mediterranea, has a nearly unlimited regenerative capacity. Following injury, this species has the ability to regenerate almost an entirely new organism. This is attributable to the presence of an abundant somatic stem cell population, the neo-blasts. These cells are also essential for the ongoing maintenance of most tissues, as their loss leads to the rapid and irreversible degeneration of the animal. This set of unique properties makes flatworms an attractive species for studying the evolution of pathways involved in self-renewal, fate specification, and regeneration. The use of Macrosto-

mum lignano, or other flatworms, as models, however, is hampered by the lack of a well-assembled and annotated genome sequence, fundamental to modern genetic and mo-lecular studies.

Here we report the genomic sequence of Macrostomum lignano and an accompanying characterization of its transcriptome. The genome structure of Macrostomum lignano is remarkably complex, with ~75% of its sequence being comprised of simple repeats and transposon sequences. This has made high quality assembly from Illumina reads alone impossible (N50=414bp). We therefore obtained 130X coverage by long sequencing reads from the PacBio platform and combined this with more than 250X Illumina coverage to create a mixed assembly with a significantly improved N50 of 64 kb.

We complemented the reference genome with an assembled and annotated transcriptome, and used both of these datasets in combination to probe gene expression patterns during regeneration, examining pathways important to stem cell function. Aditionally we found evidence of low levels of CpG methylation in Macrostomum lignano’s genome and evidence of trans-splicing in the worm’s transcriptome. Interestingly we found that flatworms lack Myc - a very conserved pluripotency factor in Bilaterians and beyond (cnidarians, poriferans). As a whole, our data will provide a crucial resource for the community for the study not only of invertebrate evolution but also of regeneration and so-matic pluripotency.

Genome and transcriptome of the regeneration-competent flatworm, Macrostomum lignano.

Wasik K.A.*1, Gurtowski J.*1, Zhou X.1,2, Ramos O.M1, Delas M.J.1,3, Battistoni G.1,3, El Demerdash O.1, Falciatori I.1,3, Vizoso D.B.4, Smith A.D.5 ,, Ladurner P.6, Scharer L.4, McCombie W.R.1, Hannon G.J.1,3 and Schatz M.1

1 Watson School of Biological Sciences, Howard Hughes Medical Institute, Cold Spring Harbor Laboratory, New York 11724, USA; 2 Molecular and Cellular Biology Graduate Program,

Stony Brook University, NY 11794; 3 Cancer Research UK Cambridge Institute, Li Ka Shing Centre, University of Cambridge, Cambridge CB2 0RE, United Kingdom; 4 Department of

Evolutionary Biology, Zoological Institute, University of Basel, 4051 Basel, Switzerland; 5 Molecular and Computational Biology, University of Southern California, Los Angeles, CA 90089,

USA; 6 Department of Evolutionary Biology, Institute of Zoology and Center for Molecular Biosciences Innsbruck, University of Innsbruck, A-6020 Innsbruck, Austria

A. Phylogenetic analysis of 23 animal species using partial sequences of 43 genes. Fig. modified from Egger et al. (2015). B. Interference contrast image and a diagrammatic

representation of an adult Macrostomum lignano. C. Phylogenomic analysis of 27 flatworm species (21 free-living and 6 neodermatan) using >100,000 aligned amino acids.

Fig. modified from Egger et al. (2015). D. Electron micrograph of a M. lignano neoblast. Note the small rim of cytoplasm (yellow) and the lack of cytoplasmic differentiation.

er - endoplasmic reticulum; mi - mitochondria; mu - muscle; ncl - nucleolus; nu - nucleus (red). E. Immunofluorescence labeling of dividing neoblasts with EdU (red) in an

adult worm. All cell nuclei are stained with DAPI (blue). T- testes, O - ovaries, DE – developing eggs, asterisks denote eyes. F. Representation of 23-mer frequency and cover-

age in the Illumina sequencing data generated from DNA extracted from a population of adult worms. M. lignano shows unusual 4-modal 23-mer distribution. G. Comparison

of Illumina only (ML1) and Pacbio (ML2) assemblies. Contig length distribution (Log2 scale) over the M. lignano genome in the ML1 (green) and ML2 (red) assemblies. Note

that the ML1 assembly covers only about 55% of the genome.

10 20 Contig size X 10 Kbp

C1 Contig ID

miRNA count Large repetitive elementsTrancsript abundance

Tandem repeat unit size in bpTandem repeat count

GC content

Illumina coverage

0 10 20 30 40 50 60 0 10 20 30 40 50 0 10 2030

40

010

2030

40

010

20

010203040

01020

3040

0102030

40010

2030400102030400

102030

0102030

0102030

010

2030

010

2030

010

2030

0

102030

0102030

010203001020300102030010203001020300102030

0102030

0102030

010

2030

0102030

0102030

010

2030

010

2030

010

20300

1020

300

102030

01020300

1020300

102030

01020300

10

2030

010

2030

0

102030

0102030

010

2030

010

20300

1020

300

1020

30 010

2030 0

102030 0

1020 0

10 20

C1C50

10

A. Schematic representation of the experimental design: 200 worms (per replicate) underwent amputation at a level between the brain and the gonads. The heads were

allowed to regenerate, and regenerating animals were collected at different timepoints post amputation (0, 3, 6, 12, 24, 48, 72 hours). RNA-Seq libraries from each timepoint

were analyzed for differentially expressed genes. Below - a heat map of differentially expressed genes at different regeneration timepoints. Each replicate is plotted separately.

Downregulated and upregulated transcripts are labeled in green and red, respectively. Scale covers Log2 values. The samples are grouped with complete-linkage clustering

using Euclidean distance. B.Known pluripotency pathways from H. sapiens and M. musculus were adapted from the Kyoto Encyclopedia of Genes and Genomes. Factors

that had potential homologues in M. lignano are labeled.

A.Overview of the 50 largest contigs in the M. lignano genome, making up about 2.6 % of the total assembly. Different tracks denote (moving inwards): contig size X 10 Kbp;

miRNA count (1-54 mapped miRNAs); large repetitive elements (RepeatScout) (1-4476 identified repeats); transcript count (1-43 mapped transcripts); Tandem repeat unit

size in base pairs (1-500); Tandem repeat count (1-28); GC content (0-1); and Illumina coverage (4-160X). The color gradients correspond to the range of values for each

track (lower values are lighter, higher values are darker). B. Sequence complexity comparison across five organisms. Drosophila melanogaster has an abundance of very low

complexity sequence, not found in the other species. Macrostomum lignano has a sizable amount of moderately complex sequences that are not found in other species and

that do not appear to be expressed. C. Tandem Repeat Finder was run on five species to assess their tandem and low complexity sequence composition. Macrostomum

lignano had far more bases masked by Tandem Repeat Finder than the other organisms in the test set. D.The number of reciprocal blast hits against the Homo sapiens tran-

scriptome for four different species: Macrostomum lignano, Schmidtea mediterranea, Caenorhabditis elegans, and Drosophila melanogaster. Only the number of hits passing

the E-value cutoff of ≤1e-10 is shown.

A. M. lignano is lacking the Myc gene. Evolution of the of Myc and Max gene families across different repre-

sentatives of the animal phyla. Mycs and Maxs gene candidates are retrieved based on reciprocal best

BLASTp from the available transcriptomes. The distance tree was inferred using neighbor-joining based on

JTT sequence evolution model (1000 bootstrap replicates). Human USF proteins are used as an outgroup.

The Myc branch is labeled in green, the Max branch is labeled in blue. dme – D. melanogaster, hsa – H.

sapiens, lgi – L. gigantea, cte – C. teleta, hvu – H. vulgaris, aqu – A. queenslandica, cel – C. elegans, mli –

M. lignano, mfu - M. fusca, mosp – Monocelis sp, psi – P. siphunculus, ltr – L. tremelaris, ece – E. celerrima,, meli – M. lingua , msc – M. schmidtii, mili – M. lineare, nco – N. coelogynoporoides, rsp – Rhabdopleura sp., gvu – G. vulgaris, csp – Cerebratulus sp., pca – P. caudatus, sst

– S. sthenum, cle – C. lemnae, pdu – P. dumerilli, sma – S. mediterranea, scma – S. mansoni. Transcript ID is next to each phylum name. B. Trans splicing in M. lignano. Align-

ment between first 130nt of M. lignano’s putative SL RNA and SL RNAs from other flatworms. The conserved splice junction is indicated by an arrowhead. Spliced leader

sequences are labeled in blue. The potential initiator AUG (last three nucleotides of the spliced leader) is labeled in green.

Conclusions:

• We have assembled and annotated a highly repetitive genome using a mix of Pacbio and Illumina sequencing

• We have found that:

- M. lignano’s genome shows evidence of CpG methylation

- It has retained a large number of homeoboxes as compared to other flatworms

- The transcriptome shows evidence of trans-splicing

- Flatworms lost the very conserved Myc gene

• We have characterized the gene expression patterns during regeneration in M. lignano

• Wasik et al. PNAS (2015); doi: 10.1073/pnas.1516718112