1 of 31 dr. giulietta m. spudich european bioinformatics institute the ensembl browser

Download 1 of 31 Dr. Giulietta M. Spudich European Bioinformatics Institute The Ensembl Browser

If you can't read please download the document

Upload: annabelle-stephens

Post on 18-Jan-2018

214 views

Category:

Documents


0 download

DESCRIPTION

3 of 31 Course Objectives gene  How to browse information about a gene transcript  How to choose a transcript variations  Where to find sequence variations alignments  How to view multiple alignments BioMart  How to use BioMart help  Where to go for help

TRANSCRIPT

1 of 31 Dr. Giulietta M. Spudich European Bioinformatics Institute The Ensembl Browser 2 of 31 Today Introduction to the Ensembl project and gene set Walk-through of the browser Hands-on Browser BioMart Lunch BioMart Hands-on Comparative Genomics + Hands-on Variations &Functional Genomics + Hands-on 3 of 31 Course Objectives gene How to browse information about a gene transcript How to choose a transcript variations Where to find sequence variations alignments How to view multiple alignments BioMart How to use BioMart help Where to go for help 4 of 31 Introduction to Ensembl Why do we have genome browsers? Why Ensembl? Ensembl genes and genomes Where to go for help? 5 of 31 Histone modification DNase I sensitive site Gene Allele Conserved sequence Genome browsers provide a map Figure adapted from the ENCODE project 6 of 31 Genome Browsers Ensembl Genome browserNCBI Map ViewerUCSC Genome Browser 7 of 31 Ensembl Features The gene set. Automatic annotation based on mRNA and protein information plus manual annotation (GENCODE set). BioMart (data export tool) Comparative analysis (gene trees) Variation and functional genomics Integration with other databases (DAS) Programmatic access via the Perl API (open source) 8 of 31 Subjects Why do we have genome browsers? Why Ensembl? Ensembl genes and genomes Where to go for help? 9 of 31 To meet a challenge Ensembls AIM: To provide annotation for the biological community that is freely available and of high quality Started in 2000 Joint project between EBI and Sanger Funded primarily by the Wellcome Trust, additional funding by EMBL, NIH-NIAID, EU, BBSRC and MRC 10 of 31 Genome annotation Genome annotation is the process of attaching biological information to sequences. It consists of two main steps: 1. Identifying genes on the genome. 2. Attaching biological information to genes and the genome. (For example, effects of sequence variation). 11 of 31 Ensembl Annotates Vertebrate Genomes Non-chordates: D. melanogaster C. elegans S. cerevisiae 50 species including: 12 of 3112 of of 35 3 Plasmodia falciparum knowlesi vivax 48 Chordates including: Human Mouse Zebrafish Chicken Chimpanzee Pig Platypus 134 species - 6 bacterial clades - 1 prokaryotic clades 8 Aspergillums 2 yeast - S.cerevisiae - S.pombe 8 species Arabidopsis thaliana Arabidopsis lyrata Oryza sativa : Extending Ensembl across the taxonomic space 21 species Drosophila (12) Caenorhabditis (5) Anopheles gambiae F. D. Ciccarelli, T. Doerks, C. von Mering, C. J. Creevey, B. Snel & P. Bork. Towards automatic reconstruction of a highly resolved tree of life. Science, 3 March Slide design by Jeff Almeida-King 13 of 31 Exploring genomes Vertebrates focus:Other species: 14 of 31 Subjects Why do we have genome browsers? Why Ensembl? Ensembl genes and genomes Where to go for help? 15 of 31 What is known? Genomic assemblies from sequencing consortia 16 of 31 What is known? UniProtKB/Swiss-Prot (manually curated) UniProtKB/TrEMBLNCBI RefSeq (manually curated)Proteins and cDNA/mRNA sequences from the research community found in: Note: See pages 55 and 56 of the course booklet 17 of 31 Exon Untranslated+Coding CodingUntranslated tgcctgttag... Combining genes and genomes 18 of 31 Too many pieces Genome Aligned cDNA and protein Exon Untranslated+Coding CodingUntranslated 19 of 31 Ensembl shows one transcript with underlying evidence 20 of 31 Ensembl Compared with Swiss-Prot and NCBI RefSeq sequences 21 of 31 Is there any consensus? NCBI RefSeq set UniProt set Ensembl combines these sets UCSC has its own gene set How do we come up with a consensus gene set between all these? 22 of 31 CCDS Reaching a consensus coding sequence set for human and mouse. 19,851 (ENS human), 17,679 (ENS mouse) (*as of Sept 2009) If you see a CCDS ID, the coding sequence is agreed upon. Genome Res Jul;19(7): Epub 2009 Jun 4 23 of 31 VEGA/Havana Automatic annotation pipeline: Gene building all at once (whole genome) Ensembl Manual curation: case-by-case basis VEGA: Vertebrate Genome Annotation Havana 24 of 31 Genes and Transcripts in Ensembl High Quality: CCDS transcripts Ensembl/Havana merged transcripts 25 of 31 Ensembl/Havana Transcripts are from:EnsemblHavana Ensembl/Havana merge 26 of 31 Gene Names in Ensembl ENSG###Ensembl Gene ID ENST###Ensembl Transcript ID ENSP###Ensembl Peptide ID ENSE###Ensembl Exon ID For other species than human a suffix is added: MUS (Mus musculus) for mouse: ENSMUSG### DAR (Danio rerio) for zebrafish: ENSDARG###, etc. 27 of 31 How is all this information organised? Ensembl Views (Website) Ensembl Database (open source) BioMart DataMining tool 28 of 31 What other annotation? Non-coding (nc)RNAs IDs in other databases microarray probes, clonesets, BAC maps Other features of the genome: repeats, CpG islands Homologs and whole genome alignments: orthologues and paralogues, protein families, syntenic regions Variation data: Single Nucleotide Polymorphisms, InDels, CNVs Regulatory data (a first guess at promoter and enhancer elements) Data from external sources (DAS) 29 of 31 Subjects Why do we have genome browsers? Why Ensembl? Ensembl genes and genomes Where to go for help? 30 of 31 Help and Information Comments and questions? Check out our tutorials page:VideosMailing list Come visit our blog!FTP site: ftp://ftp.ensembl.orgftp://ftp.ensembl.org Amazon Web Services: 31/40 EnsemblPaul Flicek (EBI), Steve Searle (Sanger Institute) SoftwareGlenn Proctor, Andreas Khri, Stephen Keenan, Rhoda Kinsella, Eugene Kulesha, Ian Longden, Iliana Toneva, Jorge Zamora Comparative GenomicsJavier Herrero, Kathryn Beal, Stephen Fitzgerald, Leo Gordon Functional GenomicsIan Dunham, Nathan Johnson, Daniel Sobral, Steven Wilder VariationFiona Cunningham, Pontus Larsson, Will McLaren, Graham Ritchie Analysis and AnnotationJan-Hinnerck Vogel, Bronwen Aken, Susan Fairley, Thibaut Hourlier, Magali Ruffier, Simon White, Amy Tang, Amonida Zadissa Web TeamAnne Parker, Ridwan Amode, Simon Brent, Maurice Hendrix, Bethan Pritchard, Steve Trevanion (VEGA) OutreachXos M Fernndez, Jeff Almeida-King, Bert Overduin, Michael Schuster (QC), Giulietta Spudich, Jana Vandrovcova Systems & SupportGuy Coates, James Beal, Gen-Tao Chiang, Peter Clapham, Simon Kelley, Shelley Goddard, Tracy Mumford, Kerry Smith Research Benot Ballester, Petra Catalina Schwalie, Andr Faure, Markus Fritz, Damian Keefe, Alison Meynert, Dace Ruklisa, Mikhail Spivakov, David Thybert, Sander Timmer, Albert Vilella Vertebrate Genomics Chao-Kung Chen, Laura Clarke, Jonathan Hinton, Zam Iqbal, Vasudev Kumanduri, Ilkka Lappalainen, Edoardo Marcora, Pablo Marn, Damian Smedley, Richard Smth, Phil Wilkinson, Holly Zheng-Bradley Ensembl Genomes Paul Kersey, Paul Derwent, Matthias Haimel, Alan Horne, Arnaud Kerhornou, Uma Maheswari, Michael Nuhn, Dan Staines, Andy Yates VectorBaseDan Lawson, Gautier Koscielny, Karyn Megy ZebrafishKerstin Howe, Kim Brugger, Will Chow, Britt Reimholz, James Torrance Ensembl StrategyEwan Birney, Richard Durbin, Tim Hubbard Ensembl Team Ensembls 10 th Year Nucleic Acids Res