emboss as a das client
DESCRIPTION
EMBOSS as a DAS Client. Peter Rice [email protected] Mahmut Uludag [email protected] 3rd March 2011. EMBOSS: A quick introduction. European Molecular Biology Open Software Suite Open source package for sequence analysis ANSI C source code GPL licensed applications, LGPL libraries - PowerPoint PPT PresentationTRANSCRIPT
EBI is an Outstation of the European Molecular Biology Laboratory.
EMBOSS as a DAS Client
Peter Rice [email protected]
Mahmut Uludag [email protected]
3rd March 2011.
EMBOSS as a DAS Client21 April 20232
EMBOSS: A quick introduction
• European Molecular Biology Open Software Suite
• Open source package for sequence analysis• ANSI C source code• GPL licensed applications, LGPL libraries• 200+ applications• 100+ third party applications in 15 associated packages• Project started 1996 at Sanger Centre and HGMP • Now based at EBI• Release 6.3.0 15th July 2010• Funded by UK-BBSRC and EMBL-EBI
EMBOSS as a DAS Client21 April 20233
EMBOSS history
• Project started at Sanger Centre and SEQNET August 1996• Alan moved from SEQNET 1997 (Wellcome funding)• Peter moved to Lion Bioscience 2000 (CCP11-BBSRC/MRC)• Peter moved to EBI 2003• HGMP closed 2005: Alan+Jon moved to EBI• BBSRC funding (limited) 2006-2009• BBSRC BBR funding 2009-2011
• Major new developments• New data types• New data sources• Built-in ontologies
EMBOSS as a DAS Client21 April 20234
EMBOSS command line interface
• EMBOSS applications run from the command line• This is not the only interface
• There are over 100 interfaces and packaged systems available• Web interfaces• Graphical user interfaces (GUIs)• Web services
• All applications have a command definition file (.acd)• Defines all inputs, outputs, and other options• Read at startup• Contains all command line options with descriptions• Template for any other interface
EMBOSS as a DAS Client21 April 20235
EMBOSS command line example
% antigenic
Input protein sequence(s): uniprot:actb1_fugru
Minimum length of antigenic region [6]:
Output report [actb1_fugru.antigenic]:
% antigenic uniprot:actb1_fugru -auto
EMBOSS as a DAS Client21 April 20236
EMBOSS ACD File
application: antigenic [ documentation: "Finds antigenic sites in proteins" groups: "Protein:Motifs"]
section: input [ information: "Input section” type: "page“ ]
seqall: sequence [ parameter: "Y" type: “proteinstandard" ]
endsection: input
section: required [ information: "Required section” type: "page” ]
integer: minlen [ standard: "Y" minimum: "1" maximum: "50" default: "6" information: "Minimum length of antigenic region" ]
endsection: required
section: output [ information: "Output section” type: "page” ]
report: outfile [ parameter: "Y" rformat: "motif" multiple: "Y" taglist: "int:pos=Max_score_pos" ]
endsection: output
EMBOSS as a DAS Client21 April 20237
EMBOSS ACD File with EDAM Annotation
application: antigenic [ documentation: "Finds antigenic sites in proteins" groups: "Protein:Motifs" relations: "EDAM:0000201 topic Immunological analysis" relations: "EDAM:0000416 operation Epitope mapping“]
section: input [ information: "Input section“ type: "page” ]
seqall: sequence [ parameter: "Y" type: “proteinstandard" relations: "EDAM:0001219 data Pure protein sequence" relations: "EDAM:0000849 data Sequence record" relations: "EDAM:0002178 data 1 or more“]
endsection: input
section: required [ information: "Required section” type: "page” ]
integer: minlen [ standard: "Y" minimum: "1" maximum: "50" default: "6" information: "Minimum length of antigenic region" relations: "EDAM:0001249 data Sequence length“ ]
endsection: required
section: output [ information: "Output section” type: "page” ]
report: outfile [ parameter: "Y" rformat: "motif" multiple: "Y" taglist: "int:pos=Max_score_pos" relations: "EDAM:0001534 data Peptide immunogenicity report“ ]
endsection: output
EMBOSS as a DAS Client21 April 20238
Documentation & books
Three books at typesetting stage.
• Administrators’ Manual• Users’ Manual• Developers’ Manual
Concomitant major revision of EMBOSS website.
Automation of website content addition.
Books to form basis of new website content.
Uniform Sequence Address (USA): URL-style naming
Derived from the familiar "VMS logical name" syntax used by SRS and GCG.
database : entryname• embl : ecompa ID or accession can be used in this way• uniprot-id : opsd_bovin SRS syntax for query by ID• embl-acc : x13776 SRS syntax for query by accession
format :: filename• fasta :: /users/pmr/paamir.fa Filename with specific format• ecoompa.genbank With no format, can try all formats
format :: filename : entryname• fasta :: unfinished : AH6.1 Most formats allow multiple sequences
Also @listfile
and asis::gctgactgactgatg
Queries database-field:query SRS syntax for id, acc, sv, des, key, org
EMBOSS: Sequences
EMBOSS as a DAS Client21 April 20239
• Aim to read “all” public data resources
• Follow cross-references (explicit and implied)• UniProt• EMBL/GenBank/DDBJ• Other
• Servers• Multiple data resources through a single server definition
• DAS, Ensembl, BioMart, WsEbeye, DbFetch, SRS• Cache files of resource definitions for server
• Data resource catalogue (drcat)• 600+ data resources• Query terms and URLs• EDAM annotation of resources, formats, identifiers, terms
New data resources
EMBOSS as a DAS Client21 April 202310
ID ArachnoServer
Acc DB-0145
Name ArachnoServer
Desc Spider toxin database
URL http://www.arachnoserver.org
Cat Organism-specific databases
Taxon 6845 | Arachnida
EDAMres 0000621 | Organism-specific
EDAMdat 0002400 | Toxin annotation
EDAMid 0002578 | ArachnoServer ID
Xref SP_explicit | ArachnoServer ID;Toxin name
Query Toxin annotation | HTML | ArachnoServer ID | www.arachnoserver.org/toxincard.html?id=%s
Example ArachnoServer ID | AS000014
CCmisc BMC Genomics 10:375-375(2009); [Pubmed: 19674480]
Data resource catalogue (drcat)
EMBOSS as a DAS Client21 April 202311
EMBOSS Datatypes21.04.2312
EMBOSS Data Types
• Sequences• Nucleotide (DNA and RNA)• Protein
• Features• Attached to sequences• Independent data objects
• Bio-Ontologies (OBO)• Taxonomy (NCBI)• Data Resources• Assembled reads• Text
• Text, HTML, XML
• Reuse “USA” syntax• [Server:] Dbname : identifier Database has an access method• [Server:] Dbname – field : query General field names
• Data types: features, bio-ontologies, taxonomy, etc.
• Access methods: HTTP, DAS, BioMart, Ensembl, ...
• Multiple types and formats for a server/resource• type: “sequence features”• format: “embl fasta”
New data types
EMBOSS as a DAS Client21 April 202313
EMBOSS as a DAS Client21.04.2314
EMBOSS Query Language
• Query fields are now made general• Any field queriable by the access method (DAS, SRS, …)• Any index created by indexing applications• Any query term in the data resource catalogue
• Multiple queries combined• For one data resource• AND, OR, … to combine queries
EMBOSS as a DAS Client21.04.2315
DAS Server Definitions
SERVER das [
method: "dassource"
type: "sequence, features"
url: "http://www.dasregistry.org/das/"
comment: "access sequence/feature sources listed on das registry
(http://www.dasregistry.org/das/)"
cachefile: "server.dassource"
]
EMBOSS as a DAS Client21.04.2316
DAS Server Definitions
SERVER ensembldas [
method: "dassource"
type: "sequence, features"
url: "http://www.ensembl.org/das/"
comment: "access sequence/feature sources on ensembl das server
(http://www.ensembl.org/das/)"
cachefile: "server.ensembldas"
]
EMBOSS as a DAS Client21.04.2317
DAS Example
DB Ensembl_Human_Genes [
method: das
type: "Sequence, Features“
taxon: "9606“
format: "das, dasgff“
url: http://www.ebi.ac.uk/das-srv/genedas/das/ Homo_sapiens.Gene_ID.reference
example: "ENSG00000139618“
comment: "The Ensembl human Gene_ID reference source, serving sequences and non-location features.“
hasaccession: "N“
identifier: "segment“
fields: "segment, type, category, categorize, feature_id“
]
EMBOSS as a DAS Client21 April 202318
Ensembl DAS Example
DB Felis_catus_CAT_prediction_transcript [ method: das type: "Nucfeatures“ taxon: "9685“ format: "dasgff“ url: http://www.ensembl.org/das/Felis_catus.CAT.prediction_transcript example: "scaffold_209987[1:550]“ comment: "Annotation source for Felis_catus prediction_transcript“ hasaccession: "N“ identifier: "segment“ fields: "segment, type, category, categorize, feature_id“]
EMBOSS as a DAS Client21.04.2319
EMBOSS Query Language
• das: ensembl_human_genes: ENSG00000139618• ensembldas: Felis_catus_CAT_prediction_transcript:
scaffold_209987 [1:550]• das: Homo_sapiens_GRCh37_transcript: 10
[32889611:32973347]• das: uniprot: P00280• das: cath: 5pti• das: uniparc: UPI000000000A• das: Homo_sapiens_GRCh37_reference-
{segment: 11 & type: supercontig}
EMBOSS as a DAS Client21.04.2320
EMBOSS Query Language: Future
• Ontology-based searches of data resources• Taxonomy• EDAM terms
• Resources• Data types• Identifiers
• Descriptions
• Search for applications matching data types• Sequences and features• Nucleotide and protein• …
• Support for DAS advanced query ...
EMBOSS as a DAS Client21 April 202321
Acknowledgements
• EBI: Peter Rice, Alan Bleasby, Jon Ison, Mahmut Uludag, Martin Senger, Tom Oinn, Jaina Mistry, Rodrigo Lopez, Sharmilla Pillai, Hamish McWilliam
• RFCGR/HGMP: Alan Bleasby, Jon Ison, Tim Carver, Hugh Morgan, Claude Beazley, Lisa Mullan, Damian Counsell, Gary Williams, Val Curwen, Mark Faller, Sinead O’Leary, Thon deBoer, Martin Bishop
• Sanger Institute: Ian Longden, Richard Bruskiewich, Simon Kelley
• LION: Mahmut Uludag, Thomas Laurent, Bijay Jassal, Bren Vaughan, Thure Etzold
• National bioinformatics service providers in: Norway, Spain, Italy, Netherlands, Germany, Belgium, Russia, China, Canada, Australia, Argentina
• Others: Catherine Letondal, Don Gilbert, Rodger Staden, Bill Pearson, Webb Miller, Marie-Laetitia Denayer, Amandine Schurmann, Gabriele Weiler, Luke McCarthy, David Mathog, David Bauer, Henrikki Almusa, Thomas Siegmund, Scott Markel, Darryl Leon, Bastien Chevreux, Ivo Hofacker, ...
• IBM, Hewlett-Packard, (Compaq), Apple, SGI, Sun, LION bioscience, SciTegic, Cambridge University Press
• Open-Bio Foundation, Sourceforge, Debian, Fedora, CEH
... And the British Antarctic Survey
http://emboss.sourceforge.net
http://emboss.open-bio.org/wiki/Latest_developments