introduction to web services - dtu bioinformatics to web services peter fischer hallin, center for...

77

Upload: doankien

Post on 28-May-2018

218 views

Category:

Documents


0 download

TRANSCRIPT

Introduction to Web Services

Peter Fischer Hallin, Center for Biological Sequence AnalysisComparative Microbial Genomics Workshop

Bangkok, ThailandJune 2nd 2008

Background - why worry...

• Increasing size and number of public sequence databases - faster sequencing methods

• Increasing number of bioinformatic prediction tools

• Requirements for data integration, spanning several types of data, different databases and physical locations.

• Solution: Web Services

• Web Services vs. Web Applications

• Why WS? Advantages / disadvantages

• Design of web services: interoperability

• Implementation case stories

• Invoking a web service

• WS and it’s future role

Topics covered

Web applications• HTML based software designed to interact

with the user - in most cases this involves human interpretation and navigation.

• If designed well, they are user friendly

• Do not require special skills to operate

• In near every case, web pages do not make sense to computers (wget?)

Web Applications ... you use them everyday

... plus a billion more ...

Web Services (on the other hand)- are software designed to enable computer-

to-computer interaction

- should aim to enhance interoperability between different systems

- exchange objects which are well defined

- consist of methods / operations and are well defined

- exchanging data using SOAP over HTTP.

Interoperability

<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/"> <soap:Body> <getProductDetails xmlns="http://warehouse.example.com/ws"> <productID>827635</productID> </getProductDetails> </soap:Body> </soap:Envelope>

<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/"> <soap:Body> <getProductDetailsResponse xmlns="http://warehouse.example.com/ws"> <getProductDetailsResult> <productName>Toptimate 3-Piece Set</productName> <productID>827635</productID> <description>3-Piece luggage set. Black Polyester.</description> <price currency="NIS">96.50</price> <inStock>true</inStock> </getProductDetailsResult> </getProductDetailsResponse> </soap:Body> </soap:Envelope>

Basic concepts of the web service:

‣ XML (Extensible Markup Language): Format for input and output.

‣ Request: An XML message generated by the user/client that are uploaded to a service

‣ Response: An XML message generated by the server as an answer to the request.

‣ WSDL (Web Services Description Language) file: Most often published by the service provider. Describes all aspects of the service

‣ Message: A detailed declaration of how input and output looks.

‣ XSD (XML Schema Definitions) is the language for defining the types that form the messages.

‣ Types: Declarations of input and output objects can be within the WSDL file itself, or linked externally.

‣ Endpoint: A URL pointing the client to a location which will read the request and respond.

‣ Documentation: Every object element/attribute can be documented

Basic concepts of the web service:

How does the WSDL look?• Written in XML

• Looks complicated - most developers fear to write them manually

• Personally, I believe they should be written by hand in a simple text editor...

• But let’s first look at an XSD file - defining objects used for the EasyGene service, which is an ORF finder

The XSD: Declaring types used for messages

Data types

Detailed documentation

The WSDL: Importing types, defining messages

The WSDL: Linking operations and messages

The WSDL: Linking operations to endpoints

The WSDL: Service documentation

The WSDL: Endpoint

Web Services design considerations

• Common data types

• Granularity

• Typing

Granularity, Granularity, and Granularity

• Our choice of technology sets standards for typing any element in input/output - and these standards should be used!

• To a certain extent, Web Services is all about plumbing - connecting objects (pipes) from different operations to build a workflow and finally to generate a result for you

• This plumbing gets increasingly difficult, the poorer the granularity

Tropomyosin isoforms

Tropomyosin isoforms

Tropomyosin isoforms

Tropomyosin isoforms

1 2 3

Granularity

Typing

Typingarray

id: string

sequence: string - restrictions? /^[ACDEFGHIKLMNPQRSTVWY]+$/

• You will not need to examine the WSDL and XSD files during this workshop.

• You should just know of their existence and idea behind the WSDL

Requirements

3 methods to invoke web services:

• Workflow editors: Graphical. Less user un-friendly (supposedly)

• Text/XML editors: Easy-to-use. Less features. For exploring services

• Programmatic access: For writing clients and automating tasks.

Graphical workflows

E.g. Taverna: http://taverna.sourceforge.net/

Raw text/XML

SoapUI: www.soapui.org

Perl/Python/C

SOAP modules exist for most programming languages: Perl, Python, C, Java ...

Perl/Python/C

SOAP modules exist for most programming languages: Perl, Python, C, Java ...

- Java based stand-alone software for inspecting, invoking, and testing Web Services.

- Downloadable from http://www.soapui.org free of charge.

- ‘pro’ edition available for purchase having extended features.

Demonstration using SoapUI

SoapUI is ...

• Ideal for development/testing of Web Services: Provide the raw XML request/response to/from your server. It allows you to view HTTP headers, attachments etc.

• Construct template/default request messages based on the WSDL/XSD

• SoapUIs strength is inspection and manually invoking of operations, however the Pro edition supports workflows/test cases!

• Handling multiple WSDLs with a project, and multiple projects within a session

Why SoapUI

• Genome Atlas (http://www.cbs.dtu.dk/ws/GenomeAtlas/) Various database tools accessing prokaryotic genomes sequences

• RNAmmer (http://www.cbs.dtu.dk/ws/RNAmmer) Predicts ribosomal RNA genes in full genome sequences.

Example: Inspecting two services

Creating a new project

Labeling the project and adding WSDL

Default request are made automatically

We provide a genbank accession to ‘getSeq’

http://www.cbs.dtu.dk/ws/RNAmmer

RNAmmer online documentation

Adding a new WSDL to the project

Adding a new WSDL to the project

Copy the genome sequence from ‘getSeq’

Paste genome sequence in RNAmmer ‘runService’

Adding a new WSDL to the project

Submit and acquire jobid

Poll queue until job finished

Poll queue until job finished

Fetch the result, suing ‘fetchResult’

Asynchronous operations

EasyGene - the programmatic way

• Similar example, this time programmed in Perl

• Using Genome Atlas ‘getSeq’ to obtain genome sequence, and submit this to the ORF finder EasyGene.

• Poll for the job status and obtain result.

First, obtaining a Genome Sequence

EasyGene client : load fasta from STDIN

EasyGene client : function to predict ORFs

Case story: meta genomic application of BLAST

atlasesA service which allows visualization of homology between a reference genome and any number of genomes, metagenomic samples, or sequence databases.

Case-story: the BLASTatlas WS

Case-story: the BLASTatlas WS

Example: Seven ocean samples from various depths (surface to 4km): 63,837,557 nucleotides, in 65,674 sequences.

24,978 proteins from 12 fully sequenced Prochlorococcus marinus genomes

0M

0.5M1M

1.5M

2M

2 .5M

P. marinus str. MIT 93032,682,675 bp

green P. marinus genomesblue=ocean samples

Surface

4km

• Web services are (almost guaranteed) to be more cumbersome to access than its web application counterpart.

• Web services allow automation and integration into your programming language.

• You need only a single (WSDL) file to allow your client to connect to the service

Conclusionpros and cons

Pros and cons

• Input (request) can be validated - remember the XSD!

• Should failures occur

Looking into the crystal ball ...

• Will WS replace the bioinformatician or require more of them?

• Will WS ease the access to tools and databases?

• Will WS save time?• Will WS allow more complex analysis

that were impossible before?• Will WS take over - and what will they

take over?

Acknowledgements

David W. UsseryCraig BenhamKarin LagesenJan Christian BryneFrancisco RoqueKristoffer RapackiHans Henrik Stærfeldt