vaughn aip walkthru_pag2015

54
araport.org Extending the Arabidopsis Information Portal: A Developer’s Perspective Matt Vaughn Director, Life Sciences Computing Texas Advanced Computing Center [email protected] | @mattdotvaughn | www.slideshare.net/mattdotvaughn

Upload: araport

Post on 14-Aug-2015

383 views

Category:

Science


3 download

TRANSCRIPT

araport.org

Extending the Arabidopsis Information Portal: A Developer’s

Perspective

Matt VaughnDirector, Life Sciences ComputingTexas Advanced Computing Center

[email protected] | @mattdotvaughn | www.slideshare.net/mattdotvaughn

araport.org

Web APIs: Problem Statement

• Lack of web services for legacy data– There are a lot of web SITES

• Existing web services don’t share information architecture– Negatively impacts interoperability,

discoverability, & usability

• Browser security models are punitively complex– Hard to build apps integrating multiple

sources

araport.org

Gold standard Data APIs

• Implement REST-like interfaces• Served over HTTPS (with valid SSL certificate)• Allow Cross Origin Scripting Support (CORS)• Require authentication

– Understand and respond to client demographics– Meter access to services

• Simple controlled vocabulary + metadata for query parameters

• Responses conform to accepted JSON schemas*• Support future AIP deep caching & mining

efforts*** Except where it makes sense not to** Based on tech like ElasticSearch or neo4j

araport.org

Araport Service Architecture

RESTful API @ https://api.araport.org/

CLI clients, Scripts, 3rd

party applications

Physical resources

Agave Core

apps

meta

files

profile

jobssystems

ADAMAmanage

enroll

a b c d e f

AIP + 3rd party data providers

API Types• Query• Map*• Generic• Pass-through

• Single-sign on• Metering• Unified logging• API versioning• Automatic HTTPS +

CORS

REST*

CGI

SOAPNew Web Services

InterMine

Chado & Tripal

Computing

StorageDatabase

araport.org

Araport Service Architecture

RESTful API @ https://api.araport.org/

CLI clients, Scripts, 3rd

party applications

Physical resources

Agave Core

apps

meta

files

profile

jobssystems

ADAMAmanage

enroll

a b c d e f

AIP + 3rd party data providers

API Types• Query• Map*• generic• pass-through

• Single-sign on• Throttling• Unified

logging• API versioning• Automatic

HTTPS

REST*

CGI

SOAPNew Web Services

InterMine

Chado & Tripal

Computing

StorageDatabase

araport.org

Data API TypesType Inputs Outputs Notes

query AIP parameters mandatory

AIP-aligned JSON Gold standard data APIs

map AIP parameters preferred

Transformed JSON Ideal for implementing namespace transformations or filters

generic AIP parameters preferred

Specified within code but can be any valid Content-type

Implement return of non-JSON data

passthrough

Specified by remote service

Specified by remote service

Allows existing services to be discoverable from AIP data store

araport.org

Data API Reserved Parameters

Name Description Validator (Case-insensitive)

locus AGI Gene Locus Identifiers

AT[1-5GM][0-5]{5,5}$

transcript AGI Transcript Identifiers

AT[1-5GM][0-9]{5,5}.[0-9]{1,3}$

identifier Another string plausibly expected to identify a gene or transcript

Valid alphanumeric string. No whitespace.

chromosome

A. thaliana Col-0 chromosome identifiers

CHR[1-5MC]$

start/end Coordinates within Col-0 assembly

Numeric. Should be range-checked.

strand Defines genomic strand

[\+\-\.]{1,1}

accession Ecotypes or natural accessions

Not validated at present

term Generic search term Valid text string. Useful for implementing full-text search

araport.org

Rationalized Responses via lightweight JSON schemas

• Facilitate creation of mash-up client applications

• Enable extraction and mining of the Arabidopsis deep web

• Facilitate future interoperability with semantic web technology without forcing their adoption

Minimal, machine validated rules for what AIP responses should look like

araport.org

curl –skL -XGET -H "Authorization: Bearer 624513772fbc2caf662b9accbf10380" https://api.araport.org/community/v0.3/aip/resolver_fetch_locus_by_synonym_v0.2/search?identifier=URIC_ARATH

{"result":[ {"relationships":[ {"direction":"undirected", "type":"synonymous_with", "scores":[ {"confidence":1}]}], "related_entity":"URIC_ARATH", "class":"locus_id_mapping", "locus":"AT2G26230", "related_entity_kind":"UniProtKB-ID"}], "metadata": {"time_in_main":0.020552873611450195}, "status":"success"}

Example Araport JSON (1)

araport.org

Interacting with Araport APIs (1)

Araport web services publish live, interactive documentation

araport.org

Interacting with Araport APIs (2)

Araport web services are available in every Javascript console

Data API namespace

Individual Data API

> Agave.api.adama.getNamespaces()

araport.org

Interacting with Araport APIs (3)

Araport web services power Science Apps!

araport.org

Creating an Araport Data API (1)

• Decide on a type of Data API to build• Initialize a local Git repository• Author a main function (Python only for now)• Test that it works in your local Python interpreter• Write a metadata.yml file describing the service• Push the local repository up to Github*• Perform an authenticated HTTP POST to the

ADAMA service with a link to your repo• Verify that the service was created successfully• Test it out via HTTP request

* Or any public git server

PrinciplesA. All development is done on

a local systemB. Almost no software

dependencies beyond standard system contents

C. Source code is always publicD. Testing via same routes as

usageE. Easy to iterate if things go

awry

2

3

4

5

1. Write code2. Publish code3. Register repository4. Code deployed5. Use web service

1

araport.org

Science Apps: Problem Statement

• Technical hurdles for developing web applications– Technology selection– Development and testing environment

setup

• The small number of applications that get built are often not reusable

araport.org

Apps Infrastructure

araport.org

Apps Development

• Industry-standard, open-source tooling– Node.js– Yeoman– Grunt– Bower

• Application generator for quickly bootstrapping application development

$ yo aip-science-app$ grunt

araport.org

araport.org

araport.org

App Security

• Apps deployed to AIP are sandboxed– Only the user creating the app can access/use– Publication workflow for AIP staff to code review

and functionality review before making public

• App code is partitioned– Kept separate from the rest of AIP Portal code– Only executes in user’s browser, not on server

• App artifact hosting is limited• App on AIP have an open-source

requirement

araport.org

Apps Workspace

• Drupal module• Apps upload/ingest from public git

repositories• User-created “workspaces”• Private, shared*, public apps

araport.org

Apps Workspace (2)

araport.org

araport.org

araport.org

Apps Examples

• Query app (ATTED-II)• Visualization app (EBI Interaction

Viewer)• Computational app (BLAST)• Other types (Notebook)

araport.org

araport.org

araport.org

araport.org

araport.org

araport.org

Developer SupportOnline Tutorial Topic Link

Getting started http://bit.ly/aip-get-started

Technical overview http://bit.ly/aip-overview

Your first AIP app http://bit.ly/aip-first-app

Araport APIs and authentication

http://bit.ly/aip-agave-auth

Creating a data-driven application

http://bit.ly/aip-build-app

Deploying your app to Araport

http://bit.ly/aip-deploy

Creating web services for Araport

http://bit.ly/aip-websvcs

Linking to Araport content http://bit.ly/aip-link• Bookmark araport.org/devzone• Follow @araport on Twitter• Join araport-developers Google Group• Follow Arabidopsis-Information-Portal GitHub

araport.org

Chris Town, PI

Lisa McDonaldEducation and Outreach Coordinator

Chris NelsonProject Manager

Jason Miller, Co-PIJCVI Technical Lead

Erik FerlantiSoftware Engineer

Vivek KrishnakumarBioinf. Engineer

Svetlana KaramychevaBioinf Engineer

Eva HualaProject lead, TAIR

Bob MullerTechnical lead, TAIR

Gos Micklem, co-PI Sergio ContrinoSoftware Engineer

Matt Vaughnco-PI

Steve MockPortal Engineer

Rion Dooley, API Engineer

Matt Hanlon, Portal Engineer

Maria KimBioinf Engineer

Ben RosenBioinf Analyst

Joe Stubbs, API Engineer

Walter Moreira, API Engineer

araport.org

araport.org

Araport Service Architecture

RESTful API @ https://api.araport.org/

CLI clients, Scripts, 3rd

party applications

Physical resources

Agave Core

apps

meta

files

profile

jobssystems

ADAMAmanage

enroll

a b c d e f

AIP + 3rd party data providers

API Types• Query• Map*• generic• pass-through

• Single-sign on• Throttling• Unified

logging• API versioning• Automatic

HTTPS

REST*

CGI

SOAPNew Web Services

InterMine

Chado & Tripal

Computing

StorageDatabase

araport.org

ADAMA Road Map

• Automatic live documentation including params• Parameter validation at query time• Response validation via JSON schema• Automated provenance and attribution• Language support (Java, Javascript, Perl)• Full command line interface• Status monitoring and notification• Better “Data API Store”• Per-namespace and-service Access Control Lists

araport.org

Community Engagement

• Existing APIs + source turned over to the community for additional development

• Community request for comment (RFC)– Parameter metadata– JSON Response schemas– Provenance and attribution features

• Developing documentation, examples and tutorial material– Complete the entire API publication and usage

lifecycle without direct AIP intervention or personal support

• Assisting community in their development efforts

araport.org

ADAMA: Araport DAta Mediator API

AGAVE

API MANAGER

NoSQL intermediary

Endpointhttps://api.araport.org/community/v0.3/

Live Docshttps://adama-dev.tacc.utexas.edu/api/adama.html

araport.org

API Manager + Enterprise Service Bus

Araport architecture (2)

Secure, rationalized REST services

Consumer Applications

Simple Proxy

ThaleMine, Data

integration, other services

Cache

XML-to-JSON

SOAP-to-REST

CGI-to-REST

Throttle

Legacy API A

Legacy API B

REST API C

Simple Proxy

• Single-sign on

• Throttling• Unified

logging• API

versioning• Mediation

and translation

• Dev-friendly interfaces

• Rationalized REST for consumer apps

Media

tors

araport.org

Science Objectives

• Make more, varied data available to the Arabidopsis (and other) communities within a unified user experience

• Enhance the innate value of data by offering enhanced search, retrieval, and display capabilities

• Facilitate analysis of user data• Enable community participation in

functional annotation

araport.org

Technical Objectives

• Deploy a responsive, flexible community-extensible system

• Provide APIs everywhere!• Promote and facilitate data integration• Enable language- and region-specific

presentation of scientific content• Meet mobile computing on its own

terms

araport.org

Local vs. Data-driven Apps

Resources are local and inherently offline.

Operating on local data using local computing.

Resources are cloud-based and inherently online. Multiple data streams integrated, queried,

presented in context of broader objective.

Photoshop Express KAYAK Pro

araport.org

Araport Bill of Materials

• Araport is currently built using– Drupal 7.25

• Developer-oriented content management system

– Bootstrap.js and some other Javascript toolkits– InterMine (with modifications)– Bioinformatics infrastructure + misc. other bits– Agave 2.0 Software as a Service platform

• Developed by iPlant Collaborative project• Bulk data, metadata, authentication, HPC app and job

management, notifications & events, and more• OAuth2 out of the box• Enterprise service bus (ESB) architecture• http://agaveapi.co/

araport.org

Agave wso2 interface

Cache (Technology TBD)

CSV

Araport APIM Architecture (1)

POLYMORPH CGI

Form

Input Key Map

Output Key Map

InputTransfor

m

OutputTransfor

m

Listen Respond

Send Listen

Input Key Map

Output Key Map

InputTransfor

m

OutputTransfor

m

Listen Respond

Send Listen

Araport API Manager

JSON Query JSON Response

ElasticSearch

Remote Services

SNP by Locus REST Indel by Position REST Enroll Manage

araport.org

Araport Architecture: Use Cases (1)

• 1001 Genomes POLYMORPH tools– Provides variation data via locus or positional

search– Total of seven variant types available for search– Search parameterization depends a lot on

variant type– Example of a plain-text CGI service– Returns results as CSV with named columns

• Objective: Transform into a RESTful API that expects and returns rationalized JSON

http://polymorph.weigelworld.org

araport.org

Araport Architecture: Use Cases (2)

• ThaleMine– Has native REST interface for general queries– Has templates which can form basis of

specific services

• Objective: Offer both Intermine-native and AIP-conformant interfaces as Data APIs

• Current path– Enroll native services in our APIM– Develop template-based AIP-conformant

serviceshttp://polymorph.weigelworld.org

araport.org

Data APIs: Getting StartedService Queries Notes

BAR eFP Locus  

BAR Expressologs Locus  

BAR Interactions Locus  

COGe Position Special case – output transform only

NASC $SERVICE Locus SOAP based but may be offline permanently

OrthologFinder Locus Based on a Thalemine template

POLYMORPH Locus, Position  Actually seven CGI services

SUBA3 Locus  

Compiling example queries, parameter mapping and description, and ideal results for use in implementing the system

araport.org

Developing a Data API

• In order, we prefer that you have ready• Well-documented REST• Moderately well-documented REST• SOAP services (plus WSDL or WADL)• Plain Old XML• Plaintext CGI• HTML CGI• No web services at all

• Work with us to enroll your services as a data source. This will involve a minor amount of coding.

araport.org

Computational App Model (1)

Host file systems

Host OSDocker.io

Centos 6.4

custom-repo

Container

/scratch/

database

araport-compute-00

araport-storage-00

Host FS (250 GB)

TACC Corral (PB+)

sftp

Agave apps, data, jobs

REST API x JSON objects

araport.org

Science Apps: Grid View• Current Scheme

• 2-3 column view w draggable apps

• Apps are normal, full-size, or collapsed

• Single app screen• Later in 2014

• N x X grid scheme implementing resizable app “tiles” like one sees in Android or Win8.x

• App SDK libraries will have “help” for enabling resizable design

• Multiple app screens

araport.org

Data API Details (2)

• For service-specific parameters– Provide human-readable names mapped to original

parameter names– Offer minimal descriptive text– Specify validation

• Cardinality• Pattern validator (regex)• Type (number, string, etc.)

– Indicate whether required– Indicate whether they should be visible in a UI– Specify reasonable default values

• Seems familiar?– This approach is used to to abstract command line apps– Allows automatic generation of minimally functional UI

araport.org

Data APIs: Response types (1)

• locus_relationship – pairwise relationship between A and B– Directionality– Type– Array of scores (weights, etc.)

• sequence_feature – positional attribute– Extension of GFF model plus– Build– Attributes array

araport.org

Data APIs: Response types (2)

• locus_feature – key-value attributes per locus– Optional controlled vocabulary* for keys– Support for both slots and arrays

• raw – for returning images or other binary formats– Source and other metadata carried in X-headers instead

of JSON result– Outbound transformation still supported– Not a preferred response mode

• text – returning either native service response or a non-conformant JSON document– Source and other metadata carried in X-headers instead

of JSON result– Not a preferred response mode

araport.org

Data API Details (6)

• Transparent caching will compensate for transient remote service failures

• Automatic indexing of certain response types via ElasticSearch, allowing for sophisticated global search– ElasticSearch allows us to index everything

we “know about” and return it quickly– iPlant uses it to live-index >700 TB user data

araport.org

Developing an app

• Understand and document the user stories you’re addressing with your app

• Identify all requisite data sources AND• Help us prepare them as Data APIs

– This may involve coding

• Understand the data integration or aggregation needs of your app– This may involve coding

• Develop the user interface(s) for your app using our tool kits and suggested practices– This will involve coding.– But you will learn tools like jQuery, Bootstrap, & D3 and will

thus be eminently employable!