interoperation between intermines

Post on 07-Aug-2015

88 Views

Category:

Software

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Interoperation between InterMines

Legume Federation, June 22, 2015Vivek Krishnakumar

Chris TownJ. Craig Venter Institute

InterMine in a nutshell

• Open-source data warehouse software• Integration of complex biological data• Parsers for common biological data formats• Extensible framework for custom data• Cookie-cutter interface, highly customizable• Interact using sophisticated web query tools• Programmatic access using web-service API

Open-source Project

• Source code available online• Distributed with the GNU

LGPL license• GitHub Repo:

https://github.com/intermine/intermine

• GitHub Organization: https://github.com/intermine

intermine / intermine> bio> biotestmine> config> flymine> humanmine> imbuild> intermine> testmodel .gitignore .travis.yml LICENSE LICENSE.LIBS README.md RELEASE_NOTES

Richard N. Smith et al. Bioinformatics 2012;28:3163-3165

InterMine system architecture

InterMine system architecture

Web Application• Java Server Pages (JSP), HTML, JS, CSS• Interfaces with Java Servlets and IM web-services

Web Server• Tomcat 7.0.x, serves Web application ARchive file• ant based build system using Java SDK

Database Server• PostgreSQL 9.2 or above• range query, btree, gist enabled (refer docs here)

http://intermine.readthedocs.org/en/latest/system-requirements/

Alex Kalderimis et al. Nucl. Acids Res. 2014;42:W468-W472

InterMine web services

http://iodocs.labs.intermine.org

JBrowse

Federated Authentication

• Apart from the standard login scheme (username/password), InterMine supports industry standard OAuth2 based login flows, implemented by Google, GitHub, Agave, etc.

• ThaleMine (Arabidopsis) relies on this infrastructure to authenticate users against the araport.org tenant registered within the Agave infrastructure

• Documentation available here: http://intermine.readthedocs.org/en/latest/webapp/properties/web-properties/#openauth2-settings-aka-openid-connect

Interoperability?

• Ability of InterMine instances to communicate ‘automatically’ with each other

• By way of leveraging web services• Questions to be answered:

What do they say to each other? How do they say it? What mechanisms are used? Enabling these mechanisms…

Data Model

• Data Model === Schema of InterMine instance

• Defined in XML format• Core data model (based on SO) can be

extended to suit requirements• Access a mines data model in JSON format

http://MINE_URL/service/model/?format=json

• Compatibility of data models across mines ensures interoperability

Advantages of common data model

• Data mining scripts developed for one mine immediately compatible with others

• Promotes crowdsourcing one/more groups write

tools/widgets/parsers can be easily reused by others

• Enables cross species analysis

Available tools

• Multi-mine search toolhttps://github.com/alexkalderimis/multimine-search-tool

Based on InterMine Lucene-based search index Allows for interoperation when data models are different

• Integration based on Homologs: Ontology integration using `dagify`

https://github.com/intermine/dagify

Pathway Integration by way of collating shared pathways

• InterMine Staircase Powerful client-side interface enabling data analysis

workflows and cross-mine integration via web serviceshttp://staircase.herokuapp.com

InterMine Staircase

InterMine StaircaseConfigure access to multiple mines

InterMine StaircaseCross-mine search

InterMine StaircaseFilter results by facets

InterMine StaircasePrepare and enrich lists

InterMine StaircasePerform mine-to-mine list conversions

InterMine StaircaseApp/tool compatibility

InterMine StaircaseApplication model

MedicMine SoyMine....

Available Reference Mines

• ThaleMine: https://github.com/Arabidopsis-Information-Portal/intermine/

Integrates variety of genomic datasets pertaining to Arabidopsis thaliana col-0 Leverages both data warehousing and federation methods Represents wide variety of data: genes, proteins, function, expression, co-

expression, interactions, pathways, homologs, alleles, polymorphism, stocks, germplasm, phenotypes

• MedicMine: https://github.com/jcvi-plant-genomics/intermine/ Warehouse for Medicago truncatula A17 genomic data Houses variety of data: genes, proteins, function, expression

• PhytoMine: https://github.com/JoeCarlson/intermine/ Warehouse for 47 different Angiosperm genomes Developed on a Chado InterMine migration path Houses variety of data: genes, proteins, expression, homologs, protein families,

variation

• FlyMine: https://github.com/intermine/intermine/

Recommendations and Challenges

• Recommendations: Develop core plant InterMine model Follow InterMine guidelines Learn from prior initiatives - InterMOD

• Challenges Users/developers are used to current way of doing

things Time taken to adapt to common data model and/or

software stack Difficult to arrive at consensus with diverse group

Acknowledgments

• InterMine Team Gos Micklem Julie Sullivan Alex Kalderimis Richard Smith Sergio Contrino Josh Heimbach et al.

• Araport Team Chris Town Jason Miller Matt Vaughn Maria Kim Svetlana

Karamycheva Erik Ferlanti Chia-Yi Cheng Benjamin Rosen Irina Belyaeva

top related