uniprotkb/swiss-prot:why sparql?

15
SPARQL: UniProtKB/Swiss-Prot why do it? Jerven Bolleman Developer Swiss-Prot Group

Upload: jerven-bolleman

Post on 13-Apr-2017

970 views

Category:

Science


0 download

TRANSCRIPT

Page 1: UniProtKB/Swiss-Prot:Why sparql?

SPARQL: UniProtKB/Swiss-Prot why do it?

Jerven Bolleman Developer Swiss-Prot Group

Page 2: UniProtKB/Swiss-Prot:Why sparql?

What is UniProtKB/Swiss-Prot

• Central  database  in  the  Life  Sciences  

– Proteins  -­‐>  you  are  made  out  of  them  

– Summarises  current  scientific  knowledge  

– Links  150+  databases  together  

• Swiss-­‐Prot  &  Vital-­‐IT  group  activities  are  funded  by  the  Swiss-­‐Confederation  through  the  SERI  (State  secretariat  for  education,  research  and  innovation)

Page 3: UniProtKB/Swiss-Prot:Why sparql?

Our Goals

• Provide  core  Bioinformatics  resources  

– UniProtKB/  

–    

– …  

• Provide  services  and  infrastructure  

–              Vital-­‐IT  :  HPC  for  the  life-­‐sciences  

– …

Page 4: UniProtKB/Swiss-Prot:Why sparql?

Why provide a public SPARQL endpoint

• A  10  man  wet  laboratory  can  not  afford:  

– to  host  their  own  database  in  house  holding  all  or  even  a  bit  of  all  life  science  data.    

– not  to  have  access,  and  use,  existing  life  science  information.

Page 5: UniProtKB/Swiss-Prot:Why sparql?

← Not CPU Time...But Brain Time

The right kind of optimisation

Page 6: UniProtKB/Swiss-Prot:Why sparql?

Why provide a public SPARQL endpoint

• Classical  SQL  can  be  provided  on  the  web  

–Is  not  practical  –No  federation  –Poor  standards  conformance  

• Local SQL is expensive • Local  JSON  is  no  better  

• Nor  is  local  XML

Page 7: UniProtKB/Swiss-Prot:Why sparql?

Data Integration Traditional

Pathway.txt

UniProt.txt

Pathway Parser

UniProt Parser

Pathway Schema

UniProt Schema

Own Lab Data

Data warehouse

SQL queries

$

$

$

$

$

$

Page 8: UniProtKB/Swiss-Prot:Why sparql?

Data Integration RDF/SPARQL

Pathway.rdf

UniProt.rdf

Own Lab Data

Triple Store SPARQL Queries

$

$?

Page 9: UniProtKB/Swiss-Prot:Why sparql?

Why provide a public SPARQL endpoint

• Document  centric  REST  is  not  enough  

–Swiss-­‐Prot  available  as  REST    –(over e-mail !!) since 1986

–expasy.ch since 1993 –www.uniprot.org  since  2002  

• Most user use a GUI not a CLI • developers  build  GUI  on  a  CLI

Page 10: UniProtKB/Swiss-Prot:Why sparql?

10© 2015 SIB

Page 11: UniProtKB/Swiss-Prot:Why sparql?
Page 13: UniProtKB/Swiss-Prot:Why sparql?

100

10'000

1'000'000

2015-01

2015-02

2015-03

2015-04

2015-05

2015-06

2015-07

2015-08

queries ask selectconstruct describe

Queries per month in 2015 peak: 4 million per month

Page 14: UniProtKB/Swiss-Prot:Why sparql?

Real users

Mix between hard analytics and super specific

Estimate somewhere between: 300 - 1000 real humans per month

We know they are real because they take holidays ;)

Page 15: UniProtKB/Swiss-Prot:Why sparql?

Using the Semantic Web for faster (Bio-) Research http://edu.isb-sib.ch/course/view.php?id=212