linking resource description framework to …bulletin.acscinf.org/pdfs/240nm03.pdflinking resource...

29
Linking Resource Description Framework to Cheminformatics and Proteochemometrics Egon Willighagen <http://chem-bla-ics.blogspot.com/> Bioclipse & Proteochemometric Group (Prof. Wikberg) Until 2010-09-30 Department of Pharmaceutical Biosciences Uppsala University 2010-08-22

Upload: hatu

Post on 02-Dec-2018

242 views

Category:

Documents


0 download

TRANSCRIPT

Linking Resource Description Frameworkto Cheminformatics and

Proteochemometrics

Egon Willighagen <http://chem-bla-ics.blogspot.com/>

Bioclipse & Proteochemometric Group (Prof. Wikberg)Until 2010-09-30

Department of Pharmaceutical Biosciences

Uppsala University

2010-08-22

Problem

BuildingBlocks

Open Data

Application

Conclusion

Proteochemometrics

2010-08-22 Bioclipse & Proteochemometric Group - 2 - Egon Willighagen | chem-bla-ics.blogspot.com

Problem

BuildingBlocks

Open Data

Application

Conclusion

Data Analysis

2010-08-22 Bioclipse & Proteochemometric Group - 3 - Egon Willighagen | chem-bla-ics.blogspot.com

Problem

BuildingBlocks

Open Data

Application

Conclusion

Knowledge...

Solanum lycopersicum...

We model our world, but ...Life is not uni- or bivariateKnowledge is not eitherDifferent representations:compatible?Information Loss!

2010-08-22 Bioclipse & Proteochemometric Group - 4 - Egon Willighagen | chem-bla-ics.blogspot.com

Problem

BuildingBlocks

Open Data

Application

Conclusion

Names...

benzene3-[4-[3-(1-methyl-7-oxo-3-propyl-4H-pyrazolo[4,3-d]pyrimidin-5-yl)-4-propoxyphenyl]sulfonylpiperazin-1-yl]propanoicacid

InChI=1S/C25H34N6O6S/c1-4-6-19-22-23(29(3)28-19)25(34)27-24(26-22)18-16-17(7-8-20(18)37-15-5-2)38(35,36)31-13-11-30(12-14-31)10-9-21(32)33/h7-8,16H,4-6,9-15H2,1-3H3,(H,32,33)(H,26,27,34)

p450 (which one?? all residues known?)Solanum lycopersicum (well....)

2010-08-22 Bioclipse & Proteochemometric Group - 5 - Egon Willighagen | chem-bla-ics.blogspot.com

Problem

BuildingBlocks

Open Data

Application

Conclusion

... Molecular reality...

1 000 000 000 000 000 000 000 000000 000 000 000 000 000 000 000000 000 000 000

2010-08-22 Bioclipse & Proteochemometric Group - 6 - Egon Willighagen | chem-bla-ics.blogspot.com

Problem

BuildingBlocks

Open Data

Application

Conclusion

... and Numbers

2010-08-22 Bioclipse & Proteochemometric Group - 7 - Egon Willighagen | chem-bla-ics.blogspot.com

Problem

BuildingBlocks

Open Data

Application

Conclusion

Main Theme

How do we navigate this dimensions space?How to include prior knowledge?Minimize information loss?With optimal knowledge extraction?Maximizing interpretability?Without ending up in random correlation?

2010-08-22 Bioclipse & Proteochemometric Group - 8 - Egon Willighagen | chem-bla-ics.blogspot.com

Problem

BuildingBlocks

Open Data

Application

Conclusion

OpenMolecules RDF: dereferenceable URI

http://rdf.openmolecules.net/?InChI=1/CH4/h1H4

2010-08-22 Bioclipse & Proteochemometric Group - 9 - Egon Willighagen | chem-bla-ics.blogspot.com

Problem

BuildingBlocks

Open Data

Application

Conclusion

OpenMolecules RDF: linked data

2010-08-22 Bioclipse & Proteochemometric Group - 10 - Egon Willighagen | chem-bla-ics.blogspot.com

Problem

BuildingBlocks

Open Data

Application

Conclusion

The Chemistry Development Kit

A Family of ProjectsCDK-Taverna (chemoinformatics workflows)JChemPaint (semantic 2D editor)ChemoJava (GPL-ed extension)

Goalslibrary of cheminformatics algorithmseducational

UsageCDK: 100+ times cited in scientific literatureBioclipse, KNIME, Jumbo (CML), AMBIT, ...

C. Steinbeck et al., J.Chem.Inf.Comput.Sci, 2003C. Steinbeck et al., Curr.Pharm.Design, 2006

2010-08-22 Bioclipse & Proteochemometric Group - 11 - Egon Willighagen | chem-bla-ics.blogspot.com

Problem

BuildingBlocks

Open Data

Application

Conclusion

Bioclipse

O. Spjuth et al., BMC Bioinformatics 2007O. Spjuth et al., BMC Bioinformatics 2010

2010-08-22 Bioclipse & Proteochemometric Group - 12 - Egon Willighagen | chem-bla-ics.blogspot.com

Problem

BuildingBlocks

Open Data

Application

Conclusion

Bioclipse-RDF

local RDF storage (memory, on disk)read/write RDF/XML, N3run SPARQL queries (local and remote)extract RDF from XHTML/RDFa

Thanx to Open Source projects including Jena, SWI-Prolog,and Pellet.

2010-08-22 Bioclipse & Proteochemometric Group - 13 - Egon Willighagen | chem-bla-ics.blogspot.com

Problem

BuildingBlocks

Open Data

Application

Conclusion

SPARQL end points (Open Data)

NMRShiftDB data (C. Steinbeck, EBI/UK)ChEMBL (J. Overingthon, EBI/UK)

2010-08-22 Bioclipse & Proteochemometric Group - 14 - Egon Willighagen | chem-bla-ics.blogspot.com

Problem

BuildingBlocks

Open Data

Application

Conclusion

Proteochemometrics: simple QSAR

E.L.Willighagen et al., J. Biomed. Sem., 2010, in print

2010-08-22 Bioclipse & Proteochemometric Group - 15 - Egon Willighagen | chem-bla-ics.blogspot.com

Problem

BuildingBlocks

Open Data

Application

Conclusion

Proteochemometrics: RDF input

2010-08-22 Bioclipse & Proteochemometric Group - 16 - Egon Willighagen | chem-bla-ics.blogspot.com

Problem

BuildingBlocks

Open Data

Application

Conclusion

Proteochemometrics: Bayesian + extraPriors

●●

●●

●●

●●

●●

● ●

●●

●● ●

●●●

●●

●●●

●●

●●

●●

● ●●

● ● ●

●●

● ●

● ●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●

●●

● ●

●●

● ●

● ●

2 4 6 8 10 12

−5

05

1015

20(a)

Actual

Pre

dict

ed

●●

● ● ●

● ●

●●

●●

●●●

● ●

●●

●●

●●

● ●

●●

●●

●●

●●●

●●

●● ●

●● ●

●●

●●

●●

●●

●●

●●

●●

●●

●● ●

2 4 6 8 10 12−

50

510

1520

(b)

Actual

Pre

dict

ed

2010-08-22 Bioclipse & Proteochemometric Group - 17 - Egon Willighagen | chem-bla-ics.blogspot.com

Problem

BuildingBlocks

Open Data

Application

Conclusion

MyExperiment: Bioclipse ScriptingLanguage

myexperiment.search("RDF")myexperiment.downloadWorkflow(937)

2010-08-22 Bioclipse & Proteochemometric Group - 18 - Egon Willighagen | chem-bla-ics.blogspot.com

Problem

BuildingBlocks

Open Data

Application

Conclusion

Reasoning: Prolog and Pellet

Samuel Lampa, M.Sc. project2010-08-22 Bioclipse & Proteochemometric Group - 19 - Egon Willighagen | chem-bla-ics.blogspot.com

Problem

BuildingBlocks

Open Data

Application

Conclusion

Semantic Wikis

Samuel Lampa, Google Summer of Code 2010

2010-08-22 Bioclipse & Proteochemometric Group - 20 - Egon Willighagen | chem-bla-ics.blogspot.com

Problem

BuildingBlocks

Open Data

Application

Conclusion

XHTML+RDFa

2010-08-22 Bioclipse & Proteochemometric Group - 21 - Egon Willighagen | chem-bla-ics.blogspot.com

Problem

BuildingBlocks

Open Data

Application

Conclusion

OpenTox: downloading

2010-08-22 Bioclipse & Proteochemometric Group - 22 - Egon Willighagen | chem-bla-ics.blogspot.com

Problem

BuildingBlocks

Open Data

Application

Conclusion

OpenTox: uploading

// requires an unspecified Bioclipse // development versionds = opentox.createDataset( "http://apps.ideaconsult.net:8080/ambit2/");opentox.addMolecule(ds, cdk.fromSMILES("CCCCC[N+](C)(C)C") )opentox.addMolecule(ds, cdk.fromSMILES("ClC(I)Br") )opentox.deleteDataset(ds);

2010-08-22 Bioclipse & Proteochemometric Group - 23 - Egon Willighagen | chem-bla-ics.blogspot.com

Problem

BuildingBlocks

Open Data

Application

Conclusion

Linked Data: Visualization

2010-08-22 Bioclipse & Proteochemometric Group - 24 - Egon Willighagen | chem-bla-ics.blogspot.com

Problem

BuildingBlocks

Open Data

Application

Conclusion

Substructure mining: ChEMBL

Annsofie Andersson, M.Sc. project2010-08-22 Bioclipse & Proteochemometric Group - 25 - Egon Willighagen | chem-bla-ics.blogspot.com

Problem

BuildingBlocks

Open Data

Application

Conclusion

Substructure mining: .. and MoSS

2010-08-22 Bioclipse & Proteochemometric Group - 26 - Egon Willighagen | chem-bla-ics.blogspot.com

Problem

BuildingBlocks

Open Data

Application

Conclusion

What does this bring us?

Platform to integrate the RDF with the computation worldBioclipse as single point of accessScripting, sharing of scripts with MyExperiment.orgBridging Names to Numbers

2010-08-22 Bioclipse & Proteochemometric Group - 27 - Egon Willighagen | chem-bla-ics.blogspot.com

Problem

BuildingBlocks

Open Data

Application

Conclusion

Acknowledgements

Maris Lapins, Martin Eklund: statisticsAnnsofie Andersson: ChEMBL + MoSS integrationSamuel Lampa: reasoning (Pellet/Prolog) and RDFIONina Jeliazkova: OpenTox integration

2010-08-22 Bioclipse & Proteochemometric Group - 28 - Egon Willighagen | chem-bla-ics.blogspot.com

Problem

BuildingBlocks

Open Data

Application

Conclusion

The Details

http://www.citeulike.org/user/

egonw/tag/papers

http:

//chem-bla-ics.blogspot.com

http://egonw.github.com

waveto:

[email protected]

2010-08-22 Bioclipse & Proteochemometric Group - 29 - Egon Willighagen | chem-bla-ics.blogspot.com