the badapple promiscuity plugin for bard

33
The BADAPPLE promiscuity plugin for BARD Evidence-based promiscuity scores Jeremy Yang, UNM Oleg Ursu, UNM Cristian Bologa, UNM Anna Waller, UNM Larry Sklar, UNM Tudor Oprea, UNM ACS National Meeting – Indianapolis, IN -- Sep. 8-12, 2013 CINF: Integrative Chemogenomics Knowledge Mining Using NIH Open Access Resources

Upload: jeremy-yang

Post on 04-Jun-2015

601 views

Category:

Technology


0 download

DESCRIPTION

The BADAPPLE promiscuity plugin for BARD; Evidence-based promiscuity scores (ACS National Meeting, Indianapolis, IN, Sept. 9, 2013)

TRANSCRIPT

Page 1: The BADAPPLE promiscuity plugin for BARD

The BADAPPLE promiscuity plugin for BARD

Evidence-based promiscuity scores

Jeremy Yang, UNM Oleg Ursu, UNM

Cristian Bologa, UNM Anna Waller, UNM

Larry Sklar, UNM Tudor Oprea, UNM

ACS National Meeting – Indianapolis, IN -- Sep. 8-12, 2013 CINF: Integrative Chemogenomics Knowledge Mining Using NIH Open Access Resources

Page 2: The BADAPPLE promiscuity plugin for BARD

What is BADAPPLE?

• BioActivity Data Associative Promiscuity Pattern Learning Engine

• Bioassay data analysis algorithm • Scaffold association patterns • Evidence-based • Robust to noise and errors

UNM

Page 3: The BADAPPLE promiscuity plugin for BARD

What is promiscuity?

• Un-selective bioactivity • Normally undesirable • Evolving conceptions:

– Polypharmacology – Systems biology – Systems chemical biology

3

Page 4: The BADAPPLE promiscuity plugin for BARD

Promiscuity & bioassay data analysis: Selected references

• Frequent hitters (2002, Schneider &al.)

• Aggregators (2003, Shoichet &al.)

• ALARM NMR (2004, Hajduk &al.)

• Promiscuous scaffolds [talk] (2007, Bologa)

• Pan-Assay Interference (2010, Baell &al.)

• PubChem Promiscuity (2011, Canny &al.) 4

Page 5: The BADAPPLE promiscuity plugin for BARD

Prerequisites for Promiscuity Data Analysis

• Definition of unique chemical entity?

– Yes

• Definition of unique biological entity?

– Challenging

• I.e., Rigorously calibrated bioactivity matrix

5

CN/C(=C\[N+](=O)[O-])/NCCSCC1=CC=C(O1)CN(C)C.Cl

VLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSHGSAQVKGHGKKVADALTNAVAH VDDMPNALSALSDLHAHKLRVDPVNFKLLSHCLLVTLAAHLPAEFTPAVHASLDKFLASVSTVLTSKYR

Page 6: The BADAPPLE promiscuity plugin for BARD

2211 Chemicals On 159 Targets

Matrix c/o Tudor Oprea, Scott Boyer

(AZ), 2011

Bioactivity matrices depend on rigorous informatics

6

Page 7: The BADAPPLE promiscuity plugin for BARD

2211 Chemicals On 159 Targets

Bioactivity matrices depend on rigorous informatics

chemistry

biology

“1 chemistry” = unique compound or… = unique scaffold

“1 biology” = unique target? or… = unique assay?

Chemical biology space: To define a space must define a point 7

Matrix c/o Tudor Oprea, Scott Boyer

(AZ), 2011

Page 8: The BADAPPLE promiscuity plugin for BARD

Promiscuity: related concepts

• Assay-interferers • Experimental

artifacts • Frequent hitters • False positives • True positives

• True non-selective actives

• Aggregators • Reactives • Cytotoxic

8

Page 9: The BADAPPLE promiscuity plugin for BARD

Badapple promiscuity score: a practical definition for bioassay

data analysis

• Purpose: Streamline molecular discovery project workflow.

• Hence: Score designed to detect "false trails" (true-promiscuous OR false positives) unlikely to be productive leads.

9

Page 10: The BADAPPLE promiscuity plugin for BARD

What is evidence-based?

• Empirical data analysis

• Mistakes can be un-learned.

• Data driven

• Not: Expert systems

10

Page 11: The BADAPPLE promiscuity plugin for BARD

Drawbacks of evidence-based

• Theories proven wrong. Ouch.

• Reality is messy.

• GIGO.

11

Page 12: The BADAPPLE promiscuity plugin for BARD

Benefits of evidence-based

More wins. More knowledge. Lower costs. Progress.

(*) N.B. key role of Dick Cramer, ACS CINF 2013 Skolnik award winner. 12

(*)

Page 13: The BADAPPLE promiscuity plugin for BARD

BADAPPLE scoring function

• Scaffold score • By scoring scaffolds, more relevant evidence • Substances, assays and samples considered • Penalize under-sampling • Score is a statistic, not a "model"! • Inherently "validated"

13

Page 14: The BADAPPLE promiscuity plugin for BARD

Avoiding overfitting; being skeptical of scanty evidence

14

Page 15: The BADAPPLE promiscuity plugin for BARD

Avoiding overfitting: Moneyball style

Sabermetrics leads the way.

The Signal and the Noise: Why So Many Predictions Fail—But Some Don't, Nate Silver (2012).

http://mark.stubbornlights.org/phils/archives/2004_05.html

15

Page 16: The BADAPPLE promiscuity plugin for BARD

*http://pasilla.health.unm.edu/tomcat/biocomp/badapple

BADAPPLE public web app

16

Page 17: The BADAPPLE promiscuity plugin for BARD

Why Scaffolds?

biology - birds, bees, nature chemistry - chemists

SAR - drugs IP - lawyers

Page 18: The BADAPPLE promiscuity plugin for BARD

Molecular scaffolds are special and useful guides for discovery

Jeremy Yang, UNM & IU Cristian Bologa, UNM

David Wild, IU Tudor Oprea, UNM

ACS National Meeting - Sept. 8-12, 2013 - Indianapolis, IN

CINF grad student symposium 9/8/13 & ACS SciMix 9/9/13 8pm

Page 19: The BADAPPLE promiscuity plugin for BARD

There's something about scaffolds... Especially the "privileged few"...

pScore

Count

Dataset: BARD, MLSMR, MLP HTS Totals: compounds: 373,802 ; scaffolds: 146,024 ; assays: 528 ; wells/results: 30,612,714

19 Promiscuity score distribution

Page 20: The BADAPPLE promiscuity plugin for BARD

Scaffolds & drug-scaffolds, the privileged few explaining a lot of activity...

% of active

samples

i_scaf, descending pScore order

Dataset: BARD, MLSMR, MLP HTS Totals: compounds: 373,802 ; scaffolds: 146,024 ; assays: 528 ; wells/results: 30,612,714; drugs: 283; drugscafs: 1958

20

Page 21: The BADAPPLE promiscuity plugin for BARD

Scaffolds & drug-scaffolds, the privileged few explaining a lot of activity...

Dataset: BARD, MLSMR, MLP HTS Totals: compounds: 373,802 ; scaffolds: 146,024 ; assays: 528 ; wells/results: 30,612,714; drugs: 283; drugscafs: 1958

% total activity

# scaffolds % scaffolds

All 50% 1979 1.4%

All 75% 11,645 8%

Drugs 50% 54 2.8%

Drugs 90% 327 16.7%

“total activity” = active scaffold-instances

21

Page 22: The BADAPPLE promiscuity plugin for BARD

Scaffold visualization: Molecule Cloud (Badapple-scaled)

The Molecule Cloud - compact visualization of large collections of molecules, Peter Ertl and Bernd Rohde, J. Cheminformatics, 2012, 4:12. 22

Page 23: The BADAPPLE promiscuity plugin for BARD

What is BARD?

• BioAssay Research Database, http://bard.nih.gov • MLP: 1000+ assays, 400k+ cpds, 200M data • Manual assay annotations + QA • BARD Assay Ontology • Assay Data Standard • Community platform for bioassay data analysis

& computation

25

Page 24: The BADAPPLE promiscuity plugin for BARD

BARD + Badapple Synergy

• BARD ontology, based on BAO

• BARD raising "semantic IQ" of public bioassay

data.

• Evidence = information = data + metadata

http://bard.nih.gov/

Page 25: The BADAPPLE promiscuity plugin for BARD

BARD Plugin Platform: Community development Enterprise deployment

• BARD Plugin spec (IPlugin interface) • BARD PluginValidator class • Java, JAX-RS, Jersey, REST • BARD REST API • WAR deployment, discoverable

fig c/o Steve Brudz, Broad

Page 26: The BADAPPLE promiscuity plugin for BARD

Badapple Plugin via BARD web client

28

Discovery scenario:

Rapidly flags potentially

problematic (notorious) scaffolds.

Page 27: The BADAPPLE promiscuity plugin for BARD

BARD Synergy, next steps: Raising Badapple to the next level

Assay Definition

Assay Protocol

Biology

BARD Classes

biology

chemistry

Re-calibrating the bioactivity matrix for improved rigor, accuracy & sensitivity, using the BARD ontology.

Page 28: The BADAPPLE promiscuity plugin for BARD

BARD Synergy, next steps: Raising Badapple to the next level

Re-calibrating the bioactivity matrix for improved rigor, accuracy & sensitivity, using the BARD ontology.

Biology Types

Assay Formats

Protein Classes

Some re-calibrations of interest:

Page 29: The BADAPPLE promiscuity plugin for BARD

BARD Synergy, next steps: Raising Badapple to the next level

cell-based format, 540, 60% single protein format,

161, 18%

biochemical format, 76, 8%

protein complex format, 38, 4%

whole-cell lysate format, 32, 4%

small-molecule format, 11, 1%

protein format, 10, 1%

Assays: assay format

cell-based format

single protein format

biochemical format

protein complex format

whole-cell lysate format

small-molecule format

protein format

tissue-based format

nucleic acid format

organism-based format

cell membrane format

microsome format

cell-free format

mitochondrion format

(blank)

Page 30: The BADAPPLE promiscuity plugin for BARD

BARD Synergy, next steps: Raising Badapple to the next level

PROTEIN, 514, 47%

PROCESS, 341, 31%

GENE, 217, 20%

Biology Types

PROTEIN

PROCESS

GENE

biology

AASUBSTITUTION

NCBI

Page 31: The BADAPPLE promiscuity plugin for BARD

BARD Synergy, next steps: Raising Badapple to the next level

33

transporter, 112, 28%

receptor, 56, 14%

nucleic acid binding, 52, 13%

enzyme modulator, 45, 11%

cell adhesion molecule, 24, 6%

transferase, 23, 6%

hydrolase, 22, 6%

signaling molecule, 19, 5%

extracellular matrix protein, 16, 4%

oxidoreductase, 11, 3% membrane traffic protein,

5, 1%

Protein Classes

transporter

receptor

nucleic acid binding

enzyme modulator

cell adhesion molecule

transferase

hydrolase

signaling molecule

extracellular matrix protein

oxidoreductase

membrane traffic protein

transfer/carrier protein

transcription factor

cytoskeletal protein

storage protein

isomerase

defense/immunity protein

Page 32: The BADAPPLE promiscuity plugin for BARD

Conclusions

• Badapple exemplar as BARD plugin

• Badapple exemplar as evidence-based

algorithm

• New BARD semantic capabilities will

elevate Badapple to next level.

• Promiscuity a complex issue. 34

Page 33: The BADAPPLE promiscuity plugin for BARD

Acknowledgements • Steve Mathias, UNM • Chris Lipinski, Melior • Rajarshi Guha, NCATS • Stephan Schurer, UMiami

• Uma Vempati, UMiami • Mark Southern, Scripps • BARD Engineering

Working Group

35