structural browsing indices, spotfire and drug discovery mark johnson 1 and yong-jin xu 2 1...

Post on 04-Jan-2016

213 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Structural Browsing Indices, Spotfire and Drug Discovery

Mark Johnson1 and Yong-jin Xu2

1Pannanugget Consulting; 2Pharmacia, Inc.

Spotfire Users Conference

Philadelphia, May, 2001

mark@pannanugget.com© 2001, Pannanugget and Pharmacia, Inc. - All Rights Reserved.

Pulling Nuggets out of the Avalanche of Data

High-throughput screening

Larger project teamsCombinatorial chemistry

Microarray technology External databases

Internal databases

Predicted parameters

Data mergers & collaborations

mark@pannanugget.com

A Low-Content View of the ACD Based on the Number of Atoms and Colored by the Number

of Cyclic-System Hetero Atoms

A High-Content Cyclic-System View of the ACD

The Distinction between Low and High-Content Views is Gained or Lost in the First Step of Data Visualization

Raw data

Data tables

Visual structures

Views

Data transformations

Visual mappings

View transformations

Set of complex objects

Imposed space of points

Scatter plot or histogram

Integrated Visualization

Card, Mackinlay, & Shneiderman “Readings in Information Visualization”, p17

Viewing High-Dimensional Binary-Vector Spaces in Spotfire using Keyword-List Variables

Complex object identifier

Single high-content keyword-list variable

5-dimensional vector of 5 low-content variables

Compound number Functional-group list Acid Amide Ketone Sulfide Amine

1 ketone 0 0 1 0 0

2 sulfide 0 0 0 1 0

3 amide amine 0 1 0 0 1

4 acid 1 0 0 0 0

5 acid amine 1 0 0 0 1

The Three Ways of Organizing Molecular Structures

Substructure Partial Orderings

Molecular Similarity Spaces

Structural Browsing Indices

Questions Associated with Substructure Partial Orderings

What structures contain a particular substructure?

What structures contain a particular generic substructure?

The basic spatial representation is a single indicator variable containing 1 for those structures satisfying the request and 0 otherwise.

Questions Associated with Molecular Similarity Spaces

What structures are similar to a particular structure?

How diverse is a collection of structures?

Will a marketed collection of structures add significant diversity to our collection?

How much do two collections overlap?

The basic spatial representation is a high-dimensional table of low-content variables and/or possibly a matrix of pair-wise similarities (for collections of a 100 or less).

N

N

N

O

O

Cl

N

N

ON

O

Cl

N

N

N

N

N

N

*

O *N

O

Cl*

8YP

6F1M

117Q1QK0

BBU30

1FU4H

2FHH

XWS8

WF5R

Compound meqnum from thehydrogen-reduced chemical graph

Compound skeletalmeqnum

Cyclic-skeletal meqnumCyclic-system meqnumSide-chains meqnums117Q 70XG 1QK0

Maximal functional-group meqnums 2FHH WF54 XWS8 EQQW

Ring-systems meqnums8YP A2J3 6FIM

Some Molecular Equivalence Classes to which the 1-hydroxyethyl-azepinAnalog of Azelastine will be Mapped

A2J3

70XG

KUG2M

DE2VR

EQQW

Questions Associated with Structural Browsing Indices

Which structural classes are represented in a collection?

Where is the overlap in two collections?

Which classes of structures turned up active in a high-throughput screening program?

What templates and positions have been investigated in a lead-optimization program and which are critical?

The basic spatial representation is a small number of high-content variables with locally-related values.

Demos

Exploring the ACD with a cyclic-system ordering

Browsing lead-optimization synthetic efforts using a cyclic-system ordering and a side-chain ordering.

Showing ACE inhibitors are aggregated in a maximal functional-group space

Discipline Keyword listsCheminformatics Functional groups

Ring systems

Side chains

Similar compounds

Activity profiles

Toxicology Toxicity profile

Metabolic reactions

Genomics Molecular function

Biological process

Cellular component

Similar genes or proteins

Some Types of Keyword-List Variables in Drug Discovery.

Structural Browsing Indices: What, if anything besides the name, is new?

Ring systems, functional groups, side chains are almost as old as chemistry.

Adamson in the early 70s perceived and tabulated ring systems.

Carhart et al. (1975) explored the reduced skeleton of a ring system.

Lynch’s Sheffield group (1987) explored generic structure representations.

Lawson’s similarity number (1990) is a “set-valued” browsing variable.

Bemis and Murcko (1996, 1999) tabulated side chains, cyclic-systems, cyclic skeletons.

LeadScope (2000) hierarchically structures overlapping classes based on chemical functionality and means of relating these to chemical properties.

The need for the systematic development of molecular-equivalence-based browsing indices.

The concept of a keyword-list variable and its importance in the visual data analysis.

Where will I be heading with Spotfire applications?

• Helping to shape and implement a vision of distributed visual data mining through publications, consulting, and workshops.

• Develop and distribute programs for constructing structural browsing indices and other keyword-list variables.

• Write a book on visual data mining.

mark@pannanugget.com

Acknowledgements

Yong-jin Xu

CADD group

Pharmacia & Upjohn chemists and biologists

Research Informatics Team

Bob Pearlman

Molecular Design, Inc.

Spotfire Inc.

mark@pannanugget.com

top related