structural browsing indices, spotfire and drug discovery mark johnson 1 and yong-jin xu 2 1...
TRANSCRIPT
Structural Browsing Indices, Spotfire and Drug Discovery
Mark Johnson1 and Yong-jin Xu2
1Pannanugget Consulting; 2Pharmacia, Inc.
Spotfire Users Conference
Philadelphia, May, 2001
[email protected]© 2001, Pannanugget and Pharmacia, Inc. - All Rights Reserved.
Pulling Nuggets out of the Avalanche of Data
High-throughput screening
Larger project teamsCombinatorial chemistry
Microarray technology External databases
Internal databases
Predicted parameters
Data mergers & collaborations
A Low-Content View of the ACD Based on the Number of Atoms and Colored by the Number
of Cyclic-System Hetero Atoms
A High-Content Cyclic-System View of the ACD
The Distinction between Low and High-Content Views is Gained or Lost in the First Step of Data Visualization
Raw data
Data tables
Visual structures
Views
Data transformations
Visual mappings
View transformations
Set of complex objects
Imposed space of points
Scatter plot or histogram
Integrated Visualization
Card, Mackinlay, & Shneiderman “Readings in Information Visualization”, p17
Viewing High-Dimensional Binary-Vector Spaces in Spotfire using Keyword-List Variables
Complex object identifier
Single high-content keyword-list variable
5-dimensional vector of 5 low-content variables
Compound number Functional-group list Acid Amide Ketone Sulfide Amine
1 ketone 0 0 1 0 0
2 sulfide 0 0 0 1 0
3 amide amine 0 1 0 0 1
4 acid 1 0 0 0 0
5 acid amine 1 0 0 0 1
The Three Ways of Organizing Molecular Structures
Substructure Partial Orderings
Molecular Similarity Spaces
Structural Browsing Indices
Questions Associated with Substructure Partial Orderings
What structures contain a particular substructure?
What structures contain a particular generic substructure?
The basic spatial representation is a single indicator variable containing 1 for those structures satisfying the request and 0 otherwise.
Questions Associated with Molecular Similarity Spaces
What structures are similar to a particular structure?
How diverse is a collection of structures?
Will a marketed collection of structures add significant diversity to our collection?
How much do two collections overlap?
The basic spatial representation is a high-dimensional table of low-content variables and/or possibly a matrix of pair-wise similarities (for collections of a 100 or less).
N
N
N
O
O
Cl
N
N
ON
O
Cl
N
N
N
N
N
N
*
O *N
O
Cl*
8YP
6F1M
117Q1QK0
BBU30
1FU4H
2FHH
XWS8
WF5R
Compound meqnum from thehydrogen-reduced chemical graph
Compound skeletalmeqnum
Cyclic-skeletal meqnumCyclic-system meqnumSide-chains meqnums117Q 70XG 1QK0
Maximal functional-group meqnums 2FHH WF54 XWS8 EQQW
Ring-systems meqnums8YP A2J3 6FIM
Some Molecular Equivalence Classes to which the 1-hydroxyethyl-azepinAnalog of Azelastine will be Mapped
A2J3
70XG
KUG2M
DE2VR
EQQW
Questions Associated with Structural Browsing Indices
Which structural classes are represented in a collection?
Where is the overlap in two collections?
Which classes of structures turned up active in a high-throughput screening program?
What templates and positions have been investigated in a lead-optimization program and which are critical?
The basic spatial representation is a small number of high-content variables with locally-related values.
Demos
Exploring the ACD with a cyclic-system ordering
Browsing lead-optimization synthetic efforts using a cyclic-system ordering and a side-chain ordering.
Showing ACE inhibitors are aggregated in a maximal functional-group space
Discipline Keyword listsCheminformatics Functional groups
Ring systems
Side chains
Similar compounds
Activity profiles
Toxicology Toxicity profile
Metabolic reactions
Genomics Molecular function
Biological process
Cellular component
Similar genes or proteins
Some Types of Keyword-List Variables in Drug Discovery.
Structural Browsing Indices: What, if anything besides the name, is new?
Ring systems, functional groups, side chains are almost as old as chemistry.
Adamson in the early 70s perceived and tabulated ring systems.
Carhart et al. (1975) explored the reduced skeleton of a ring system.
Lynch’s Sheffield group (1987) explored generic structure representations.
Lawson’s similarity number (1990) is a “set-valued” browsing variable.
Bemis and Murcko (1996, 1999) tabulated side chains, cyclic-systems, cyclic skeletons.
LeadScope (2000) hierarchically structures overlapping classes based on chemical functionality and means of relating these to chemical properties.
The need for the systematic development of molecular-equivalence-based browsing indices.
The concept of a keyword-list variable and its importance in the visual data analysis.
Where will I be heading with Spotfire applications?
• Helping to shape and implement a vision of distributed visual data mining through publications, consulting, and workshops.
• Develop and distribute programs for constructing structural browsing indices and other keyword-list variables.
• Write a book on visual data mining.
Acknowledgements
Yong-jin Xu
CADD group
Pharmacia & Upjohn chemists and biologists
Research Informatics Team
Bob Pearlman
Molecular Design, Inc.
Spotfire Inc.