approaches to rational drug designhomes.nano.aau.dk/fp/md/chemogenomic.pdf · – important target...

Chemogenomic: Approaches to Rational Drug Design

Jonas Skjødt Møller

Chemogenomic

• Chemistry

• Biology

• Chemical biology

• Medical chemistry

• Chemical genetics

• Chemoinformatics

• Bioinformatics

• ChemoproteomicsThe study of small-molecular-weight drug candidates on gene/protein function.

Chemogenomics defines, in principle, the screening of the

chemical universe, i.e., all possible chemical compounds,

against the target universe, i.e., all proteins and other potential

drug targets.

Chemogenomic

Mission impossible!!

The solution…the method defines the screening of congeneric chemical

libraries against certain target families, e.g., the G protein-

coupled receptors, nuclear receptors, different protease

families, kinases, phosphodiesterases, ion channels,

transporters, etc.

Chemogenomic

• Requirements– A compound library

– A representative biological system

• Target library

• Single cell

• Organism

– A reliable readout

• Gene/protein expression

• High-throughput screening (binding or functionality assays)

Chemogenomic

Completion of a two-dimentional matrix, representing the

interaction of tragets/genes and compounds by values of

binding affinities (Ki) or functional effect (IC50).

Chemogenomic

• Assumptions for any chemogenomi-based approach

(a)Compounds sharing some chemical similarity should also share targets.

(b) Targets sharing similar ligands should share similar patterns (binding sites).

Question – How do we measure the distances between two ligands or two targets?

Ligand and Target Spaces

• Distance measuring between two compounds is done by solving a similarity matrix

• The compounds properties are often described using descriptors

• Descriptor classification– One-dimensional

– Two-dimensional

– Three-dimensional

Ligand Space

• 1D descriptors– Easy and fast to compute

– Describe global properties (MW, atom and bond counts)

– Based on the chemical formulae

• Prediction of physicochemical properties– Polar surface area

– Solubility

– Rings

• Discrimination between compound sets– Drugs vs. nondrugs

– Ligands from targetfamilies

• 1D linear representations of compounds– SMILES (Simplified Molecular Input Entry System)

Ligand Space (Descriptors)

• 2D descriptors– Most common ligand descriptors

– Describe topological properties (maximum common substructure, structural keys)

– Encode both atomic and bond properties

• 2D sketch figure – Scanning libraries for similar substructures or fragments

• Graph-based method– Molecular graph (subfamily clustering)

– Computational slow

• Fingerprint-based method– Bit strings (0’ and 1’ = atoms, fragments, rings..)

– Fingerprints easy for comparison

– Also used in receptor-ligand recognition


• 3D descriptors– Describe conformational properties (atomic coordinates,

potentials, fields, shapes)

• Necessities for proper alignment– Comparison in same 3D Cartesian space

– Conformational space accessible to each ligand

• Bit strings vs. structure comparison– Structure comparison can produce false positives

– 3D information is stored in bit strings

• Binary representation of 2D or 3D properties– Tanimoto coefficient (simple similarity indicies)


• Chemoproteomics– Traget = proteins

Dimension Classification scheme Databases

1D By sequence

By patterns

UniProt, Pfam

PRINTS, PROSITE

2D By secondary structure,

fold

SCOP, CATH

3D By atomic coordinates

By binding site

PDB, MODBASE

BindingMOAD, sc-PDB

Target Space

• The amino acid sequence (1D)– Clustering of targets into target-families

– Large variation in sequence length even among family members

– e.g., human GPCRs range from 290 to 6200 residues

• Structural motifs (2D)– Mapping of a-helices, b-sheets, coils and random structures

• 3D Structure– Atomic coordinates derived by X-ray diffraction or NMR

– Structural fold

– Ligand-binding site, higher similarity among related targets

• Pharmacological profile– Binding affinity for a panel of ligands

– Modifying pharmacological profiles of druges are widely used in drug design

Target Space

• Full matrices (affinity or structural information)– Experimental data are stored in the matrices

– Affinity of a new compound to a known target

– Measuring structure-activity relationships

– Prediction of a global pharmacological profile

• Advantages– Based on experimental data

– Superior to computed descriptors

• Disadvantages– Enormous amount of data is necessary

– Highly cost consuming (not realistic in academic environments)

• Interaction fingerprints (IFPs)– Replacement of affinity with molecular interaction descriptors

– Conversion of atomic coordinates of protein-ligand complexes into bit strings.

Protein-Ligand Space

• Annotating ligand libraries– Molecules sharing enough similarity to existing ligands for which

a target profile is known have enhanced probability of sharing the same biological profile.

• Ligand libraries– Targets

– In vitro affinity data

– ADME properties

• Biological annotated compound libraries– AurSCOPE (160.000 GPCR ligands and 77.000 kinase inhibitors)

– MedChem database (Biological and pharmacological information of 650.000 compounds)

– ChemBank (50.000 compounds in 441 high-troughput screening assays)

• Natural product-oriented chemical libraries– Evolutionary pressure

– Highly specific binding mechnisms

Ligand-based Chemogenomic

• Coined by Evans et al. (1,4-benzodiazepine scaffold)

A privileged structure is defined as a substructure or scaffold exhibiting strong preference for a particular area of the target space.

– Suitable to orient design of trageted compound libraries

– Biphenyl: protein-binding motif

• No particular preference for target family

• 2-tetrazolo-biphenyl → GPCRs

• Only few are really selective

Ligand-based: Privileged Structures

• Target fishing– Reference compounds set (known 2D or 3D descriptors)

– Screening procedure (QSAR, Bayesian analysis or pharmacophore)

– Screening collection for identification of new compounds

Ligand-based In silico Screening

• Mestres et al.– Library of molecules targeting nuclear hormone receptors NHR

• 2000 ligands

• 25 receptors

– Easily distinction between selective and promiscuous scaffolds

• SHannon Entropy Descriptors SHED

• Novartis– Prediction of target profiles from extended connectivity fingerprints

• Machine learning algorithm based on Bayesian statistics

• Wombat database (1230 unique SMILES)

– Bayesian models was produced (trained) for each activity class

– Prediction is done by calculating the probability of each test compound to become a ligand for each of the tragets

– Improvement by concatenate all target-associated probabilities

• Bayes affinity fingerprint

– 2D descriptors was more predictive than 3D (not for singletons)


• Drawback– Categorization of training set compounds according to their

molecular target, without checking:

• Does it really bind?

• Where it binds?

• How it binds?

– Training a machine learning algorithm with incorrect data

• Alternatives– 3D pharmacophores from protein-ligand complexes

• Experimentally determined atomic coordinates

• Experimentally determined pharmacological activities

– Limited chemical diversity observed among PDB ligands


• Selectivity control– Selectivity of ligands among family related targets

– Proteome-wide comparative modeling

• Structural data (X-ray or NMR)

• Sequence-based comparison

• Structure-based comparison– Comparing Molecular fields

– Comparing 3D structures

Target-based Chemogenomic

• Multiple alignment of all targets– Comparison of any kind of target families

– Lack of high-resolution structural data

• GPCRs are ideal candidates for sequence-based comparison– Only bovine rhodopsin has been crystallised

– Important target family for drug design

• Key residues are extracted and concatenated– Ungapped sequence (30 residues)

– Distance matrix based on:

• Sequence identity

• Sequence similarity

• Physicochemical properties

• Cavity-based clustering of 372 human GPCRs – Reproduced a perfect full sequence –based tree

– Target comparison across a family is possible using only few residues

• Applications– Simple analysis of binding site regions by residue conservation

– Target hopping used to discover receptor ligands to a particular receptor

Sequence-based Comparison

Sequence-based Comparison

“High-resolution structural data is crucial for homology modeling,

however only the ligand-binding site are compared”

• Comparing Molecular Fields– Molecular interaction fields (MIFs)

– Structural alignment of targets

– Interaction energies

• Probe atoms at each point

of a 3D grid (binding site)

– MIFs placed in a global matrix

• Rows: Targets

• Columns: Interaction energies

– Analysis either by:

• Principal component analysis

• Hierarchical clustering

– Highly dependent on:

• Structural alignment, grid reso-

lution and probe atoms

Structure-based Comparison

• Comparing 3D Structures– Global structural alignment methods

• GASH

• DaliLite

• CE

– Alignment of predefined structural motifs

• Matching templates to a reference protein

• Not all proteins sharing binding sites for a particular ligand share any structural template similarities

– Structural alignment by physicochemical property description

• Surface-based comparison – Relatively slow and thus incompatible with proteome-wide comparison

• SuMO, Cavbase, SiteEngine, SitesBase and CPASS– Emerged in the last years

– Represent active site by pseudocenters encoding physicochemical properties (H-bonding, capacity, aromaticity, hydrophobicity and charge)

– Pseudocentres are linked by edges providing a molecular graph

– Detection of maximal common subgraphs (clique detection)

– Detection of local similarities at ligand-binding subpockets for proteins with totally different fold and catalytic activities


• Comparing 3D Structures– Interpretation of computated similarity scores often difficult

• Active sites of different dimensions– Larger sites tend to present more matches even if the smallest is more

similar

– Surgand et al. projected an active site on a dimensionless 80-triangled sphere of cavity descriptors

– Measuring normalized distance in descriptor space


• Chemical annotation of target binding sites– Various chemical compound libraries exist

• Binding information is crucial

• Protein/binding site must annotated by ligand chemotype

• SMID (Small Molecule Interaction Database) annotate protein sequence by domain-specific ligands

– Browse likely ligands to a protein of unknown 3D structure

– Ligand-annotated binding sites from PDB

• BindingMOAD and sc-PDB– Pharmacological point of view

• Prioritize ligands for designing targeted compound libraries

Target-Ligand-based Chemogenomic

“To browse and predict protein-ligand complexes, one needs to

set up simple descriptors for both ligands and proteins from

knowledge databases and concatenate them into a single

protein-ligands description.”

• Two dimensional searches– Use experimental binding affinity matrices and define appropriate

QSAR models to predict affinity of new compounds

• Three-dimensional searches– Dock each ligand of compound library into each active site of target

library

– Molecular inverse docking approach

• Scoring functions cannot quantify very heterogeneous protein-ligand complexes

– Computation of IFP strings

• Converts 3D information about protein-ligand interaction to 1D


• Three-dimensional searches

– 3D-based docking-independent methods

• Retrieving ligand from protein and vice versa

• Encode protein and ligand properties with similar descriptors

– CoLiBRI (complementary ligands based on receptor information)

• Ligand and protein described using same molecular descriptors (TAE-RECON)

• Shape and electronic properties of isolated atoms

• Mapping patterns of active sites onto patterns of their complementary ligands and vice versa

• High test results when similar training set


• High-troughput data (structure, binding affinity, etc.)– Ligand

– Target

• Linking data either by ligand or target focusing– Target-based Chemogenomics

– Ligand-based Chemogenomics

– Target-Ligand-bases Chemogenomics

• Selectivity profiles for therapeutic usage– Not more selective ligands

– In silico approach

Final remarks

Chemogenomic: Approaches to Rational Drug Design

Jonas Skjødt Møller

approaches to rational drug designhomes.nano.aau.dk/fp/md/chemogenomic.pdf · – important target...

Documents