plan - dtu bioinformatics · • a drug is a key molecule involved in a particular metabolic or ......
TRANSCRIPT
Plan Day 1: • What is Chemoinformatics and Drug Design? • Methods and Algorithms used in Chemoinformatics
including SVM. • Cross validation and sequence encoding • Example and exercise with hERG potassium channel:
Use of SVM in WEKA program
Day 2: • Exercise on MHC molecules.
• Lectures and compendium are available on the course program site at CBS.
• Data set for the hERG exercise and MHC exercises will have to be downloaded on your directory or your machine.
• WEKA program can be installed in your own machine or can be run from ”life” server at CBS.
Material
Drug discovery
Drug and drug design • A drug is a key molecule involved in a particular metabolic or signaling pathway that is specific to a disease condition or pathology.
• Action of activation (agonist) or inhibition (antagonist) to a biological target (protein, receptor, enzymes, cells…).
• Drug design is the approach of finding drugs by design, based on their biological targets.
• An important part of drug design is the prediction of small molecules binding to a target protein (pharmacophore, docking, QSAR,…)
F.K. Brown (1998): Annual Reports in Medicinal Chemistry, 33, 375-384 ”The use of information technology and management has become a
critical part of the drug discovery process. Chemoinformatics is the mixing of those information resources to transform data into information and information into knowledge for the intended purpose of making better decisions faster in the area of drug lead identification and optimization.”
Chemoinformatics- definition
Virtual screening- Chemoinformatics
• Exploit any knowledge of target(s) and/or active ligand(s) and/or gene family.
• It involves computational technique for a rapid assessment of large libraries of chemical structures in order to guide the selection of likely drug candidates.
Structure based
Docking…
Ligand based
QSAR, similarity search
Pharmacophore based
knowledge based
Chemical-Biology network
VS
Drug-likeness with a simple counting method,
‘rule of five’ Octanol-water partition coefficient (logP) ≤ 5 Molecular weight ≤ 500 No. hydrogen bond acceptors (HBA) ≤ 10 No. hydrogen bond donors (HBD) ≤ 5 If two or more of these rules are violated, the compound
might have problems with oral bioavailability. (Lipinski et al., Adv. Drug Delivery Rev., 23, 1997, 3.)
Rules have always exception.
(antibiotics, antibacterials and antimicrobials,…)
What the chemist sees.
Aspirin Sildenafil
CC(=O)OC1=CC=CC=C1C(=O)O CCCC1=NN(C2=C1NC(=NC2=O)C3=C(C=CC(=C3)S(=O)(=O)N4CCN(CC4)C)OCC)C
1D structure, line notation
SMILES – Simplified Molecular Line Entry System http://www.daylight.com/dayhtml/doc/theory/theory.smiles.html
SMILES
What a protein sees.
Aspirin Sildenafil
Electrostatic fields (red are negative and blue positive)
What a protein sees.
Aspirin Sildenafil
Green are hydrophilic area and red are hydrophobic areas
Ligand based- QSAR Based on a set of experimental data (biological activity,
solubility, toxicity, permeability,…) one tries to correlate these data with some descriptors.
But what can be these descriptors?
MACCS key
1D descriptors:
MW, number of features, sequence based
2D descriptors:
Topological, physichochemical, BCUT,…
Descriptors
QSAR based on 3 D interaction energies (GRID, CoMFA...)
000 001
002 003
004
DRY O N1 H2O
PCA and PLS model Structural model Molecular interaction field (GRID)
GRID: Determines a total interaction energy. Etot = Evdw + Eelec + Ehb
Descriptors
Blue areas represent the favorable electronegative region and red the unfavorable electronegative regions (based on CoMFA)
Green areas represent the favorable steric region and yellow the unfavorable steric regions (based on CoMFA).
The most common method used to correlate data are usually, PLS, SVM, ANN, K-means.
Example with Antimicrobial Peptides (AMPs)
AMPs are small (10-40 amino acids), cationic and amphipatic molecules, which are encoded directly from DNA. They are ubiquitous in nature, appearing in such diverse organisms as fungi, bacteria, plants, amphibians, insects, and mammals, where they establish a first-line defence mechanism against invading pathogens or competing organisms.
CAP18
Defensin Leucocin A
Tritrpticin
Activity S001 S002 S003 … S131
variant1 63
variant2 82
Activity = y + aS001 + bS002 + … + zS131
Novispirin
Classification from a sequence based analysis
• Translate sequence information to a quantitative description 29 physicochemical descriptors reduced to 3 latent variables
by Principal Component Analysis (PCA) Hellberg et al., J.Med.Chem. 30 (1987) 1126
• Extended to non-natural amino acids Sandberg et al., J. Med. Chem. 41 (1998) 2481
• Classification of GPCRs Lapinsh et al., Prot. Sci. 11 (2002) 795
Amino Acid z-Scales
Amino Acid z-Scales
Translate sequence information to a quantitative description 29 physicochemical descriptors reduced to 3 latent variables
by Principal Component Analysis (PCA) z1 (lipophilicity) z2 (steric properties) z3 (electrostatic properties) Ala Phe Lys
0.07 -1.73 0.09 -4.92 1.30 0.45 2.84 1.41 -3.14
Hellberg et al., J.Med.Chem. 30 (1987) 1126
Principal Property
Translation
• Principal property translation (z-scales) & multivariate data analysis
52 x 20
(52+n) x 60 z-scales PCA scores
and QSAR models
Novispirin: From sequences to data
training set of 52 Novispirin variants with 60 variables.
q2loo=0.40 ; r2=0.72 sdep =0.14
Novispirin: From structures to data • 131 descriptors: - CPSA descriptors (charged partial surface area) computed with the Tripos force field, dipole moment, energies,surface, weight (33).
- VolSurf descriptors defined interaction of molecules with biological membranes which is mediated by surface properties such as shape, electrostatic, hydrogen-bonding and hydrophobicity (94).
- theoretical descriptors according of the hydrophobicity of each amino acid defined by Engleman-Steitz (4).
48 variables selected with a fractional factorial design (FFD).
q2loo=0.64 ; r2=0.79 sdep =0.11
Single mutation Novispirin prediction with a QSAR model. Each position of the Novispirin sequence is mutated with the 20 naturally amino acids. The residual activity of peptide variants is defined according of the wild type activity. Then, all point superior to 0 represent a peptide mutant with a better predict activity compared to the original novispirin.
position 4
Taboureau Methods Mol. Biol. 2010 Taboureau et al. Chem.Biol.Drug.Des. 2006. Raventos et al. CCHTS. 2005 Mygind et al. Nature. 2005
Pharmacophore
A pharmacophore is the ensemble of steric and electronic features that is necessary to ensure the optimal supramolecular interactions with a specific biological target structure and to trigger (or to block) its biological response. A pharmacophore does not represent a real molecule or a real association of functional groups, but a purely abstract concept that accounts for the common molecular interaction capacities of a group of compounds towards their target structure.
Definition
Pharmacophore: chemical features • The chemical features can be hydrogen bonds acceptors, hydrogen bond
donors, charge interactions, hydrophobic areas, aromatic rings, positive or negative ionizable group.) The shape or volume is also considered.
• Pharmacophores represent chemical functions, valid not only for the curretly bound, but also unknown molecules. The steric hindrance may explain lack of activity.
Hyd Acc
Acc
Acc & Don
Aro
Start to be complex !!!
Pharmacophore
Atom is acceptor if it’s a nitrogen, oxygen or sulfur and not an amide nitrogen, aniline nitrogen and sulfonyl sulfur and nitro group nitrogen
Donor
Acceptor
Acceptor
Aro ring center
Example 1
Pharmacophore
Pharmacophore Example 2 with 3 inhibitors
• Dopamine (2 rotations and 2 OH groups)
• Apomorphine (no rotations)
• 5-OH DPAT (one OH group and many rotation)
Agonist at D2 receptor
Example 2
Active agonists define important groups: – -Aromatic ring – -meta OH group – -N atom, righ distance from aromatic ring – -other molecular ”scaffolding does NOT get in the way at the receptor
Pharmacophore
Pharmacophore
Pharmacophore
Are features of the site unique to hERG?
Structural based design: Docking
Docking
Kellenberger E. et al. Proteins 2004, 57(2): 225-242
Structural based design: Docking
Comparison of some of the docking tools.
Structural based design: Docking
Induced fit docking
Lock and Key
+ Substrate (ligand)
Enzyme (receptor) + Substrate
(ligand) Enzyme (receptor)
Induced Fit
Zhou, Sciences 2007
Structural based design: an example with antidepressant
Chemogenomics and Pharmacogenomics Chemogenomics: Studied the biological effect of a wide array of small molecules on a wide array of macromolecular targets. Pharmacogenomics: Design drugs according of the genetic variation.
Chemical network: example with neurotransmitter transporters
A441G
S438T
A173S
V343S V343N
A173S CIT: 4.5x gain of function Des-F: 16.5x gain of function
S438T CIT: 175x loss of function Monomethyl: 3.5x loss of function
A441G CIT: 2.5x gain of function CIT with “ears”: 2x gain of function
V343S CIT: 4.5x gain of function Des-CN: 4.9x gain of function [Des-F: 7.4x gain of function]
V343N CIT: 3.5x gain of function Des-CN: 35.5x gain of function [Des-F: 3.8x gain of function]
Chemogenomics and Pharmacogenomics: an example with citalopram
A441G
S438T
A173S
V343S V343N
A173S CIT: 4.5x gain of function Des-F: 16.5x gain of function
S438T CIT: 175x loss of function Monomethyl: 3.5x loss of function
A441G CIT: 2.5x gain of function CIT with “ears”: 2x gain of function
V343S CIT: 4.5x gain of function Des-CN: 4.9x gain of function [Des-F: 7.4x gain of function]
V343N CIT: 3.5x gain of function Des-CN: 35.5x gain of function [Des-F: 3.8x gain of function]
Chemogenomics and Pharmacogenomics: an example with citalopram
Conclusion
According of the information you have, different strategies can be used.
If you can develop different strategies which come to the same conclusion, that will reinforce your hypothesis.
The most information (experimental) you have, the better your validation will be.
Time for a break !!!