pharmacophore qsar et alfch.upol.cz/wp-content/uploads/2015/07/uk_dd_lbdd1_berka_vz4.pdf · search...
TRANSCRIPT
Ligand-based Methods Chemical Libraries
Similarity
Pharmacophores
RNDr. Karel Berka, PhD
RNDr. Jindřich Fanfrlík, PhD
RNDr. Martin Lepšík, PhD
Dpt. Physical Chemistry, RCPTM, Faculty of Science,
Palacky University in Olomouc
Drug Design
Outline
• Structure-based drug design (SBDD)
– Docking
– Virtual screening
– de novo design
– Pharmacophore search
• Ligand-based drug design (LBDD)
– Similarity matching
– Pharmacophore search
– QSAR
– ADMET
2
Possibilities of Drug Design
Known ligand Unknown ligand K
now
n targ
et
str
uctu
re
Unknow
n targ
et
str
uctu
re
Structure-based drug design
(SBDD)
Docking
Ligand-based drug design
(LBDD)
1 or more ligands
• Similarity search
Several ligands
• Pharmacophore
Large number of ligands (20+)
• Quantitative Structure-Activity
Relationships (QSAR)
De novo design
CADD not possible
some experimental
data needed
ADMET filtering
CHEMICAL LIBRARIES
Chemical Libraries • Large diversity
– Lead search
• Specific
– For combinatorial chemistry
• Typical motives:
5
David C. Young - Computational Drug Design: A guide for computational and medicinal chemists. Wiley-Blackwell, New York, 2009, ISBN 978-0470126851
• Types:
• 1D, 2D, 3D
• What for:
• Similarity search – 2D and 3D similarity, motifs
• Predictions – pKa, logP/logD, charge distribution, logS, …
How to Store Chemical Compound
CCO Ethanol:
6
Chemical Information System
Operation Classical information system Chemical information system
Storage Name = ‘KSICHT’
Store text, numbers, pictures,…
Store chemical structures and information about them
Search Search $Name
Returns: ‘KSICHT’
Search CC(=O)C4CC3C2CC(C)C1=C(C) C(=O)CC(O)C1C2CCC3(C)C4
Returns:
Advanced searches
Find queries containing ‘chemist’
Returns: ‘chemist’ ‘taky chemist’ ‘bum’
Searc molecules containing:
Returns:
Questions How to became chemist?
Returns: ‘Solve KSICHT’
Calculate logP(o/w) of:
logP(o/w) = 2.62
7
1D Structure Representation
Stores molecule in string format
• CAS number – registered molecules only
• SMILES – simple format
• InChI – IUPAC format - more comprehensive
8
SMILES
• Chemical graph Atoms: organics with implicit H (B, C, N, O, P, S, X),
anorganics or isotopes - [Au], [2H], charge – Ti+4 or Ti++++
aromates – small print (cccccc – benzene)
Bonds: simple – without sign (CC – ethane, O – water)
double (O=O), triple (N#N – nitrogen), four ([Ga-]$[As+]),
ring breaking – numbers (%10)
(C1CCCCC1 – cyklohexane, c1cccc2c1cccc2 - naphtalene)
Branching: brackets (C(Cl)(Cl)Cl – chloroform)
Stereochemistry: “/” a “\”
(F/C=C/F – trans-difluoroethen)
chiral atoms @ (c.clockwise) or @@ (clockwise)
(N[C@@H](C)C(=O)O – L-alanin)
Simplified Molecular-Input Line-Entry System
9
SMARTS • SMiles ARbitrary Target Specification
– Selection of atomic regular expressions
• Atoms – symbol or atomic number [C],[#6],[C,c]
– aromates small print [c],
– Regular expression: * (any),A (aliphatics), a (aromate)
• Bonds – '-' (simple), '=' (double), '#' (triple), ':' (aromatic), '~' (any bond)
• Connectivity – X (different) a D (same) deskriptors - [CX4] carbon with 4 other atoms, [CD4] – quarter carbon
• Cyclicity - R descriptor - [CR] (aliphatic carbon atom in ring)
• Logical operators – (and= ; &) (or= ,) (not= !)
[N;H3;+][C;X4] (primary amine)
10
InChI • IUPAC International Chemical Identifier
InChI=1(S standard)/chemical formula/c(atom connections)/ h(hydrogens)/p(protons)/q(charges)/b(double bonds)/ t ev. m(tetrahedral) /s(stereochemistry)/i(isotopes)/f/r
• InChIKey – for quicker searches – 27 long strings – 14 characters long hash of
connectivity ~ 9 characters hash of other properties
CH3CH2OH ethanol
InChI=1/C2H6O/c1-2-3/h3H,2H2,1H3 InChI=1S/C2H6O/c1-2-3/h3H,2H2,1H3 (standard InChI)
L-ascorbic acid
InChI=1/C6H8O6/c7-1-2(8)5-3(9)4(10)6(11)12-5/h2,5,7-10H,1H2/t2-,5+/m0/s1 InChI=1S/C6H8O6/c7-1-2(8)5-3(9)4(10)6(11)12-5/h2,5,7-8,10-11H,1H2/t2-,5+/m0/s1 (standard InChI)
11
Ethanol: LFQSCWFLJHTTHZ-UHFFFAOYSA-N Ascorbic acid: CIWBSHSKHKDKBQ-JLAZNSOCSA-N
2D Structure Representation
• CHM – ChemDraw
• CDX – ChemDraw exchange file
12
3D Structure Representation
• MOL, SDF
• XYZ
• PDB
13
MOL/SDF
14
Row Section Description
1-3 header
1 Name of molecule („benzene“)
2 Additional information
3 Comment
4-17 Connection table
4 Sum of lines: 6 atoms, 6 bonds, ..., V2000 standard
5-10 atoms (1 row per atom): x, y, z, element, etc.
11-16 bonds (1 row per bond): 1. atom, 2. atom, type, etc.
17 properties
18 $$$$ End of molecule SDF can hold whole database
benzene
ACD/Labs0812062058
6 6 0 0 0 0 0 0 0 0 1 V2000
1.9050 -0.7932 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
1.9050 -2.1232 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
0.7531 -0.1282 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
0.7531 -2.7882 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
-0.3987 -0.7932 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
-0.3987 -2.1232 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
2 1 1 0 0 0 0
3 1 2 0 0 0 0
4 2 2 0 0 0 0
5 3 1 0 0 0 0
6 4 1 0 0 0 0
6 5 2 0 0 0 0
<Molecular Weight>499.61
M END
$$$$
• MDL molfile, structure-data file
XYZ
15
row Section Description
1-2 Header
1 Number of atoms
2 Comment
3-X Block of atoms
(1 row per atom):
5-10 element, x, y, z
More structures stored as multiple entries
5
methane molecule (in [[Ångström]]s)
C 0.000000 0.000000 0.000000
H 0.000000 0.000000 1.089000
H 1.026719 0.000000 -0.363000
H -0.513360 -0.889165 -0.363000
H -0.513360 0.889165 -0.363000
• Free format
• Easy storage
PDB - Protein DataBank file
16
HEADER EXTRACELLULAR MATRIX 22-JAN-98 1A3I
TITLE X-RAY CRYSTALLOGRAPHIC DETERMINATION OF A COLLAGEN-LIKE
TITLE 2 PEPTIDE WITH THE REPEATING SEQUENCE (PRO-PRO-GLY)
...
EXPDTA X-RAY DIFFRACTION
AUTHOR R.Z.KRAMER,L.VITAGLIANO,J.BELLA,R.BERISIO
AUTHOR 2 B.BRODSKY,A.ZAGARI,H.M.BERMAN
...
REMARK 350 BIOMOLECULE: 1
REMARK 350 APPLY THE FOLLOWING TO CHAINS: A, B, C
REMARK 350 BIOMT1 1 1.000000 0.000000 0.000000 0.00000
REMARK 350 BIOMT2 1 0.000000 1.000000 0.000000 0.00000
...
SEQRES 1 A 9 PRO PRO GLY PRO PRO GLY PRO PRO GLY
SEQRES 1 B 6 PRO PRO GLY PRO PRO GLY
SEQRES 1 C 6 PRO PRO GLY PRO PRO GLY
...
ATOM 1 N PRO A 1 8.316 21.206 21.530 1.00 17.44 N
ATOM 2 CA PRO A 1 7.608 20.729 20.336 1.00 17.44 C
ATOM 3 C PRO A 1 8.487 20.707 19.092 1.00 17.44 C
ATOM 4 O PRO A 1 9.466 21.457 19.005 1.00 17.44 O
...
HETATM 130 C ACY 401 3.682 22.541 11.236 1.00 21.19 C
HETATM 131 O ACY 401 2.807 23.097 10.553 1.00 21.19 O
HETATM 132 OXT ACY 401 4.306 23.101 12.291 1.00 21.19 O
...
SIMILARITY SEARCHES
17
Similarity Search
Search for similar structures to lead –
• 2D Sub-Structures
• 3D Sub-Structures
• 3D Conformational flexibility
18
N
NH2
HO
H
N
N(CH3)2
H
S
HN
O O
H3C
5-Hydroxytryptamine (5-HT)Serotonin (a natural neurotransmitter
synthesized in certain neurons in the CNS)
Sumatriptan (Imitrex)
Used to treat migrain headaches
known to be a 5-HT1 agonist
Similarity to Natural Ligand
19
2D Sub-Structure
• Functional groups
• Connectivity
Example. Halogen on aromatic ring together with carboxylic group
[
F
,
C
l
,
B
r
,
I
]
O
O
N
O
O
Cl
O
O
Cl
N
N
N
O
O
F
F
O
F
O
O
N
I
O
N
20
3D Sub-Structure
• Distances in 3D play more important role
• Bioisostericity – similar groups in 3D
• Usually storage only lowest energy structure
C
(
u
)
O
(
s
1
)
O
(
s
1
)
A
A
[
O
,
S
]
O
3.6 - 4.6 Å
3.3 - 4.3 Å
6.8 - 7.8 Å
360300240180120600
0
1
2
3
4
5
6
Dihedral angle
Ste
ric
Ene
rgy
(kca
l/mol
)
21
Bioisostericity
Young, D.C. Computational Drug Design. Wiley, 2009. 22
Bioisostericity II
Young, D.C. Computational Drug Design. Wiley, 2009. 23
How to calculate molecular similarity
)&()()(
)&(),(
yxByBxB
yxByxT
n
i
ii yxyxE1
2),(
Sequences/vectors of bits, or numeric values that can be compared by distance functions, similarity metrics .
E= Euclidean distance T = Tanimoto index (distance in XYZ) (similarity of bit vectors)
Quantitative assessment of similarity/dissimilarity of structures
need a numerically tractable form molecular descriptors, fingerprints, structural keys
Paradox of Similarity
Aminogenistein (x cystic fibrosis)
7-Hydroxy-2-(4-nitro-phenyl)-chromen-4-one
Pargyline (x hypertensis)
N-benzyl-N,1-dimethyl-2-propynylamine
It is not that simple
25
PHARMACOPHORE
Pharmacophore
• structural motive
(geometrical
restrictions on
functional groups)
important for
biological - mostly
pharmacological
activity
• Analogous to
chromophore
Bojarski, Curr. Top. Med. Chem. 2006, 6, 2005.
Pharmacophore-based Drug Design
Experimental
activity
Search for active
compounds in
chemical library
Preparation of
pharmacophore
Activity testing
Preparation of hits
See also John Van Drie’s
http://pharmacophore.org
Search for pharmacophore • Prerequisition:
– All active compounds in training set bind to common active site
• Pharmacophore query preparation
– Identification of characteristic functional groups (hydrogen bonds acceptors and donors, lipophilic groups, charge distribution)
• Pharmacophore search
– Search for pharmacophore query in all molecules in DB
– Scaffold hopping possible
– No need for receptor structure (can be useful for query generation)
Pharmacophore for HIV protease
Geometric set of functional groups necessary for HIV protease activity based on active site
O
O
Asp25
O
Gly27
H
NCH
CH3
CH2
H3C
Ile50
Donor
Donor
Acceptor
Hydrophobic
6.0 Å
6.9 Å
5.2 Å
6.3 Å
10.4 Å
Pharmacophore Query Identification
1) active site analysis
O
O
Asp25
O
Gly27
H
NCH
CH3
CH2
H3C
Ile50
O
O
Asp25
O
Gly27
H
NCH
CH3
CH2
H3C
Ile50
9.6 Å
6.9
8.8 Å
6.3Å
12.2 Å
Donor
Acceptoror Anion
Hydrophobic
Acceptor
Pharmacophore Query Identification
2) functional group definition
3) distance setup
O
O
Asp25
O
Gly27
H
NCH
CH3
CH2
H3C
Ile50
Donor
Donor
Acceptor
Hydrophobic
O
O
Asp25
O
Gly27
H
NCH
CH3
CH2
H3C
Ile50
Donor
Donor
Acceptor
Hydrophobic
9.6 Å
6.0 Å
6.9
6.9 Å
8.8 Å 5.2 Å
6.3Å
6.3 Å
12.2 Å
10.4 Å
Donor
Acceptoror Anion
Hydrophobic
Acceptor
Final Pharmacophore Query
Donor
Donor
Acceptor
Hydrophobic
6.0 Å
6.9 Å
5.2 Å
6.3 Å
10.4 Å
Last step:
Search in database of small molecules for
conformations identifiable with the query