discrimination of near-native structures by clustering docked conformations and the selection of the...

Discrimination of near-native structures by clustering docked

conformations and the selection of the optimal radius

D. Kozakov1, K. H. Clodfelter2, C. J. Camacho1,3,

and S. Vajda1,2

1Department of Biomedical Engineering

2Program in Bioinformatics, Boston University

3Current address: University of Pittsburgh

Why do we need clustering?

● Rigid body docking methods sample a large set of conformations which uniformly cover the energy landscape

● Energy scoring functions are not enough to discriminate between near native structures

● unbound crystal structure conformations are not the same as when in solvent– difficulty in estimating the solvation effects

● Distribution of sampled conformations in such cases has more information than single conformations alone

What clustering means for docking?

● Low energy conformations below a given threshold will cluster

● Clusters are representative of the energy minima

● The cluster in the native funnel should be the most populated

How to analyze clustering propertiesof distribution?

How to describe clustering property?

● Δ characterize intra- to inter- cluster elements ratio

● Δ=1 Data set well separated

● Δ=0 No clustering

● Δ>Δn Distribution carries cluster size information

● Optimal Radius (OR): First minimum with the largest Δ

Clustering Procedure● Element with maximum number of

neighbors is chosen. It is called the cluster centre.

● All the elements within the optimal radius are included in the cluster.

● Exclude these elements and repeat until all points are exhausted.

● Redistribute the elements to their closest cluster centre.

● Rank the clusters based on size.

● Clusters with a size less than 10 are ignored.

Application to Docking

● Rigid body methods uniformly sample the placement of the ligand around a fixed receptor

● Best conformations are chosen based on shape complementarities and a simple energy scoring

● The total set of conformations considered is 2000-20,000 in size

● We choose N of the lowest energy desolvation (ACP) conformations and 3N of the lowest electrostatic energy conformations (N = 50-500)

● A distance of 6-9 Å is the characteristic size of attractors from these potentials

How does docking histograms look like?

•OR measure – property of sampled energy landscape

Results● Tested on the benchmark set of

protein complexes

● Hit is rank of first best cluster with center within a distance of 10 Å RMSD from native bound conformation

● “Biggest cluster = native funnel” is supported

● Clusters – starting points for further refinement

Successful

Prediction

Ranking based on

Free Energy Clustering

Top 1 5% 38%

Top 10 14% 74%

Top 30 19% 93%

Top 50 31% 100%

Fixed radius prediction compared to optimal

+ -Complex 9 Å rank OR Rank Δ

2PCC 42 48 0.745

1MEL 2 1 0.7

1ATN 2 1 0.617

1STF 1 1 0.615

1UDI 10 1 0.587

1AVW 1 1 0.587

2TEC 1 1 0.563

2BTF 7 3 0.561

2PTC 3 3 0.52

2KAI 25 8 0.514

1QFU 39 11 0.492

1UGH 5 1 0.489

1BRS 15 16 0.441

1MDA 13 12 0.431

2SIC 2 1 0.423

1BQL 6 3 0.406

Complex 9 Å rank OR Rank Δ

1AHW 1 2 0.389

1CHO 1 1 0.384

1WQ1 1 3 0.383

1IAI 15 22 0.381

1TAB 11 8 0.364

4HTC 3 1 0.346

1NCA 1 2 0.343

1NMB 10 6 0.311

1BVK 4 11 0.304

2SNI 11 7 0.302

1CSE 9 2 0.286

1MLC 14 2 0.243

1SPB 1 1 0.208

1DQJ 26 37 0.206

1FBI 17 32 0.138

2JEL 6 13 0.108

1ACB 3 1 0.102

Can we see the clusters?

Acknowledgments

● Sandor Vajda● Carlos Camacho

discrimination of near-native structures by clustering docked conformations and the selection of the...

Documents

low energy conformations

docked conformations

single conformations

energy minimathe cluster

large set of conformations

best cluster

energy landscapeenergy

clustering property