ugm 2007 miklós vargyas*, judit vaskó-szedlár whats new in librarymcs

19
UGM 2007 Miklós Vargyas*, Judit Vaskó-Szedlár What’s new in LibraryMCS

Upload: jose-gonzales

Post on 26-Mar-2015

220 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: UGM 2007 Miklós Vargyas*, Judit Vaskó-Szedlár Whats new in LibraryMCS

UGM 2007

Miklós Vargyas*, Judit Vaskó-Szedlár

What’s new in LibraryMCS

Page 2: UGM 2007 Miklós Vargyas*, Judit Vaskó-Szedlár Whats new in LibraryMCS

UGM 2007

Talk Overview

• Introduction to LibraryMCS – Concepts, motivation– Main features– GUI

• 2006 Roadmap accomplishment

• New features in detail– Performance– Iterative clustering– Additive clustering

• Current roadmap and wishlist

Page 3: UGM 2007 Miklós Vargyas*, Judit Vaskó-Szedlár Whats new in LibraryMCS

UGM 2007

Introduction – Concept of MCS

Maximum Common Substructure

Looks simple, yet hard to compute!

Page 4: UGM 2007 Miklós Vargyas*, Judit Vaskó-Szedlár Whats new in LibraryMCS

UGM 2007

Introduction – Motivations

• MCS based clustering– More intuitive than similarity based– Closer to chemists golden standard

• Initial requirements– Focused set analysis

• screens: 2000 – 10000 structures• lead optimization: 3000 – 5000 structures

– Should be hierarchical (outliers)– Ultimate goal: cluster 5000 compounds in 5 seconds

• Further application areas– Library profiling– Compound acquisition

Page 5: UGM 2007 Miklós Vargyas*, Judit Vaskó-Szedlár Whats new in LibraryMCS

UGM 2007

Introduction – Main features

• MCS based hierarchical clustering

• Flexible search options

• No theoretical size limitation

• Fast operation

• Filtering by chemical properties

• Cluster statistics

• Hierarchy browser

Page 6: UGM 2007 Miklós Vargyas*, Judit Vaskó-Szedlár Whats new in LibraryMCS

UGM 2007

GUI – Dendogram view

• Interactive navigation, selection

• Zoom & move

Page 7: UGM 2007 Miklós Vargyas*, Judit Vaskó-Szedlár Whats new in LibraryMCS

UGM 2007

GUI – Molecule view

Page 8: UGM 2007 Miklós Vargyas*, Judit Vaskó-Szedlár Whats new in LibraryMCS

UGM 2007

GUI – SAR-table

• Cluster statistics, structure filtering by properties

Page 9: UGM 2007 Miklós Vargyas*, Judit Vaskó-Szedlár Whats new in LibraryMCS

UGM 2007

GUI – R-table

Page 10: UGM 2007 Miklós Vargyas*, Judit Vaskó-Szedlár Whats new in LibraryMCS

UGM 2007

2006 Roadmap accomplishment

...

Page 11: UGM 2007 Miklós Vargyas*, Judit Vaskó-Szedlár Whats new in LibraryMCS

UGM 2007

Preserving rings

Page 12: UGM 2007 Miklós Vargyas*, Judit Vaskó-Szedlár Whats new in LibraryMCS

UGM 2007

Iterative clustering

• Outliers– Singletons– Large blobby clusters

• Aim – Minimise number of singletons

– Maintain high quality

Page 13: UGM 2007 Miklós Vargyas*, Judit Vaskó-Szedlár Whats new in LibraryMCS

UGM 2007

Additive clustering

Corporatedatabase

Pre-clustering, stored

new set

registration

Cluster diversity enrichment

Page 14: UGM 2007 Miklós Vargyas*, Judit Vaskó-Szedlár Whats new in LibraryMCS

UGM 2007

Performance

• Depends on various factors– average structure size– diversity– minimal required MCS size– atom/bond constraints

0

2

4

6

8

10

12

14

16

CombiLib MixedLib Maybridge

Normal

Fast

Fastest

Page 15: UGM 2007 Miklós Vargyas*, Judit Vaskó-Szedlár Whats new in LibraryMCS

UGM 2007

Performance

• Scales linearly

-500

0

500

1000

1500

2000

2500

3000

3500

4000

0 5000 10000 15000 20000 25000 30000 35000

Structure count

Ru

nn

ing

tim

e (

sec)

2006

2007

Linear (2007)

Page 16: UGM 2007 Miklós Vargyas*, Judit Vaskó-Szedlár Whats new in LibraryMCS

UGM 2007

Performance

• Maximum speed achieved:1 000 structures/s

0

2000

4000

6000

8000

10000

12000

14000

100 1000 10000 20000 40000 100000

library size

run

tim

e (

s)

Ward 512

Jarp 512

LibMCS 6

• Memory requirements– scalable

– 50 000 structures occupy <100MB

Page 17: UGM 2007 Miklós Vargyas*, Judit Vaskó-Szedlár Whats new in LibraryMCS

UGM 2007

In the pipeline

• Multi-stage clustering

• Additive clustering

• Disconnected MCS (Maximum Overlapping Set)

• Enhanced R-group decomposition

• Markush export

• Further clustering criteria

– Ring count

• Performance tuning

– Easier memory control of memory usage

Page 18: UGM 2007 Miklós Vargyas*, Judit Vaskó-Szedlár Whats new in LibraryMCS

UGM 2007

Current roadmap and wishlist

• Simpler table view

• IJC integration

• Multi-cluster members

• Clustering million compound libraries

• Integrate Chemical Terms

• Stereo care MCS

Page 19: UGM 2007 Miklós Vargyas*, Judit Vaskó-Szedlár Whats new in LibraryMCS

UGM 2007

Acknowledgements

• Co-workers– Péter Vadász– Judit Vaskó-Szedlár

• Ideas– Ferenc Csizmadia, Szabolcs Csepregi,

Ákos Papp, György Pirok

• Partners, early adaptors