interactive datamining of large-scale screening datasets
DESCRIPTION
Interactive Datamining of Large-Scale Screening Datasets. Frank Oellien, Wolf D. Ihlenfeldt Computer-Chemie-Centrum Universit y Erlangen-Nuremberg. Klaus Engel, Thomas Ertl Visualization and Interactive Systems Group Universit y Stuttgart. Overview. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Interactive Datamining of Large-Scale Screening Datasets](https://reader031.vdocuments.mx/reader031/viewer/2022013012/56814fe1550346895dbda906/html5/thumbnails/1.jpg)
© Oellien, Ihlenfeldt, Engel, ErtlC3 MMWS 2002
Interactive Datamining of Large-Scale Screening Datasets
Klaus Engel, Thomas ErtlVisualization and Interactive Systems Group University Stuttgart
Frank Oellien, Wolf D. IhlenfeldtComputer-Chemie-Centrum University Erlangen-Nuremberg
![Page 2: Interactive Datamining of Large-Scale Screening Datasets](https://reader031.vdocuments.mx/reader031/viewer/2022013012/56814fe1550346895dbda906/html5/thumbnails/2.jpg)
© Oellien, Ihlenfeldt, Engel, ErtlC3 MMWS 2002
Overview
Multi-variate and multi-dimensional datasets
• Motivation
• Information Visualization Techniques
• Examples (ChemCodes Inc., NCI)
• Demo
![Page 3: Interactive Datamining of Large-Scale Screening Datasets](https://reader031.vdocuments.mx/reader031/viewer/2022013012/56814fe1550346895dbda906/html5/thumbnails/3.jpg)
© Oellien, Ihlenfeldt, Engel, ErtlC3 MMWS 2002
Overview
Multi-variate and multi-dimensional datasets
• Motivation
• Information Visualization Techniques
• Examples (ChemCodes Inc., NCI)
• Demo
![Page 4: Interactive Datamining of Large-Scale Screening Datasets](https://reader031.vdocuments.mx/reader031/viewer/2022013012/56814fe1550346895dbda906/html5/thumbnails/4.jpg)
© Oellien, Ihlenfeldt, Engel, ErtlC3 MMWS 2002
Chemical data
0
2000000
4000000
6000000
8000000
10000000
12000000
14000000
16000000
18000000
Merck Katalog
Synopsys PG
ACX
NCI DTP
ChemInform
Spresi
Beilstein
CAS
Current datasets
![Page 5: Interactive Datamining of Large-Scale Screening Datasets](https://reader031.vdocuments.mx/reader031/viewer/2022013012/56814fe1550346895dbda906/html5/thumbnails/5.jpg)
© Oellien, Ihlenfeldt, Engel, ErtlC3 MMWS 2002
Multi-Variate and Multi-Dimensional Numeric Datasets Today
Change in chemical synthesis technology
• new technologies (HTS, combinatorial synthesis) experiments generate terabytes of data per year
• development of data mining and visualization tools could not keep pace
• most critical bottleneck in R&D today !
tools for interactive mining and information visualization are needed
![Page 6: Interactive Datamining of Large-Scale Screening Datasets](https://reader031.vdocuments.mx/reader031/viewer/2022013012/56814fe1550346895dbda906/html5/thumbnails/6.jpg)
© Oellien, Ihlenfeldt, Engel, ErtlC3 MMWS 2002
Tools for Interactive Visualization of Multi-Variate and Multi-Dimensional Data
Standard applications• barchart, 2D and pseudo 3D
scatter plots, molecular spreadsheets• limited to small subsets• platform-dependent
Our goal: applications that are• simple to use• allow straightforward interpretation of results• generalized access to tabular numeric data• platform-independent
![Page 7: Interactive Datamining of Large-Scale Screening Datasets](https://reader031.vdocuments.mx/reader031/viewer/2022013012/56814fe1550346895dbda906/html5/thumbnails/7.jpg)
© Oellien, Ihlenfeldt, Engel, ErtlC3 MMWS 2002
Overview
Multi-variate and multi-dimensional datasets
• Motivation
• Information Visualization Techniques
• Examples (ChemCodes Inc., NCI)
• Demo
![Page 8: Interactive Datamining of Large-Scale Screening Datasets](https://reader031.vdocuments.mx/reader031/viewer/2022013012/56814fe1550346895dbda906/html5/thumbnails/8.jpg)
© Oellien, Ihlenfeldt, Engel, ErtlC3 MMWS 2002
3D Tools for Interactive Information Visualization
Information Visualization Applications that uses 3D capabilities of modern clients
• Glyph-based InfVis approaches
• Volume-based InfVis approaches
![Page 9: Interactive Datamining of Large-Scale Screening Datasets](https://reader031.vdocuments.mx/reader031/viewer/2022013012/56814fe1550346895dbda906/html5/thumbnails/9.jpg)
© Oellien, Ihlenfeldt, Engel, ErtlC3 MMWS 2002
Glyph-based InfVis Tools
• 3 orthogonal axes
• color
• shape
• size
• transparency
• surface effects
• animation
• up to ~100 Glyphs
![Page 10: Interactive Datamining of Large-Scale Screening Datasets](https://reader031.vdocuments.mx/reader031/viewer/2022013012/56814fe1550346895dbda906/html5/thumbnails/10.jpg)
© Oellien, Ihlenfeldt, Engel, ErtlC3 MMWS 2002
Java/Java3D InfVis Applet
Tool Panel(filters, selection
tools, details)
Java3DCanvas
ControlPanel
![Page 11: Interactive Datamining of Large-Scale Screening Datasets](https://reader031.vdocuments.mx/reader031/viewer/2022013012/56814fe1550346895dbda906/html5/thumbnails/11.jpg)
© Oellien, Ihlenfeldt, Engel, ErtlC3 MMWS 2002
Java/Java3D InfVis Applet3D Render Panel
3D Barchart3D Glyphs
![Page 12: Interactive Datamining of Large-Scale Screening Datasets](https://reader031.vdocuments.mx/reader031/viewer/2022013012/56814fe1550346895dbda906/html5/thumbnails/12.jpg)
© Oellien, Ihlenfeldt, Engel, ErtlC3 MMWS 2002
Java/Java3D InfVis Applet3D Tool Panel
Dynamic Filter Tools
Selection Tools
Detail Tools
![Page 13: Interactive Datamining of Large-Scale Screening Datasets](https://reader031.vdocuments.mx/reader031/viewer/2022013012/56814fe1550346895dbda906/html5/thumbnails/13.jpg)
© Oellien, Ihlenfeldt, Engel, ErtlC3 MMWS 2002
Java/Java3D InfVis Applet3D Control Panel
![Page 14: Interactive Datamining of Large-Scale Screening Datasets](https://reader031.vdocuments.mx/reader031/viewer/2022013012/56814fe1550346895dbda906/html5/thumbnails/14.jpg)
© Oellien, Ihlenfeldt, Engel, ErtlC3 MMWS 2002
Advantages of Volume-based InfVis Tools
Databases with millions of data points – Glyph-based InfVis approaches
• produce millions of geometricprimitives
• interactive visualization not possible
– Volume-based InfVis approaches • can handle large number of
data points• interactive visualization using
low-cost graphics hardware is possible
![Page 15: Interactive Datamining of Large-Scale Screening Datasets](https://reader031.vdocuments.mx/reader031/viewer/2022013012/56814fe1550346895dbda906/html5/thumbnails/15.jpg)
© Oellien, Ihlenfeldt, Engel, ErtlC3 MMWS 2002
Overview
Multi-variate and multi-dimensional datasets
• Motivation
• Information Visualization Techniques
• Examples (ChemCodes Inc., NCI)
• Demo
![Page 16: Interactive Datamining of Large-Scale Screening Datasets](https://reader031.vdocuments.mx/reader031/viewer/2022013012/56814fe1550346895dbda906/html5/thumbnails/16.jpg)
© Oellien, Ihlenfeldt, Engel, ErtlC3 MMWS 2002
ChemCodes Reaction Database
• 100 most important FGs ~75% chemistry• 100 standard reactions• Limits of standard reactions• Functional Group Compatibility• Generating Rules
Goal: Analysis of the reaction space
![Page 17: Interactive Datamining of Large-Scale Screening Datasets](https://reader031.vdocuments.mx/reader031/viewer/2022013012/56814fe1550346895dbda906/html5/thumbnails/17.jpg)
© Oellien, Ihlenfeldt, Engel, ErtlC3 MMWS 2002
ChemCodes - Reaction Optimization I
• Goal: Reaction Optimization: > 95% Yield
• 7 Dimensions:reagent, solvent, time, temperature,stoichiometry,reagent order,FG-compatibility
![Page 18: Interactive Datamining of Large-Scale Screening Datasets](https://reader031.vdocuments.mx/reader031/viewer/2022013012/56814fe1550346895dbda906/html5/thumbnails/18.jpg)
© Oellien, Ihlenfeldt, Engel, ErtlC3 MMWS 2002
ChemCodes - Reaction Optimization II
![Page 19: Interactive Datamining of Large-Scale Screening Datasets](https://reader031.vdocuments.mx/reader031/viewer/2022013012/56814fe1550346895dbda906/html5/thumbnails/19.jpg)
© Oellien, Ihlenfeldt, Engel, ErtlC3 MMWS 2002
FunctionalGroupCompatibilityCheck
ChemCodes - Reaction Planning
N
H H
H
O
![Page 20: Interactive Datamining of Large-Scale Screening Datasets](https://reader031.vdocuments.mx/reader031/viewer/2022013012/56814fe1550346895dbda906/html5/thumbnails/20.jpg)
© Oellien, Ihlenfeldt, Engel, ErtlC3 MMWS 2002
Example 2: NCI Anti-tumor / Anti-viral Database
• Initiated in April 1990 (modified 1994)• ~ 250.000 compounds• ~ 30.000 with anti-tumor screening data
Enhanced NCI Database Browser• > 30 different molecular properties• up to 23 3D conformers per compound
![Page 21: Interactive Datamining of Large-Scale Screening Datasets](https://reader031.vdocuments.mx/reader031/viewer/2022013012/56814fe1550346895dbda906/html5/thumbnails/21.jpg)
© Oellien, Ihlenfeldt, Engel, ErtlC3 MMWS 2002
Lead Compound Discovery II
![Page 22: Interactive Datamining of Large-Scale Screening Datasets](https://reader031.vdocuments.mx/reader031/viewer/2022013012/56814fe1550346895dbda906/html5/thumbnails/22.jpg)
© Oellien, Ihlenfeldt, Engel, ErtlC3 MMWS 2002
Lead Compound Discovery II
![Page 23: Interactive Datamining of Large-Scale Screening Datasets](https://reader031.vdocuments.mx/reader031/viewer/2022013012/56814fe1550346895dbda906/html5/thumbnails/23.jpg)
© Oellien, Ihlenfeldt, Engel, ErtlC3 MMWS 2002
Overview
Multi-variate and multi-dimensional datasets
• Motivation
• Information Visualization Techniques
• Examples (ChemCodes Inc., NCI)
• Demo
![Page 24: Interactive Datamining of Large-Scale Screening Datasets](https://reader031.vdocuments.mx/reader031/viewer/2022013012/56814fe1550346895dbda906/html5/thumbnails/24.jpg)
© Oellien, Ihlenfeldt, Engel, ErtlC3 MMWS 2002
Acknowledgment
• Prof. Johann GasteigerComputer-Chemie-CentrumUniversity of Erlangen-Nuremberg
• Prof. Thomas Ertl, Dipl. Inf. Klaus Engel Visualization and interactive SystemsUniversity of Stuttgart
• Dr. Patrick Kiser, Dr. Gary Eichenbaum ChemCodes Inc.
• Marc NicklausLaboratory of Medicinal ChemistryNCI, NIH
• Deutsche Forschungsgemeinschaft