intervet chemicals directory (icd) - a framework combining accelrys pipeline pilot and symyx...
Post on 20-Jun-2015
123 Views
Preview:
DESCRIPTION
TRANSCRIPT
Accelrys European User Group Meeting, Barcelona10/26/2010
Intervet Chemicals Directory (ICD)A Framework Combining Accelrys Pipeline Pilot and Symyx Isentris
Frank Oellien
210/26/2010SP Intervet Chemicals Directory (ICD)
Outline
• Motivation ICD project (historical review)
• Technical Implementation (2003)
• ICD Today (Enhancements in the last years)
• Technical limitations of the Isentris approach
• Solution: Combining Symyx Isentris & Accelrys PP
– Structure Registration, Synchronization
– Database Cleaning
– Property-Calculations
310/26/2010SP Intervet Chemicals Directory (ICD)
Motivation
• Start of the ICD project 2003• Company was still young• BioChemInformatics group (more precisely the cheminformatics
branch) started its work on regular basis– Ligand- and Structure-based Virtual Screening (LO and Hit2Lead projects)– Property and Descriptor Calculations– QSAR– Substructure- and Similarity Searches
→ Access to many in-house data sources especially structures required→ Many exchange formats used (including Excel and SD files)→ Many diverse tools and applications used
410/26/2010SP Intervet Chemicals Directory (ICD)
Pre-ICD Time (before Q2 2003)
510/26/2010SP Intervet Chemicals Directory (ICD)
The Idea – A Central Data Source
SDSD SD
SDSDSD
In-house Databases Supplier Data
Other Data Sources
BCI Applications
MedicinalChemists
CompLog
ICD
610/26/2010SP Intervet Chemicals Directory (ICD)
Requirements
• Standard data source for all BCI tasks• Merged data source including in-house structures, supplier structures
and other data sources• Dynamically updated• Structure database with unique structure identifier• Standardized and Normalized data (including chemical normalization)• Extendable system that can store other BCI-relevant information
(e.g. virtual screening data)
Ask other Scientists in the Drug Discovery department• Storing supplier catalogues and other supplier information • Data source for compound ordering• Accessible by other scientists (especially medicinal chemists)• Storage of physico-chemical properties for research projects
710/26/2010SP Intervet Chemicals Directory (ICD)
Implementation: Reasons for Isentris (2003)
• Not many systems available in 2003 (Auspyx, Acorrd, Isentris)• Isentris used many technolgies that were already available in-house
(MDL Direct, Oracle)• Chemical Normalization available: Cheshire• Advanced J2EE architecture and API that allows a good
customization and extension
• CoRe: already an existing project based on Isentris– Intervet was an early adopter of Isentris– No additional software costs– Synergy effects (e.g. chemical business rules)
810/26/2010SP Intervet Chemicals Directory (ICD)
Implementation Overview
CACTVS (Linux) Java applications (Windows)
• supplier catalogs
• TORE Updates (in-house)
File syntax normalisation
of SD files
Generation of salt information
and Parent-Hash codes
chemical normalisation registration
prepared
SD Files
SD Files
ICD
ADME data
(phys-chem properties)
Oracle SQLLoader
MDL Isentris (Client-Server)
Chemical Rules(CheckAndFix_Main.cct)
Java application
910/26/2010SP Intervet Chemicals Directory (ICD)
Implementation: SD File Syntax Standardisation
• Based on CACTVS application (by Xemistry)• SD file can have different inputs• 2 generic scripts (supplier-specific, in-house specific) to standardize
the format of the input SD files and supplier-specific configuration files• SDF fields for supplier-related files:
SupplierName, OrderNo, CatalogName, CatalogType, CatalogRelease, Confidential, CompoundName, IsSalt, Salt, Quantity, Purity
• SDF fields for in-house data:AHNO, CompoundName, IsSalt, Salt
• Calculation of structural hash codes (parent structure hash code)Insensitive hash codes: isotope, salt, tautomer, stereochemistry
• Automatically knowledge-based identification of salts→ 174 different salts can be determined
1010/26/2010SP Intervet Chemicals Directory (ICD)
Implementation: Chemical Normalisation
• Based on Cheshire (part of the Isentris framework)• JavaScript clone• Valence checks, Ion2kov, nitro group, transition metals, queries,
geometries, stereo chemistry,…• 99 rules
– 45 correction functions– 29 warnings functions– 25 error functions
• Used by CoRe and ICD applications• Import: molfile string• Output: molfile string and message string
→ Category; No of changes???list of descriptions
1110/26/2010SP Intervet Chemicals Directory (ICD)
Implementation: Registration
• Based on Symyx Isentris Java Client (now Accelrys Isentris)• Using Isentris Data Sources (Data Source Factory)• 3 Java applications (in-house structures, supplier, virtual screening)→ 31 java classes, ~9.500 lines code
• Run types: command line, GUI, batch mode• Chem. Normalisation, duplicate check, registration logic
******************************************************** * ICD Supplier Registration* Version null* Frank Oellien, Intervet Innovation GmbH* *******************************************************
1:10:21 PM INFO: Chemical Normalization status:304334 records without changes4685 records fixed11 records fixed but still have warnings200 records with warnings2 records with errors
1:10:21 PM INFO: Chemical Registration status:1:10:21 PM INFO: New supplier has been registered.309230 records to register309222 records passed registration8 records failed registration180284 new structues registered128938 structues already found in the DB*******************************************************1:10:21 PM INFO: Closing Cheshire environment...1:10:22 PM INFO: Releasing the ICD datasource resources ...1:10:22 PM INFO: Closing the ICD DataSourceFactory...1:10:22 PM INFO: Logout... 1:10:22 PM INFO: All resources released.
1210/26/2010SP Intervet Chemicals Directory (ICD)
ICD Today - Datasheet
• ~ 11,500,000 structures• 237 different catalogues
(including screening libraries, focused data sets)
• 60 suppliers• A broad range of
standard pysico-chemical properties
• Intervet’s in-housedatabase
• Specific Intervet data sets
• References to external sources (PubChem)
1310/26/2010SP Intervet Chemicals Directory (ICD)
ICD Today – Change of Relevance
• Still the main data source for the BCI group, although almost all other BCI technologies have changed in the meantime
• Moreover, has become a key technology platform for the whole Drug Discovery process– Almost all compound logistic activities are based on the ICD
(Applications for compound ordering)– Stores specific essential information for CompLog– Important database for Hit2Lead and LO projects
(contains decision-critical properties)– Has become the most important structure-database for medicinal chemists
• Isentris upgrade to 3.1 → re-design of the ICD Isentris part necessary• New demands by BCI and others had to be implemented→ could not be realized with former setup because of limitations
• Solution: Combination with Pipeline Pilot
1410/26/2010SP Intervet Chemicals Directory (ICD)
Limitations of the original Isentris Setup
From the Beginning• Starting with Isentris 1.1, early adopters• Hard to implement: large, over-designed J2EE API, no developer
guides, only some small code snippets• Limited and complicated functions
– e.g. no support for very large structure files• Re-design of applications was necessary, because of Isentris updates• No automation, everything is done in user context!
Regarding recent Demands• Missing Automation was still most critical issue:
– Synchronisation– Adding non-structural data
• Elaborate database cleaning mechanisms
1510/26/2010SP Intervet Chemicals Directory (ICD)
Registration of Supplier Cataloges
CACTVS (Linux) Java applications (Windows)
structural normalisation
of SD files
Generation of salt information
and Parent-Hash codes
chemical normalisation registration
prepared
SD Files
SD Files
MDL Isentris (Client-Server)
Chemical Rules(CheckAndFix_Main.cct)
ICD
1610/26/2010SP Intervet Chemicals Directory (ICD)
Registration of in-house Structures by PP I
structural normalisation
of SD files
Generation of salt information
and Parent-Hash codes
chemical normalisation registration
ICD
Chemical Rules(CheckAndFix_Main.cct)
in-housedatabase
Synchronisation byPipeline Pilot (Linux)
CACTVS called by PP Cheshire PP Component
1710/26/2010SP Intervet Chemicals Directory (ICD)
Registration of in-house Structures by PP IIRetrieve structures from database
Call CACTVS application
Chemical Normalisation & Registration
1810/26/2010SP Intervet Chemicals Directory (ICD)
Cheshire PP Component (Java)
• Implemented as PP Java component• Based on Cheshire Java API• Calls Cheshire core library (shared object files called by JNI)
1910/26/2010SP Intervet Chemicals Directory (ICD)
Cheshire PP Component (Java)
• Implemented as PP Java component• Based on Cheshire Java API• Calls Cheshire core library (shared object files called by JNI)
2010/26/2010SP Intervet Chemicals Directory (ICD)
Cheshire PP Component (Java)
2110/26/2010SP Intervet Chemicals Directory (ICD)
Importing physico-chemical Properties I
ADME data
(phys-chem properties)
Oracle SQLLoader
Java application ICD
2210/26/2010SP Intervet Chemicals Directory (ICD)
Importing physico-chemical Properties I
ADME data
(phys-chem properties)
Oracle SQLLoader
Java application
Managed by Pipeline Pilot (Linux)
ICD
Retrieval ofstructures without
properties
Import properties
Externalapplication 1(standardize)
Internal PPcomponents(descriptors)
Externalapplication 2(descriptos)
Externalapplication 3(descriptos)
Externalapplication 4(descriptos)
2310/26/2010SP Intervet Chemicals Directory (ICD)
Importing physico-chemical Properties II
2410/26/2010SP Intervet Chemicals Directory (ICD)
Database Maintenance
2510/26/2010SP Intervet Chemicals Directory (ICD)
Isentris PP Components @ SP Intervet
• Isentris Cheshire PP• Converter:
– Chime string to Molecule– Chime string to CTAB– Molecule to Chime string– CTAB to Chime string
2610/26/2010SP Intervet Chemicals Directory (ICD)
Acknowledgement
Information Management• Werner Schlüter• Thomas Fischer
BioChemInformatics• Richard Marhöfer• Andreas Krasky• (Jörg Cramer)• Jörg Schröder• Paul M. Selzer
Thank you
top related