small-molecules in big data - chemaxon · small-molecules in big data proceed with caution! tanya...
TRANSCRIPT
B D 2 K - L I N C S
D AT A C O O R D I N AT I O N A N D
I N T E G R AT I O N C E N T E R
Small-Molecules in Big DataProceed with caution!
Tanya T. Kelley
BD2K LINCS Data Coordination and Integration Center
B D 2 K - L I N C S
D AT A C O O R D I N AT I O N A N D
I N T E G R AT I O N C E N T E R
• NIH common fund project established in 2013
• Currently in Phase II
2
Library of Integrated Network-Based Cellular Signatures
B D 2 K - L I N C S
D AT A C O O R D I N AT I O N A N D
I N T E G R AT I O N C E N T E R3
Library of Integrated Network-Based Cellular Signatures
Data and Signature Generation
Centers (DSGC)
B D 2 K - L I N C S
D AT A C O O R D I N AT I O N A N D
I N T E G R AT I O N C E N T E R4
Library of Integrated Network-Based Cellular Signatures
Data Coordination and
Integration Center (DCIC)
B D 2 K - L I N C S
D AT A C O O R D I N AT I O N A N D
I N T E G R AT I O N C E N T E R5
LINCS generates diverse multidimensional signatures
B D 2 K - L I N C S
D AT A C O O R D I N AT I O N A N D
I N T E G R AT I O N C E N T E R6
Metadata standards - foundation for data integration
B D 2 K - L I N C S
D AT A C O O R D I N AT I O N A N D
I N T E G R AT I O N C E N T E R7
Metadata standards - foundation for data integration
B D 2 K - L I N C S
D AT A C O O R D I N AT I O N A N D
I N T E G R AT I O N C E N T E R8
LINCS Data Release and Publication Process
B D 2 K - L I N C S
D AT A C O O R D I N AT I O N A N D
I N T E G R AT I O N C E N T E R9
LINCS Data Release and Publication Process
B D 2 K - L I N C S
D AT A C O O R D I N AT I O N A N D
I N T E G R AT I O N C E N T E R10
LINCS Data Release and Publication Process
B D 2 K - L I N C S
D AT A C O O R D I N AT I O N A N D
I N T E G R AT I O N C E N T E R11
LINCS Data Portal (LDP)
B D 2 K - L I N C S
D AT A C O O R D I N AT I O N A N D
I N T E G R AT I O N C E N T E R12
LINCS Data Portal (LDP)
B D 2 K - L I N C S
D AT A C O O R D I N AT I O N A N D
I N T E G R AT I O N C E N T E R13
LINCS Data Portal (LDP)
B D 2 K - L I N C S
D AT A C O O R D I N AT I O N A N D
I N T E G R AT I O N C E N T E R14
Small-Molecule Standardization
B D 2 K - L I N C S
D AT A C O O R D I N AT I O N A N D
I N T E G R AT I O N C E N T E R15
Small-Molecule Standardization
B D 2 K - L I N C S
D AT A C O O R D I N AT I O N A N D
I N T E G R AT I O N C E N T E R16
Small-Molecule Standardization
Chemaxon Tautomer
Calculator Plugin
Chemaxon PLP Calculator or IJC
B D 2 K - L I N C S
D AT A C O O R D I N AT I O N A N D
I N T E G R AT I O N C E N T E R17
Small-Molecule Standardization
B D 2 K - L I N C S
D AT A C O O R D I N AT I O N A N D
I N T E G R AT I O N C E N T E R18
Small-Molecule Standardization(In a Perfect World)
B D 2 K - L I N C S
D AT A C O O R D I N AT I O N A N D
I N T E G R AT I O N C E N T E R19
Small-Molecule Standardization(In a Perfect World)
B D 2 K - L I N C S
D AT A C O O R D I N AT I O N A N D
I N T E G R AT I O N C E N T E R20
Small-Molecule Standardization
B D 2 K - L I N C S
D AT A C O O R D I N AT I O N A N D
I N T E G R AT I O N C E N T E R21
Small-Molecule Standardization(In Reality)
Stereochemical information is not
always correct in submitted
structures.
Are the SMILES incorrect or are
these all different compounds with
incorrect names?
B D 2 K - L I N C S
D AT A C O O R D I N AT I O N A N D
I N T E G R AT I O N C E N T E R22
Small-Molecule Standardization(In Reality)
Centers often omit alkene geometry.
Do we link all data together under one LSM
ID?
B D 2 K - L I N C S
D AT A C O O R D I N AT I O N A N D
I N T E G R AT I O N C E N T E R23
Small-Molecule Standardization(In Reality)
Salt stripping components
can’t handle coordination
complexes
Maps to Pubchem CID
Assigned LSM-ID
Error propagates…
B D 2 K - L I N C S
D AT A C O O R D I N AT I O N A N D
I N T E G R AT I O N C E N T E R24
Small-Molecule Standardization(In Reality)
Each tautomer maps to a
unique pubchem ID
Which one is correct?
B D 2 K - L I N C S
D AT A C O O R D I N AT I O N A N D
I N T E G R AT I O N C E N T E R25
In Conclusion…
• The LINCS data portal (LDP) is really cool!
(and a great resource for industry and academics)
• Compound standardization pipeline is constantly evolving
• Currently, we are implementing a Substructure Search
using Marvin sketch (See life.ccs.miami.edu)
B D 2 K - L I N C S
D AT A C O O R D I N AT I O N A N D
I N T E G R AT I O N C E N T E R26
Acknowledgements• Dr. Stephan Schurer (PI)
• Amar Koleti (Software engineer)
• Asiyah Yu Lin (Assoc Scientist, FDA)
• Bryce Allen (PhD student)
• Caty Chung (Software engineer)
• Dušica Vidović (Associate scientist)
• Hande McGinty (PhD student)
• Jianbin Duan (MS student)
• Michele Forlin (Associate scientist)
• Qiong Cheng (Assistant Scientist)
• Raymond Terryn (Postdoctoral fellow)
• Simar Puri (Undergraduate student)
• Vasileios Stathias (PhD student)
http://bd2k-lincs.orghttp://lincsportal.ccs.miami.edu/http://life.ccs.miami.edu/