chemspider overview slides august 2007

32
An Introduction to An Introduction to ChemSpider ChemSpider Antony Williams Antony Williams

Upload: orcid-0000-0002-2668-4821

Post on 10-May-2015

995 views

Category:

Technology


2 download

DESCRIPTION

ChemSpider is being built with the intention of being a chemical structure centric community for chemists. With over 16 million chemical structures as of August 2007, and with data deposition and curation mechanisms in place for text, structure and spectra ChemSpider intends to be a meeting place and collaborative environment for chemists to work together.

TRANSCRIPT

Page 1: ChemSpider Overview SLides August 2007

An Introduction to An Introduction to ChemSpiderChemSpider

Antony WilliamsAntony Williams

Page 2: ChemSpider Overview SLides August 2007

Building a Structure Centric Community for Chemists

The ChemSpider MissionThe ChemSpider Mission

ChemSpider intends to build a structure centric ChemSpider intends to build a structure centric community for chemists by:community for chemists by:

Providing an environment for chemical structure drawing, Providing an environment for chemical structure drawing, manipulation, visualization, modeling & manipulation, visualization, modeling & databasingdatabasingProviding methods by which to deposit, curate and enhance Providing methods by which to deposit, curate and enhance data associated with chemical structuresdata associated with chemical structuresProviding structureProviding structure--based access to federated Chemistry based access to federated Chemistry databases representing chemical vendors, literature, online databases representing chemical vendors, literature, online data, patents and other forms of Chemistry data data, patents and other forms of Chemistry data

Page 3: ChemSpider Overview SLides August 2007

Building a Structure Centric Community for Chemists

Every Journey Starts with a Single StepEvery Journey Starts with a Single StepMarch 2007March 2007

ChemSpider ChemSpider BetaBeta opened to the public for exposure at opened to the public for exposure at the Spring ACS (March 2007)the Spring ACS (March 2007)

Initial database exposed only Initial database exposed only PubChemPubChem Data as a proof of Data as a proof of concept (10.5 Million compounds)concept (10.5 Million compounds)Structure/substructure searching availableStructure/substructure searching availableOnline prediction services availableOnline prediction services availableAccompanying Spinneret Accompanying Spinneret BlogBlog released released

Page 4: ChemSpider Overview SLides August 2007

Building a Structure Centric Community for Chemists

Execution of the MissionExecution of the MissionAugust 2007August 2007

An online database of over 16.5 million structuresAn online database of over 16.5 million structuresSystems in place for: Systems in place for:

Single structure and data collection depositionsSingle structure and data collection depositionsAssociation of analytical data with structuresAssociation of analytical data with structuresAbility to curate data for each individual recordAbility to curate data for each individual record

Indexing of and Integration to:Indexing of and Integration to:Over 70 individual databasesOver 70 individual databasesPatents from the US, European and Asian Patent officesPatents from the US, European and Asian Patent offices

TextText--based searching of over 50,000 Open Access articlesbased searching of over 50,000 Open Access articlesOver a thousand unique users access ChemSpider per dayOver a thousand unique users access ChemSpider per day

Page 5: ChemSpider Overview SLides August 2007

Building a Structure Centric Community for Chemists

Search Text Search Text ““ProzacProzac””

Page 6: ChemSpider Overview SLides August 2007

Building a Structure Centric Community for Chemists

Predicted Properties Details Predicted Properties Details ““ProzacProzac””

Page 7: ChemSpider Overview SLides August 2007

Building a Structure Centric Community for Chemists

Flexible Boolean SearchingFlexible Boolean Searching

Page 8: ChemSpider Overview SLides August 2007

Building a Structure Centric Community for Chemists

Integrated Searching from Online Applet or Integrated Searching from Online Applet or Desktop Drawing PackagesDesktop Drawing Packages

Page 9: ChemSpider Overview SLides August 2007

Building a Structure Centric Community for Chemists

Integrated Structure/Substructure and Integrated Structure/Substructure and Property SearchesProperty Searches

Page 10: ChemSpider Overview SLides August 2007

Building a Structure Centric Community for Chemists

Search result: 49 hits in 2.8 secondsSearch result: 49 hits in 2.8 seconds

Page 11: ChemSpider Overview SLides August 2007

Building a Structure Centric Community for Chemists

Integrated Visualization ToolsIntegrated Visualization Tools

Page 12: ChemSpider Overview SLides August 2007

Building a Structure Centric Community for Chemists

Integrated Analytical Data ManagementIntegrated Analytical Data Managementfor Public Domain Datafor Public Domain Data

Page 13: ChemSpider Overview SLides August 2007

Building a Structure Centric Community for Chemists

Integrated Prediction EnginesIntegrated Prediction Engines

Input a chemical structure and predict the properties in real time using server-based transactions. Predictions furnished by ACD/Labs at present.

Page 14: ChemSpider Overview SLides August 2007

Building a Structure Centric Community for Chemists

Integrated Access to Open Access Integrated Access to Open Access LiteratureLiterature

Text-based searching of over 50,000 Open Access Chemistry Articles

Page 15: ChemSpider Overview SLides August 2007

Building a Structure Centric Community for Chemists

External Integrations External Integrations -- WikipediaWikipedia

The links between Wikipedia and ChemSpider are formed automatically

Page 16: ChemSpider Overview SLides August 2007

Building a Structure Centric Community for Chemists

External Integrations External Integrations -- GoogleGoogle

Search Across Search Across Google Using Google Using

InChIInChI stringstring

Page 17: ChemSpider Overview SLides August 2007

Building a Structure Centric Community for Chemists

External Integrations External Integrations –– PatentsPatentsReel Two Reel Two SurechemSurechem PortalPortal

Page 18: ChemSpider Overview SLides August 2007

Building a Structure Centric Community for Chemists

External Integrations External Integrations –– PlugPlug--insins

Page 19: ChemSpider Overview SLides August 2007

Building a Structure Centric Community for Chemists

How do people generally use How do people generally use ChemSpider?ChemSpider?

Searching for chemical structures, in rank order, via:Searching for chemical structures, in rank order, via:Registry numbers, trade names and synonyms. Registry numbers, trade names and synonyms. Structure identifiers such as SMILES or Structure identifiers such as SMILES or InChIInChIIntrinsic properties: commonly massIntrinsic properties: commonly mass--based searches executed based searches executed by mass by mass spectrometristsspectrometristsSystematic names: IUPAC or CAS Index nameSystematic names: IUPAC or CAS Index name

Generation of physicochemical propertiesGeneration of physicochemical propertiesTextText--based searching of Open Access articlesbased searching of Open Access articlesStructureStructure--based searching of Patentsbased searching of Patents

Page 20: ChemSpider Overview SLides August 2007

Building a Structure Centric Community for Chemists

Some Initial FeedbackSome Initial Feedback

““I must say that this site is excellent because it is the only plI must say that this site is excellent because it is the only place I ace I can drop the name of a published molecule like a bile acid or can drop the name of a published molecule like a bile acid or steroid metabolite and actually get a hit.steroid metabolite and actually get a hit.”” Department of Animal Department of Animal Sciences, University of WisconsinSciences, University of Wisconsin--MadisonMadison““I saw the ChemSpider release today I saw the ChemSpider release today -- looks like an awesome looks like an awesome service!service!”” Stanford UniversityStanford University““We encourage our students to use the We encourage our students to use the chemspiderchemspider services for services for looking up chemical structures and their properties. My looking up chemical structures and their properties. My collaborators use it to find company information. Your services collaborators use it to find company information. Your services are very valuable for us.are very valuable for us.”” Faculty of Pharmaceutical Sciences, Faculty of Pharmaceutical Sciences, University of Leuven (Belgium)University of Leuven (Belgium)

Page 21: ChemSpider Overview SLides August 2007

Building a Structure Centric Community for Chemists

Curators Curators -- An Active CommunityAn Active Community

Page 22: ChemSpider Overview SLides August 2007

Building a Structure Centric Community for Chemists

The ChemSpider PlatformThe ChemSpider Platform

ChemSpider is developed using a series of components ChemSpider is developed using a series of components and programming languages. and programming languages.

The core is C/C++The core is C/C++The database engine is Microsoft SQL ServerThe database engine is Microsoft SQL ServerWeb interface is ASP.NET and C#Web interface is ASP.NET and C#ServerServer--side scripting is Perl. side scripting is Perl.

ChemSpider is hosted on x86 PCs on Windows 2003 ChemSpider is hosted on x86 PCs on Windows 2003 Server Server

Page 23: ChemSpider Overview SLides August 2007

Building a Structure Centric Community for Chemists

The ChemSpider PlatformThe ChemSpider Platform

Page 24: ChemSpider Overview SLides August 2007

Building a Structure Centric Community for Chemists

ChemSpider ChemSpider –– The EntityThe Entity

The team consists of five people including 3 developersThe team consists of five people including 3 developersThe team has almost 40 years of experience in The team has almost 40 years of experience in developing modern developing modern cheminformaticscheminformatics software software applications for the desktop and the enterpriseapplications for the desktop and the enterpriseChemSpider has been built in offChemSpider has been built in off--hours only and with hours only and with personal fundingpersonal fundingChemSpider has an active Advisory GroupChemSpider has an active Advisory Group

Page 25: ChemSpider Overview SLides August 2007

Building a Structure Centric Community for Chemists

The Advisory GroupThe Advisory Group

The Advisory Group has been built from industry experts in the The Advisory Group has been built from industry experts in the fields of:fields of:

Pharmaceutical sciencesPharmaceutical sciencesAnalytical SciencesAnalytical SciencesAcademia and Public Chemistry EffortsAcademia and Public Chemistry EffortsForensic SciencesForensic SciencesOpen StandardsOpen StandardsOpen Source/Open Access SpectroscopyOpen Source/Open Access SpectroscopySoftware DevelopmentSoftware DevelopmentChemistry DatabasesChemistry DatabasesNatural ProductsNatural ProductsChemical Informatics and StatisticsChemical Informatics and StatisticsMedia, web skills and public relationsMedia, web skills and public relations

Page 26: ChemSpider Overview SLides August 2007

Building a Structure Centric Community for Chemists

Targets for 2007Targets for 2007

End of year intentions for ChemSpider includeEnd of year intentions for ChemSpider includeAdding more databases to the index to expand database to 20 Adding more databases to the index to expand database to 20 million unique structuresmillion unique structuresEnhance integrations to other structure drawing packagesEnhance integrations to other structure drawing packagesInclude additional property prediction algorithms as providedInclude additional property prediction algorithms as providedExtend integrations to synergistic software offeringsExtend integrations to synergistic software offeringsExpand the Public Domain analytical data handlingExpand the Public Domain analytical data handlingEnhance the Patent integrationEnhance the Patent integrationExpand the Open Access article index to >100,000 articlesExpand the Open Access article index to >100,000 articles

Page 27: ChemSpider Overview SLides August 2007

Building a Structure Centric Community for Chemists

ChemSpider in the LandscapeChemSpider in the Landscape

Common questions include:Common questions include:How does ChemSpider compare with How does ChemSpider compare with PubChemPubChem??Is ChemSpider a competitor to Is ChemSpider a competitor to eMoleculeseMolecules??PubChemPubChem and CAS collided. Do ChemSpider and CAS and CAS collided. Do ChemSpider and CAS collide?collide?

Page 28: ChemSpider Overview SLides August 2007

Building a Structure Centric Community for Chemists

ChemSpider and ChemSpider and PubChemPubChem

PubchemPubchem was built to support the National Screening Libraries was built to support the National Screening Libraries Initiative. The system is delivering very well on managing and Initiative. The system is delivering very well on managing and providing access to associated dataproviding access to associated dataChemSpider HAS taken advantage of the ChemSpider HAS taken advantage of the PubChemPubChem data sets data sets ––PubChemPubChem data is only a subset of ChemSpiderdata is only a subset of ChemSpiderAspects of Aspects of PubChemPubChem’’ss approach to manage and display structure approach to manage and display structure data are worth emulating data are worth emulating –– so ChemSpider doesso ChemSpider doesChemSpider is in the process of ChemSpider is in the process of curatingcurating data and submitting data and submitting back to back to PubChemPubChemChemSpider will make submissions to the ChemSpider will make submissions to the PubChemPubChem dataset and dataset and provide input, guidance and feedbackprovide input, guidance and feedbackThe relationship is synergistic The relationship is synergistic

Page 29: ChemSpider Overview SLides August 2007

Building a Structure Centric Community for Chemists

ChemSpider and ChemSpider and eMoleculeseMolecules

ChemSpider and ChemSpider and eMoleculeseMolecules are similar efforts to deliver are similar efforts to deliver access to structureaccess to structure--related informationrelated informationBoth groups are developing new Both groups are developing new cheminformaticcheminformatictechnologies to facilitate their systemstechnologies to facilitate their systemsChemSpider is actively delivering on a ChemSpider is actively delivering on a wikiwiki--like like environment for expansion and environment for expansion and curationcuration of the databaseof the database

Page 30: ChemSpider Overview SLides August 2007

Building a Structure Centric Community for Chemists

ChemSpider and CASChemSpider and CAS

CAS is the gold standard in many domains CAS is the gold standard in many domains –– the Registry is a the Registry is a highly highly curatedcurated and controlled assembly of chemical structures and controlled assembly of chemical structures and related information connected to publications, patents and and related information connected to publications, patents and vendors. vendors. ChemSpider is primarily an algorithmically ChemSpider is primarily an algorithmically curatedcurated database with database with manual manual curationcuration enabled. ChemSpider is a system facilitating the enabled. ChemSpider is a system facilitating the deposition and association of chemical structures and related deposition and association of chemical structures and related information including, but not limited to public and commercial information including, but not limited to public and commercial databases, chemical vendor catalogs, patents and individual databases, chemical vendor catalogs, patents and individual chemistschemistsCAS is not CAS is not yetyet concerned with providing capabilities to manage concerned with providing capabilities to manage chemistry depositions submitted by individual scientists, researchemistry depositions submitted by individual scientists, research ch groups, with the management of analytical data and the provisiongroups, with the management of analytical data and the provisionof APIs to provide integration to other Open Access systemsof APIs to provide integration to other Open Access systems

Page 31: ChemSpider Overview SLides August 2007

Building a Structure Centric Community for Chemists

ConclusionConclusion

ChemSpider is ChemSpider is successfully successfully building a structure centric building a structure centric community for chemistscommunity for chemistsOver 1000 chemists per day utilize ChemSpider to help Over 1000 chemists per day utilize ChemSpider to help answer questions and solve their problemsanswer questions and solve their problemsA wellA well--defined path forward to enhance the service has defined path forward to enhance the service has been definedbeen defined

Page 32: ChemSpider Overview SLides August 2007

Building a Structure Centric Community for Chemists

AcknowledgmentsAcknowledgments

Thousands of users for their feedback and ongoing Thousands of users for their feedback and ongoing encouragementencouragementThe The ““naysayersnaysayers”” –– criticism, when taken constructively, criticism, when taken constructively, can drive creative actionscan drive creative actionsOur advisory group of scientists, specialists and friendsOur advisory group of scientists, specialists and friendsThe The bloggersbloggers coming to the ChemSpider coming to the ChemSpider BlogBlog and and ChemSpider NewsChemSpider News