bridging the knowledge gap: searching sharepoint, e ...chemical hits are combined with relevant...
TRANSCRIPT
RESEARCH POSTER PRESENTATION DESIGN © 2011
www.PosterPresentations.com
Web views of use cases
SharePoint
ELN
CDS Genius
Documents Servers / Databases
Bridging the Knowledge Gap: Searching SharePoint, E-Notebook,
Chromatography Data Systems and Unstructured Documents for
Chemical and other Scientific Information to Enable Cooperation,
Collaboration and Improved Decision Making
The development of discrete information systems to capture scientific information, such as Document
Management Systems (DMS, e.g. SharePoint), Electronic Lab Notebooks (ELN) and Chromatography Data
Systems has led to information being largely distributed in different silos, reflecting the scientific disciplines of
the primary users. In-application search tools often perform badly because systems tend to be optimized for
data capture rather than search and retrieval. Increasing demands for cross-disciplinary collaboration and
decision making have increased the need for highly adaptable, cross-application, scientifically aware search
tools that can aid project management and scientific discovery.
Collaboration with key industry leaders has identified three interfaces where sub-optimal scientific and chemical
searchability has hindered information gathering and sharing, namely
1) Cross-application searching between SharePoint and E-Notebook, including chemical structure
searchability;
2) Rapid chemical reaction searching across multiple Electronic Lab Notebook systems; and
3) Chemically-aware data mining in a chromatography data system to connect method data and results to
structural information.
E-Notebook: Rapid Reaction Searching & Federated Searches
Industry Problem Statement: Reaction information, stored within ELNs, requires the user to access the ELN
to conduct the search. Searches cannot be consolidated across multiple ELNs. Traditional searches wait until
completion of the search before returning hits, leading to slower performance across large data sets. Lab
performance metrics are difficult to extract.
Desired Outcomes: 1) An external interface to rapidly search reactions; 2) Federated searches across multiple
ELN or other reaction data sources ; 3) Extraction of performance metrics such as number of reactions per
scientist, project or site.
Solution: Reaction Genius was developed to extract reaction information from one, or multiple, ELNs along
with relevant metadata, such as project, user, creation date, yield, temperature etc. The data is extracted and
consolidated within a single XML document which is then transformed into new database tables that are
optimized for search which allows searching of thousands of reactions in seconds. Results are returned in
buckets of graded relevancy, providing the most important hits immediately, as presented in an intuitive web
form. Metadata associated with the record (date, project etc.) can be displayed in graphical representations in a
widget based dashboard, to give performance metrics tailored to the specific needs of the organization.
Industry Problem Statement: Analytical method development requires significant trial and error to develop
new methods for analyzing and identifying pharmaceutical intermediates and impurities. Historical data on the
analysis of prior analytes is stored within Chromatography Data Systems (CDS) where peaks are labeled with
unique identifying numbers. The peak labels are correlated with their chemical structure elsewhere. The CDS
does not support a chemical similarity search which would be highly beneficial to predicting best methods for
analysis of the latest compounds.
Desired Outcomes: 1) A system to allow structure searching of analytical run data contained within a CDS,
including substructure and structural similarity searching; 2) Presentation of run and method parameters within
an intuitive interface; and 3) Additional chemical property search criteria, such as cLogP, for improved predictive
properties.
Solution: Method Genius extracts peak information, along with method and run parameters, from a CDS. This
data is merged with structural information, held in an additional document or database, compiled within an XML
document and transformed into database tables optimized for search. Results are returned in buckets of
graded relevancy, providing the most important hits immediately, as presented in an intuitive web form.
Chemical properties are calculated or predicted on the identified structures. Searches can be a combination of
structure (either substructure or structure similarity), chemical properties (such as cLogP), and method and run
parameters.
A central data repository can be created to
pull, merge and reorganize information
from a combination of file shares and
databases. The reorganized data can be
optimized for faster retrieval, and to
combine related data sets. Data views can
be tailored towards specific use cases and
delivered through either web pages or
application specific forms.
Philip J Skinner PhD, Phil McHale D. Phil, Rudy Potenzone PhD, Kate Blanchard, and Megean Schoenberg
PerkinElmer Informatics, 100 CambridgePark Drive, Cambridge, MA02140
Industry Problem Statement: Scientific information is dispersed across and external to the organization, in
E-Notebook, SharePoint and the web. It cannot be simultaneously searched from a single interface, nor can
SharePoint be searched for chemical structure. Search results cannot be stored readily within the E-Notebook
as a record of the inventive thought process.
Desired Outcomes: 1) A search tool to enable simultaneous structure searching on SharePoint and
E-Notebook content, and text searching on both and the web; 2) An easily configurable interface for searching
and presenting results; 3) Ability to record the thought process going into searches for possible intellectual
property around inventive steps.
Solution: Search Genius for SharePoint enhances Microsoft SharePoint to include searches by chemical
structure. Custom web parts were introduced to easily expose widgets for searching and displaying results from
within the SharePoint framework. All included datasources with relevant chemical data are crawled and
indexed by structure. Chemical hits are combined with relevant textual hits to produce a comprehensive result
list for the user, allowing federated searches, across the web, SharePoint and E-Notebook simultaneously, from
either SharePoint or the E-Notebook. Any search result can be stored, annotated, and saved in the
experimental record within E-Notebook as evidence of the thought process.
ELN &/or Reaction Databases
All three solutions were assessed by industrial partners and found to meet or exceed the desired outcomes.
Production implementation is currently underway, in addition to extensions including incorporation of data from
custom in-house systems including robotics and additional bidirectional data transfer.
Further information available by email ([email protected]), at www.perkinelmer.com/informatics, from our booth (#219) or
by scanning the QR code above into your mobile device.
CDS Compound Registry
Reaction Genius
Method Genius
Web
FAST Science Parser
E-Notebook
SharePoint Front End
E-Notebook Client
Scan to download a copy of this poster
Introduction
Concept
SharePoint and E-Notebook Cross Platform and
Chemical Searching
E-Notebook: Rapid Reaction Searching &
Federated Searches
Chromatography Data Systems & Structural Searching
Conclusions
Federate chemical and text searching, from SharePoint, the web
and E-Notebook from either a SharePoint or
E-Notebook front end.
Results can be viewed from either
SharePoint or E-Notebook
Collect links and articles as a thought
experiment within E-Notebook to document
and protect the invention process
Performance metrics highlight the
most productive scientists, teams,
projects or sites.
Dashboard is built on a widget
model to allow easy customization
and hence institutionally specific
views into the data.
Widgets provide real-
time views of the most
recent additions.
Results are returned in buckets,
with the most relevant results
returned first.
Combined structure, chemical ,
experimental and hierarchical
property search parameters.
Expandable reaction graph to
explore precursors and products
throughout the synthetic scheme.
Structural results are
returned in buckets, with
the most relevant results
returned first.
.
All the identified peaks in a run
are associated with their
relevant structures.
.
Combined structure, property and
metadata search parameters.
.
Reports consolidate structural, chemical
properties and run and method parameters
where relevant.
.
.