icic 2014 increasing the efficiency of pharmaceutical research through data integration
DESCRIPTION
The pressures of pharmaceutical research and development demand increasing efficiency from scientists. High-quality decisions must be made faster and encompass all available information. At the same time there is a growing desire to better utilize the multi-billion dollar research investment recorded in laboratory notebooks and bioassay databases. Key values for data integration in a data exploration environment include gathering data from disparate E-notebooks and bioassay databases into a single searchable “virtual” system and increased discoverability by accessing data through a system designed for exploration. Key benefits are better chemistry decisions through easier access to broader data and reduced time for preparing patent filings. The ability to interlink in-house and reported assay data with in-house and published chemistry provides a data-rich environment for developing insights and predictive models. We will discuss our experience with integrating information from journals, patents, bio-assay databases, and E-lab notebooks to address these needs.TRANSCRIPT
INCREASING THE EFFICIENCY OF PHARMACEUTICAL RESEARCH THROUGH DATA INTEGRATION
Dr. Roland Bauer
12-15.Oct. 2014 ICIC 2014 Heidelberg
Project Manager Content Integration & Development Elsevier Information Systems GmbH, Frankfurt [email protected]
Matthew Clark Ph.D. Consultant, Life Science Services Elsevier Inc. Philadelphia, PA [email protected]
2
ABOUT ME:
- “Babes –Bolyai” University, Cluj-Napoca,
Romania
- Max-Planck-Institute for Polymer Research,
Mainz, Germany
- Elsevier
3
Introduction & Setting the Stage
Why?
Content Integration : The Reaxys Case
Integration Process Project Overview
AGENDA
4
INTRODUCTION & SETTING THE STAGE: THE DRUG DISCOVERY INFORMATION LANDSCAPE
5
INTRODUCTION & SETTING THE STAGE: TENDENCIES IN THE DRUG DISCOVERY INFORMATION LANDSCAPE
6
INTRODUCTION & SETTING THE STAGE: TENDENCIES IN THE DRUG DISCOVERY INFORMATION LANDSCAPE
WHY?
7
ISSUE: CHEMICAL INFORMATION ACCESS IS FRAGMENTED
• End users must learn many
interfaces
• Different data sources
have different
capabilities for searching
• Scientists may not search all
appropriate data sources
Licensed
Database
Licensed
Database
Catalog
Catalog
E-Notebook
E-Notebook References/
Full Text
INTEGRATION OF DATA PROVIDES BETTER ANSWERS
Searching multiple sources with one search via a single
interface increases efficiency
Harmonized indexing allows asking similar question
among all sources
9
Easier
Access Enhanced
usage
Better
value for
investment
Better
decisions
Faster
progress
CONTENT INTEGRATION : THE REAXYS CASE
10
11
THE REAXYS DATABASE: CONTAINS INTEGRATED PUBLISHED CHEMISTRY DATA
12
THE REAXYS DATABASE: …ALONG WITH EXPANDED BIBLIOGRAPHICAL INFORMATION
13
THE REAXYS TREE : BROWSE CONTENT BY ONTOLOGY
TWO APPROACHES TOWARDS INTEGRATED CONTENT
14
Analysis system
End-User
Central
Storage
FEDERATED MODEL WAREHOUSE MODEL
TWO APPROACHES TOWARDS INTEGRATED CONTENT
15
FEDERATED MODEL WAREHOUSE MODEL
Pros: - Easy scalability in case of new
data sources - Delivery of short term „wins“
- Maintenance costs
Cons: - Lack of normalization and
harmonized indexing
- Performance and availability dependent on the source systems
Pros: - High data quality trough
normalization
- Unified Queries and Filters applicable
Cons: - Long implementation times &
higher starting costs
- Expensive and difficult to accommodate changes in data types
16
REAXYS EXTERNAL CONTENT INTEGRATION
Database
End-User
ELN 1
ELN 2
CUSTOM IN HOUSE
REACTIONS SOURCE
Indexed Storage
RX CONTENT EXTERNAL CONTENT
Customer Hosted
17
REAXYS EXTERNAL CONTENT INTEGRATION: IN HOUSE SCENARIO
Database
End-User
ELN 1
ELN 2
CUSTOM IN HOUSE
REACTIONS SOURCE
Indexed Storage
RX CONTENT EXTERNAL CONTENT
Customer Hosted
Elsevier Hosted
18
REAXYS EXTERNAL CONTENT INTEGRATION: ELSEVIER HOSTED SCENARIO
Database
End-User
ELN 1
ELN 2
CUSTOM IN HOUSE
REACTIONS SOURCE
Indexed Storage
RX CONTENT EXTERNAL CONTENT
Customer Hosted Elsevier Hosted
19
REAXYS EXTERNAL CONTENT INTEGRATION: HYBRID HOSTING SCENARIO
Database
End-User
ELN 1
ELN 2
CUSTOM IN HOUSE
REACTIONS SOURCE
Indexed Storage
RX CONTENT EXTERNAL CONTENT
REAXYS PROVIDES A UNIFIED INFORMATION PORTAL
• Provides a single powerful
interface
• Can integrate several
notebook systems
• Links chemistry, structures,
sourcing, citations, and
full-text of articles
Structures,
reactions, and
Full-Text
Licensed
Reaction and
Structure
Databases
E-Notebook Binding,
Properties
E-Notebook
Patents
INTEGRATED SOLUTION SEARCH
21
List of integrated
sources
Sources list can include licensed
databases, and multiple e-
notebooks from organizational
units
All e-notebooks can be integrated
and searched together
REACTION SEARCH RESULTS SEPARATED BY SOURCE
22
Results from
each source on
separate tab
Show corresponding
substances in …
PubChem
eMolecules
Licensed
PharmaCo e-notebook
PharmaCo2 e-notebook
Cross link to
substance in all
other sources
where it is found
E-notebooks
SUBSTANCE RESULTS
23
Results from each source on separate tab
Including PubChem and eMolecules
All filters fully
active
INTEGRATION CASE STUDY: ROCHE IN HOUSE HOSTED
Integrated Reaxys with several data sources:
• Medicinal Chemistry E-notebooks
• Development Chemistry E-notebooks
• Several E-notebook systems of acquired organizations
• Licensed Databases
• Current Chemical Reactions
• Several other databases
Links to many more sources
• Roche stockroom availability
• Patent/Literature full text
• Link to original e-notebook pages
24
Reaxys integrates these e-
notebooks with each other,
while they are still maintained
as separate systems
CASE STUDY: ROCHE KEY DRIVERS
From ACS Presentation by
Michael Kapler, Roche Pharma Research and Early Development
http://abstracts.acs.org/chem/245nm/program/view.php?obj_id=188977
INTEGRATION PROCESS
PROJECT OVERVIEW
26
PROCESS OVERVIEW FOR AN INTEGRATION PROJECT
Initialisation:
- Evaluation of Datasources and needed
resources
- Determine hosting scenario
- Commercial and legal framework
Kick-of:
-requirements harvesting
-establish milestones and top down workstreams
-Refine & finalize plans
Execution:
-implement automatised ETL process
-implement application customisation
-install IT infrastructuee and interfaces
Delivery
- BETA release
- Refinement
- GoLive
Sprint 1
Sprint 2
Sprint 3
Sprint 4
Hand over to BAU /Maintenance
- Sprint Iterations a 3 weeks -Sprint number dependent on complexity (5-…)
DATA MODEL AND USER INTERFACE PROCESS
Determine
data
sources
Map to Reaxys
Integration Data
Model (XML)
User interface
configuration for new
fields
Unit
conversions,
data cleaning
E-notebooks
Licensed databases
°C, K
moles, grams
Identify URL
links to E-
notebooks
an other
resources
This is a key step for
the integration project Design location, nature of displaying
the fields, urls etc.
AUTOMATISED FABRICATION PROCESS (ETL)
E-notebook
1
E-notebook
2
Bioassay db
Transmit to
fabrication
server (sftp,
scp)
Fabrication
combines data
with Reaxys data
for production
Daily
extraction to
XML using
defined data
model …
30
THANK YOU – QUESTIONS?
Project Manager Content Integration & Development Elsevier Information Systems GmbH, Frankfurt [email protected]
Matthew Clark Ph.D. Consultant, Life Science Services Elsevier Inc. Philadelphia, PA [email protected]
Dr. Roland Bauer