icic 2014 semantic integration of pharmaceutical content : blueprint and examples
DESCRIPTION
Pharmaceutical research is becoming more translational, more collaborative and more distributed. This has brought on a new set of Information Management challenges for all involved parties : dealing with data that is not only Big, but also comes from a large variety of open and proprietary sources and needs to be viewed from the angle of multiple disciplines. In this context, semantic enrichment has emerged as a critical Information Systems capability enabling the integration of diverse content, and its access from diverse viewpoints, with productivity benefits at all stages of the lifecycle. Through the lens of a corresponding Maturity Model, this session will investigate representative applications of semantics for pharma information, and underscore the trends shaping how we will manage, distribute, and access it in the future.TRANSCRIPT
Copyright © 2014 TEMIS - All Rights Reserved - Slide 1
Semantic Integration of Pharmaceutical Content : Blueprint and Examples
ICIC Heidelberg, October 15, 2014
Stefan Geißler, TEMIS Deutschland GmbH
www.temis.com
Copyright © 2014 TEMIS - All Rights Reserved - Slide 2
A pioneer in Content Enrichment since 2000
70
Copyright © 2014 TEMIS - All Rights Reserved - Slide 3
We‘re from Heidelberg
And since we are from Heidelberg, this is not the first
time I visit the Stadthalle…
„Ball der Vampire“ 2014
Copyright © 2014 TEMIS - All Rights Reserved - Slide 4
TEMIS @ Publishing
Copyright © 2014 TEMIS - All Rights Reserved - Slide 5
TEMIS @ Industry Accounts
Copyright © 2014 TEMIS - All Rights Reserved - Slide 6
TEMIS Mission: Structuring the unstructured
We report a 52 year-old man presenting an acute hair loss induced by carbamazepine (CBZ) in concentration of 8.6 microg/ml.
Extract domain-specific information from text
Relations
We report a 52 year-old man presenting an acute hair loss induced by carbamazepine (CBZ) in concentration of 8.6 microg/ml.
Verb Patient Verb Symptom Verb Dosage information Subj
Entities
Drug Name
Terms
Pro Verb Num Art N-P Noun Verb Art Adj Nn Nn Verb Pp PropNn Pp Noun Pp Num Unit Abbr
Attributes
Roles
Adverse Event Side Effect Alopecia
Cause Carbamazepine
Dosage 8.6 mg/ml
Patient 52 year old male
Copyright © 2014 TEMIS - All Rights Reserved - Slide 7
Delivery of Analysis Results Luxid WebFrontend: Highlighting, Linking, Charts, Graphs, …
Copyright © 2014 TEMIS - All Rights Reserved - Slide 8
Observation
Luxid Webfrontend: Powerful, rich tool for experienced expert users
Some users express the demand for a simpler, less sophisticated system
Quote from one client:
„The aim is not to make 5% of my staff 80% smarter, but 80% of my people 5% smarter!“
Copyright © 2014 TEMIS - All Rights Reserved - Slide 9
Introducing Luxid Navigator
TEMIS acquisition of „I3 Analytics“ from Columbia, USA, in 2013
Large-scale crawling, enrichment, indexing and delivery of Life Science content in an intuitive web application
Design and implementation in close cooperation with BigPharma pilot client
Deployments at INSERM and Prometic Life Sciences
Copyright © 2014 TEMIS - All Rights Reserved - Slide 10
Fierce Bio
Fierce Pharma
Marketwire
PrNewswire
Biospace
Reuters
Linkoph
DrugDiscoveryNews
NewsMedical.net
pr.com
Medline
ChemIDPlus
MeSH
NCT
WHO
FDA
NIH …
Excel HTML XML PDF
Streamlines experts & competitive intelligence efforts by :
• Aggregating, Enriching and Integrating multiple sources of open but unstructured data on these topics
• Presenting information in structured yet easy-to-use interfaces
Biopharma Edition Luxid® Navigator
Extract key items of interest
Interconnect items with Linked Data
Portal application
Content Aggregation
Information Extraction
Knowledge Integration
Presentation
Content merged into a single template
Copyright © 2014 TEMIS - All Rights Reserved - Slide 11
BioExpert Navigator:
Assists users in identifying experts and their published content in life sciences research.
ClinicalTrials Navigator:
Assists users in identifying competitive intelligence related to clinical trials.
BioNews Navigator: Continuously scans news feeds to pick up every possible news story for target discovery, drug approvals, trial initiation, trial results, regulatory news and variety of other news.
Pharmacovigilance (Beta) Provide overview over Adverse Events
Toxicology (in preparation)
Biopharma Edition Luxid® Navigator
Copyright © 2014 TEMIS - All Rights Reserved - Slide 12
Biopharma Navigator Modules
Copyright © 2014 TEMIS - All Rights Reserved - Slide 13 Copyright © 2014 TEMIS - All Rights Reserved
Data Breakdown
❖ ~750,000 News Articles (1,000 new per day)
❖ ~270,000 Clinical Trials (WHO + CT.gov)
❖ ~13,600,000 Medline Articles (1992-Today)
❖ ~18,000,000 FDA Adverse Events
❖ ~5,100,000 US Patents
❖ ~390,000 NIH Grants
❖ ~11,000 Conference Documents
Biopharma Edition Luxid® Navigator
Copyright © 2014 TEMIS - All Rights Reserved - Slide 14 Copyright © 2014 TEMIS - All Rights Reserved
Refreshing Documents ❖ ClinicalTrials.gov fully recrawled daily ❖ Takes less than an hour
❖ Bionews refreshed weekly ❖ Takes a few hours
❖Medline (13.6 million) refreshed monthly ❖ Takes about 30 hours
❖Multiple refreshes in parallel with little time difference
Selective refreshes can be done for real time results and are recommended
Biopharma Edition
Luxid® Navigator
Copyright © 2014 TEMIS - All Rights Reserved - Slide 15
Process: Capture, Extract, Normalize & Integrate Data
Extract, Normalize,
and Integrate
(Vocabularies)
Clinical Trials
News releases
Scientific Publications
Adverse Events
Drug Profile: Integrated Information View
Factor Xa
Vocabularies • Drugs • MOA • Diseases • Organizations • Adverse
Reactions
• ….
Copyright © 2014 TEMIS - All Rights Reserved - Slide 16
Multiple Terminology & Multiple Values
Drug/Intervention/ Product
• BAY59-7939 (molecule name)
• Rivaroxaban (chemical name)
• Xarelto (brand name)
Sponsor/Co-sponsor/ Manufacturer/
Organization/Affiliation
• Bayer
• Bayer Healthcare
• Berlex
• Berlis AG
Condition/Disease/ Indication
• AF
• Atfb1
• Atrial Fibrillation
• Auricular Fibrillations
• Familial Atrial Fibrillation
Copyright © 2014 TEMIS - All Rights Reserved - Slide 17
Multiple Data Sources Case Study: Xarelto
• BAY59-7939 in Atrial Fibrillation Once Daily (OD)
•Oral Direct Factor Xa Inhibitor BAY59-7939 in Patients With Acute Symptomatic Proximal Deep Vein Thrombosis(ODIXa-DVT)
• Rivaroxaban for Antiphospholipid Antibody Syndrome (RAPS)
Clinical Trials
• Xarelto Lawsuit Alleges Uncontrollable Bleeding Nearly Cost California Woman Her Life, Bernstein Liebhard LLP Reports
•Anticoagulant therapy Rivaroxaban for patients with AF undergoing cardioversion
• Xarelto is Alleged to Cause Serious and Uncontrolled Internal Bleeding
News Releases
•Development of new anticoagulant highly honoured: Bayer's Xarelto recognised with 2010 international Prix Galien award.
•The discovery and development of Rivaroxaban, an oral, direct factor Xa inhibitor
•Antagonists of activated factor Xa and thrombin: innovative antithrombotic agents
Scientific Publications
•An adverse event of cerebral hemorrhage leading to death was reported to FDA while patient was on Xarelto for the treatment of atrial fibrillation FDA Adverse Events
Copyright © 2014 TEMIS - All Rights Reserved - Slide 18
Normalized Yet Granular
Search for BAY59-7939
Non-Normalized
• 105 publications
Normalized
• 2055 publications
Search for Rivaroxaban
Non-Normalized
• 2037 publications
Normalized
• 2055 publications
Search for Xarelto
Non-Normalized
• 216 publications
Normalized
• 2055 publications
Source Data: Medline Publications
Granular Search
Normalized Search
Copyright © 2014 TEMIS - All Rights Reserved - Slide 19
Sample Analysis: Expert Finder
Find experts & documents based on Medline publications
Copyright © 2014 TEMIS - All Rights Reserved - Slide 20
Sample Analysis: Expert Finder
Copyright © 2014 TEMIS - All Rights Reserved - Slide 21
Sample Analysis: Expert Finder
Copyright © 2014 TEMIS - All Rights Reserved - Slide 22
Behind the scenes: Named Entity Recognition
Name Disambiguation
• Name ambiguity is a big challenge when we bring data from multiple sources – not only the naming schemes differ, but the form of the record often differs.
• Our name disambiguation algorithms overcome this challenge.
Why it Matters?
• It is critical to present a holistic view when identifying experts across different data sources.
• Example: a person may be represented as
–“Smith J.” in Medline
–“Smith J., MD” in clinicaltrials.gov and
–“john Smith” in Patent database
Our system is able to make such connections.
Copyright © 2014 TEMIS - All Rights Reserved - Slide 23
Behind the scenes: Metadata creation from fulltext
Missing Information • Data elements in trial registries (e.g. trial phase in WHO) are not always populated under
designated xml fields. However, such data elements could be buried under textual descriptions.
• Algorithms have been developed to extract such elements from textual description: Enrollment Country location Phase level data Principal Investigator Inclusion and Exclusion Criteria
• In addition, algorithms have been developed to extract pre-clinical and other Phase I trials (not mandatory to be included in trial registries) from news releases
Why it Matters? • Missing data can give an incomplete picture when analyzing the competitive landscape. We
seek to bring the full picture to our end users.
Copyright © 2014 TEMIS - All Rights Reserved - Slide 24
Alerts
Staying up to date with your topics of interest
Copyright © 2014 TEMIS - All Rights Reserved - Slide 25
Comparison
Search Query: “Glucagon-like Peptide-1 (GLP-1) Agonists”
Clinicaltrials.gov returns 96 trials Clinical Trials Navigator returns 231 trials
Mechanism of action (MoA) search
Copyright © 2014 TEMIS - All Rights Reserved - Slide 26
Comparison
Search Query: “RG7652 AND PCSK9”. RG7652 (MPSK3169A) is an Anti-PCSK9 Antibody in development for the treatment of
Coronary diseases by Roche.
Clinicaltrials.gov returns no trial Clinical Trials Navigator returns 1 trial
Copyright © 2014 TEMIS - All Rights Reserved - Slide 27
Comparison
Search Query: “NCT00460265 AND Bulgaria”. NCT00460265: phase 3 trials conducted at multiple location across world. Health
Authority field specifies the country locations.
Clinicaltrials.gov returns no trial Clinical Trials Navigator brings the relevant
trial
Country Search
Copyright © 2014 TEMIS - All Rights Reserved - Slide 28 Copyright © 2014 TEMIS - All Rights Reserved
Document Processor
High Level Architecture
Glossaries
Management UIs
Apache
Load
Balancer
Front End
Front End
Users
Clinical Trials
FDA AERS
Medline
News RSS
NIH Grants
US Patents
Crawlers Backend
MongoDB
Frontend
MongoDB
ReplicaSet
Solr
Luxid
Copyright © 2014 TEMIS - All Rights Reserved - Slide 29
Luxid Biopharma Navigator
Sign up for a free trial at
www.biopharmanavigator.com
Copyright © 2014 TEMIS - All Rights Reserved - Slide 30
Thank you Your questions