icic 2014 semantic integration of pharmaceutical content : blueprint and examples

30
Copyright © 2014 TEMIS - All Rights Reserved - Slide 1 Semantic Integration of Pharmaceutical Content : Blueprint and Examples ICIC Heidelberg, October 15, 2014 Stefan Geißler, TEMIS Deutschland GmbH www.temis.com

Upload: dr-haxel-congress-and-event-management-gmbh

Post on 11-Jun-2015

425 views

Category:

Data & Analytics


1 download

DESCRIPTION

Pharmaceutical research is becoming more translational, more collaborative and more distributed. This has brought on a new set of Information Management challenges for all involved parties : dealing with data that is not only Big, but also comes from a large variety of open and proprietary sources and needs to be viewed from the angle of multiple disciplines. In this context, semantic enrichment has emerged as a critical Information Systems capability enabling the integration of diverse content, and its access from diverse viewpoints, with productivity benefits at all stages of the lifecycle. Through the lens of a corresponding Maturity Model, this session will investigate representative applications of semantics for pharma information, and underscore the trends shaping how we will manage, distribute, and access it in the future.

TRANSCRIPT

Page 2: ICIC 2014 Semantic Integration of Pharmaceutical Content : Blueprint and Examples

Copyright © 2014 TEMIS - All Rights Reserved - Slide 2

A pioneer in Content Enrichment since 2000

70

Page 3: ICIC 2014 Semantic Integration of Pharmaceutical Content : Blueprint and Examples

Copyright © 2014 TEMIS - All Rights Reserved - Slide 3

We‘re from Heidelberg

And since we are from Heidelberg, this is not the first

time I visit the Stadthalle…

„Ball der Vampire“ 2014

Page 4: ICIC 2014 Semantic Integration of Pharmaceutical Content : Blueprint and Examples

Copyright © 2014 TEMIS - All Rights Reserved - Slide 4

TEMIS @ Publishing

Page 5: ICIC 2014 Semantic Integration of Pharmaceutical Content : Blueprint and Examples

Copyright © 2014 TEMIS - All Rights Reserved - Slide 5

TEMIS @ Industry Accounts

Page 6: ICIC 2014 Semantic Integration of Pharmaceutical Content : Blueprint and Examples

Copyright © 2014 TEMIS - All Rights Reserved - Slide 6

TEMIS Mission: Structuring the unstructured

We report a 52 year-old man presenting an acute hair loss induced by carbamazepine (CBZ) in concentration of 8.6 microg/ml.

Extract domain-specific information from text

Relations

We report a 52 year-old man presenting an acute hair loss induced by carbamazepine (CBZ) in concentration of 8.6 microg/ml.

Verb Patient Verb Symptom Verb Dosage information Subj

Entities

Drug Name

Terms

Pro Verb Num Art N-P Noun Verb Art Adj Nn Nn Verb Pp PropNn Pp Noun Pp Num Unit Abbr

Attributes

Roles

Adverse Event Side Effect Alopecia

Cause Carbamazepine

Dosage 8.6 mg/ml

Patient 52 year old male

Page 7: ICIC 2014 Semantic Integration of Pharmaceutical Content : Blueprint and Examples

Copyright © 2014 TEMIS - All Rights Reserved - Slide 7

Delivery of Analysis Results Luxid WebFrontend: Highlighting, Linking, Charts, Graphs, …

Page 8: ICIC 2014 Semantic Integration of Pharmaceutical Content : Blueprint and Examples

Copyright © 2014 TEMIS - All Rights Reserved - Slide 8

Observation

Luxid Webfrontend: Powerful, rich tool for experienced expert users

Some users express the demand for a simpler, less sophisticated system

Quote from one client:

„The aim is not to make 5% of my staff 80% smarter, but 80% of my people 5% smarter!“

Page 9: ICIC 2014 Semantic Integration of Pharmaceutical Content : Blueprint and Examples

Copyright © 2014 TEMIS - All Rights Reserved - Slide 9

Introducing Luxid Navigator

TEMIS acquisition of „I3 Analytics“ from Columbia, USA, in 2013

Large-scale crawling, enrichment, indexing and delivery of Life Science content in an intuitive web application

Design and implementation in close cooperation with BigPharma pilot client

Deployments at INSERM and Prometic Life Sciences

Page 10: ICIC 2014 Semantic Integration of Pharmaceutical Content : Blueprint and Examples

Copyright © 2014 TEMIS - All Rights Reserved - Slide 10

Fierce Bio

Fierce Pharma

Marketwire

PrNewswire

Biospace

Reuters

Linkoph

DrugDiscoveryNews

NewsMedical.net

pr.com

Medline

ChemIDPlus

MeSH

NCT

WHO

FDA

NIH …

Excel HTML XML PDF

Streamlines experts & competitive intelligence efforts by :

• Aggregating, Enriching and Integrating multiple sources of open but unstructured data on these topics

• Presenting information in structured yet easy-to-use interfaces

Biopharma Edition Luxid® Navigator

Extract key items of interest

Interconnect items with Linked Data

Portal application

Content Aggregation

Information Extraction

Knowledge Integration

Presentation

Content merged into a single template

Page 11: ICIC 2014 Semantic Integration of Pharmaceutical Content : Blueprint and Examples

Copyright © 2014 TEMIS - All Rights Reserved - Slide 11

BioExpert Navigator:

Assists users in identifying experts and their published content in life sciences research.

ClinicalTrials Navigator:

Assists users in identifying competitive intelligence related to clinical trials.

BioNews Navigator: Continuously scans news feeds to pick up every possible news story for target discovery, drug approvals, trial initiation, trial results, regulatory news and variety of other news.

Pharmacovigilance (Beta) Provide overview over Adverse Events

Toxicology (in preparation)

Biopharma Edition Luxid® Navigator

Page 12: ICIC 2014 Semantic Integration of Pharmaceutical Content : Blueprint and Examples

Copyright © 2014 TEMIS - All Rights Reserved - Slide 12

Biopharma Navigator Modules

Page 13: ICIC 2014 Semantic Integration of Pharmaceutical Content : Blueprint and Examples

Copyright © 2014 TEMIS - All Rights Reserved - Slide 13 Copyright © 2014 TEMIS - All Rights Reserved

Data Breakdown

❖ ~750,000 News Articles (1,000 new per day)

❖ ~270,000 Clinical Trials (WHO + CT.gov)

❖ ~13,600,000 Medline Articles (1992-Today)

❖ ~18,000,000 FDA Adverse Events

❖ ~5,100,000 US Patents

❖ ~390,000 NIH Grants

❖ ~11,000 Conference Documents

Biopharma Edition Luxid® Navigator

Page 14: ICIC 2014 Semantic Integration of Pharmaceutical Content : Blueprint and Examples

Copyright © 2014 TEMIS - All Rights Reserved - Slide 14 Copyright © 2014 TEMIS - All Rights Reserved

Refreshing Documents ❖ ClinicalTrials.gov fully recrawled daily ❖ Takes less than an hour

❖ Bionews refreshed weekly ❖ Takes a few hours

❖Medline (13.6 million) refreshed monthly ❖ Takes about 30 hours

❖Multiple refreshes in parallel with little time difference

Selective refreshes can be done for real time results and are recommended

Biopharma Edition

Luxid® Navigator

Page 15: ICIC 2014 Semantic Integration of Pharmaceutical Content : Blueprint and Examples

Copyright © 2014 TEMIS - All Rights Reserved - Slide 15

Process: Capture, Extract, Normalize & Integrate Data

Extract, Normalize,

and Integrate

(Vocabularies)

Clinical Trials

News releases

Scientific Publications

Adverse Events

Drug Profile: Integrated Information View

Factor Xa

Vocabularies • Drugs • MOA • Diseases • Organizations • Adverse

Reactions

• ….

Page 16: ICIC 2014 Semantic Integration of Pharmaceutical Content : Blueprint and Examples

Copyright © 2014 TEMIS - All Rights Reserved - Slide 16

Multiple Terminology & Multiple Values

Drug/Intervention/ Product

• BAY59-7939 (molecule name)

• Rivaroxaban (chemical name)

• Xarelto (brand name)

Sponsor/Co-sponsor/ Manufacturer/

Organization/Affiliation

• Bayer

• Bayer Healthcare

• Berlex

• Berlis AG

Condition/Disease/ Indication

• AF

• Atfb1

• Atrial Fibrillation

• Auricular Fibrillations

• Familial Atrial Fibrillation

Page 17: ICIC 2014 Semantic Integration of Pharmaceutical Content : Blueprint and Examples

Copyright © 2014 TEMIS - All Rights Reserved - Slide 17

Multiple Data Sources Case Study: Xarelto

• BAY59-7939 in Atrial Fibrillation Once Daily (OD)

•Oral Direct Factor Xa Inhibitor BAY59-7939 in Patients With Acute Symptomatic Proximal Deep Vein Thrombosis(ODIXa-DVT)

• Rivaroxaban for Antiphospholipid Antibody Syndrome (RAPS)

Clinical Trials

• Xarelto Lawsuit Alleges Uncontrollable Bleeding Nearly Cost California Woman Her Life, Bernstein Liebhard LLP Reports

•Anticoagulant therapy Rivaroxaban for patients with AF undergoing cardioversion

• Xarelto is Alleged to Cause Serious and Uncontrolled Internal Bleeding

News Releases

•Development of new anticoagulant highly honoured: Bayer's Xarelto recognised with 2010 international Prix Galien award.

•The discovery and development of Rivaroxaban, an oral, direct factor Xa inhibitor

•Antagonists of activated factor Xa and thrombin: innovative antithrombotic agents

Scientific Publications

•An adverse event of cerebral hemorrhage leading to death was reported to FDA while patient was on Xarelto for the treatment of atrial fibrillation FDA Adverse Events

Page 18: ICIC 2014 Semantic Integration of Pharmaceutical Content : Blueprint and Examples

Copyright © 2014 TEMIS - All Rights Reserved - Slide 18

Normalized Yet Granular

Search for BAY59-7939

Non-Normalized

• 105 publications

Normalized

• 2055 publications

Search for Rivaroxaban

Non-Normalized

• 2037 publications

Normalized

• 2055 publications

Search for Xarelto

Non-Normalized

• 216 publications

Normalized

• 2055 publications

Source Data: Medline Publications

Granular Search

Normalized Search

Page 19: ICIC 2014 Semantic Integration of Pharmaceutical Content : Blueprint and Examples

Copyright © 2014 TEMIS - All Rights Reserved - Slide 19

Sample Analysis: Expert Finder

Find experts & documents based on Medline publications

Page 20: ICIC 2014 Semantic Integration of Pharmaceutical Content : Blueprint and Examples

Copyright © 2014 TEMIS - All Rights Reserved - Slide 20

Sample Analysis: Expert Finder

Page 21: ICIC 2014 Semantic Integration of Pharmaceutical Content : Blueprint and Examples

Copyright © 2014 TEMIS - All Rights Reserved - Slide 21

Sample Analysis: Expert Finder

Page 22: ICIC 2014 Semantic Integration of Pharmaceutical Content : Blueprint and Examples

Copyright © 2014 TEMIS - All Rights Reserved - Slide 22

Behind the scenes: Named Entity Recognition

Name Disambiguation

• Name ambiguity is a big challenge when we bring data from multiple sources – not only the naming schemes differ, but the form of the record often differs.

• Our name disambiguation algorithms overcome this challenge.

Why it Matters?

• It is critical to present a holistic view when identifying experts across different data sources.

• Example: a person may be represented as

–“Smith J.” in Medline

–“Smith J., MD” in clinicaltrials.gov and

–“john Smith” in Patent database

Our system is able to make such connections.

Page 23: ICIC 2014 Semantic Integration of Pharmaceutical Content : Blueprint and Examples

Copyright © 2014 TEMIS - All Rights Reserved - Slide 23

Behind the scenes: Metadata creation from fulltext

Missing Information • Data elements in trial registries (e.g. trial phase in WHO) are not always populated under

designated xml fields. However, such data elements could be buried under textual descriptions.

• Algorithms have been developed to extract such elements from textual description: Enrollment Country location Phase level data Principal Investigator Inclusion and Exclusion Criteria

• In addition, algorithms have been developed to extract pre-clinical and other Phase I trials (not mandatory to be included in trial registries) from news releases

Why it Matters? • Missing data can give an incomplete picture when analyzing the competitive landscape. We

seek to bring the full picture to our end users.

Page 24: ICIC 2014 Semantic Integration of Pharmaceutical Content : Blueprint and Examples

Copyright © 2014 TEMIS - All Rights Reserved - Slide 24

Alerts

Staying up to date with your topics of interest

Page 25: ICIC 2014 Semantic Integration of Pharmaceutical Content : Blueprint and Examples

Copyright © 2014 TEMIS - All Rights Reserved - Slide 25

Comparison

Search Query: “Glucagon-like Peptide-1 (GLP-1) Agonists”

Clinicaltrials.gov returns 96 trials Clinical Trials Navigator returns 231 trials

Mechanism of action (MoA) search

Page 26: ICIC 2014 Semantic Integration of Pharmaceutical Content : Blueprint and Examples

Copyright © 2014 TEMIS - All Rights Reserved - Slide 26

Comparison

Search Query: “RG7652 AND PCSK9”. RG7652 (MPSK3169A) is an Anti-PCSK9 Antibody in development for the treatment of

Coronary diseases by Roche.

Clinicaltrials.gov returns no trial Clinical Trials Navigator returns 1 trial

Page 27: ICIC 2014 Semantic Integration of Pharmaceutical Content : Blueprint and Examples

Copyright © 2014 TEMIS - All Rights Reserved - Slide 27

Comparison

Search Query: “NCT00460265 AND Bulgaria”. NCT00460265: phase 3 trials conducted at multiple location across world. Health

Authority field specifies the country locations.

Clinicaltrials.gov returns no trial Clinical Trials Navigator brings the relevant

trial

Country Search

Page 28: ICIC 2014 Semantic Integration of Pharmaceutical Content : Blueprint and Examples

Copyright © 2014 TEMIS - All Rights Reserved - Slide 28 Copyright © 2014 TEMIS - All Rights Reserved

Document Processor

High Level Architecture

Glossaries

Management UIs

Apache

Load

Balancer

Front End

Front End

Users

Clinical Trials

FDA AERS

Medline

News RSS

NIH Grants

US Patents

Crawlers Backend

MongoDB

Frontend

MongoDB

ReplicaSet

Solr

Luxid

Page 29: ICIC 2014 Semantic Integration of Pharmaceutical Content : Blueprint and Examples

Copyright © 2014 TEMIS - All Rights Reserved - Slide 29

Luxid Biopharma Navigator

Sign up for a free trial at

www.biopharmanavigator.com

Page 30: ICIC 2014 Semantic Integration of Pharmaceutical Content : Blueprint and Examples

Copyright © 2014 TEMIS - All Rights Reserved - Slide 30

Thank you Your questions