svmpharma real world evidence – the rwe series – part iii: big data analytics and visualisation

7
SVMPharma Ltd, Landmark House Station Road, Hook, Hampshire, UK, RG27 9HA CONTACT US [email protected] +44(0) 1256 962 220 www.svmpharma.com THE RWE SERIES Providing the background, the research and the insights. PART III: BIG DATA ANALYTICS & VISUALISATION

Upload: svmpharma-limited

Post on 23-Jan-2018

97 views

Category:

Healthcare


0 download

TRANSCRIPT

Page 1: SVMPharma Real World Evidence – The RWE Series – Part III: Big Data Analytics and Visualisation

SVMPharma Ltd,Landmark House

Station Road, Hook, Hampshire, UK,

RG27 9HA

CONTACT US [email protected]

+44(0) 1256 962 220www.svmpharma.com

THE RWE SERIESProviding the background, the

research and the insights.

PART III: BIG DATA ANALYTICS & VISUALISATION

Page 2: SVMPharma Real World Evidence – The RWE Series – Part III: Big Data Analytics and Visualisation

RWE Series Part III: Big Data Analytics and Visualisation

© 2015 SVMPharma Ltd. All rights reserved. 1

Big Data in healthcare is a complex, rapidly

evolving and exciting area. It is an important

source of Real World Evidence (RWE) - data

which has been collected outside of a

conventional clinical trial.

In Part III of the RWE Series we distil this wide-

ranging and sometimes intimidating topic down

to its key features to understand how we can

take advantage of this data and maximise its

potential.

We pay special attention to the visualisation

methods for interpreting and presenting Big

Data.

This is the third part of a regular series available

at svmpharma.com

PART III:

BY GEM AUDDY [email protected] SVMPHARMA LIMITED WINCHFIELD LODGE OLD POTBRIDGE RD WINCHFIELD HAMPSHIRE RG27 8BT +44 (0)1252 848 848

THE RWE

SERIES

Providing the background, the

research and the insights. RWE TRIALS

BIG DATA ANALYTICS &

VISUALISATION

Page 3: SVMPharma Real World Evidence – The RWE Series – Part III: Big Data Analytics and Visualisation

RWE Series Part III: Big Data Analytics and Visualisation

© 2015 SVMPharma Ltd. All rights reserved. 2

he capabilities and potential of Big Data have become a major focus of attention in recent years across

many industries. There is no shortage of astounding facts about Big Data, for example, 90% of the data

in the world today has been created in the last 2 years alone.1-3 Furthermore, there have been frequent

predictions that Big Data will solve persistent issues or lead to previously unimaginable cost reductions.4

However, it can be difficult to relate to and

contextualise these impressive facts and possibilities.

First we need to gain a better understanding of the

core principles of healthcare analytics and Big Data.

Big Data has been defined as ‘high volume, high velocity, and high variety information assets that require new

forms of processing to enable enhanced decision making, insight discovery and process optimisation’. This

influential ‘3Vs’ model of Big Data was first envisioned in 2001 and has been referred to in the majority of

articles since that time.5,6 More recently a fourth ‘V’ has been incorporated which emphasises the need for the

data to be accurate, this is known as data veracity.7,8

This model provides an excellent starting point for exploring the

topic, and as you will see, healthcare data analytics provides the most

powerful examples of the characteristics of Big Data.

In this article we propose for the first time, a fifth ‘V’: visualisation.

We will highlight the visualisation tools and techniques which are needed to drive meaningful outcomes from

Big Data.

T IT CAN BE DIFFICULT TO RELATE TO AND

CONTEXTUALISE THESE IMPRESSIVE FACTS

AND POSSIBILITIES

HEALTHCARE DATA ANALYTICS

PROVIDES THE MOST POWERFUL

EXAMPLES OF THE

CHARACTERISTICS OF BIG DATA

Big Data &

Healthcare

Analytics

VISUALISATION

“NEW HEALTH DATA IS

GENERATED AND

CAPTURED

CONTINUOUSLY”

“VISUALISATION TOOLS AND

TECHNIQUES ARE VITAL IN IDENTIFYING

RELATIONSHIPS AND TO CONVEY

INFORMATION”

“THE SHEER QUANTITY OF

DATA IS THE EPONYMOUS

CHARACTERISTIC OF BIG

DATA”

“HEALTHCARE DATA EXISTS IN

MANY FORMS AND IS OFTEN

UNSTRUCTURED”

“THE VERACITY OF DATA IS ITS

ACCURACY AND IS A KEY PART

OF ANY DISCUSSION”

Page 4: SVMPharma Real World Evidence – The RWE Series – Part III: Big Data Analytics and Visualisation

RWE Series Part III: Big Data Analytics and Visualisation

© 2015 SVMPharma Ltd. All rights reserved. 3

Volume The sheer quantity of data that is captured, stored

and processed is the eponymous characteristic of

Big Data. Compared to other industries, the

healthcare services have always generated vast

quantities of data. This is through meticulous

record keeping, compliance/ regulatory

requirements, and comprehensive patient care

notes.9 As we move towards universal electronic

health records in primary care and increasing

digitisation of hospital data, we now have a

remarkable volume of data.10 Recent technological

advancements in data storage capacity is now

prepared for this, whilst networking capabilities

can handle the movement of large quantities of

data. In addition to clinical data, there has been a

notable increase in the collection of consumer

driven lifestyle data which can provide valuable

insights into general health and wellbeing.11

Velocity New health data is generated and captured

continuously and at a rapid pace. There is a

constant flow of new data. The health service is

moving away from traditional static data such as x-

rays films and scripts to more dynamic systems

involving multi-dimensional and regular

measurements. Real-time data monitoring is

associated with critical care units and the

operating theatre but can increasingly be applied to

care at home and the community. Technology

allows for immediate data analysis in real time

from any location and any number of patients.

The general consensus is that real-time analytics of

high volume data has the potential to revolutionise

healthcare.12-14

Variety Healthcare data exists in many forms including

written notes and prescriptions, medical imaging,

laboratory, pharmacy, insurance and administrative

data in addition to sensor data. Increasingly health

information can be found within dynamic web

content including social media. Big Data can be

coded and stored in a structured form, for

example, in the large datasets, Hospital Episode

Statistics (HES) and the Clinical Practice Research

Datalink (CPRD). However a significant amount of

data exists in unstructured or semi-structured

formats and the processing, storage and analysis of

data is adapting to accommodate this.15

Veracity Veracity of data is its accuracy and conformity to

facts. Veracity is an essential part of any discussion

on data and is particularly important in healthcare,

where the implications of inaccurate data can be

severe. To attain accurate analyses in Big Data,

there needs to be a scaling up in granularity and

performance of the architectures, platforms,

algorithms, methodologies and tools. Big Data

analysis has been compared to financial data in this

regard.9 As Big Data in healthcare is measuring

real-world outcomes, it can offer a more accurate

representation of real world practice than

controlled trials.

Visualisation Due to its volume, variety and velocity, Big Data

analysis and presentation can be difficult.

Traditional written, tabular and graphical formats

can be inadequate in conveying the findings, and

key details can be lost in summarising and

aggregating the data. There a number of

visualisation tools and solutions which have been

proven to be valuable in identifying relationships

and conveying information. The two most

commonly used for Big Data in healthcare are

Cytoscape and Gephi. Cytoscape initially began in

bioinformatics (gene processing and molecular

networks) whilst Gephi has often been used for

consumer and social research. Both of these tools

can be adapted to enhance healthcare data

analytics as our examples will show. Network

graphs consists of nodes (a set of objects

representing entities in a dataset) and edges

(connections between nodes that provide visual

cues as to the degree of connectedness).16

Page 5: SVMPharma Real World Evidence – The RWE Series – Part III: Big Data Analytics and Visualisation

RWE Series Part III: Big Data Analytics and Visualisation

© 2015 SVMPharma Ltd. All rights reserved. 4

Visualisation in Cystic Fibrosis

SVMPharma carried out a data visualisation exercise using Gephi to

show the multi-disciplinary nature of Cystic Fibrosis (CF) by

highlighting its co-morbidities and associated health problems.

15,000 Cystic Fibrosis patients who visited hospital from 2010-2014

were selected by their ICD-10 code (diagnosis code) from the Hospital

Episodes Statistics (HES) dataset.

This provided 3,800 ICD-10 codes and over 20,000 different

connections between

them. The data was

entered into Gephi, with

each ICD-10 code

represented by a node, and each different connection between them

via a line. The radial layout was used in Gephi to create the image on the

right (Image 5) which includes the entirety of the data.

To take a closer look at the data, the top 100 different connections

leading from the CF nodes were identified. This is represented below

(Image 8) with both the node size and connecting line representing the

strength and volume of the connection, and the nodes positioned and

colour-coded according to disease group.

The final visual conveys at a glance, the main co-morbidities of CF patients

and associated health issues. This shows the strength of the relationships

between CF and its secondary complications and directions the disease is

likely to progress.

8. A Closer Look

Nodes

Connections

(Edges)

1. Data in Tables 2. Data In Gephi

3. Ranking (Colour) 4. Radial Layout

5. Entire Data

6. Top 100 7. Labelling

3,800 ICD-10 CODES AND OVER

20,000 DIFFERENT CONNECTIONS

BETWEEN THEM

Page 6: SVMPharma Real World Evidence – The RWE Series – Part III: Big Data Analytics and Visualisation

RWE Series Part III: Big Data Analytics and Visualisation

© 2015 SVMPharma Ltd. All rights reserved. 5

The Technology

To cope with Big Data, a number of hardware and software solutions are used and these have replaced the

traditional Business Intelligence (BI) tools. The first step involves massively parallel computing and the

processing of large datasets across clusters using a Hadoop based program. Specialist tools are available for

rapid processing and querying including Spark for structured and semi-structured data, whilst unstructured

data can be stored in the document database software MongoDB. Saiku allows users to create, visualize and

analyse information in graphical & pivot table formats. These wholesale changes in data storage and processing

architecture has been crucial and sets the platform for future innovation and expansion.

Big Data analytics in healthcare is almost ready to fulfil its potential: this is due

to the rapid digitisation of information, the adaptations in technological

architecture and the increasing capabilities for real-time analysis. However, to

maximise its potential we need to fully understand the key characteristics of

Big Data: volume, variety and velocity. A consideration of the veracity of the

data and visualisation techniques make up our ‘5Vs’ of Big Data in healthcare

and provide a useful framework to explore the key issues.

1. ScienceDaily. Big Data, for better or worse: 90% of world's data generated over last two years. 2013. www.sciencedaily.com/releases/2013/05/130522085217.htm.

2. Mayer-Schönberger V, Cukier K. Big data: A revolution that will transform how we live, work, and think: Houghton Mifflin Harcourt; 2013.

3. Marr B. Big Data: The Eye-Opening Facts Everyone Should Know. 2014. www.linkedin.com/pulse/20140925030713-64875646-big-data-the-eye-opening-facts-everyone-should-know.

4. Gurnett J. Big data could dramatically cut £80bn NHS chronic care bill. 2014. http://www.theguardian.com/healthcare-network/emc-partner-zone/big-data-nhs-chronic-care.

5. Ward JS, Barker A. Undefined by data: a survey of big data definitions. arXiv preprint arXiv:13095821 2013.

6. Diebold FX. On the Origin (s) and Development of the Term'Big Data'. 2012.

7. Buhl HU, Röglinger M, Moser D-KF, Heidemann J. Big data. Business & Information Systems Engineering 2013; 5(2): 65-9.

8. Hitzler P, Janowicz K. Linked data, big data, and the 4th paradigm. Semantic Web 2013; 4(3): 233-5.

9. Raghupathi W, Raghupathi V. Big data analytics in healthcare: promise and potential. Health Inf Sci Syst 2014; 2(1): 1-10.

10. Takian A, Cornford T. NHS information: Revolution or evolution? Health Policy and Technology 2012; 1(4): 193-8.

11. Johnson K, Jimison H, Mandl K. Consumer Health Informatics and Personal Health Records. In: Shortliffe EH, Cimino JJ, eds. Biomedical Informatics: Springer London; 2014: 517-39.

12. Blount M, Ebling MR, Eklund JM, et al. Real-Time Analysis for Intensive Care: Development and Deployment of the Artemis Analytic System. Engineering in Medicine and Biology Magazine, IEEE 2010; 29(2): 110-8.

13. Feldman B, Martin EM, Skotnes T. Big Data in Healthcare Hype and Hope. October 2012 Dr Bonnie 2012; 360.

14. Berry A, Milosevic Z. Real-Time Analytics for Legacy Data Streams in Health: Monitoring Health Data Quality. Enterprise Distributed Object Computing Conference (EDOC), 2013 17th IEEE International; 2013 9-13 Sept. 2013; 2013. p. 91-100.

15. Sawant N, Shah H. Big Data Storage Patterns. Big Data Application Architecture Q & A: Apress; 2013: 43-56.

16. Cherven K. Network Graph Analysis and Visualization with Gephi: Packt Publishing Ltd; 2013.

Front Cover Image by Calvinius [CC BY-SA 3.0 (http://creativecommons.org/licenses/by-sa/3.0)], via Wikimedia Commons