medical image analysis and big data evaluation infrastructures

40
Medical image analysis and big data evaluation infrastructures Henning Müller HES-SO & Martinos Center

Upload: institute-of-information-systems-hes-so

Post on 22-Feb-2017

730 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Medical image analysis and big data evaluation infrastructures

Medical image analysis and big data evaluation infrastructures

Henning MüllerHES-SO &

Martinos Center

Page 2: Medical image analysis and big data evaluation infrastructures

Overview

• Medical image analysis & retrieval projects• 3D texture modeling

– Graph models for data analysis

• Big data/data science evaluation infrastructures– ImageCLEF– VISCERAL– EaaS – Evaluation as a Service

• What comes next?

Page 3: Medical image analysis and big data evaluation infrastructures

Henning Müller• Studies in medical informatics in

Heidelberg, Germany– Work in Portland, OR, USA

• PhD in image processing in Geneva,focus on image analysis and retrieval– Exchange at Monash Univ., Melbourne, Australia

• Prof. at Univ. of Geneva in medicine (2014)– Medical image analysis and retrieval for decision

support• Professor at the HES-SO Valais (2007)

– Head of the eHealth unit• Sabbatical at the Martinos Center, Boston, MA

Page 4: Medical image analysis and big data evaluation infrastructures

Medical imaging is big data!!

• Much imaging data is produced• Imaging data are very complex

– And getting more complex• Imaging is essential for

diagnosis and treatment • Images out of their context

loose most of their sense– Clinical data are necessary

• Evidence-based medicine & case-based reasoning

Page 5: Medical image analysis and big data evaluation infrastructures

Medical image retrieval (history)

• MedGIFT project started in 2002– Global image similarity

• Texture, grey levels– Teaching files– Linking text files and

image similarity• Often data not available

– Medical data hard to get– Images and text are

connected in cases• Unrealistic expectations, high quality vs. browsing

– Semantic gap

Page 6: Medical image analysis and big data evaluation infrastructures

• Mixing multilingual data from many resources and semantic information for medical retrieval– LinkedLifeData.com

Allan Hanbury, Célia Boyer, Manfred Gschwandtner, Henning Müller, KHRESMOI: Towards a Multi-Lingual Search and Access System for Biomedical Information, Med-e-Tel, pages 412-416, Luxembourg, 2011.

Page 7: Medical image analysis and big data evaluation infrastructures

The informed patient

Page 8: Medical image analysis and big data evaluation infrastructures

Integrated interfaces

Page 9: Medical image analysis and big data evaluation infrastructures

Prototypes available

• http://everyone.khresmoi.eu/• http://radiology.khresmoi.eu/

– Ask for login

• http://shangrila.khresmoi.eu/• http://shambala.khresmoi.eu/

Page 10: Medical image analysis and big data evaluation infrastructures

Texture analysis (2D->3D->4D)

• Describe various tissue types– Brain, lung, …– Concentration on 3D and 4D data– Mainly texture descriptors

• Extract visual features/signatures– Learned, so relation to deep learning

Adrien Depeursinge, Antonio Foncubierta–Rodriguez, Dimitri Van de Ville, and Henning Müller, Three–dimensional solid texture analysis and retrieval: review and opportunities, Medical Image Analysis, volume 18, number 1, pages 176-196, 2014.

Page 11: Medical image analysis and big data evaluation infrastructures

Database with CT image of interstitial lung diseases

• 128 cases with CT image series and biopsy confirmed diagnosis

• Manually annotated regions for tissue classes (1946)– 6 tissue types of 13 with a larger number of examples

• 159 clinical parameters extracted (sparse)– Smoking history, age, gender,

hematocrit, …

• Available after signing a license agreement

Page 12: Medical image analysis and big data evaluation infrastructures

Learned 3D signatures

• Learn combinations of Riesz wavelets as digital signatures using SVMs (steerable filters)– Create signatures to detect small local lesions

and visualize them

Adrien Depeursinge, Antonio Foncubierta–Rodriguez, Dimitri Van de Ville, and Henning Müller, Rotation–covariant feature learning using steerable Riesz wavelets, IEEE Transactions on Image Processing, volume 23, number 2, page 898-908, 2014.

Page 13: Medical image analysis and big data evaluation infrastructures

Learning Riesz in 3D

• Most medical tissues are naturally 3D• But modeling gets much more complex

– Vertical planes

– 3-D checkerboard

– 3-D wiggledcheckerboard

Page 14: Medical image analysis and big data evaluation infrastructures

Aiding clinical decisions

Page 15: Medical image analysis and big data evaluation infrastructures

Using graphs for lung data analysis• Pulmonary hypertension and pulmonary embolism

– Dual energy CT (DECT)• Based on a simple lung atlas

– Not based on lobes• Analyzing relationships between the

lung areas and their perfusion• Differences in statistical moments

for areas as features– Can easily be combined with texture

• Good quality for PH (77%) and PE (79%)– DECT important for PE

Yashin Dicente Cid, Henning Müller, Alexandra Platon, JP Janssens, Frédéric Lador, PA Poletti, A. Depeursinge, A Lung Graph-Model for Pulmonary Hypertension and Pulmonary Embolism Detection on DECT images, submitted to MICCAI 2016, Athens Greece, 2016.

Page 16: Medical image analysis and big data evaluation infrastructures

Scientific challenges/crowdsourcing

• Most conferences now organize challengesessions (MICCAI, ISBI, Grand Challenges, …)

• Public administrations use it increasingly– NCI uses it: Coding4Cancer– http://www.challenge.gov/

• Commercial challenge platforms– Kaggle, Topcoder, also Netflix challenge

• Open innovation in data science• Amazon Mechanical Turk for small tasks,

including medical image analysis– But also: https://www.cellslider.net/

Page 17: Medical image analysis and big data evaluation infrastructures

• Benchmark on multimodal image retrieval– Run since 2003, medical task since 2004– Part of the Cross Language Evaluation Forum (CLEF)

• Many tasks related to image retrieval– Image classification– Image-based retrieval– Case-based retrieval– Compound figure separation– Caption prediction– …

• Many old databases remain available, imageclef.org

Henning Müller, Paul Clough, Thomas Deselaers, Barbara Caputo, ImageCLEF –Experimental evaluation of visual information retrieval, Springer, 2010.

Page 18: Medical image analysis and big data evaluation infrastructures

Challenges with challenges

• Difficult to distribute very big datasets– Sending around hard disks? Risky, expensive

• Sharing confidential data– Big data is impossible to anonymize automatically

• Quickly changing data sets– Outdated when a test collection is being created

• Optimizations on the test data are possible– Manual adaptations, etc.– Often hard to fully reproduce results

• Groups without large computing are disadvantaged

Page 19: Medical image analysis and big data evaluation infrastructures

cloud

Page 20: Medical image analysis and big data evaluation infrastructures

Test

Page 21: Medical image analysis and big data evaluation infrastructures

Resources available

Page 22: Medical image analysis and big data evaluation infrastructures
Page 23: Medical image analysis and big data evaluation infrastructures

Test DataTraining Data

Participants Organiser

Participant Virtual MachinesRegistration

System

Annotation Management System

Analysis System

Annotators (Radiologists)

Locally Installed Annotation Clients

Microsoft Azure Cloud

Test Data

Page 24: Medical image analysis and big data evaluation infrastructures

Silver corpus (example trachea)

• Executable code of all participants– Run it on new data, do label fusion

Dice 0.85 Dice 0.71 Dice 0.84 Dice 0.83

Participant segmentations

Dice 0.92

Silver Corpus

Page 25: Medical image analysis and big data evaluation infrastructures

Docker vs. Virtual MachinesCo

ntai

ners

Bins/Libs

VM

VS.

Page 26: Medical image analysis and big data evaluation infrastructures

Evaluation as a Service (EaaS)

• Moving the algorithms to the data not vice versa– Required when data are: very large, changing

quickly, confidential (medical, commercial, …)• Different approaches

– Source code submission, APIs, VMs local or in the cloud, Docker containers, specific frameworks

• Allows for continuous evaluation, component-based evaluation, total reproducibility, updates, …– Workshop March 2015 in Sierre on EaaS– Workshop November 2015 in Boston on cloud-

based evaluation (http://www.martinos.org/cloudWorkshop/)Allan Hanbury, Henning Müller, Georg Langs, Marc André Weber, Bjoern H. Menze, and Tomas Salas Fernandez, Bringing the algorithms to the data: cloud–based benchmarking for medical image analysis, CLEF conference, Springer Lecture Notes in Computer Science, 2012.

Page 27: Medical image analysis and big data evaluation infrastructures

Sharing images, research data• Very important aspect of research is to have solid

methods, data, large if possible– If data not available, results can not be reproduced– If data are small, results may be meaningless

• Many multi-center projects spend most money on data acquisition, often delayed no time for analysis– IRB takes long, sometimes restrictions are strange

• Research is international!• NIH & NCI are great to push data availability

– But data can be made available in an unusable way

Page 28: Medical image analysis and big data evaluation infrastructures

Political support for research infrastructures!

Page 29: Medical image analysis and big data evaluation infrastructures

Sustaining biomedical big data

Page 30: Medical image analysis and big data evaluation infrastructures

Microsoft Azure

Page 31: Medical image analysis and big data evaluation infrastructures

Intels CCC

Page 32: Medical image analysis and big data evaluation infrastructures

Institutional support (NCI)

• Using crowdsourcing to link researchers & challenges

Page 33: Medical image analysis and big data evaluation infrastructures

Business models for these links• Manually annotate large data sets for challenges

– Data needs to be available in a secure space• Have researchers work on data (on infrastructure)

– Deliver code in Docker containers• Commercialize results and share benefits

Page 34: Medical image analysis and big data evaluation infrastructures

• Part of QIN – Quantitative Imaging Network (NCI)– Cloud-Based Image Biomarker Optimization Platform

• Create challenges for QIN to validate tools• Use Codalab to run project challenges

– Run code in containers (Docker), well integrated– Share code blocks across teams, evaluate

combinations

Page 35: Medical image analysis and big data evaluation infrastructures

Codalab

• Open Source challenge platform supported by Microsoft– Integrated with the Azure cloud infrastructure

• Easy creation of new challenges• Participant registration, leaderboard of results

Page 36: Medical image analysis and big data evaluation infrastructures

CodaLab worksheets

• Running computational experiments• Execute Docker containers

– Workflow of Docker containers– Foster collaboration and component reuse

Page 37: Medical image analysis and big data evaluation infrastructures

Future of research infrastructures• Much more centered around data!!

– Nature Scientific Data underlines the importance!• Data need to be available but in a meaningful way

– Infrastructure needs to be available and way to evaluate on the data with specific tasks

• More work for data preparation but in line with IRB– Analysis inside medical institutions

• Code will become even more portable– Docker helps enormously and develops quickly

• Public private partnerships to be sustainable• Total reproducibility, long term, sharing tools

• Much higher efficiency

Page 38: Medical image analysis and big data evaluation infrastructures

Conclusions• Medicine is digital medicine

– More data and more complex links (genes, visual, signals, …)

• Medical data science requires new infrastructures– Use routine data, not manually extracted, curated

data, curate large scale, accommodate for errors• Active learning and interactive data curation

– Use large data sets from data warehouses– Keep data where they are produced

• More “local” computation, so where data are– Secure aggregation of results

• Sharing infrastructures, data and more

Page 39: Medical image analysis and big data evaluation infrastructures

Contact

• More information can be found at – http://khresmoi.eu/– http://visceral.eu/– http://medgift.hevs.ch/– http://publications.hevs.ch/

• Contact:– [email protected]

Page 40: Medical image analysis and big data evaluation infrastructures