deep-hybriddataclouddigital.csic.es/bitstream/10261/164311/4/deep-na2-d2.1-annex1.pdf · v3.0...

DEEP-HybridDataCloud

DEEP LEARNING APPLICATION FOR CLASSIFICATION

OF DISEASE PROGRESSION OF DIABETIC

RETINOPATHY

DELIVERABLE: D2.1 (ANNEX 1)

Document identifier: DEEP-NA2-D2.1-Annex1-V8.0.odt

Date: 15/05/2018

Activity: WP2

Lead partner: HMGU

Status: FINAL

Dissemination level: PUBLIC

Permalink: http://hdl.handle.net/10261/164311

DEEP-HybridDataCloud – 777435 1

http://hdl.handle.net/10261/164311

Copyright NoticeCopyright © Members of the DEEP-HybridDataCloud Collaboration, 2017-2020.

Delivery Slip

Name Partner/Activity Date

From Wolfgang zu Castell HMGU / WP2 16/05/2018

Reviewed byIgnacio BlanquerFernando Aguilar

Álvaro López

UPVCSICCSIC

25/04/2018

Approved by Steering Commitee 07/05/2018

Document Log

Issue Date Comment Author/Partner

V1.0 26/02/2018 First template version Álvaro López / CSIC

V2.0 26/03/2018 TOCWolfgang zu Castell / HMGU

Marcus Hardt / KITLara Lloret / CSIC

V3.0 15/04/2018 First version Werner Dubitzky / HMGU

V4.0 22/04/2018 Use Case Description Updated Werner Dubitzky / HMGU

V5.0 24/04/2018 Use Case Requirements Updated Werner Dubitzky / HMGU

V6.0 25/04/2018 External ReviewIgnacio Blanquer / UPVFernando Aguilar / CSIC

Álvaro López / CSIC

V7.0 27/04/2018 Internal ReviewIgnacio Heredia / CSICÁlvaro López / CSICLara Lloret / CSIC

V8.0 07/05/2018 Final Version Werner Dubitzky / HMGU


Table of Contents1. Executive Summary.........................................................................................................................4

1.1. Identification............................................................................................................................41.2. Brief description of the Use Case............................................................................................41.3. Expectations in the framework of the Deep Hybrid Datacloud Project...................................41.4 Expected results and derived impact.........................................................................................51.5 References useful to understand the Use Case.........................................................................5

2. Introduction and Use Case...............................................................................................................62.1. Presentation on the Use Case...................................................................................................62.2. Description of the research community...................................................................................62.3. Current status and plan for this Use Case................................................................................72.4. Identification of the KEY scientific goals...............................................................................72.5. Description of potential development......................................................................................8

3. Technical description of the use case...............................................................................................83.1. User categories and roles.........................................................................................................83.2. General description of datasets/formats/software used............................................................83.3. Technical (S/T) requirements...................................................................................................83.4 Identification of required services............................................................................................93.5 Description of the use case in terms of workflows...................................................................9

4. Data requirements............................................................................................................................94.1. Access control..........................................................................................................................9

4.1.1. Privacy..............................................................................................................................94.1.2. Location.........................................................................................................................104.1.3. Sharing..........................................................................................................................10

4.2. Capacity (Data Volume).........................................................................................................114.2.1. Test data / production data..............................................................................................114.2.2. Transfer rate requirements..............................................................................................11

4.3. Preservation requirements......................................................................................................115. Infrastructure and technical requirements....................................................................................11

5.1. Expectation regarding the advantage through the use of technology....................................115.2. Expectations regarding e-infrastructure use...........................................................................11

5.2.1. Networking....................................................................................................................115.2.2. Computing: clusters, grid, cloud, supercomputing resources.......................................115.2.3. Storage............................................................................................................................12

5.3. On authentication and authorization Infrastructure (AAI).....................................................126. Formal list of requirements............................................................................................................127. Use case summary table................................................................................................................138. References.....................................................................................................................................14


1. Executive SummaryThis use case focuses on a deep learning approach to automated classification and stage andprogression of retinopathy based on large set of color fundus retinal photography images[Eul2017]. We plan to develop an application using state-of-the-art convolutional neural networksto classify retinal images. Ultimately, this application could benefit large-scale screening programsin remote locations and facilitate deep learning on inherently distributed data.

1.1. IdentificationName Deep learning application for retinopathy detection

Institution/Partner HMGU

Contacts• Wolfgang zu Castell (HMGU)

[email protected]• Werner Dubitzky (HMGU)

[email protected]

1.2. Brief description of the Use CaseThis use case focuses on a deep learning approach to automated classification of retinopathy basedon color fundus retinal photography images [Eul2017].

The scientific goals of use case are

• To develop and evaluate a deep learning tool facilitating the classification of retinopathystage and progression based on digital colour fundus retinal photography images.

• To improve automated classification retinopathy stage (Healthy, Mild, Medium, Severe)and reconstruct disease progression by means of deep learning.

• To explore construction (training) of deep learning models using inherently distributedtraining data.

• To address the need for a comprehensive and automated method for large-scale screeningprograms based on medical images.

1.3. Expectations in the framework of the Deep Hybrid Datacloud Project

• Integration of various tools that allow seamless and easy handling and managing of largesets of image data for deploying, developing and using deep learning models.

• Provision of efficient data and model processing and transfer needs that facilitate“incremental” deep learning based on geographically distributed datasets that are logicalviewed as one datasets.

• Support for various type of users and their data and system access and authorization needs.


mailto:[email protected]

mailto:[email protected]

1.4 Expected results and derived impactExpected results

• An integrated set of easy-to-use (end users, modelers, developers) tools facilitating effectiveand efficient deployment, use and construction of deep learning models based on largeimage datasets.

• A set of easy to use tools facilitating an effective and efficient construction (training) ofdeep learning models from geographically distributed datasets.

Derived impact

• Enabling end user communities (e.g. experts running medical screening programs) withlimited access to large-scale data and computing infrastructures to use deep learning modelsand services hosted/provided on a cloud e-infrastructure.

• Increasing proliferation of deep learning solutions to a wide range of disciplines andapplication domains.

• Enabling modellers and developers to develop and deploy their deep learning solutionsthrough cloud-based e-infrastructures.

1.5 References useful to understand the Use Case• Eulenberg, P., Köhler, N., Blasi, T., Filby, A., Carpenter, A.E., Rees, P., Theis, F.J., Wolf,

F.A. (2017). Reconstructing cell cycle and disease progression using deep learning. NatureCommunications 8:463.

• Yau, J.W., Rogers, S.L., Kawasaki, R., Lamoureux, E.L., Kowalski, J.W., et al. (2012)Global prevalence and major risk factors of diabetic retinopathy. Diabetes Care 35: 556–564.

• https://research.googleblog.com/2016/11/deep-learning-for-detection-of-diabetic.html

• https://www.kaggle.com/c/diabetic-retinopathy-detection

2. Introduction and Use Case

2.1. Presentation on the Use CaseRetinopathy is a fast-growing cause of blindness worldwide, over 400 million people at risk fromdiabetic retinopathy alone [Yau2012]. The disease can be successfully treated if it is detected early.Color fundus retinal photography uses a fundus camera (a specialized low power microscope withan attached camera) to record color images of the condition of the interior surface of the eye, inorder to document the presence of disorders and monitor their change over time. Medical expertsinterpret such images and are able to detect the presence and stage of retinal eye disease such asdiabetic retinopathy. However, due to a lack of suitably qualified medical specialists in many partsof the world, a comprehensive detection and treatment of the disease is difficult. This use case


https://www.kaggle.com/c/diabetic-retinopathy-detection

https://research.googleblog.com/2016/11/deep-learning-for-detection-of-diabetic.html

focuses on the development of tools facilitating a deep learning approach to automatedclassification of retinopathy based on color fundus retinal photography images [Eul2017]. The usecase is divided into three increasingly complex sub-use cases: (a) Deployment and use ofconvolutional neural network classification model on DEEP e-infrastructure. (b) Construction(training) and evaluation of convolutional neural network classification model on DEEP e-infrastructure based on integrated set of 70,000 retinal photography images. (c) Construction(training) and evaluation of convolutional neural network classification model on DEEP e-infrastructure based on distributed set of 70,000 retinal photography images. Potential applicationsof this use cases include (a) support of retinopathy screening programs in remote locations; (b)deep learning model development for image classification tasks requiring large-scale computinginfrastructures in a wide range of application domains; (c) deep learning model construction(training) based on inherently distributed or learning data sets (for example in areas where dataintegration is prohibited or infeasible due to technical, regulatory or other constraints). Besidesacceptable classification effectiveness of the models, the main challenges of this use case are:

• Ease of use of classification tool for end users such as medical experts. • Ease of use and scalability of e-infrastructure for deep learning model deployment and

development for modellers and developers. • Construction (learning) of effective and efficient deep learning models with inherently

distributed data.

2.2. Description of the research community• Domain expert/user: Uses deployed deep learning model to classify images into predefined

categories. For example, a medical expert that is part of a large-scale screening programwhich obtains image data from a particular population to asses their disease status (e.g.from a population whose individuals have a high risk of diabetic retinopathy).

• Modeller/developer: Who is tasked to develop and deploy a deep learning model based on alarge set of image data.

• Algorithmician: Designs and evaluates novel deep learning algorithms for specific class ofproblems.

• Scientist: E.g. neuroscientist researching new models of cognition.

2.3. Current status and plan for this Use CaseOur starting point is a deep learning retinopathy classification and progression model developed byEulenberg and co-workers [Eul2017]. The model graph and associated weight matrices arecurrently stored in a TensorFlow binary file [TF2018]. This file can be read and loaded throughTensorFlow (e.g. using Python) and used to classify retinopathy images. In the first phase, we planto deploy this model on the DEEP e-infrastructure and demonstrate its use.

For the second phase, we plan to re-learn and evaluate a deep learning retinopathy imageclassification model from scratch using the DEEP e-infrastructure. This requires a workflow


capable of processing and handling of substantial amounts of labelled image data and an iterativetraining procedure that optimizes a large number of model parameters of the underlyingconvolutional neural network using GPUs.

In the final phase of this use case, we plan to realize a learning procedure on the DEEP system, inwhich we assume that the underlying learning dataset is partitioned into several subsets located ingeographically dispersed sites. is it possible to efficiently train a CNN on such data by a procedurethat moves partially learned CNNs instead of moving data? Such a procedure could be useful invarious settings where physical integration of data is infeasible or impractical.

2.4. Identification of the KEY scientific goals• To develop and evaluate a deep learning tool facilitating the classification of retinopathy

stage and progression based on digital color fundus retinal photography images.

• To improve automated classification retinopathy stage (Healthy, Mild, Medium, Severe)and reconstruct disease progression by means of deep learning.

• To explore construction (training) of deep learning models using inherently distributedtraining data.

• To address the need for a comprehensive and automated method for large-scale screeningprograms based on medical images.

2.5. Description of potential developmentConvolutional neural networks for image classification have been applied to a wide range ofproblems in all kinds of domains and disciplines. Thus, the solutions provided through this use caseon the DEEP e-infrastructure may potentially be used, adapted and further developed in manyways, depending on domain, problem and stakeholder.

3. Technical description of the use case

3.1. User categories and roles• A subject matter or domain specialist/expert user, possibly located in region where large-

scale computing/data infrastructures are not readily available. • A machine learning expert or medical image analyst who seeks to develop a deep learning

image classifier from an image dataset.

3.2. General description of datasets/formats/software usedThe dataset consists of high-resolution retina images taken under a variety of imaging conditions.

Datasets


• Training data: 35,126 JPEG image files (ca. 35 GB in total). Each file is named using theformat ID_side.jpeg, where ID is a subject identifier number, and side refers to either left orright eye respectively. Each image is associated with a disease stage label (number): 0=No.1=Mild, 2=Moderate, 3=Severe, 4=Proliferative Retinopathy.

• Test data: ca. 30,000 JPEG image files (ca. 30 GB in total). These images are not labeled,i.e. their disease stage is not known.

Software

• Python: Numpy, scipy, pandas, matplotlib, os

• Image processing: OpenCV and PIL

• Deep learning: Sart with TensorFlow and Keras

3.3. Technical (S/T) requirementsScientific requirements

• Experts that validate the results produced by the developed models and tools.

Technological Requirements

• Temporary storage in the order of 100s GBs to store intermediate data. • Permanent storage in the order of 100s GBs to store raw image data and possible

intermediate data. • CPUs with 64-128GB of RAM for computation and processing • Powerful GPUs to efficiently train deep learning models

3.4 Identification of required services• Data orchestration to control the flow of data and responses from new classification

requests. • Data orchestration to control the flow of data for the construction (training) of a new deep

learning model. • Orchestration of model flow in an incremental/distributed approach to deep learning model

construction (training).

3.5 Description of the use case in terms of workflowsThe general workflows for the different use case scenarios are as follows:

Using a deployed CNN model: Task = Classify:

• Workflow: Data (1 x image) → Model → Class label

Training new CNN model (1 x integrated dataset): Task = Train:

• Workflow: Data (N x images in single dataset) → Algorithm → Model


Training new CNN model (K x datasets with M x images each): Task = Train(incremental/distributed):

• Data (K x N x images) → Algorithm → Final Model

4. Data requirements

4.1. Access controlThis use case needs proper access control mechanisms.

4.1.1.PrivacyThis use case is representative of medical applications in which medical images are submitted toand processed via a remote cloud service. While the current use case does not include explicitinformation (name, address, phone number, email address, etc.) linking the data to the individual, itis still advisable to that the data is handled in a manner that prevents subject identification.

The 3rd scenario of this use case aims to construct (train) a deep learning classification model fromgeographically dispersed data. One application scenario of such distributed/incremental training ofa deep learning model is privacy-preserving learning. We explore if distributed/incremental trainingcan be efficiently performed in the DEEP infrastructure.

4.1.2. LocationThis use case principal assumption is that data (medical images such as rethinopathy images) aregenerated in various locations (hospitals, clinics, mobile screening facilities). To provide and use adeep learning classifier for such data, we distinguish the model construction (training) andapplication phase.

Model application: Here we consider the situation, where the classification model exists in thecloud and the user uploads an image for it to be classified by the model. In this scenario, smallamounts of data are uploaded to the cloud model but the management of the data remains local.Hence, no particular data management requirements need to be considered (other than dataprotection/privacy).

Model construction (training), scenario 1: Data is physically integrated in the cloud. Thisrequires that data (medical images) potentially generated in dispersed geographic locations, needsto be transferred to the cloud and (physically) integrated into a coherent dataset for training. Thelocation of the integrated dataset should be such that efficient processing for deep learning(training) is possible.

Model construction (training), scenario 2: This approach assumes that the training data set issplit into different subsets which reside in geographically dispersed locations in the cloud, and thatit is infeasible or impractical these merge these subsets into a single integrated set. This use caseaims to realize a training procedure in which multiple versions or portions of the final model are


first created in a distributed/incremental fashion before the final model is formed. Thus, the data inthis scenario remains in different cloud locations and the model (or parts of it) is moved betweenlocations.

4.1.3. SharingIn the standard scenarios (single data owner creates model and uses it) the data is not meant to beshared.

However, in the distributed/incremental scenario, multiple data owners "share" in the sense thatthey allow (potentially privacy-preserving) access for model training with the aim of sharing thefinal model.

4.2. Capacity (Data Volume)

4.2.1.Test data / production dataEssentially, this use case consists of approximately 100 GB raw image data.

4.2.2.Transfer rate requirementsData transfer between cloud storage and GPUs is the main bottleneck when training deep learningmodels – so this aspect should be highly efficient.

In the distributed/incremental model training scenario, it partial/intermediate model versions mayneed to be frequently exchanged among dispersed storage locations.

4.3. Preservation requirementsThe raw image data of this use case is approximately 100 GB in size.

In addition, it may be useful to preserve data from various intermediate convolution/pooling layerspotentially from multiple models. This may amount to several 100 GBs.

5. Infrastructure and technical requirements

5.1. Expectation regarding the advantage through the use of technology

5.2. Expectations regarding e-infrastructure use

5.2.1. NetworkingAccessibility of the data from both CPU and GPU infrastructure with low latency interconnectionsamong cores and nodes.


5.2.2. Computing: clusters, grid, cloud, supercomputing resourcesInfrastructure for deep learning model development

• Servers with CPUs with ca. 64 GB RAM to for various processing tasks.• Servers one or more GPU units for fast training of deep learning models.

Infrastructure for model deployment

The deployment of a deep learning model for image classification in this use case does not needspecial hardware. What is needed are servers for hosting the web services that will enable users toaccess the tools.

5.2.3.StorageInfrastructure for deep learning model development

• 100s GB.

Infrastructure for model deployment

• No particular requirements.

• On (user-facing) monitoring (and accounting)

No particular needs.

5.3. On authentication and authorization Infrastructure (AAI)See above.

6. Formal list of requirementsSee table below.


7. Use case summary table

Use Case Deep learning application for classification of disease progression of diabetic retinopathy.

Area Biological and medical sciences

Software and services used

Python, orchestration of containers, data orchestration, web services that will enable users to access the tools.

Machine / Deep Learning tools

• Python: Numpy, scipy, pandas, matplotlib, os

• Image processing: OpenCV and PIL

• Deep learning: Sart with TensorFlow and Keras

Computing CPUs and powerful GPUs for Deep Learning (8 GB GPU memory preferred)

Memory requirements 64-128 GB

Networking

As fast as possible, as data transfer rate between a cloud storage and GPU is the main bottleneck for the learning phase in Deep Learning. A decent network performance may also be required in the distributed Deep Learning sub-use case where the training data resides in differentlocations.

Storage requirements (permanent, temporal)

Feasible approach:

• 100s of GBs as temporary storage (for intermediate representations, the code, and the neural network)

• 100s of GBs as permanent storage (raw data including raw image data).

External data access requirements Kaggle: https://www.kaggle.com/c/diabetic-retinopathy-detection

PrivacySub-use case 3 explores Deep Learning model construction on inherently distributed training data. One reason why such a scenario might be important is privacy-preserving model construction.

Other requirements Currently none. Additional requirements may arise during the development, deployment and evaluation of the use case.

Other comments A node with single GPU is sufficient to start with.

Relevant references or • Eul2017



URL • https://research.googleblog.com/2016/11/deep-learning-for- detection-of-diabetic.html

• https://www.kaggle.com/c/diabetic-retinopathy-detection

8. References1. [Eul2017] Eulenberg, P., Köhler, N., Blasi, T., Filby, A., Carpenter, A.E., Rees, P., Theis, F.J.,

Wolf, F.A. (2017). Reconstructing cell cycle and disease progression using deep learning. Nature Communications 8:463.

2. [Yau2012] Yau, J.W., Rogers, S.L., Kawasaki, R., Lamoureux, E.L., Kowalski, J.W., et al. (2012) Global prevalence and major risk factors of diabetic retinopathy. Diabetes Care 35: 556–564.

3. [TF2018] TensorFlow website: Serving a TensorFlow model. https://www.tensorflow.org/serving/serving_basic


https://www.tensorflow.org/serving/serving_basic




deep-hybriddataclouddigital.csic.es/bitstream/10261/164311/4/deep-na2-d2.1-annex1.pdf · v3.0...

Documents