(life(and medical(biology data accelerator (lambda)

19
Institute of Computing Technology, Chinese Academy of Sciences 1 Guangming Tan : Life and Medical Biology Data Accelerator (Lambda) Ins>tute of Compu>ng Technology, Chinese Academy of Sciences

Upload: others

Post on 15-Oct-2021

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: (Life(and Medical(Biology Data Accelerator (Lambda)

Institute of Computing Technology, Chinese Academy of Sciences

1

Guangming  Tan  

:    Life  and  Medical  Biology  Data  Accelerator  

(Lambda)  

Ins>tute  of  Compu>ng  Technology,  Chinese  Academy  of  Sciences  

Page 2: (Life(and Medical(Biology Data Accelerator (Lambda)

Institute of Computing Technology, Chinese Academy of Sciences

2

Biological  Imaging  Data  Challenge

GAP:  O(years)  !

Moritz  Helmstaedter,  Cellular  resolu>on  connectomics:  challenges  of  dense  neural  circuit  reconstruc>on,  Nature  Method,  10(6),  2013      

High-Throughput Image Data Analysis is Required!

Higher  Resolu>on

Page 3: (Life(and Medical(Biology Data Accelerator (Lambda)

Institute of Computing Technology, Chinese Academy of Sciences

3

High  Spa>otemporal  Resolu>on  Two-­‐Photon  Microscope  Imaging  System

•  In  vivo  •  High  Dimension

Peking  University

Page 4: (Life(and Medical(Biology Data Accelerator (Lambda)

Institute of Computing Technology, Chinese Academy of Sciences

4

Event  Detec>on  at  Cellular  Level

Cheng,  H  Science  1993 Calcium  Spark Sparks  and  Transients

Cheng,  H  Cell  2008  

Superoxide  Flash      

5  mm

Superoxide  Flash  

Elementary  Events  of  Calcium  Signals  

Visualiza>on  of  Reac>ve  oxygen  species  (ROS)

5µm

(Zhuang Zhou, Xiaowei Can)

Dendrite  Calcium  Imaging  

Animal’s  dynamic  neural  signals  

2µm

Page 5: (Life(and Medical(Biology Data Accelerator (Lambda)

Institute of Computing Technology, Chinese Academy of Sciences

5

Life and Medical Biology Data Accelerator (Lambda, λ)

•  PostgreSQL

•  Bio-Format Data  

•  Domain-Specific Accelerator

•  Auto-tuning library Engine  

•  Built-in modules

•  Customizable framework Pipeline  

lambda

Page 6: (Life(and Medical(Biology Data Accelerator (Lambda)

Institute of Computing Technology, Chinese Academy of Sciences

6

λ-­‐Image    SoYware/Hardware  Stack

¢  High-­‐dimension  &  mul>-­‐mode  biological  image  data  system  ¢  Data  analysis  pipeline  for  massive  biological  image  ¢  Accelera>ng  data-­‐intensive  algorithms  for  biological  image  analysis

machine learning stencil denoising

Biological Data Analysis Pipeline (cell event detection, segmentation)

Biological Data Analysis Algorithm Toolkit

deconvolution

Database   Accelerator

MPI Spark CUDA OpenCL

RDMA  

Mouse  embryo  heart  image  cell  lineage  

Mice  brain  cell  Ca2+  spark  detec>on  

Islet  forming  in  pancrea>c  and  imaging  in  vivo  

Cardiovasology  

Brain

Endocrinology

Page 7: (Life(and Medical(Biology Data Accelerator (Lambda)

Institute of Computing Technology, Chinese Academy of Sciences

7

High-­‐throughput  Image  Processing  Algorithm

fMRI  ssTEM  sBEM  LSFM

O(N*P3)

Unbiased  Analysis  of  Events

Current  Compu>ng  Systems(SoYware/Hardware):O(Years)

Interac>ve

High  Performance  Compu>ng  Pla_orm:  O(Minutes)

High  accuracy

Machine  Learning

Page 8: (Life(and Medical(Biology Data Accelerator (Lambda)

Institute of Computing Technology, Chinese Academy of Sciences

8

Paralleliza>on  with  in-­‐memory  Compu>ng  Model

Raw  Data

Raw  Data

3D  Deconvolution

Intensity  Normalization

Subtract  Background

Preprocessed  Data

Left  Side

Match

Powell

Mutual  Information

Left  Side

Right  Side

Right  Side

Left  Side

Wavelet  Decomposition

Activity  Measure

Fusion  Decomposition

Fused  Data

Right  Side

Fused  Data

Planarity  Enhancement

Tensor  Voting

3D  Watershed

Labelmap  Image  Data

Image  R1

Preprocess

Registration

Fusion

Image  L1

Preprocess

PreprocessedImage  L1

Registration

Registered  Image  L1

Image  L2

Preprocess

Registration

Image  R2

Preprocess

Registration

Fused  Image1

Merge  Image  Stack

... ...

Fusion

Fused  Image2 ...

Fused  Image  Stack

Segmetation

Final  Result  For  Visualization  Process

Map

Reduce

PreprocessedImage  L2

PreprocessedImage  R1

PreprocessedImage  R2

Registered  Image  L2

Registered  Image  R1

Registered  Image  R2

Spark

Page 9: (Life(and Medical(Biology Data Accelerator (Lambda)

Institute of Computing Technology, Chinese Academy of Sciences

9

GPU  Accelera>on  of  Algorithm  Modules

0  

5  

10  

15  

20  

25  

30  

35  

40  

DeconvoluJon   Median  Filter   Objectness  Filter  IteraJve  Closing  

CPU   GPU  

Page 10: (Life(and Medical(Biology Data Accelerator (Lambda)

Institute of Computing Technology, Chinese Academy of Sciences

10

Image Processing & Analysis Pipeline

RL/Sparse

Mutual Information

Machine Learning (Event detection or Pattern Analysis)

Watershed segmentation Labelmap selection Particle analysis

Mutual Information is derived from Information Theory and its application to image registration has been proposed in different forms

2D and 3D iterative deconvolution.

Machine Learning

Mice Brain Cell Ca2+ Spike Detection Analysis  

Segm

enta>o

n  Re

gistra>o

n  De

convolu>

on  

Fusion

 

GPU

✔ ✔

✔ ✔

Spark

Fusion

 

Wavelet based ✔

use global five-level wavelet decomposition

Page 11: (Life(and Medical(Biology Data Accelerator (Lambda)

Institute of Computing Technology, Chinese Academy of Sciences

11

Deconvolu>on  of  Pancreas  Islet  Images

2  DAYS

4  GPUs  (K20)

Terabyte  EM  Images

Preprocessing

e:image p:psf for N iterations /*apple imaging model to estimate*/ E=gpu_fft(e, batch) B=gpu_multiply(E, PSF, batch) b=gpu_ifft(B, batch) /*captured image divided by                      blurred estimate*/ r=gpu_divide(o, b, batch) /*calculate correction vector*/ R=gpu_fft(r, batch) C=gpu_multiply(R, PSF, batch) c=gpu_ifft(C, batch) /*apply correction vector*/ e=gpu_multiply(e,c, batch) end

deconvolution

GPU

batch

name pixel(XYZ) #iters

size Fiji (JAVA)

GPU speedup

beta.tif 1024x2048x51 100 408MB 60m 22s 163

glucose_sequential2.tif

512x512x400 50 200MB 30m 10s 180

4.7YEARS

Page 12: (Life(and Medical(Biology Data Accelerator (Lambda)

Institute of Computing Technology, Chinese Academy of Sciences

12

Extrac>ng  Cells  from  Mouse  Embryos  Images

light-­‐sheet  microscopes  images

1.5  DAYS

culture Fast,    two-­‐side,    3D,    duel-­‐color  imaging

excitaJon

detecJon

detecJon ReconstrucJon    

Time1 Time2 Time3

………

200  Time  points  2x500  images  2048x2048  pixels  4GB*2*200  =  1.6TB

Page 13: (Life(and Medical(Biology Data Accelerator (Lambda)

Institute of Computing Technology, Chinese Academy of Sciences

13

Blitz:High  Performance  Machine  Learning  Toolkit

Algorithm  Interface

Layer  Opera>on

Virtual  Backend

NVIDIA  DIGITS  (Customized)

Sugon  Xmachine

Pipelining  Parallelism

Model  Parallelism

Data  Parallelism

RDMA

Classifica>on Clustering SVM K-­‐means

Accelerator: Programming    Hardware

Distributed  Parallelism

Operator  Language    (Linear  Algebra  /  Tensor  Primi>ves)  

Automa>c  Performance  Tuning

Performance    Interface

Communica>on  Avoiding

KNN

vectoriza>on

mul>thread

DNN CNN

Dimensionality PCA

Page 14: (Life(and Medical(Biology Data Accelerator (Lambda)

Institute of Computing Technology, Chinese Academy of Sciences

14

Convolu>onal  Nets  2012  (AlexNet)

Hardware  Environment  CPU:  Dual  Intel(R)  Xeon(R)  CPU  E5-­‐2680  v3,  28    CPU-­‐Memory:  128GB    GPU:  Tesla  K20    GPU-­‐Memory:6GB  

blitz   1310s    

caffe     1960s  

blitz   125ms caffe   196ms

13-­‐layer  architecture Layer Type Maps  and  neurons Kernel  size

0 Input 1  map  of  224*224  neurons

1 ConvoluJon 64  maps  of  55*55  neurons 11*11

2 Pooling 64  maps  of  27*27  neurons   3*3

3 ConvoluJon 192  maps  of  27*27  neurons 5*5

4 Pooling 192  maps  of  13*13  neurons 3*3

5 ConvoluJon 384  maps  of  13*13  neurons 3*3

6   ConvoluJon 256  maps  of  13*13  neurons   3*3

7   ConvoluJon 256  maps  of  13*13  neurons 3*3

8   Pooling 256  maps  of  6*6  neurons 3*3

9   Fully-­‐connected 4096  neurons 1*1

10   Dropout 4096  neurons 1*1

11   Fully-­‐connected 4096  neurons 1*1

12   Dropout 4096  neurons 1*1

13   Fully-­‐connected 1000  neurons 1*1

Input  3@224*224

Conv1  64@55*55

Convolu>ons Pooling

Pool1  64@27*27

Convolu>ons

Conv2  192@27*27 Pool2  192@13*13

Pooling

1  batch  size  running  >me

batch  size=128  1  epoch  running  >me  

Page 15: (Life(and Medical(Biology Data Accelerator (Lambda)

Institute of Computing Technology, Chinese Academy of Sciences

15

Flash  Detec>on

A B C

E.Coli,  Jme  series,  512X512X(100  frames).  

Intensity  increases  rapidly   Intensity  declines  obviously    averaged  intensi>es  change  con>nuously  

A  nonstandard  flash  is  not  found  by    either  expert  or  threshold-­‐based  method

Page 16: (Life(and Medical(Biology Data Accelerator (Lambda)

Institute of Computing Technology, Chinese Academy of Sciences

16

Automated  Flash  Detec>on  based  on  Blitz

Stack of fluorescence

images

Image registration

Cell segmentation

Intensity average

Data over-fitting check

If model accuracy is close to 100%

Feature addition

Data skew elimination

Test set upper bound estimation

Data ratio adjustmentCross validation

If model accuracy is close to upper bound

Model generation

Event detection Events output

Input Pre-processing

Feature selection

Model train

Model prediction

Resultcollection

Skewelimination

F  value: F=2×precision×recall/(precision+recall), where precision  =(no.  returned  flashes)/(no.  returned  peaks) recall  =(no.  returned  flashes)/(no.  all  the  flashes).  

¢  Use  cross  valida>on  to  find  parameters  to  train  a  model  which  can  get  beper  accuracy  and  F  value.    

¢  Use  MPI  +  CUDA  paralleliza>on  to  reduce  training  >me

•  9 features: 6 local, 3 global •  (amplitude, width, slope)*(left, right) •  Average intensity of trace, distance to

the (last, next) peak

Page 17: (Life(and Medical(Biology Data Accelerator (Lambda)

Institute of Computing Technology, Chinese Academy of Sciences

17

Membrane  Segmenta>on  based  on  Blitz

Group   Rand  error  [·∙10  −3  ]  

Warping  error  [·∙10  −6]  

Pixel  error  [·∙10  −3  ]   Training  Time  

Simple  Thresholding   445     15522   222  

NIPS  2012   48   434   60   7  Days    (Four  GPUs)  Our  approach 116   2865   95   2  Days    (One  GPUs)  

Deep  Neural  Network    

Brain Heart

Page 18: (Life(and Medical(Biology Data Accelerator (Lambda)

Institute of Computing Technology, Chinese Academy of Sciences

18

Conclusion

¢  Develop  a  Spark-­‐based  paralleliza>on  framework  for  high  throughput  image  analysis  pipelines  

¢  Op>miza>ons  on  GPU  § Core  algorithms  in  image  processing  (3x-­‐10x)  §  SGEMM  in  deep  learning  (      30%)  

¢  Achieve  significant  speedups  for  image  processing  §   Years  à  days

Page 19: (Life(and Medical(Biology Data Accelerator (Lambda)

Institute of Computing Technology, Chinese Academy of Sciences

19

Thanks!