![Page 1: Training deep convolutional neural networks for ...jdh4/churchill_deep_learning_user... · Training deep convolutional neural networks for classification of multi-scale, nonlocal](https://reader034.vdocuments.mx/reader034/viewer/2022042209/5eadad9021ac7c6d046a2065/html5/thumbnails/1.jpg)
Training deep convolutional neural networks for
classification of multi-scale, nonlocal data in fusion
energy, using the Pytorch framework
R.M. Churchill1, the DIII-D teamSpecial thanks to: ● DIII-D team generally, specifically Ben Tobias1,
Yilun Zhu2, Neville Luhmann2, Dave Schissel3, Raffi Nazikian1, Cristina Rea4, Bob Granetz4
● PPPL colleagues: CS Chang1, Bill Tang1, Julian Kates-Harbeck1,5, Ahmed Diallo1, Ken Silber1
● Princeton University Research Computing6
https://deepmind.com/blog/wavenet-generative-model-raw-audio/
1
2
3
4
6
5
![Page 2: Training deep convolutional neural networks for ...jdh4/churchill_deep_learning_user... · Training deep convolutional neural networks for classification of multi-scale, nonlocal](https://reader034.vdocuments.mx/reader034/viewer/2022042209/5eadad9021ac7c6d046a2065/html5/thumbnails/2.jpg)
The Princeton Plasma Physics Laboratory is a world-class fusion energy research laboratory dedicated to developing the scientific and technological knowledge base for fusion energy as a safe, economical and environmentally attractive energy source for the world’s long-term energy requirements.
![Page 4: Training deep convolutional neural networks for ...jdh4/churchill_deep_learning_user... · Training deep convolutional neural networks for classification of multi-scale, nonlocal](https://reader034.vdocuments.mx/reader034/viewer/2022042209/5eadad9021ac7c6d046a2065/html5/thumbnails/4.jpg)
Why you should use Pytorch...
![Page 5: Training deep convolutional neural networks for ...jdh4/churchill_deep_learning_user... · Training deep convolutional neural networks for classification of multi-scale, nonlocal](https://reader034.vdocuments.mx/reader034/viewer/2022042209/5eadad9021ac7c6d046a2065/html5/thumbnails/5.jpg)
Outline
● Neural network architectures for multi-scale, non-local data
● Initial results with ECEi for Disruption Prediction
● Using Pytorch
![Page 6: Training deep convolutional neural networks for ...jdh4/churchill_deep_learning_user... · Training deep convolutional neural networks for classification of multi-scale, nonlocal](https://reader034.vdocuments.mx/reader034/viewer/2022042209/5eadad9021ac7c6d046a2065/html5/thumbnails/6.jpg)
Motivation: Automating classification of fusion plasma phenomena is complicated ● Fusion plasmas exhibit a range of physics over different time and
spatial scales● Fusion experimental diagnostics are disparate, and increasingly
high time resolution● How can we automate identification of important plasma
phenomena, for example oncoming disruptions?
Figure: [F. Poli, APS DPP 2017]
![Page 7: Training deep convolutional neural networks for ...jdh4/churchill_deep_learning_user... · Training deep convolutional neural networks for classification of multi-scale, nonlocal](https://reader034.vdocuments.mx/reader034/viewer/2022042209/5eadad9021ac7c6d046a2065/html5/thumbnails/7.jpg)
Neural networks can be thought of as series of filters whose weights are “learned” to accomplish a task
● Fusion experiment/simulation have a wide variety of data analysis pipelines, use prior knowledge to get result
● Neural networks (NN) have a number of layers of “weights” which can be viewed as filters (esp. Convolutional NN). But these filters are taught how to map given input through a complicated non-linear function to a given output.
Low pass filter EFITInternal Inductance
Input (magnetics)
https://becominghuman.ai/deep-learning-made-easy-with-deep-cognition-403fbe445351
![Page 8: Training deep convolutional neural networks for ...jdh4/churchill_deep_learning_user... · Training deep convolutional neural networks for classification of multi-scale, nonlocal](https://reader034.vdocuments.mx/reader034/viewer/2022042209/5eadad9021ac7c6d046a2065/html5/thumbnails/8.jpg)
Challenges for RNN/LSTM on long sequences
● Typical, popular sequence NN like LSTM in principle are sensitive to infinite sequence length, due to memory cell technique
● However, in practice they tend to “forget” for phenomena with sequence length >1000 (approximate, depends on data)
● If characterising a sequence requires Tlong seconds, and short-scale phenomena of time-scale Tshort are important in the sequence, to use an LSTM requires
● Various NN architectures enable learning on long sequences (CNN with dilated convolutions, attention, etc.)
https://colah.github.io/posts/2015-08-Understanding-LSTMs/
![Page 9: Training deep convolutional neural networks for ...jdh4/churchill_deep_learning_user... · Training deep convolutional neural networks for classification of multi-scale, nonlocal](https://reader034.vdocuments.mx/reader034/viewer/2022042209/5eadad9021ac7c6d046a2065/html5/thumbnails/9.jpg)
Dilated convolutions enable efficient training on long sequences● One difficulty using CNNs with
causal filters is they require large filters or many layers to learn from long sequences ○ Due to memory constraints, this
becomes infeasible
● A seminal paper [*] showed using dilated convolutions (i.e. convolution w/ defined gaps) for time series modeling could increase the NN receptive field, reducing computational and memory requirements, and allowing training on long sequences
Normal convolution
Dilated convolution
[* A. Van Den Oord, et. al., WaveNET: A Generative Model for Raw Audio, 2016]
![Page 10: Training deep convolutional neural networks for ...jdh4/churchill_deep_learning_user... · Training deep convolutional neural networks for classification of multi-scale, nonlocal](https://reader034.vdocuments.mx/reader034/viewer/2022042209/5eadad9021ac7c6d046a2065/html5/thumbnails/10.jpg)
Temporal Convolutional Networks
● Temporal Convolutional Network (TCN) architecture [*] combines causal, dilated convolutions with additional modern NN improvements (residual connections, weight normalization)
● Several beneficial aspects compared to RNN’s:○ Empirically TCN’s exhibit longer memory (i.e. better for long
sequences)○ Non-sequential, allows parallelized training and inference○ Require less GPU memory for training
[* Bai, J.Z. Kolter, V. Koltun, http://arxiv.org/abs/1803.01271(2018)]
![Page 11: Training deep convolutional neural networks for ...jdh4/churchill_deep_learning_user... · Training deep convolutional neural networks for classification of multi-scale, nonlocal](https://reader034.vdocuments.mx/reader034/viewer/2022042209/5eadad9021ac7c6d046a2065/html5/thumbnails/11.jpg)
Outline
● Neural network architectures for multi-scale, non-local data
● Initial results with ECEi for Disruption Prediction
● Using Pytorch
![Page 12: Training deep convolutional neural networks for ...jdh4/churchill_deep_learning_user... · Training deep convolutional neural networks for classification of multi-scale, nonlocal](https://reader034.vdocuments.mx/reader034/viewer/2022042209/5eadad9021ac7c6d046a2065/html5/thumbnails/12.jpg)
Machine learning for Disruption Prediction
● Predicting (and understanding?) disruptions is a key challenge for tokamak operation, a lot of ML research has been applied [Vega Fus. Eng., 2013 , Rea FST 2018, Kates-Harbeck Nature 2019, D. Ferreira arxiv 2018]○ Most ML methods use processed 0-D signals (e.g. line averaged
density, locked mode amplitude, internal inductance, etc.)○ Can we apply deep CNNs directly to diagnostic outputs for improved
disruption prediction?● Electron Cyclotron Emission imaging (ECEi) diagnostic has
temporal & spatial sensitivity to disruption markers [Choi NF 2016]
ITER Physics Basis, Chapter 3 Nucl. Fusion 39 (1999) 2251–2389.
disr
uptio
n
ECEi data near disruption
![Page 13: Training deep convolutional neural networks for ...jdh4/churchill_deep_learning_user... · Training deep convolutional neural networks for classification of multi-scale, nonlocal](https://reader034.vdocuments.mx/reader034/viewer/2022042209/5eadad9021ac7c6d046a2065/html5/thumbnails/13.jpg)
DIII-D Electron Cyclotron Emission Imaging (ECEi)
● ECEi characteristics:○ Measures electron temperature, Te○ Time resolution (1 MHz) enabling measurement of δTe on turbulent
timescales○ Digitizer sufficient to measure entire DIII-D discharge (~O(5s))○ 20 x 8 channels for spatial resolution○ Some limitations due to signal cutoff above certain densities
● Sensitive to a number of plasma phenomena, e.g.○ Sawteeth○ Tearing modes○ ELM’s
https://sites.google.com/view/mmwave/research/advanced-mmw-imaging/ecei-on-diii-d
● Due to high temporal resolution (long time sequences), and spatial resolution, ECEi is a good candidate for applying end-to-end TCN
[B. Tobias et al., RSI (2010)]
![Page 14: Training deep convolutional neural networks for ...jdh4/churchill_deep_learning_user... · Training deep convolutional neural networks for classification of multi-scale, nonlocal](https://reader034.vdocuments.mx/reader034/viewer/2022042209/5eadad9021ac7c6d046a2065/html5/thumbnails/14.jpg)
Dataset and computation
● Database of ~3000 shots (~50/50 non-disruptive/disruptive) with good ECEi data created from the Omfit DISRUPTIONS module shot list [E. Kolemen, et. al.]○ “Good” data defined as all channels have SNR>3, avoid
discharges where 2nd harmonic ECE cutoff● ECEi data (~10 TB) transferred to Princeton TigerGPU cluster for
distributed training (320 nVidia P100 GPU’s, 4 GPU’s per compute node)
![Page 15: Training deep convolutional neural networks for ...jdh4/churchill_deep_learning_user... · Training deep convolutional neural networks for classification of multi-scale, nonlocal](https://reader034.vdocuments.mx/reader034/viewer/2022042209/5eadad9021ac7c6d046a2065/html5/thumbnails/15.jpg)
Setup for training neural network
● Each time point is labeled as “disruptive” or “non”. For a disruptive shot, all time points 300ms or closer to disruption are labelled “disruptive” ○ Times before 350ms have similar distribution
to non-disruptive discharges [Rea FST 2018]● Binary classification problem
(disruptive/non-disruptive time slice)● Overlapping subsequences of length
>> receptive field are created, length mainly set by GPU memory constraints
Figure: Rea, FST, 2018
Near disruption
Far from disruption
Non-disrupted
Receptive field(# inputs needed to make 1 prediction)
![Page 16: Training deep convolutional neural networks for ...jdh4/churchill_deep_learning_user... · Training deep convolutional neural networks for classification of multi-scale, nonlocal](https://reader034.vdocuments.mx/reader034/viewer/2022042209/5eadad9021ac7c6d046a2065/html5/thumbnails/16.jpg)
● Starting with smaller subsets of data, working up.● Data setup
○ Downsampled to 100 kHz○ Sequences broken up into subsequences of 78,125 (781ms)○ Undersampled subsequence training dataset so that 50/50 split in
non-disruptive/disruptive subsequences (natural class imbalance ~5% disruptive subsequences).
○ Weighted loss function for 50/50 balancing of time slices classes○ Full 20 x 8 channels used (but no 2D convolutions)○ Data normalized with z-normalization (y - mean(y))/std(y)
Setup for training neural network
● TCN setup:○ Receptive field ~30,000 i.e. 300ms (each
time slice prediction based on receptive field)
○ 4 layers, dilation 10, kernel size 15, hidden nodes 80 per layer
![Page 17: Training deep convolutional neural networks for ...jdh4/churchill_deep_learning_user... · Training deep convolutional neural networks for classification of multi-scale, nonlocal](https://reader034.vdocuments.mx/reader034/viewer/2022042209/5eadad9021ac7c6d046a2065/html5/thumbnails/17.jpg)
Current, initial results
● Training on the subset of data, the loss does continually decrease, suggesting the network has the capacity necessary to capture and model disruptions with the ECEi data
● F1-score is ~91%, accuracy ~94%, on individual time slices. ○ Additional regularization and/or training with larger dataset can
help improve.● Run on 16 GPU’s for 2 days.
![Page 18: Training deep convolutional neural networks for ...jdh4/churchill_deep_learning_user... · Training deep convolutional neural networks for classification of multi-scale, nonlocal](https://reader034.vdocuments.mx/reader034/viewer/2022042209/5eadad9021ac7c6d046a2065/html5/thumbnails/18.jpg)
Future Possibilities
● Deep CNN architectures (e.g. TCN) can be applied to many fusion sequence diagnostics, e.g. magnetics, bolometry, etc.○ Tying together multiple diagnostics in a single or multiple neural
networks can give enhanced possibilities○ Can be used to create “automated logbook”, to enable
researchers to quickly find discharges with phenomena of interest. Especially important for longer pulses + more diagnostics + higher time resolution diagnostics
● Transfer learning can be explored for quickly re-training CNN on a different machine with few examples
● Large batch training for distributed learning ○ Very hot topic now in ML/DL community, how to quickly and
efficiently train NN? Especially important since training often requires many iterations on hyperparameters
![Page 19: Training deep convolutional neural networks for ...jdh4/churchill_deep_learning_user... · Training deep convolutional neural networks for classification of multi-scale, nonlocal](https://reader034.vdocuments.mx/reader034/viewer/2022042209/5eadad9021ac7c6d046a2065/html5/thumbnails/19.jpg)
Outline
● Neural network architectures for multi-scale, non-local data
● Initial results with ECEi for Disruption Prediction
● Using Pytorch
![Page 20: Training deep convolutional neural networks for ...jdh4/churchill_deep_learning_user... · Training deep convolutional neural networks for classification of multi-scale, nonlocal](https://reader034.vdocuments.mx/reader034/viewer/2022042209/5eadad9021ac7c6d046a2065/html5/thumbnails/20.jpg)
Getting started with Pytorch
● Once you know Python (+NumPy), most of Pytorch will be intuitive (with some exceptions)
● Pytorch.org has great documentation, decent tutorials (some outdated), and generally useful User Forum
● For TigerGPU, make sure you load:○ anaconda3○ cudatoolkit/10.0○ cudnn/cuda-10.0○ ->then install Pytorch according to website
● For distributed training examples, highly recommend the Pytorch Imagenet example (https://github.com/pytorch/examples/tree/master/imagenet)
● Lots of good examples available (see e.g. https://paperswithcode.com/, or new Pytorch Hub)
![Page 21: Training deep convolutional neural networks for ...jdh4/churchill_deep_learning_user... · Training deep convolutional neural networks for classification of multi-scale, nonlocal](https://reader034.vdocuments.mx/reader034/viewer/2022042209/5eadad9021ac7c6d046a2065/html5/thumbnails/21.jpg)
Custom Datasets in Pytorch
● If your dataset isn’t ImageNet, or other predefined datasets that Pytorch offers, creating a custom data loader is straightforward:○ Define your “__init__” method, with any parameters or metadata○ Define a “__len__” method, returning the number of samples○ Define a “__getitem__(index)” method, which returns a sample for
a given index● This allows a lot of flexibility in terms of data file type, size, etc.
Anyway of reading data in Python can be used within custom DataLoaders.
● Any correctly created Dataset object can be passed to the Pytorch DataLoader, defining things like batch_size, samplers, data loader num_workers, etc.)○ One caution: still appears to be a bug at times for num_workers>0
![Page 22: Training deep convolutional neural networks for ...jdh4/churchill_deep_learning_user... · Training deep convolutional neural networks for classification of multi-scale, nonlocal](https://reader034.vdocuments.mx/reader034/viewer/2022042209/5eadad9021ac7c6d046a2065/html5/thumbnails/22.jpg)
Custom Data Samplers
● The torch DataLoader (highly recommend!) allows inputing a sampler object (defines how to draw samples from dataset each epoch)○ Pytorch already has a number of predefined samplers, including a
DistributedSampler● Custom DataSamplers can be written, inheriting from the other
Samplers and rewriting methods needed (__init__, __iter__, __len__)
● This allowed creating a stratified sampler, to ensure each batch received balanced labels, and ensure no bleeding from training to validation and test sets.
![Page 23: Training deep convolutional neural networks for ...jdh4/churchill_deep_learning_user... · Training deep convolutional neural networks for classification of multi-scale, nonlocal](https://reader034.vdocuments.mx/reader034/viewer/2022042209/5eadad9021ac7c6d046a2065/html5/thumbnails/23.jpg)
Pytorch model size estimation
● A common error you might get is “OOM”, out-of-memory, due to your neural network size○ This is (usually) dominated by gradients for backpropagation, not
by parameter size itself● There are ways to estimate before hand the memory required
○ See great blog: ○ Code implemented here:
(though this tended to be too high in practice for me)● Brute force method is to load a model onto GPU, begin training,
and monitor GPU memory usage with “nvidia-smi”○ This code can be added to your bash script for running in parallel
to your Pytorch code
http://jacobkimmel.github.io/pytorch_estimating_model_size/
https://github.com/sksq96/pytorch-summary
![Page 24: Training deep convolutional neural networks for ...jdh4/churchill_deep_learning_user... · Training deep convolutional neural networks for classification of multi-scale, nonlocal](https://reader034.vdocuments.mx/reader034/viewer/2022042209/5eadad9021ac7c6d046a2065/html5/thumbnails/24.jpg)
Distributed training in PytorchMain Pytorch code
Rank0 Rank1 RankN
GPU0
GPU2 GPU3
GPU1 GPU0
GPU2 GPU3
GPU1 GPU0
GPU2 GPU3
GPU1
Rank per compute node, torch.multiprocessing launches 1 process per 4 GPU’s. Allows sharing memory between processes.
Main Pytorch code
GPU0
GPU2 GPU3
GPU1
GPU3GPU3
Rank per GPU, no multiprocessing
Rank0
Rank2 Rank3
Rank1
GPU0
GPU2 GPU3
GPU1
Rank4
Rank6 Rank7
Rank5
GPU0
GPU2 GPU3
GPU1
RankN-4
RankN-2
RankN-1
RankN-3
How Pytorch distributed recommends
How I could get Pytorch distributed to work on TigerGPY
Main issue with torch.multiprocessing was deadlocks
![Page 25: Training deep convolutional neural networks for ...jdh4/churchill_deep_learning_user... · Training deep convolutional neural networks for classification of multi-scale, nonlocal](https://reader034.vdocuments.mx/reader034/viewer/2022042209/5eadad9021ac7c6d046a2065/html5/thumbnails/25.jpg)
Distributed training on TigerGPU with Pytorch● Use Pytorch distributed package
○ Easy SLURM integration○ If using file init_method, make sure
file name changes each run (otherwise can have issues)● Start with DistributedDataParallel
○ Efficient for data parallel training (I haven’t tried new model parallel just out in Pytorch 1.1)
○ Should be able to do everything DataParallel does, and more● Stay away from torch.multiprocessing (or tell me how it works)● Pytorch distributed package recommends NCCL backend for GPU
systems with Infiniband, but also works on TigerGPU○ GLOO often gave deadlocks, and was very opaque to debug. NCCL
once working worked consistently, and was ~1.5x faster
Use NOTNCCL GLOO
![Page 26: Training deep convolutional neural networks for ...jdh4/churchill_deep_learning_user... · Training deep convolutional neural networks for classification of multi-scale, nonlocal](https://reader034.vdocuments.mx/reader034/viewer/2022042209/5eadad9021ac7c6d046a2065/html5/thumbnails/26.jpg)
Conclusions
● Deep convolutional neural networks offer the promise of identifying multi-scale plasma phenomena, using end-to-end learning to work with diagnostic output directly○ TCN architecture with causal, dilated convolutions allows
predictions to be sensitive to longer time sequences while maintaining computational efficiency
● Initial work training a TCN on reduced ECEi datasets yields promising results for the ability of the TCN architecture to train a disruption predictor based on the ECEi data alone.
● Pytorch offers a flexible, performant architecture for tackling YOUR problem
![Page 27: Training deep convolutional neural networks for ...jdh4/churchill_deep_learning_user... · Training deep convolutional neural networks for classification of multi-scale, nonlocal](https://reader034.vdocuments.mx/reader034/viewer/2022042209/5eadad9021ac7c6d046a2065/html5/thumbnails/27.jpg)
References
1. https://deepmind.com/blog/wavenet-generative-model-raw-audio/
2. F. Poli, APS DPP (2017)
3. https://becominghuman.ai/deep-learning-made-easy-with-deep-cognition-403fbe445351
4. http://deeplearning.net/software/theano/tutorial/conv_arithmetic.html
5. https://developers.google.com/machine-learning/practica/image-classification/convolutional-neural-networks
6. A. Van Den Oord, et. al., https://arxiv.org/pdf/1609.03499.pdf (2016),
7. S. Bai, et. al., http://arxiv.org/abs/1803.01271 (2018).
8. ITER Physics Basis, Chapter 3, Nucl. Fusion 39 (1999) 2251–2389.
9. J. Vega, et. al., Fusion Eng. Des. 88 (2013) 1228–1231.
10. C. Rea, et. al., Fusion Sci. Technol. (2018) 1–12.
11. Kates-Harbeck, et. al., submitted (2018)
12. D. Ferreira, http://arxiv.org/abs/1811.00333, (2018)
13. M.J. Choi, et. al., Nucl. Fusion 56 (2016) 066013.
14. https://sites.google.com/view/mmwave/research/advanced-mmw-imaging/ecei-on-diii-d
15. B. Tobias, et. al., Rev. Sci. Instrum. 81 (2010) 10D928.
![Page 28: Training deep convolutional neural networks for ...jdh4/churchill_deep_learning_user... · Training deep convolutional neural networks for classification of multi-scale, nonlocal](https://reader034.vdocuments.mx/reader034/viewer/2022042209/5eadad9021ac7c6d046a2065/html5/thumbnails/28.jpg)
END PRESENTATION
![Page 29: Training deep convolutional neural networks for ...jdh4/churchill_deep_learning_user... · Training deep convolutional neural networks for classification of multi-scale, nonlocal](https://reader034.vdocuments.mx/reader034/viewer/2022042209/5eadad9021ac7c6d046a2065/html5/thumbnails/29.jpg)
Deep learning enables end-to-end learning
● Traditional machine learning focused on hand developed features (e.g. shape in an image) to train shallow NN or other ML algorithms
● Deep learning (multiple layer NN) enable end-to-end learning, where higher dimensional features (e.g. pixels in an image) are input directly to the NN
![Page 30: Training deep convolutional neural networks for ...jdh4/churchill_deep_learning_user... · Training deep convolutional neural networks for classification of multi-scale, nonlocal](https://reader034.vdocuments.mx/reader034/viewer/2022042209/5eadad9021ac7c6d046a2065/html5/thumbnails/30.jpg)
Convolutional neural networks (CNN)
http://deeplearning.net/software/theano/tutorial/conv_arithmetic.htmlhttps://developers.google.com/machine-learning/practica/image-classification/convolutional-neural-networks
● CNN’s very successful in image classification, whereas Recurrent NN (e.g. LSTM) are often used for sequence classification (e.g. time series)
● But viewing NN as “filters”, no reason CNN can’t be applied to sequence machine learning also
![Page 31: Training deep convolutional neural networks for ...jdh4/churchill_deep_learning_user... · Training deep convolutional neural networks for classification of multi-scale, nonlocal](https://reader034.vdocuments.mx/reader034/viewer/2022042209/5eadad9021ac7c6d046a2065/html5/thumbnails/31.jpg)
Transfer learning
https://www.mathworks.com/help/deeplearning/examples/transfer-learning-using-alexnet.html;jsessionid=b71bf04f90335a09f702b9feb5e7
![Page 32: Training deep convolutional neural networks for ...jdh4/churchill_deep_learning_user... · Training deep convolutional neural networks for classification of multi-scale, nonlocal](https://reader034.vdocuments.mx/reader034/viewer/2022042209/5eadad9021ac7c6d046a2065/html5/thumbnails/32.jpg)
Alex Krizhevsky, et. al., NIPS 2012
1st layers of CNN often exhibit basic filter charactersitics, e.g. edge or color filters
![Page 33: Training deep convolutional neural networks for ...jdh4/churchill_deep_learning_user... · Training deep convolutional neural networks for classification of multi-scale, nonlocal](https://reader034.vdocuments.mx/reader034/viewer/2022042209/5eadad9021ac7c6d046a2065/html5/thumbnails/33.jpg)
● Shot where disruption alarm triggered >> 300ms before disruption due to very similar behavior in that time region to just before the disruption (drop in Te, followed by recovery)
Target/prediction
ECEi data
Sequence ind
Target and prediction for disruptive shot
![Page 34: Training deep convolutional neural networks for ...jdh4/churchill_deep_learning_user... · Training deep convolutional neural networks for classification of multi-scale, nonlocal](https://reader034.vdocuments.mx/reader034/viewer/2022042209/5eadad9021ac7c6d046a2065/html5/thumbnails/34.jpg)
(previous results)
● F1-score is ~86%, accuracy ~91%, on individual time slices. ○ Additional regularization and/or training with larger dataset can
help improve.● Run on 16 GPU’s for 2 days.
![Page 35: Training deep convolutional neural networks for ...jdh4/churchill_deep_learning_user... · Training deep convolutional neural networks for classification of multi-scale, nonlocal](https://reader034.vdocuments.mx/reader034/viewer/2022042209/5eadad9021ac7c6d046a2065/html5/thumbnails/35.jpg)
Training deep convolutional neural networks for classification of multi-scale, nonlocal data in fusion energy, using the Pytorch framework
R. Michael Churchill, Princeton Plasma Physics Laboratory
Fusion plasmas exhibit phenomena over a wide range of time and spatial scales, and a number of different sensors are used in fusion energy experiments to observe these phenomena. Recently, many deep neural network architectures have been developed to work directly with such multi-scale, nonlocal data, including deep convolutional neural networks with dilated convolutions[1,2], and transformer networks[3]. I’ll show how these networks have been applied to the raw data of a high sample rate fusion experiment diagnostic (ECEi), to predict one of the most pressing issues with magnetic confinement fusion experiments: sudden, violent instabilities known as disruptions. Along the way I’ll share how Pytorch was used, and lessons learned with the framework, including custom data classes, custom DistributedSamplers, and distributed training on TigerGPU.
[1] A. Van Den Oord, et. al., ArXiv E-Prints (2016) arXiv:1609.03499.
[2] S. Bai, et. al., ArXiv E-Prints (2018) arXiv:1803.01271.
[3] R. Child, et. al., ArXiv E-Prints (2019) arXiv:1904.10509.