building deep learning powered big data: spark summit east talk by jiao wang and yiheng wang
Post on 12-Apr-2017
200 Views
Preview:
TRANSCRIPT
Intel® Confidential — INTERNAL USE ONLY
Building Deep Learning PoweredBig Data Analytics using BigDLYiheng Wang, Jennie Wang
BDT / SSG / Intel
3
Big data boost deep learning Production ML/DL system is Complex
Why BigDL?
Andrew NG, Baidu, NIPS 2015 Paper
4
Why BigDL?
BigDL open sourced on Dec 30, 2016
§ Write deep learning applications as standard Spark programs
§ Run on top of existing Spark or Hadoop clusters(No change to the clusters)
§ Rich deep learning support
§ High performance powered by Intel MKL and multi-threaded programming
§ Efficient scale-out with an all-reduce communications on Spark
6
Fraud Transaction Detection
Fraud transaction detection is very import to finance companies. A good fraud detection solution can save a lot of money.
ML solution challenge§ Data cleaning§ Feature engineering§ Unbalanced data § Hyper parameter
7
Fraud Transaction Detection
§ History data is stored on Hive
§ Easily data preprocess/cleaning with Spark-SQL
§ Spark ML pipeline for complex feature engineering
§ Under sample + Bagging solve unbalance problem
§ Grid search for hyper parameter tuning
Powered by BigDL
8
Product Defect Detection and Classification
Data source
§ Cameras installed on manufactory pipeline
Task
§ Detect defect from the photos
§ Classify the defect
11
Fast-RCNN
§ Faster-RCNN is a popular object detection framework
§ It share the features between detection network and region proposal network
Ren, Shaoqing, et al. "Faster r-cnn: Towards real-time object detection with region proposal networks." Advances in neural information processing systems. 2015.
12
Object Detection with Fast-RCNN
See the code at: https://github.com/intel-analytics/BigDL/pull/387
13
Language Model with RNN
TextPreprocessing
RNNModelTraining
SentenceGenerating
§ Sentence Tokenizer
§ Dictionary Building
§ Input Document Transformer
Generated sentences with regard to trigger words.
14
RNN Model
See the code at: https://github.com/intel-analytics/BigDL/tree/master/dl/src/main/scala/com/intel/analytics/bigdl/models/rnn
15
Learn from Shakespeare Poems Output of RNN:
Long live the King . The King and Queen , and the Strange of the Veils of the rhapsodic . and grapple, and the entreatments of the pressure .
Upon her head , and in the world ? `` Oh, the gods ! O Jove ! To whom the king : `` O friends !
Her hair, nor loose ! If , my lord , and the groundlings of the skies . jocund and Tasso in the Staggering of the Mankind . and
16
Fine-tune Caffe/Torch Model on Spark
BigDL Model
Fine-tune
Melancholy
Sunny
Macro
CaffeModel
BigDLModel
TorchModel
Load
• Train on different dataset based on pre-trained model• Predict image style instead of type• Save training time and improve accuracy
Image source: https://www.flickr.com/photos/
18
Integration with Spark Streaming
Spark Streaming RDDs
EvaluatorBigDLModel
StreamWriter
BigDL integarates with Spark Streaming for runtime training and prediction
HDFS/S3
Kafka
Flume
Kinesis
Train
Predict
19
Tight Integration with SparkSQLand DataFrames
df.select($’image’).withColumn(
“image_type”, ImgClassifier(“image”))
.filter($’image_type’ == ‘dog’)
.show()
Image classification on ImageNet(http://www.image-net.org)
20
More BigDL Examples
BigDL provide examples to help developer play with bigdl and start with popular models.
https://github.com/intel-analytics/BigDL/wiki/Examples
Models(Train and Inference example code):§ LeNet, Inception, VGG, ResNet, RNN, Auto-encoder
Examples:
• Text Classification
• Image Classification
• Load Torch/Caffe model
top related