wrangleconf big data malaysia 2016
TRANSCRIPT
![Page 1: Wrangleconf Big Data Malaysia 2016](https://reader034.vdocuments.mx/reader034/viewer/2022051318/587504b21a28ab29208b5f6d/html5/thumbnails/1.jpg)
![Page 2: Wrangleconf Big Data Malaysia 2016](https://reader034.vdocuments.mx/reader034/viewer/2022051318/587504b21a28ab29208b5f6d/html5/thumbnails/2.jpg)
Overview
● Brief Skymind Intro● Deep Learning outside research● Core trends for ROI in deep learning● Anomaly Detection with deep learning● Simbox fraud detection for telco● Network Intrusion● Fintech securities churn prediction● Real time corporate campus security: Detecting
dangerous objects
![Page 3: Wrangleconf Big Data Malaysia 2016](https://reader034.vdocuments.mx/reader034/viewer/2022051318/587504b21a28ab29208b5f6d/html5/thumbnails/3.jpg)
Distributed Deep RL on Spark
We builtDeeplearning4j
![Page 4: Wrangleconf Big Data Malaysia 2016](https://reader034.vdocuments.mx/reader034/viewer/2022051318/587504b21a28ab29208b5f6d/html5/thumbnails/4.jpg)
SKYMIND INTELLIGENCE LAYER (SKIL)REFERENCE ARCHITECTURE
![Page 5: Wrangleconf Big Data Malaysia 2016](https://reader034.vdocuments.mx/reader034/viewer/2022051318/587504b21a28ab29208b5f6d/html5/thumbnails/5.jpg)
Deep Learning outside research
● Too much hype● Most companies rarely do machine learning let
alone deep learning● Beginners try to jump to deep learning after
andrew ng’s coursera class without first principles
This is not deep learning.
This is deep learning.
![Page 6: Wrangleconf Big Data Malaysia 2016](https://reader034.vdocuments.mx/reader034/viewer/2022051318/587504b21a28ab29208b5f6d/html5/thumbnails/6.jpg)
Deep Learning outside research
● Mostly python and r on kaggle● Many learning from udacity● Most deep learning is research stage/enthusiast● Salaried engineers doing DL mostly publishing
papers● Large fight for talent (see google fellowship)
![Page 7: Wrangleconf Big Data Malaysia 2016](https://reader034.vdocuments.mx/reader034/viewer/2022051318/587504b21a28ab29208b5f6d/html5/thumbnails/7.jpg)
Deep Learning outside research
● Deep Learning hasn’t penetrated the fortune 2000
● Fortune 2000 wants ROI not cat pictures● Many organizations just NOW starting to take
software seriously let alone data science● Use cases for deep learning still not widely
understood● Large fight for talent (see google fellowship)
![Page 8: Wrangleconf Big Data Malaysia 2016](https://reader034.vdocuments.mx/reader034/viewer/2022051318/587504b21a28ab29208b5f6d/html5/thumbnails/8.jpg)
Core trends for ROI in DL
● Mostly funded by adtech companies● Companies doing DL have data from lots of
media data (audio,image,video)● Many companies using DL for ad targeting ● Best use cases are targeting understanding large
scale hidden patterns in data (often cross domain)
● Time series has largely been ignored
![Page 9: Wrangleconf Big Data Malaysia 2016](https://reader034.vdocuments.mx/reader034/viewer/2022051318/587504b21a28ab29208b5f6d/html5/thumbnails/9.jpg)
Core trends for ROI in DL
● Initial first attempts at deep learning following papers (no other examples)
● Many companies end up sticking to simpler techniques after trying DL
● Expectations for DL tend to match hype not reality
● Some rare cases exist outside this trend (mainly in asia)
![Page 10: Wrangleconf Big Data Malaysia 2016](https://reader034.vdocuments.mx/reader034/viewer/2022051318/587504b21a28ab29208b5f6d/html5/thumbnails/10.jpg)
Core trends for ROI in DL
For more trends see: https://www.oreilly.com/ideas/the-current-state-of-machine-intelligence-3-0
![Page 11: Wrangleconf Big Data Malaysia 2016](https://reader034.vdocuments.mx/reader034/viewer/2022051318/587504b21a28ab29208b5f6d/html5/thumbnails/11.jpg)
Anomaly Detection
● “Find the needle in the haystack”● “Find the bad guy”● “The machines about to break!”● “Find the next market rally”● “Take action on said anomaly”
![Page 12: Wrangleconf Big Data Malaysia 2016](https://reader034.vdocuments.mx/reader034/viewer/2022051318/587504b21a28ab29208b5f6d/html5/thumbnails/12.jpg)
Anomaly Detection with deep learning
● Both unsupervised and supervised techniques● LSTMs (time series neural net)● Autoencoders (unsupervised)● Expectations for DL tend to match hype not
reality● Some rare cases exist outside this trend (mainly
in asia)
LSTM
AutoEncoder
![Page 13: Wrangleconf Big Data Malaysia 2016](https://reader034.vdocuments.mx/reader034/viewer/2022051318/587504b21a28ab29208b5f6d/html5/thumbnails/13.jpg)
Simbox fraud for telco
● Costs telco over 3 billion yearly ● Route calls for free over a carrier network● Need to mine raw call detail records to find● Find and cluster fraudulent CDRs with
autoencoders (unsupervised)● Beats current rules and supervised based
approaches
![Page 14: Wrangleconf Big Data Malaysia 2016](https://reader034.vdocuments.mx/reader034/viewer/2022051318/587504b21a28ab29208b5f6d/html5/thumbnails/14.jpg)
Network Intrusion
● Raw web log traffic ● Detect attacks at points of origin ● Typically supervised learning● Goal: Classify raw time series to find attacks● Optional: Detect *kind* of attack
![Page 15: Wrangleconf Big Data Malaysia 2016](https://reader034.vdocuments.mx/reader034/viewer/2022051318/587504b21a28ab29208b5f6d/html5/thumbnails/15.jpg)
Fintech securities churn prediction
● Predict when user is going to leaveservice● Using recurrent nets find likelihood of leaving ● Using lift curves identify budget for sending
discounts to percentage of users “worth” saving● Optional: use autoencoders with kmeans toidentify groups of users wanting to leave
![Page 16: Wrangleconf Big Data Malaysia 2016](https://reader034.vdocuments.mx/reader034/viewer/2022051318/587504b21a28ab29208b5f6d/html5/thumbnails/16.jpg)
Corporate campus security
● At 30 FPS or more find dangerous objects in a crowd
● Identify a target object and send immediate report
● Uses variants of Convolutional nets● Imagine hooking this up to a real camera
![Page 17: Wrangleconf Big Data Malaysia 2016](https://reader034.vdocuments.mx/reader034/viewer/2022051318/587504b21a28ab29208b5f6d/html5/thumbnails/17.jpg)
Conclusion
● Deep Learning still young● Many use cases not being tried● Research is moving faster every year● Talent still hard to find● Will become more common with time
![Page 18: Wrangleconf Big Data Malaysia 2016](https://reader034.vdocuments.mx/reader034/viewer/2022051318/587504b21a28ab29208b5f6d/html5/thumbnails/18.jpg)