micro architecture for machine learning big data

27
Micro Architecture for Machine Learning/Big Data Adam Gibson - East Bay JUG Sep 2015

Upload: adam-gibson

Post on 10-Feb-2017

3.193 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Micro Architecture for Machine Learning/Big Data

Adam Gibson - East Bay JUG Sep 2015

Me:

CTO and Co founder of Skymind

GU Faculty Advisor

Book Author - Deep Learning: A Practitioner's Approach

Micro Services

Benefits/Trade offs

Monolithic - One app Easy to update at first

Microservices - meant for scale, modular components, easier for bigger teams

Software Development Life Cycle

Starts small

Early - Minimum Viable product/ Move fast break things

Mid stage - Company will actually last - now let’s focus a bit on scale might have growing pains

Late stage - too many cooks in the kitchen needs separation of concerns

Credit: http://startupquote.com/post/1624569753

No Silver Bullet - Different stages/sizes

In the SDLC - different incentives for different teams/companies of different sizes

Microservices can also go wrong:http://martinfowler.com/articles/microservice-trade-offs.html

Takeaway - Do what’s right for your product

Data Science and ML

Stats + Software engineering

(Not actually true: We WISH it was)

Analytics and Products

A/B Testing (does this button increase my revenue/CTR?)

Data/Analytics Products: Think BI + some machine learning depending on the application

Machine Learning

Given some observations make some inference based on trends in data

Label stuff (supervised learning)

Predict something (regression)

Group stuff (clustering)

Regression

Given some attributes (features) predict some continuous value

Attributes of house - predict price

Pricing movements in stock market

Classification

Churn Prediction (Will churn or not churn)Big Spender or not big spammerSpam or not spamFraud or not FraudPicture of? (cat/dog/cow)

Clustering

Work flowSome problem needs to be solved

Exploratory Data Analysis

Extract Transform LoadNormalize (maybe a part of the loading process depending on data warehousing process if any)

Visualize

Determine way to solve problem if any

Get some cursory results

EDA Cont.

Validate results

Scope out problem

Deploy results

Parallels to software engineeringBoth have a lifecuylce to follow with different standards for different stages

Data science teams are also similar in that you start by building the data infrastructure adding on the analysis later

Machine Learning + Software

Machine learning is integrated in to software

Often a second class citizen in engineering standards

Data scientists don’t often think about production

ML Software

Data pipelines (complete process of data all the way through to modeL) are often messy/adhoc)

To be done right involves 2 + teams of data engineers AND scientists

Data engineers don’t often know much ML and scientists don’t know much about software infrastructure

Components

ETL/Vectorization

Data Manipulation

Data Exploration

Model building

Model integration with serving system (lambda architecture)

Should be separated

ETL not tied to machine learning models

Everything should be swappable

Should run interchangeably on different platforms

Deeplearning4j

Questions? [email protected]