machine learning in production with scikit-learn

33
Machine Learning in Production with scikit-learn JeKlukas - Data Engineer at Simple 1

Upload: jeff-klukas

Post on 21-Jan-2018

445 views

Category:

Technology


2 download

TRANSCRIPT

Page 1: Machine learning in production with scikit-learn

Machine Learning in Production with scikit-learnJeff Klukas - Data Engineer at Simple

1

Page 2: Machine learning in production with scikit-learn

2

Page 3: Machine learning in production with scikit-learn

3

• What’s the problem we’re solving?

• Why machine learning?

• Walkthrough of developing the model

• ✨ Live demo ✨

• Complications of moving this workflow to production

• Other potential approaches

Overview

Page 4: Machine learning in production with scikit-learn

4

Page 5: Machine learning in production with scikit-learn

5

Categorizing chats# SELECT subject, body, category FROM chats;

subject | body | category

--------------+---------------------------+----------------

Check deposit | Hi how are you? I was… | education

Lost Card | Can you send me a new… | urgent my transfer | My transfer of $10 isn’t… | education

Mail deposits | I have a large check… | education

urgent, customer education, new product, incidents, other

Page 6: Machine learning in production with scikit-learn

6

Page 7: Machine learning in production with scikit-learn

7

Page 8: Machine learning in production with scikit-learn

8

✨✨

✨ ✨

💖💖

💖

Machine Learning

💖

Page 9: Machine learning in production with scikit-learn

9

Page 10: Machine learning in production with scikit-learn

10

Page 11: Machine learning in production with scikit-learn

11

sklearn.pipelinefrom sklearn.pipeline import Pipeline from sklearn.feature_extraction.text import ( CountVectorizer, TfidfTransformer) from xgboost import XGBClassifier

stopwords, lemmatizer = …

pipeline = Pipeline([ ('preprocess', MessagePreprocessor(subject_weight=2)), ('text', TextProcessor(stopwords, lemmatizer)), ('vect', CountVectorizer()), ('tfidf', TfidfTransformer()), ('clf', XGBClassifier(objective='multi:softmax')), ])

Page 12: Machine learning in production with scikit-learn

12

Training the model

import pandas as pd

data_frame = pd.read_sql(redshift_connection, "SELECT category, subject, body FROM chats;")

X = data_frame[['subject', 'body']] y = data_frame['category']

X_train, X_test, y_train, y_test = \ train_test_split(X, y, test_size=0.33, random_state=0)

pipeline.fit(X_train, y_train)

Page 13: Machine learning in production with scikit-learn

13

Overfitting

https://en.wikipedia.org/wiki/Overfitting

Page 14: Machine learning in production with scikit-learn

14

Testing the model

from sklearn.metrics import classification_report

y_predicted = pipeline.predict(X_test)

print(classification_report(y_test, y_predicted))

precision recall f1-score support

class 0 0.67 1.00 0.80 2 class 1 0.00 0.00 0.00 1 class 2 1.00 0.50 0.67 2

avg / total 0.67 0.60 0.59 5

Page 15: Machine learning in production with scikit-learn

15

Serving the model in Flaskfrom flask import route, jsonify, request

@route('/chat-classification-api/messages', methods=['POST']) def classify_messages(): """Classify given chat messages""" messages = request.get_json()

y = pipeline.predict(messages)

# join class labels back with identifiers predictions = [{"chat_id": message["chat_id"], "class_label": label} for message, label in zip(messages, y)]

return jsonify(predictions)

Page 16: Machine learning in production with scikit-learn

16

Live Demo

Page 17: Machine learning in production with scikit-learn

17

How do we take this to production?

Page 18: Machine learning in production with scikit-learn

18

How do we take this to production?

Page 19: Machine learning in production with scikit-learn

Step 1Separate training and serving

19

Page 20: Machine learning in production with scikit-learn

20

Model Persistenceimport pickle import boto3

def write_to_s3(pipeline, key, bucket): s3_client = boto3.client("s3") kms_client = boto3.client("kms")

pkl = pickle.dumps(pipeline) enc_pkl = my_encrypt_function(pkl, kms_client)

s3_client.put_object(Bucket=s3_bucket, Key=key, Body=enc_pkl, ServerSideEncryption="AES256")

Page 21: Machine learning in production with scikit-learn

21

Model Persistenceimport pickle import boto3 from flask import current_app

def load_message_classifier(app): conf = app.config["MESSAGE_CLASSIFIER"]

s3_client = boto3.client("s3") kms_client = boto3.client("kms")

resp = s3_client.get_object(Bucket=conf[“bucket"], Key=conf["path"]) untrusted_bytes = resp["Body"].read() pkl = decrypt(untrusted_bytes, kms_client)

with app.app_context(): current_app._message_classifier = pickle.loads(pkl)

Page 22: Machine learning in production with scikit-learn

Step 2Provide an environment for batch training and evaluation

22

Page 23: Machine learning in production with scikit-learn

23

Optimizing Parameter Values

from sklearn.model_selection import GridSearchCV

params = { 'preprocess__subject_weight': (1, 2, 3, 4, 5), 'text__stopwords': ([], IGNORE, PUNCTUATION), 'vect__max_df': (0.5, 0.75, 1.0), 'vect__ngram_range': ((1, 1), (1, 2)), 'tfidf__use_idf': (True, False), 'tfidf__norm': ('l1', 'l2'), }

search = GridSearchCV(pipeline, params) search.fit(X_train, y_train)

Page 24: Machine learning in production with scikit-learn

Step 3Monitor performance, adapt to production load, degrade gracefully

24

Page 25: Machine learning in production with scikit-learn

Other Approaches

25

Page 26: Machine learning in production with scikit-learn

26

• How big is your team?

• How large of a problem space do you need to cover?

• What is your existing stack?

Considerations

Page 27: Machine learning in production with scikit-learn

27

Off-the-Shelf

Page 28: Machine learning in production with scikit-learn

28

Off-the-Shelf

Page 29: Machine learning in production with scikit-learn

29

Off-the-Shelf

Page 30: Machine learning in production with scikit-learn

30

• Train and test in a batch environment

• Output serialized model and classification report

• sklearn.pipeline is convenient for storing code+params

• Serve on-demand predictions separately

• Treat this like any production service

Recap

Page 31: Machine learning in production with scikit-learn

Thank You

31

Page 32: Machine learning in production with scikit-learn

32

Questions ✨💖

Page 33: Machine learning in production with scikit-learn

Machine Learning in Production with scikit-learnJeff Klukas - Data Engineer at Simple

33