operationalizing security data science for the cloud: challenges, solutions, and trade-offs

40

Upload: ram-shankar-siva-kumar

Post on 11-Apr-2017

272 views

Category:

Engineering


1 download

TRANSCRIPT

Page 1: Operationalizing security data science for the cloud: Challenges, solutions, and trade-offs
Page 2: Operationalizing security data science for the cloud: Challenges, solutions, and trade-offs

Choosing the Learner

Binary Classification

Regression

Multiclass Classification

Unsupervised

Ranking

Anomaly Detection

Collaborative Filtering

Sequence Prediction

Reinforcement Learning

Representation Learning

Page 3: Operationalizing security data science for the cloud: Challenges, solutions, and trade-offs

Choosing the Learning Task •Binary Classification•Anomaly Detector •Ranking

Defining Data Input• Data Loaders (text, binary, SVM light, Transpose loader)

•Data type

Applying Data Transforms •Cleaning Missing data •Dealing with categorical data •Dealing with text data •Data Normalization

Choosing the Learner•Binary Classification •Regression •Multi class•Unsupervised •Ranking •Anomaly Detection •Collaborative Filtering •Sequence Prediction

Choosing Output •Save Features of a model? •Save the model as text? •Save Model as binary? •Save the per-instance results?

Choosing Run Options •Run Locally? •Run distributed on HPC cluster? •Are all paths in the experiment node-accessible?

•Priority? •Max Concurrent Process?

View Results•Too large? •Sampled

•Right size•Load data

•Histogram•Per feature•Sampled Instances

Debug and Visualize Errors •Error in Data•Error in Learner•Error in Optimizer •Error in Experimentation setup

Analyze Model Predictions •Root cause analysis •Grading

Page 4: Operationalizing security data science for the cloud: Challenges, solutions, and trade-offs

Choosing the Learning Task • Binary Classification• Anomaly Detector • Ranking

Defining Data Input• Data Loaders (text, binary, SVM

light, Transpose loader)• Data type

Applying Data Transforms • Cleaning Missing data • Dealing with categorical data • Dealing with text data • Data Normalization

Choosing the Learner• Binary Classification • Regression • Multi class• Unsupervised • Ranking • Anomaly Detection • Collaborative Filtering • Sequence Prediction

Choosing Output • Save Features of a model? • Save the model as text? • Save Model as binary? • Save the per-instance results?

Choosing Run Options • Run Locally? • Run distributed on HPC cluster?

• Are all paths in the experiment node-accessible?

• Priority? • Max Concurrent Process?

View Results

Debug and Visualize Errors

Analyze Model Predictions

Choosing the Learning Task • Binary Classification• Anomaly Detector • Ranking

Defining Data Input• Data Loaders (text, binary, SVM

light, Transpose loader)• Data type

Applying Data Transforms • Cleaning Missing data • Dealing with categorical data • Dealing with text data • Data Normalization

Choosing the Learner• Binary Classification • Regression • Multi class• Unsupervised • Ranking • Anomaly Detection • Collaborative Filtering • Sequence Prediction

Choosing Output • Save Features of a model? • Save the model as text? • Save Model as binary? • Save the per-instance results?

Choosing Run Options • Run Locally? • Run distributed on HPC cluster?

• Are all paths in the experiment node-accessible?

• Priority? • Max Concurrent Process?

View Results

Debug and Visualize Errors

Analyze Model Predictions

Choosing the Learning Task • Binary Classification• Anomaly Detector • Ranking

Defining Data Input• Data Loaders (text, binary, SVM

light, Transpose loader)• Data type

Applying Data Transforms • Cleaning Missing data • Dealing with categorical data • Dealing with text data • Data Normalization

Choosing the Learner• Binary Classification • Regression • Multi class• Unsupervised • Ranking • Anomaly Detection • Collaborative Filtering • Sequence Prediction

Choosing Output • Save Features of a model? • Save the model as text? • Save Model as binary? • Save the per-instance results?

Choosing Run Options • Run Locally? • Run distributed on HPC cluster?

• Are all paths in the experiment node-accessible?

• Priority? • Max Concurrent Process?

View Results

Debug and Visualize Errors

Analyze Model Predictions

Choosing the Learning Task • Binary Classification• Anomaly Detector • Ranking

Defining Data Input• Data Loaders (text, binary, SVM

light, Transpose loader)• Data type

Applying Data Transforms • Cleaning Missing data • Dealing with categorical data • Dealing with text data • Data Normalization

Choosing the Learner• Binary Classification • Regression • Multi class• Unsupervised • Ranking • Anomaly Detection • Collaborative Filtering • Sequence Prediction

Choosing Output • Save Features of a model? • Save the model as text? • Save Model as binary? • Save the per-instance results?

Choosing Run Options • Run Locally? • Run distributed on HPC cluster?

• Are all paths in the experiment node-accessible?

• Priority? • Max Concurrent Process?

View Results

Debug and Visualize Errors

Analyze Model Predictions

Choosing the Learning Task • Binary Classification• Anomaly Detector • Ranking

Defining Data Input• Data Loaders (text, binary, SVM

light, Transpose loader)• Data type

Applying Data Transforms • Cleaning Missing data • Dealing with categorical data • Dealing with text data • Data Normalization

Choosing the Learner• Binary Classification • Regression • Multi class• Unsupervised • Ranking • Anomaly Detection • Collaborative Filtering • Sequence Prediction

Choosing Output • Save Features of a model? • Save the model as text? • Save Model as binary? • Save the per-instance results?

Choosing Run Options • Run Locally? • Run distributed on HPC cluster?

• Are all paths in the experiment node-accessible?

• Priority? • Max Concurrent Process?

View Results

Debug and Visualize Errors

Analyze Model Predictions

Choosing the Learning Task • Binary Classification• Anomaly Detector • Ranking

Defining Data Input• Data Loaders (text, binary, SVM

light, Transpose loader)• Data type

Applying Data Transforms • Cleaning Missing data • Dealing with categorical data • Dealing with text data • Data Normalization

Choosing the Learner• Binary Classification • Regression • Multi class• Unsupervised • Ranking • Anomaly Detection • Collaborative Filtering • Sequence Prediction

Choosing Output • Save Features of a model? • Save the model as text? • Save Model as binary? • Save the per-instance results?

Choosing Run Options • Run Locally? • Run distributed on HPC cluster?

• Are all paths in the experiment node-accessible?

• Priority? • Max Concurrent Process?

View Results

Debug and Visualize Errors

Analyze Model Predictions

Choosing the Learning Task • Binary Classification• Anomaly Detector • Ranking

Defining Data Input• Data Loaders (text, binary, SVM

light, Transpose loader)• Data type

Applying Data Transforms • Cleaning Missing data • Dealing with categorical data • Dealing with text data • Data Normalization

Choosing the Learner• Binary Classification • Regression • Multi class• Unsupervised • Ranking • Anomaly Detection • Collaborative Filtering • Sequence Prediction

Choosing Output • Save Features of a model? • Save the model as text? • Save Model as binary? • Save the per-instance results?

Choosing Run Options • Run Locally? • Run distributed on HPC cluster?

• Are all paths in the experiment node-accessible?

• Priority? • Max Concurrent Process?

View Results

Debug and Visualize Errors

Analyze Model Predictions

Choosing the Learning Task • Binary Classification• Anomaly Detector • Ranking

Defining Data Input• Data Loaders (text, binary, SVM

light, Transpose loader)• Data type

Applying Data Transforms • Cleaning Missing data • Dealing with categorical data • Dealing with text data • Data Normalization

Choosing the Learner• Binary Classification • Regression • Multi class• Unsupervised • Ranking • Anomaly Detection • Collaborative Filtering • Sequence Prediction

Choosing Output • Save Features of a model? • Save the model as text? • Save Model as binary? • Save the per-instance results?

Choosing Run Options • Run Locally? • Run distributed on HPC cluster?

• Are all paths in the experiment node-accessible?

• Priority? • Max Concurrent Process?

View Results

Debug and Visualize Errors

Analyze Model Predictions

Choosing the Learning Task • Binary Classification• Anomaly Detector • Ranking

Defining Data Input• Data Loaders (text, binary, SVM

light, Transpose loader)• Data type

Applying Data Transforms • Cleaning Missing data • Dealing with categorical data • Dealing with text data • Data Normalization

Choosing the Learner• Binary Classification • Regression • Multi class• Unsupervised • Ranking • Anomaly Detection • Collaborative Filtering • Sequence Prediction

Choosing Output • Save Features of a model? • Save the model as text? • Save Model as binary? • Save the per-instance results?

Choosing Run Options • Run Locally? • Run distributed on HPC cluster?

• Are all paths in the experiment node-accessible?

• Priority? • Max Concurrent Process?

View Results

Debug and Visualize Errors

Analyze Model Predictions

Page 5: Operationalizing security data science for the cloud: Challenges, solutions, and trade-offs

Operationalizing Security Data Science

Ram Shankar Siva Kumar (@ram_ssk) Andrew Wicker

Microsoft

Page 6: Operationalizing security data science for the cloud: Challenges, solutions, and trade-offs

Security Data Science Projects are different • Traditional Programming Projects: spec/prototype implement ship• Data Science Projects: at each stage: relabel, refeaturize, retrain

• With data-driven features, all components drift:• Learner: more accurate/faster/lower-memory-footprint/…• Features: there are always better ones• Data: all distributions drift

• Security Projects: at each stage: assess threat, build detections, respond• All components drift:

• Threat: new attacks constantly come out; • Detection: newer log sources • Response: better tooling, newer TSGs

Intro Model Evaluation Model Deployment Model Scale-out Conclusion

So wait…when do we ship??

Page 7: Operationalizing security data science for the cloud: Challenges, solutions, and trade-offs

You ship when your solution is operational

Security Experts

Engineers

Legal

Service Engineers

Product Managers

Machine Learning Experts

Intro Model Evaluation Model Deployment Model Scale-out Conclusion

Page 8: Operationalizing security data science for the cloud: Challenges, solutions, and trade-offs

Operational is more than your “model is working”…

Detect unusual user activity to prevent data exfiltration

Detect unusual user activity using Application logs, with false positive rate < 1%, for all Azure customers, in near real-time

Intro Model Evaluation Model Deployment Model Scale-out Conclusion

Page 9: Operationalizing security data science for the cloud: Challenges, solutions, and trade-offs

Detect unusual user activity using Application logs, with false positive rate < 1%,for all Azure Customersin near real-time

=> The Problem => Data=> Model Evaluation => Model Deployment => Model Scale-out

Operationalize Security Data Science: Components

Intro Model Evaluation Model Deployment Model Scale-out Conclusion

Page 10: Operationalizing security data science for the cloud: Challenges, solutions, and trade-offs

Model EvaluationHow do you know your system works?

Page 11: Operationalizing security data science for the cloud: Challenges, solutions, and trade-offs

Intro Model Evaluation Model Deployment Model Scale-out Conclusion

Page 12: Operationalizing security data science for the cloud: Challenges, solutions, and trade-offs

Intro Model Evaluation Model Deployment Model Scale-out Conclusion

Page 13: Operationalizing security data science for the cloud: Challenges, solutions, and trade-offs

Model Evaluation Metrics

Model Usage Metrics

Model Validation Metrics

• E.g: False Positive• Makes your customer (and ergo,

your business) happy• How to measure this?

• E.g: Call Rate • How much is the model in use? • Makes your division happy• Collected by your pipeline after

deployment

• E.g: MSE, Reconstruction error….• How well does the model

generalize? • Makes the data scientist happy• Comes pre-built with ML

framework (Scikit learn, CNTK)

Intro Model Evaluation Model Deployment Model Scale-out Conclusion

Page 14: Operationalizing security data science for the cloud: Challenges, solutions, and trade-offs

Model Evaluation: How to gather Evaluation dataset? • Good: Use Benchmark datasets• List of curated datasets - www.secrepo.com • Con: Remember – attackers have ‘em too!

• Better: Use previous Indicators of Compromise• Honeypots, commercial IOC feeds• Steps:

• Gather confirmed IOCs• “Backprop” them through the generated alerts • This will help you calculate FP and FN

• Best: Curate your own dataset

Mor

e S

peci

alize

d

Intro Model Evaluation Model Deployment Model Scale-out Conclusion

Page 15: Operationalizing security data science for the cloud: Challenges, solutions, and trade-offs

Curating your own dataset options 1. Inject Fake Malicious data

Model

Synthetic data

Storage

How: Label data as “eviluser” and check if “eviluser” pops to the top of the reports every day

Pro: Low overhead—you don’t have to depend on a red team to test your detection

Con: The injected data may not be representative of true attacker activity

Storage

AlertingSystem C

Intro Model Evaluation Model Deployment Model Scale-out Conclusion

Page 16: Operationalizing security data science for the cloud: Challenges, solutions, and trade-offs

Curating your own dataset options 2. Employ Commonly Used Attacker Tools

How: Spin up a malicious process using Metasploit, Powersploit, or Veil in your environment. Look for traces in your logs

Pro: Easy to implement; your development team, with little tutorial, can run the tool, which would generate attack data in the logs.

Con: The machine learning system, will only learn to detect known attacker toolkits and not generalize over the attack methodology

Model

Storage

Tainted Data

AlertingSystem

Intro Model Evaluation Model Deployment Model Scale-out Conclusion

Page 17: Operationalizing security data science for the cloud: Challenges, solutions, and trade-offs

Curating your own dataset options 3. Red Team pentests your environment

How: a red team attacks the system and we try to get the logs from the attacks, as tainted data

Pro: Closest technique to real-world attacks

Con: Red Teams are point in time exercises; expensive

Model

Storage

Tainted Data

AlertingSystem

Intro Model Evaluation Model Deployment Model Scale-out Conclusion

Page 18: Operationalizing security data science for the cloud: Challenges, solutions, and trade-offs

Growing your dataset: Generative Adversarial Networks

Source: https://medium.com/@devnag/generative-adversarial-networks-gans-in-50-lines-of-code-pytorch-e81b79659e3f#.djcfc6eo0

Source: http://www.evolvingai.org/ppgn Intro Model Evaluation Model Deployment Model Scale-out Conclusion

Page 19: Operationalizing security data science for the cloud: Challenges, solutions, and trade-offs

Model DeploymentTailoring alerts based on customers geographic location

Page 20: Operationalizing security data science for the cloud: Challenges, solutions, and trade-offs

Azure has data centers all around the world!

Intro Model Evaluation Model Deployment Model Scale-out Conclusion

Page 21: Operationalizing security data science for the cloud: Challenges, solutions, and trade-offs

Localization affects Model Building • Privacy Laws vary across the board • IP address is treated as EII in some regions vs. not EII in other regions

• “Anyone logging into corporate network at midnight during the weekend is anomalous” • Weekend in Middle East != Weekend in Americas • Seasonality varies

Intro Model Evaluation Model Deployment Model Scale-out Conclusion

Page 22: Operationalizing security data science for the cloud: Challenges, solutions, and trade-offs

Option 1: Shotgun Deployment • How: Deploy same model code

across different regions • Pros:• Easy deployment;• Uniform metrics • Single TSG to debug all service incidents

• Cons: • Lose macro trends in favor of micro

trends• Model-Region Incompatibility Region

1Region

2Region

3

Model Model Model

Intro Model Evaluation Model Deployment Model Scale-out Conclusion

Page 23: Operationalizing security data science for the cloud: Challenges, solutions, and trade-offs

Option 2: Tiered Modeling • How:

• Federated Models • Each region is modeled separately • Results are scrubbed according to compliance laws

and privacy agreements• Scrubbed results are used as input to “Model Prime”

• Model Prime• Results are collated to search for global trends

• Pros: • Bespoke modeling for every region • Balance between Micro and Macro modeling

• Cons:• Complicated Deployment • Depending on the agreements, model-prime may

not be possible

Region1 Region2 Region3

Model 1

Model - Prime

Model 2 Model3

Scru

bbed

Re

sults

Scrubbed

Results

Scrubbed Results

Intro Model Evaluation Model Deployment Model Scale-out Conclusion

Page 24: Operationalizing security data science for the cloud: Challenges, solutions, and trade-offs

Model Scale-Out A Case Study

Page 25: Operationalizing security data science for the cloud: Challenges, solutions, and trade-offs

Detecting Malicious ActivitiesDetect risky or malicious activity in SharePoint Online activity logs with precision > 90% for all SPO users in near real-time

=> The Problem => Data=> Model Evaluation => Model Deployment => Model Scale-out

Intro Model Evaluation Model Deployment Model Scale-out Conclusion

Page 26: Operationalizing security data science for the cloud: Challenges, solutions, and trade-offs

Exploratory Analysis• Typical data science work:• Sample data• Script for preprocessing data• Summary statistics• Script for evaluating approaches

• All done locally on dev machine using R/Python• Facilitates quick turn around• Avoids having to debug at scale

Intro Model Evaluation Model Deployment Model Scale-out Conclusion

Page 27: Operationalizing security data science for the cloud: Challenges, solutions, and trade-offs

Model Evaluation • Labels from known incidents and investigations• Inject labels by mimicking malicious activity• SPO team helps us understand the malicious activity• Red team helps us simulate the malicious activity

• > 90% precision

Intro Model Evaluation Model Deployment Model Scale-out Conclusion

Page 28: Operationalizing security data science for the cloud: Challenges, solutions, and trade-offs

Model: Bayesian Network• Probabilistic Graphical Model• Related to GMM, CRF, MRF

• Represents variables and conditional independence assertions in a directed acyclic graph• Directed edges encode conditional

dependencies• Conditional probability distributions for

each variable

Burglary

Alarm

MaryCalls

JohnCalls

Earthquake

Intro Model Evaluation Model Deployment Model Scale-out Conclusion

Page 29: Operationalizing security data science for the cloud: Challenges, solutions, and trade-offs

Initial Prototype – v0.1• One activity model for all users• Run model in cloud environment with

Azure Worker Role• Storage accounts for input data and

output scores• Pros:

• Easy to manage• Small memory footprint

• Cons:• Does not scale• Low throughput

Data

Scores

Azure Worker Role

Activity Model

User 1User 2User 3

Intro Model Evaluation Model Deployment Model Scale-out Conclusion

Page 30: Operationalizing security data science for the cloud: Challenges, solutions, and trade-offs

Improved Approach• One model for each user• Personalized activity suspiciousness• Cluster low-activity users for better

model results

• Replace storage accounts with Azure Event Hubs• Low-latency, cloud-scale “queues”

Azure Worker Role

User 1User 2User 3

Event Hub

Event Hub

Model 1

Model 2

Model 3

Model n…

Scores

Intro Model Evaluation Model Deployment Model Scale-out Conclusion

Page 31: Operationalizing security data science for the cloud: Challenges, solutions, and trade-offs

Model Scale-Out: Memory

Azure Worker Role

User 1User 2User 3

Event Hub

Event Hub

Model 1

Model 2

Model 3

Model n

Scores

Model Storage

• Millions of per-user models• More than can fit in worker

role memory

• Store models in storage account• Load as needed

Intro Model Evaluation Model Deployment Model Scale-out Conclusion

Page 32: Operationalizing security data science for the cloud: Challenges, solutions, and trade-offs

Model Scale-Out: Latency

Azure Worker Role

User 1User 2User 3

Event Hub

Event Hub

Model 1

Model 2

Model 3

Model n…

ScoresModel Storage

Redis Cache

• Model storage account adds too much latency• Redis cache minimizes model

loading latency• LRU policy as we process user

activity events

Intro Model Evaluation Model Deployment Model Scale-out Conclusion

Page 33: Operationalizing security data science for the cloud: Challenges, solutions, and trade-offs

Data Compliance• Models can not use certain PII• Balkanized cloud environments• Tiered model development

• Resolve user information for UX• UserID -> User Name

Intro Model Evaluation Model Deployment Model Scale-out Conclusion

Page 34: Operationalizing security data science for the cloud: Challenges, solutions, and trade-offs

Data Compliance

Azure Worker Role

User 1User 2User 3

Event Hub

Event Hub

Model 1

Model 2

Model 3

Model n…

ScoresModel Storage

Redis Cache

User Account DB

Redis Cache

Intro Model Evaluation Model Deployment Model Scale-out Conclusion

Page 35: Operationalizing security data science for the cloud: Challenges, solutions, and trade-offs

Cloud Resource Competition

Signal 1

Signal 2

Signal 3

Signal m

User Account DB

Redis Cache

Intro Model Evaluation Model Deployment Model Scale-out Conclusion

Page 36: Operationalizing security data science for the cloud: Challenges, solutions, and trade-offs

Cloud Resource Competition

Signal 1

Signal 2

Signal 3

Signal m

User Account DB

Redis Cache

Intro Model Evaluation Model Deployment Model Scale-out Conclusion

Page 37: Operationalizing security data science for the cloud: Challenges, solutions, and trade-offs

From v0.1 to v1.0

Intro Model Evaluation Model Deployment Model Scale-out Conclusion

Page 38: Operationalizing security data science for the cloud: Challenges, solutions, and trade-offs

Conclusion

Page 39: Operationalizing security data science for the cloud: Challenges, solutions, and trade-offs

Operationalize Security Data Science: Components

=> Model Evaluation => Model Deployment => Model Scale-out

Intro Model Evaluation Model Deployment Model Scale-out Conclusion

Page 40: Operationalizing security data science for the cloud: Challenges, solutions, and trade-offs

The Rand TestTest to see if your Security Data Science solution operational

Answer Yes/No to the following: 1) Do you have an established pipeline to collect relevant security data? 2) Do you have established SLAs/data contracts with partner teams?3) Can you seamlessly update the model with new features and re-train? 4) Did you evaluate the model with real attack data? 5) Does your model respect different privacy laws, across all regions? 6) Do you account for model localization? 7) Is your model scalable, end to end? 8) Do you hold live site meetings about your solution? 9) Can security responders leverage the model for insights during an

investigation? 10) Do you have a framework to collect feedback from security

analysts/feedback on the results?

By @ram_ssk, Andrew Wicker

Score - Yes = 1 point

10

5

0

All systems Operational!

Houston! We have a problem

One small step…

Model Evaluation Model Deployment Model Scale-out

Intro Model Evaluation Model Deployment Model Scale-out Conclusion