applying machine learning to network security monitoring - baythreat 2013

35
Applying Machine Learning to Network Security Monitoring Alexandre Pinto Chief Data Scien4st | MLSec Project @alexcpsec @MLSecProject

Upload: alex-pinto

Post on 20-May-2015

6.938 views

Category:

Technology


2 download

DESCRIPTION

Video (at YouTube) - http://bit.ly/19TNSTF Big Data Security Analytics, Data Science and Machine Learning are a few of the new buzzwords that have invaded out industry of late. Most of what we hear are promises of an unicorn-laden, silver-bullet panacea by heavy-handed marketing folks, evoking an expected pushback from the most enlightened members of our community. This talk will help parse what we as a community need to know and understand about these concepts and help understand where the technical details and actual capabilities of those concepts and also where they fail and how they can be exploited and fooled by an attacker. The talk will also share results of the author's current ongoing research (on MLSec Project) of applying machine learning techniques to information secuirty monitoring.

TRANSCRIPT

Page 1: Applying Machine Learning to Network Security Monitoring - BayThreat 2013

Applying  Machine  Learning  to  Network  Security  Monitoring  

Alexandre  Pinto  Chief  Data  Scien4st  |  MLSec  Project    

@alexcpsec  @MLSecProject!

Page 2: Applying Machine Learning to Network Security Monitoring - BayThreat 2013

•  This  is  a  talk  about  BUILDING  not  breaking  –  NO  systems  were  harmed  on  the  development  of  this  talk.  –  This  is  NOT  about  1337  Android  Malware  

•  Only  thing  we  are  likely  to  break  here  is  the  4me  limit  on  the  talk  

 •  This  talk  includes  more  MATH  than  the  daily  recommended  

intake  by  the  FDA.  

•  All  stunts  described  in  this  talk  were  performed  by  trained  professionals.!

WARNING!  

Page 3: Applying Machine Learning to Network Security Monitoring - BayThreat 2013

•  13  years  in  Informa4on  Security,  done  a  liRle  bit  of  everything.  •  Past  7  or  so  years  leading  security  consultancy  and  monitoring  

teams  in  Brazil,  London  and  the  US.  –  If  there  is  any  way  a  SIEM  can  hurt  you,  it  did  to  me.  

•  Researching  machine  learning  and  data  science  in  general  for  the  past  year  or  so  and  presen4ng  about  the  intersec4on  of  it  and  Infosec  throughout  the  year.  

•  Created  MLSec  Project  in  July  2013  to  give  structure  to  the  research  being  done.  

Who's  Alex?  

Page 4: Applying Machine Learning to Network Security Monitoring - BayThreat 2013

•  Defini4ons  •  Big  Data  •  Data  Science  •  Machine  Learning  

•  Y  U  DO  DIS?  •  Network  Security  Monitoring  •  PoC  ||  GTFO  •  Feature  Intui4on  •  How  to  get  started?  

Agenda  

Page 5: Applying Machine Learning to Network Security Monitoring - BayThreat 2013

Big  Data  +  Machine  Learning  +  Data  Science  

Page 6: Applying Machine Learning to Network Security Monitoring - BayThreat 2013

Big  Data  +  Machine  Learning  +  Data  Science  

Page 7: Applying Machine Learning to Network Security Monitoring - BayThreat 2013

Big  Data  

Page 8: Applying Machine Learning to Network Security Monitoring - BayThreat 2013

(Security)  Data  ScienEst  

Data  Science  Venn  Diagram  by  Drew  Conway!

•  “Data  Scien4st  (n.):  Person  who  is  beRer  at  sta4s4cs  than  any  so`ware  engineer  and  beRer  at  so`ware  engineering  than  any  sta4s4cian.”

 -­‐-­‐  Josh  Willis,  Cloudera  

Page 9: Applying Machine Learning to Network Security Monitoring - BayThreat 2013

•  “Machine  learning  systems  automa4cally  learn  programs  from  data”  (*)  

•  You  don’t  really  code  the  program,  but  it  is  inferred  from  data.  

•  Intui4on  of  trying  to  mimic  the  way  the  brain  learns:    that's  where  terms  like  ar#ficial  intelligence  come  from.!

Enter  Machine  Learning  

(*)  CACM  55(10)  -­‐  A  Few  Useful  Things  to  Know  about  Machine  Learning  (Domingos  2012)  

Page 10: Applying Machine Learning to Network Security Monitoring - BayThreat 2013

•  Supervised  Learning:  –  Classifica4on  (NN,  SVM,  Naïve  Bayes)  

–  Regression  (linear,  logis4c)!

Kinds  of  Machine  Learning  

Source  –  scikit-­‐learn.github.io/scikit-­‐learn-­‐tutorial/general_concepts.html  

•  Unsupervised  Learning  :  –  Clustering  (k-­‐means)  –  Decomposi4on  (PCA,  SVD)  

Page 11: Applying Machine Learning to Network Security Monitoring - BayThreat 2013

ClassificaEon  Example  

VS!

Page 12: Applying Machine Learning to Network Security Monitoring - BayThreat 2013

Regression  Example  

Page 13: Applying Machine Learning to Network Security Monitoring - BayThreat 2013

ConsideraEons  on  Data  Gathering  •  Models  will  (generally)  get  beRer  with  more  data  

–  But  we  always  have  to  consider  bias  and  variance  as  we  select  our  data  points  

–  Also  adversaries  –  we  may  be  force  fed  “bad  data”,  find  signal  in  weird  noise  or  design  bad  (or  exploitable)  features  

•  “I’ve  got  99  problems,  but  data  ain’t  one”!

Domingos,  2012   Abu-­‐Mostafa,  Caltech,  2012  

Page 14: Applying Machine Learning to Network Security Monitoring - BayThreat 2013

•  Sales!

ApplicaEons  of  Machine  Learning  

•  Trading  

•  Image  and  Voice  Recogni4on  

Page 15: Applying Machine Learning to Network Security Monitoring - BayThreat 2013

•  Common  reac4ons  from  Security  Professionals:  •  “Eh,  cool…”  *blank  stare*  *walks  away*  •  “Are  you  high,  bro?”  

Y  U  DO  DIS?  

•  “Why  aren’t  you  doing  some  cool  research  like  Android  Malware?”  

Page 16: Applying Machine Learning to Network Security Monitoring - BayThreat 2013

Math  is  HARD  

Page 17: Applying Machine Learning to Network Security Monitoring - BayThreat 2013

•  Fraud  detec4on  systems:  –  Is  what  he  just  did  consistent  with  past  behavior?  

•  Network  anomaly  detec4on  (?):  –  More  like  bad  sta4s4cal  analysis  –  Did  not  advance  a  lot,  IMO  

•  Predic4ng  likelihood  of  aRack  actors  –  Create  different  predic4ve  models  and  chain  them  to  gain  more  confidence  in  each  step.!

Security  ApplicaEons  of  ML  

•  SPAM  filters  

Page 18: Applying Machine Learning to Network Security Monitoring - BayThreat 2013

•  Adversaries  -­‐  Exploi4ng  the  learning  process  •  Understand  the  model,  understand  the  machine,  and  you  can  circumvent  it  

•  Something  InfoSec  community  knows  very  well  •  Any  predic4ve  model  on  InfoSec  will  be  pushed  to  the  limit  

•  Again,  think  back  on  the    way  SPAM  engines  evolved.!

ConsideraEons  on  Data  Gathering  

Page 19: Applying Machine Learning to Network Security Monitoring - BayThreat 2013

Network  Security  Monitoring  

Page 20: Applying Machine Learning to Network Security Monitoring - BayThreat 2013

•  Rules  in  a  SIEM  solu4on  invariably  are:  –  “Something”  has  happened  “x”  4mes;  –  “Something”  has  happened  and  other  “something2”  has  happened,  with  some  rela4onship  (4me,  same  fields,  etc)  between  them.  

•  Configuring  SIEM  =  iterate  on  combina4ons  un4l:  –  Customer  or  management  is  foole..  I  mean  sa4sfied;    –  Consul4ng  money  runs  out  

•  Behavioral  rules  (anomaly  detec4on)  helps  a  bit  with  the  “x”s,  but  s4ll,  very  laborious  and  4me  consuming.!

CorrelaEon  Rules:  A  Primer  

Page 21: Applying Machine Learning to Network Security Monitoring - BayThreat 2013

•  Alert-­‐based:  –  “Tradi4onal”  log  management  –  SIEM  –  Using  “Threat  Intelligence”  (i.e  blacklists)  for  about  a  year  or  so  

–  Lack  of  context  –  Low  effec4veness  –  You  get  the  results  handed  over  to  you  

Kinds  of  Network  Security  Monitoring  

•  Explora4on-­‐based:  –  Network  Forensics  tools  (2/3  years  ago)  

–  Elas4c  Search  based  LM  systems  

–  High  effec4veness  –  Lots  of  people  necessary  –  Lots  of  HIGHLY  trained  people  

•  Big  Data  Security  Analy4cs  (BDSA):  –  Run  explora4on-­‐based  monitoring  on  Hadoop  –  More  like  Big  Data  Security  Monitoring  (BDSM)  

Page 22: Applying Machine Learning to Network Security Monitoring - BayThreat 2013

Alert-­‐based  +  ExploraEon-­‐based  

Page 23: Applying Machine Learning to Network Security Monitoring - BayThreat 2013

A  wild  army  of  robots  appears  

Page 24: Applying Machine Learning to Network Security Monitoring - BayThreat 2013

Using  robots  to  catch  bad  guys  

Page 25: Applying Machine Learning to Network Security Monitoring - BayThreat 2013

•  We  developed  a  set  of  algorithms  to  detect  malicious  behavior  from  log  entries  of  firewall  blocks  

•  Over  6  months  of  data  from  SANS  DShield  (thanks,  guys!)    •  A`er  a  lot  of  sta4s4cal-­‐based  math  (true  posi4ve  ra4o,  true  nega4ve  ra4o,  odds  likelihood),  it  could  pinpoint  actors  that  would  be  13x-­‐18x  more  likely  to  aRack  you.  

•  Today  more  like  30x  on  the  SANS  data,  and  finding  around  80%  of  “badness”  in  par4cipant  deployments.!

PoC  ||  GTFO  

Page 26: Applying Machine Learning to Network Security Monitoring - BayThreat 2013

•  Assump4ons  to  aggregate  the  data    •  Correla4on  /  proximity  /  similarity  BY  BEHAVIOR  •  “Bad  Neighborhoods”  concept:    –  Spamhaus  x  CyberBunker  –  Google  Report  (June  2013)  – Moura  2013  

•  Group  by  Geoloca4on  •  Group  by  Netblock  (/16,  /24)  •  Group  by  ASN    –  (thanks,  Team  Cymru)!

Feature  IntuiEon:  IP  Proximity  

Page 27: Applying Machine Learning to Network Security Monitoring - BayThreat 2013

Map  of  the  Internet  

•  (Hilbert  Curve)  •  Block  port  22    •  2013-­‐07-­‐20  

0  

10  

127  

MULTICAST  AND  FRIENDS  

CN  

RU  

CN,  BR,  TH  

You  are  here!

Page 28: Applying Machine Learning to Network Security Monitoring - BayThreat 2013

•  Even  bad  neighborhoods  renovate:  –  ARackers  may  change  ISPs/proxies  –  Botnets  may  be  shut  down  /  relocate  –  A  liRle  paranoia  is  Ok,  but  not  EVERYONE  is  out  to  get  you  (at  least  not  all  at  once)!

Feature  IntuiEon:  Temporal  Decay  

•  As  days  pass,  let's  forget,  bit  by  bit,  who  aRacked  

•  Last  4me  I  saw  this  actor,  and  how  o`en  did  I  see  them!

Page 29: Applying Machine Learning to Network Security Monitoring - BayThreat 2013

•  Behavior:  block  on  port  22  

•  Trial  inference  on  100k  IP  addresses  per  Class  A  subnet  

•  Logarithm    scale:  brightest  4les  are  10  to  1000  4mes  more  likely  to  aRack.  

MLSec  Project  

Page 30: Applying Machine Learning to Network Security Monitoring - BayThreat 2013

•  Who  resolves  to  this  IP  address?  •  Number  of  domains  that  resolve  to  the  IP  address  •  Distribu4on  of  their  life4me  •  Entropy,  size,  ccTLDs  •  Registrar  informa4on  

•  Reverse  DNS  informa4on…  •  History  of  DNS  registra4on…  •  (Thanks,  DNSDB!)  

Feature  IntuiEon:  DNS  features  

Page 31: Applying Machine Learning to Network Security Monitoring - BayThreat 2013

•  YAY!  We  have  a  bunch  of  numbers  per  IP  address/domain!  •  How  do  you  define  what  is  malicious  or  not?  

•  “Advanced  exper4se  in  both  informa4on  security  and  data  science  will  be  a  necessary  ingredient  in  enabling  accurate  discrimina4on  between  malicious  and  benign  ac4vity.  “  

       -­‐  Anton  Chuvakin,  Gartner  

•  Kinda  easy  for  security  tools  (if  you  trust  them)  •  Web  applica4on  logs  need  deeper  sta4s4cal  analysis  •  Not  normal  /  standard  devia4on  thing  

 !

Training  the  Model  

Page 32: Applying Machine Learning to Network Security Monitoring - BayThreat 2013

•  Programming  is  a  must  (Python  /  R)  •  Sta4s4cal  knowledge  keeps  you  from  making  dumb  mistakes  

•  Specific  machine  learning  courses  and  books:  –  Coursera  (ML/  Data  Analysis  /  Data  Science)  

•  Prac4ce,  Prac4ce,  Prac4ce:  –  Explore  your  data!  –  (Security  Onion)  –  Kaggle  –  KDD,  VAST,  VizSec!

How  do  I  get  started  on  this?  

Page 33: Applying Machine Learning to Network Security Monitoring - BayThreat 2013

MLSec  Project  

•  Sign  up,  send  logs,  receive  reports  generated  by  machine  learning  models!  

•  Working  with  several  companies  on  trying  out  these  models  on  their  environment  with  their  data  

•  We  are  hiring  (KINDA)  

•  Visit  h]ps://www.mlsecproject.org  ,  message  @MLSecProject  or  just  e-­‐mail  me.!

Page 34: Applying Machine Learning to Network Security Monitoring - BayThreat 2013

•  Inbound  aRacks  on  exposed  services  (DEFCON/BH  2013):  –  Informa4on  from  inbound  connec4ons  on  firewalls,  IPS,  WAFs  –  Feature  extrac4on  and  supervised  learning      

•  Malware  Distribu4on  and  Botnets:  –  Informa4on  from  outbound  connec4ons  on  firewalls,  DNS  and  Web  Proxy  

–  Ini4al  labeling  provided  by  intelligence  feeds  and  AV/an4-­‐malware  –  Semi-­‐supervised  learning  involved  

•  Kill-­‐chain  Ensemble  Models:  –  Increased  precision  by  composing  different  behaviors  – Web  server  path  -­‐>  go  through  Firewall,  then  IPS,  then  WAF  –  Early  confirma4on  of  aRack  failure  or  success  

MLSec  Project  -­‐  Current  Research  

Page 35: Applying Machine Learning to Network Security Monitoring - BayThreat 2013

Thanks!  •  Q&A?  •  Feedback?  

Alexandre  Pinto    @alexcpsec  

@MLSecProject  hRps://www.mlsecproject.org/  

"  Essen4ally,  all  models  are  wrong,  but  some  are  useful."                        -­‐  George  E.  P.  Box