how machine learning can fence-in bad actors
TRANSCRIPT
© 2020 SPLUNK INC.
How Machine Learning Can Fence-in Bad ActorsWSTA | September 2020
Julio Gomez
Financial Services Strategist, [email protected]
During the course of this presentation, we may make forward‐looking statements
regarding future events or plans of the company. We caution you that such statements
reflect our current expectations and estimates based on factors currently known to us
and that actual events or results may differ materially. The forward-looking statements
made in the this presentation are being made as of the time and date of its live
presentation. If reviewed after its live presentation, it may not contain current or
accurate information. We do not assume any obligation to update
any forward‐looking statements made herein.
In addition, any information about our roadmap outlines our general product direction
and is subject to change at any time without notice. It is for informational purposes only,
and shall not be incorporated into any contract or other commitment. Splunk undertakes
no obligation either to develop the features or functionalities described or to include any
such feature or functionality in a future release.
Splunk, Splunk>, Data-to-Everything, D2E and Turn Data Into Doing are trademarks and registered trademarks of Splunk Inc. in the
United States and other countries. All other brand names, product names or trademarks belong to their respective owners. © 2020
Splunk Inc. All rights reserved
Forward-LookingStatements
© 2020 SPLUNK INC.
© 2020 SPLUNK INC.
Agenda
● Key challenges with predictive cybersecurity defense
● Opportunity for data analytics, machine learning, and AI
● Lessons learned from the field
© 2020 SPLUNK INC.
Why Do Organizations Struggle to Get “Predictive” with Security? Data
Volume, Variety, Velocity, and
Veracity
MachineLearning
Expertise
Difficult to obtain
Analytic Skill Set and Data Literacy
Data Accessand
Integrationis Difficult
Security Domain
Expertise
Adequate Tools
ComplicatedData
Structure
© 2020 SPLUNK INC.
What Data Scientists Really DoData Preparation accounts for about 80% of the work of data scientists
“Cleaning Big Data: Most Time-Consuming, Least Enjoyable Data Science Task, Survey Says”, Forbes Mar 23, 2016
© 2020 SPLUNK INC.
Statistical Analysis Is A Start
Account Enumeration & Credential Testing● Abnormally high number of failed logins from device or IP● Abnormally high number of account access from device or IP
ATM Transactions & Wire Transfers● Anomalously high number of transactions by merchant● Anomalously high transaction by account
Data Exfiltration & Access● User with high reads & writes to database compared to others in the same role● Servers or users with high bytes_out in comparison to peers
IP Theft● High number of requests to API service ● Speed violations: accounts requesting data at machine speed
© 2020 SPLUNK INC.
Prediction & Time Series Forecasting
Use Case: As a network analyst for an Investment Bank, I want to forecast ip traffic patterns to ensure proper resourcing and identify
denial of service attacks.
© 2020 SPLUNK INC.
Enterprises Want Answers from their Data
► Deviation from past behavior
► Deviation from peers
► (aka Multivariate AD or Cohesive AD)
► Unusual change in features
► Identify peer groups
► Event Correlation
► Reduce alert noise
► Behavioral Analytics
Anomaly detection Predictive Analytics Clustering
► Predict Service Health Score/Churn
► Predicting Events
► Trend Forecasting
► Detecting influencing entities
► Early warning of failure
© 2020 SPLUNK INC.
Every Search Can Use Machine Learning
Analytics
SOAR & 3rd
party
applications
Smartphones
and Devices
Tickets
Send an
File a
ticket
Send a text
Flash lights
Trigger
automated
response &
other
integrations
AlertReal Time
OT
Industrial Assets
IT
Consumer and
Mobile Devices Machine Learning
Tool
© 2020 SPLUNK INC.
Supervised Machine Learning Process
Data Prep /
Pre-
Processing
Attribute
Selection
Apply
Predictions
against the
Test Set
Measure
Model
Accuracy
Modify
Attribute
Selection
Re-Run the
Model
Measure
Model
Accuracy
Deploy the Model
to Unseen Data
Structured
Data
Unstructured
Data
Training Set
Test Set
Attribute
Selection
Model Creation
Predictor
Field
Algorithm
Selection
Run the
Model (Fit)
Clean, Transform, Data
Validate & Refine the Model Productionize
© 2020 SPLUNK INC.
Benefits of an ML Strategy for Securityand the opportunity for technology and data
BI Rule Based Clustering ML Based Clustering
Possible
Fraud Ring
© 2020 SPLUNK INC.
Lessons Learned from the FieldCustomer Success Stories
● ML used to help automate threat huntingand 90% of security metrics process in two months
● Security analysts given back 30+ hours a month to focus on proactive security, instead of manual data collection and reporting
● ML used to detect insider and external threats.
● Analyst efficiency to gather data and conduct security investigations increased by 50%
● Provides deep reusable correlation rules across all support engineer levels
© 2020 SPLUNK INC.
What are ML/AI Practitioners Detecting?
● Privilege abuse of admin account
● Detection of account sharing
● Detection of Shadow IT Servers
● Privilege operations on self
● Short lived accounts on Box & AD
● Interactive logins by service accounts
● Unauthorized password change
attempt
● Critical file access by service accounts
● Unauthorized file access
● Unauthorized application usage
● Compromised service accounts
that enabled remote access
● Usage of co-worker’s machine by
user
● Query blank password on admin
acct
● Malware communication where
the standard security tools failed
to detect (multiple customers)
● Exposed company's sensitive
information on public web
services
● Compromised mobile phone
generating suspicious outbound
connection
● Infected machine based on alert
correlation & internal detection
Account Misuse
Compromised User Account
Compromised / Infected Machine
● Creation of temporary local
accounts across multiple
machines
● Unusual process creating
sockets across internal machines
Lateral Movement
© 2020 SPLUNK INC.
What are ML/AI Practitioners Detecting?
▶ User forwarding all corporate emails to
personal email address
▶ User copying few years older email
archive data from central repository
▶ High volume of data downloads from
box and previews indicating data
gathering/snooping
▶ Users exfiltrating data out of
organization - detected by deviations
from user’s and peer group’s profiles
▶ Users with web proxy disabled
▶ Users logging in from unusual/unauthorized
geo locations
▶ Suspicious call home activity
▶ Users encountering malvertising
▶ Suspicious account lockout
▶ Misconfigured services using expired
credentials
▶ Accounts impersonating user logins
▶ Misconfigured/corrupted VDI profile
▶ Unauthorized use of P2P software
▶ Unauthorized corporate resource usage for
BitCoin mining
▶ DNS abuse activity
▶ Automatic detection of user/account
related details
- Account types - domain
administrators, service accounts
- User risk scoring – internal &
external risk
▶ Automatic detection of device types in
environment
- DCs, exchange servers, email
servers, DNS servers, personal
laptops, web servers, NTP servers
▶ Popularity of external domains & IPs
Data Exfiltration
Suspicious Behavior / Unknown
ThreatContextual Intelligence
▶ Detection of directory traversal
attacks based on malicious
strings in Web URL requests to
web servers
▶ RDP traffic from suspicious
sources
External Attack
© 2020 SPLUNK INC.
To Learn More...
MLTK / DLTK / Fraud Fraud, Security, and Compliance
This is your map
Apps WorkshopsEssential
Guide
Learn from the experts
Blogs
All of these are complimentary
© 2020 SPLUNK INC.
DataLakesMaster Data
ManagementETL
Point Data Management
Solutions
DataSilos
Business Processes
The Data-to-Everything
Platform
IT
Security
DevOps