anomaly detection at scale

47
ANOMALY DETECTION AT SCALE: A CYBERSECURITY STREAMING DATA PIPELINE USING KAFKA AND AKKA CLUSTERING O'Reilly Security Conference NYC, November 2, 2016 Jeff Henrikson Groovescale http://www.groovescale.com

Upload: jeff-henrikson

Post on 08-Jan-2017

379 views

Category:

Technology


0 download

TRANSCRIPT

ANOMALY DETECTION AT SCALE:A CYBERSECURITY STREAMING DATA PIPELINE USING KAFKA AND AKKA

CLUSTERINGO'Reilly Security Conference NYC, November 2, 2016

Jeff Henrikson

Groovescale

http://www.groovescale.com

OUTLINEframingproblem statementstreaming tech conceptsoutline of solutionarchitecture, learnings

FRAMING

Why build predictive models?

Models continue to do usefulwork a�er humans are not looking

Models are based on assumptions

Only humans can make assumptions

INTRUSION DETECTION

1) Log Data2) Configure rules3) Human awareness examines alarms and logs4) Quick action taken (e.g. deauthorize)5) Re-authorize once human awareness deems longer-term mitigation is adequate

Sometimes for high-confidence rules we allow 2) to trigger 4) without human intervention

HOW IS A SKILLED PERSON'S AWARENESS CAN BE MORE EFFECTIVELY GUIDED?

1) Matching of network behavior against localized rules

2) Predictive modeling of the aggregate network behavior

HOW IS A SKILLED PERSON'S AWARENESS CAN BE MORE EFFECTIVELY GUIDED?

1) Matching of network behavior against localized rules

2) Predictive modeling of the aggregate network behavior

Hypothesis: Let's see if 2 is better.

AI Artificial Intelligence

"IA" Intelligence Augmented

From Building practical AI systems

Adam Cheyer, (Siri, Sentient, and Viv Labs) Strata 2016

INTRUSION DETECTION TOOLS AS "INTELLIGENCE AUGMENTED"

Intruders are trying to evade detection.

Let's not worry about making the human protector of the network going away. Probably not possiblegiven evasive response.

PROBLEM STATEMENT

NETWORK PACKET BROKER

CAPTURE SERVER

dumpcap (from Wireshark)

NETFLOW (V5) BASICSAttributes:Source/Destination IP

Source/Destination PortInput interface

Metrics: Number ofPackets, Sum of Bytes, Start Time, End Time.IPv4 only

https://nsrc.org/workshops/2015/sanog25-nmm-tutorial/materials/netflow.pdf

Functional Requirements

Produce netflow from PCAPScore netflow for anomaliesControl the number of anomalous events brought to the human expert's attention

Nonfunctional Requirements

Process line rate 10Gb/sBe within 2x perf of tcpdumpBe within 4x of netflow latencyDo not add single points of failure

SOLUTION OUTLINE

OVERVIEW OF SERVICES

EXTERNAL DESIGN

EXTERNAL DESIGNSystem coupling:

Do not prescribe deploying kafka upstream or downstream(Which Kafka version? Which language binding?)

External APIs:

Ingress HTTP POST octet encoding

Egress HTTP GET Long Polling

INTERNAL DESIGN

INTERNAL DESIGNRecord state only in:

KafkaPcap temporary files on local fs

Need to write block id to EFH and dedupe for sumsto be correct in the presence of retriesPrefer late delivery to dropping dataPrefer reading capture time in data stream to wall clock time

Akka-cluster in one slide:Framework for Actor-based concurrencyProgram in Scala or JavaAkka-cluster more general than map reduce, data pipelinesMakes use local and remote resources work the same

MINIMUM VIABLE PREDICTIVE MODEL

1) Take Netflow metrics: sum(bytes), sum(packets), count

2) For each metric, compute mean and variance

3) Emit an "anomaly" when signal exceeds (mean + 3.0*sqrt(variance))

Meets minimum requirement: controls the number of events brought to the human expert'sattention

EXERCISE FOR THE READER

Model for periodicity:

Ihler et al, Adaptive Event Detection with Time–Varying Poisson Processes, ACM SIGKDD 2006http://www.datalab.uci.edu/papers/event_detection_kdd06.pdf

Symmetrical mapping of docker containers to hosts:

DEPLOYMENT

RESULTSQualitatively, users can find relevant Anomalies in a reasonable sized streamSystem operates reliablyNumbers are correct within assumptions

ARCHITECTURE, LEARNINGS

SO WHY KAFKA VS ANY OTHER STREAMING COMPONENT?

https://databaseline.wordpress.com/2016/03/12/an-overview-of-apache-streaming-technologies/comment-page-1/

HOW DOES YOUR ORGANIZATION PICK COMPONENTS?

STREAMING DATA LITERATURE:A data entity is created by one module, is passed from module to module until it is no longer needed

and is then destroyed. . . . Punched card accounting systems exemplify this environment.

J. P. Morrison, "Data Stream Linkage Mechanism", IBM Systems Journal, 1978.

http://citeseerx.ist.psu.edu/viewdoc/download;jsessionid=45DED06EC91474F5938A9E05CC3D5A61?doi=10.1.1.89.2601&rep=rep1&type=pdf

BIND ARCHITECTURAL COUPLINGS EARLY SO THAT ARCHICTECTURALCOMPONENTS CAN BE CHOSEN WITH AMPLE EVIDENCE

Examples of components:

which databasewhich streaming engine

Examples of couplings:

format of data (e.g. newline delimited json)how to notifyhow to checkpoint

HTTP COUPLING: WINSWin #1: Can't get access to pcap over APIWin #2: Only RHEL-distributed reqs (perl-core, curl) required for ingressWin #3: Upgrade kafka when improved

HTTP COUPLING: WIN #3: UPGRADE WHEN READYKafka Version 0.9.0 0.10.0.1 0.10.1.0

Partition by Hash x x x

Write timestamp to message x x

Read seek by timestamp x

LEARNING #1https://github.com/akka/reactive-kafka

Using this library in place of KafkaConsumer

LEARNING #2, HIDING IN PLAIN SIGHThttp://www.reactive-streams.org/

FAVOR INTEGRATION TESTING TO UNIT TESTINGIngress, egress have optional flag placebo={true,false}. Default to true.Every deployment simulates low volume placebo sinks, sources.Transmit heartbeats when each component is sure to have made forward progress.

ON EVALUATING FAULT TOLERANCE AND SCALABILITY

My smart buddy

LinkedIn runs it in production

The NSA

Can we do better?

ON EVALUATING FAULT TOLERANCE AND SCALABILITY:The idea:

Create linked containers for appUse tc to tell netfilter to drop and/or delay packetsRun simulated data source

ON EVALUATING FAULT TOLERANCE AND SCALABILITY:

Hands on create container:

Hands on with the container:

Hands on with the host:

(docker-machine's boot2docker has tc built-in)

docker run -it --rm ubuntu:14.04.2 bash

root@07e330775e98:/# apt-get update && apt-get install -y ethtool root@07e330775e98:/# ethtool -S eth0 NIC statistics: peer_ifindex: 875

dev=$(ip link | grep '^875:') tc qdisc change dev $dev root netem delay 100ms 20ms distribution normaltc qdisc change dev eth0 root netem loss 0.1%

Myth: Code should always go into docker containers through an image

Myth: Code should always go into docker containers through an image

Alternative: docker run -v $dirSrc:$dirSrc # to convey source code docker exec # to restart program

Myth: A docker image is something that came from a Dockerfile:

Myth: A docker image is something that came from a Dockerfile:

Alternative docker run ansible-playbook -c local docker commit

ACKNOWLEDGEMENTSIlya LevnerGunjan Gupta, Lightsphere AITrey Blalock, Firewall Consulting

RECOMMENDED READING

I Heart Logs, Jay Kreps (creator of Kafka)

Akka in Action, Roestenburg et al

Released Sept 30, 2016

Scala for the Impatient, 1e, Cay Horstman

Second edition coming December 2016

https://www.amazon.com/Heart-Logs-Stream-Processing-Integration/dp/1491909382

https://www.amazon.com/Akka-Action-Raymond-Roestenburg/dp/1617291013

https://www.amazon.com/Scala-Impatient-Cay-S-Horstmann/dp/0321774094

READINGS ON LOW LATENCY DATA ENGINEERING(ORGANIZED BY COMMUNITY)

Community Title URL

Reactive The Reactive Manifesto http://www.reactivemanifesto.org/

Reactive Streams http://www.reactive-streams.org/

Kafka I Heart Logs, Jay Kreps, 2014 https://www.amazon.com/Heart-Logs-Stream-Processing-Integration/dp/1491909382

Kafka: The Definitive Guide,prerelease/2017

https://www.amazon.com/Kafka-Definitive-Real-time-stream-processing/dp/1491936169

NiFi The core concepts of NifFi http://nifi.apache.org/docs/nifi-docs/html/overview.html#the-core-concepts-of-nifi

Flow BasedProgramming

Flow-Based Programming, J. PaulMorrison, 2010

https://www.amazon.com/Flow-Based-Programming-2nd-Application-Development/dp/1451542321

Storm Big Data, Nathan Marz, 2015 https://www.amazon.com/Big-Data-Principles-practices-scalable/dp/1617290343

QUESTIONS?