analyzing 1.2 million network packets per second in real-time

47
OpenSOC The Open Security Operations Center for Analyzing 1.2 Million Network Packets per Second in Real Time James Sirota, Big Data Architect Cisco Security Solutions Practice jsirota @ cisco.com Sheetal Dolas Principal Architect Hortonworks [email protected] June 3, 2014

Upload: hadoopsummit

Post on 27-Jan-2015

111 views

Category:

Technology


0 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Analyzing 1.2 Million Network Packets per Second in Real-time

OpenSOCThe Open Security Operations

Centerfor

Analyzing 1.2 Million Network Packets per Second in Real Time

James Sirota, Big Data ArchitectCisco Security Solutions [email protected]

Sheetal DolasPrincipal [email protected]

June 3, 2014

Page 2: Analyzing 1.2 Million Network Packets per Second in Real-time

Cisco Confidential 2© 2013-2014 Cisco and/or its affiliates. All rights reserved.

Problem Statement & Business Case for OpenSOC

Solution Architecture and Design

Best Practices and Lessons Learned

Q & A

Over Next Few Minutes

Page 3: Analyzing 1.2 Million Network Packets per Second in Real-time

Cisco Confidential 3© 2013-2014 Cisco and/or its affiliates. All rights reserved.

Business Case

Page 4: Analyzing 1.2 Million Network Packets per Second in Real-time

Cisco Confidential 4© 2013-2014 Cisco and/or its affiliates. All rights reserved.

“There's now a growing sense of fatalism:

It's no longer if or when you get hacked,

but the assumption is that you've already been hacked,

with a focus on minimizing the damage.”

Source: Dark Reading / Security’s New Reality: Assume The Worst

Page 5: Analyzing 1.2 Million Network Packets per Second in Real-time

Cisco Confidential 5© 2013-2014 Cisco and/or its affiliates. All rights reserved.

Breaches Happen in Hours…But Go Undetected for Months or Even Years

Source: 2013 Data Breach Investigations Report

Seconds

Minutes

Hours Days WeeksMonth

sYears

Initial Attack to Initial Compromise 10% 75% 12% 2% 0% 1% 1%

Initial Compromise to Data Exfiltration 8% 38% 14% 25% 8% 8% 0%

Initial Compromise to Discovery 0% 0% 2% 13% 29% 54% 2%

Discovery to Containment/ Restoration 0% 1% 9% 32% 38% 17% 4%

Timespan of events by percent of breaches

In 60% of breaches, data

is stolen in hours

54% of breaches are not

discovered for months

Page 6: Analyzing 1.2 Million Network Packets per Second in Real-time

Cisco Confidential 6© 2013-2014 Cisco and/or its affiliates. All rights reserved.

Cisco Global Cloud Index

Source: 2014 Cisco Global Cloud Index

Page 7: Analyzing 1.2 Million Network Packets per Second in Real-time

Cisco Confidential 7© 2013-2014 Cisco and/or its affiliates. All rights reserved.

Introducing OpenSOCIntersection of Big Data and Security Analytics

Multi Petabyte StorageInteractive Query

Real-Time Search

Scalable Stream Processing

Unstructured Data

Data Access Control

Scalable Compute

OpenSOC

Real-Time Alerts

Anomaly Detection

Data Correlation

Rules and Reports

Predictive Modeling

UI and Applications

Big Data Platform

HadoopStorm

Elastic

SearchKafka

Page 8: Analyzing 1.2 Million Network Packets per Second in Real-time

Cisco Confidential 8© 2013-2014 Cisco and/or its affiliates. All rights reserved.

OpenSOC Journey

Sept 2013

First Prototype

Dec 2013

Hortonworks joins the project

March 2014

Platform developmen

t finished

Sept 2014

General Availability

May 2014

CR Work off

April 2014

First beta test at

customer site

Page 9: Analyzing 1.2 Million Network Packets per Second in Real-time

Cisco Confidential 9© 2013-2014 Cisco and/or its affiliates. All rights reserved.

Solution Architecture & Design

Page 10: Analyzing 1.2 Million Network Packets per Second in Real-time

Cisco Confidential 10© 2013-2014 Cisco and/or its affiliates. All rights reserved.

OpenSOC Conceptual Architecture

Raw Network Stream

Network Metadata Stream

Netflow

Syslog

Raw Application Logs

Other Streaming Telemetry

HiveHBase

Raw Packet Store

Long-Term Store

Elastic Search

Real-Time Index

Network Packet

Mining and PCAP

Reconstruction

Log Mining and Analytics

Big Data Exploration,Predictive Modeling

Applications + Analyst Tools

Pars

e +

Form

at

En

rich

Ale

rt

Threat IntelligenceFeeds

Enrichment Data

Page 11: Analyzing 1.2 Million Network Packets per Second in Real-time

Cisco Confidential 11© 2013-2014 Cisco and/or its affiliates. All rights reserved.

Raw Network Packet Capture, Store, Traffic

Reconstruction

Telemetry Ingest, Enrichment and Real-Time Rules-

Based Alerts

Real-Time Telemetry Search and Cross-Telemetry

Matching

Automated Reports, Anomaly Detection and

Anomaly Alerts

Rich Analytics Apps and Integration with Existing

Analytics Tools

Key Functional Capabilities

Page 12: Analyzing 1.2 Million Network Packets per Second in Real-time

Cisco Confidential 12© 2013-2014 Cisco and/or its affiliates. All rights reserved.

Fully-Backed by Cisco and Used Internally for Multiple Customers

Free, Open Source and Apache Licensed Built on Highly-Scalable and Proven Platforms

(Hadoop, Kafka, Storm)

Extensible and Pluggable Design Flexible Deployment Model (On-Premise or Cloud) Centralize your processes, people and data

The OpenSOC Advantage

Page 13: Analyzing 1.2 Million Network Packets per Second in Real-time

Cisco Confidential 13© 2013-2014 Cisco and/or its affiliates. All rights reserved.

OpenSOC Deployment at CiscoHardware footprint (40u)

14 Data Nodes (UCS C240 M3)

3 Cluster Control Nodes (UCS C220

M3)

2 ESX Hypervisor Hosts (UCS C220

M3)

1 PCAP Processor (UCS C220 M3

+ Napatech NIC)

2 SourceFire Threat alert

processors

1 Anue Network Traffic splitter

1 Router

1 48 Port 10GE Switch

Software Stack

HDP 2.1

Kafka 0.8

Elastic Search

1.1

MySQL 5.5

Page 14: Analyzing 1.2 Million Network Packets per Second in Real-time

Cisco Confidential 14© 2013-2014 Cisco and/or its affiliates. All rights reserved.

OpenSOC - Stitching Things Together AccessMessaging

SystemData

CollectionSource Systems StorageReal Time

Processing

StormKafka

B Topic

N Topic

Elastic Search

Index

Web Services

Search

PCAP Reconstruction

HBase

PCAP Table

Analytic Tools

R / Python

Power Pivot

Tableau

Hive

Raw Data

ORC

Passive Tap

PCAP Topic

DPI Topic

A Topic

Telemetry Sources

Syslog

HTTP

File System

Other

Flume

Agent A

Agent B

Agent N

B Topology

N Topology

A Topology

PCAP

Traffic Replicator

PCAP Topology

DPI Topology

Page 15: Analyzing 1.2 Million Network Packets per Second in Real-time

Cisco Confidential 15© 2013-2014 Cisco and/or its affiliates. All rights reserved.

OpenSOC - Stitching Things Together AccessMessaging

SystemData

CollectionSource Systems StorageReal Time

Processing

StormKafka

B Topic

N Topic

Elastic Search

Index

Web Services

Search

PCAP Reconstruction

HBase

PCAP Table

Analytic Tools

R / Python

Power Pivot

Tableau

Hive

Raw Data

ORC

Passive Tap

PCAP Topic

DPI Topic

A Topic

Telemetry Sources

Syslog

HTTP

File System

Other

Flume

Agent A

Agent B

Agent N

B Topology

N Topology

A Topology

PCAP

Traffic Replicator

Deeper Look

PCAP Topology

DPI Topology

Page 16: Analyzing 1.2 Million Network Packets per Second in Real-time

Cisco Confidential 16© 2013-2014 Cisco and/or its affiliates. All rights reserved.

PCAP TopologyStorageReal Time Processing

Storm

Elastic Search

Index

HBase

PCAP Table

Hive

Raw Data

ORC

Kafka

Spout

Parser

Bolt

HDFSBolt

HBase

Bolt

ESBolt

Page 17: Analyzing 1.2 Million Network Packets per Second in Real-time

Cisco Confidential 17© 2013-2014 Cisco and/or its affiliates. All rights reserved.

DPI Topology & Telemetry Enrichment StorageReal Time Processing

Storm

Elastic Search

Index

HBase

PCAP Table

Hive

Raw Data

ORC

Kafka

Spout

Parser

Bolt

GEO Enric

h

Whois

Enrich

CIF Enric

h

HDFS

Bolt

ESBolt

Page 18: Analyzing 1.2 Million Network Packets per Second in Real-time

Cisco Confidential 18© 2013-2014 Cisco and/or its affiliates. All rights reserved.

Enrichments

Parser

Bolt

GEOEnric

h

RAW Message

{“msg_key1”: “msg value1”,“src_ip”: “10.20.30.40”,“dest_ip”: “20.30.40.50”,“domain”: “mydomain.com”}

Who Is

Enrich

"geo":[ {"region":"CA","postalCode":"95134","areaCode":"408","metroCode":"807","longitude":-121.946,"latitude":37.425,"locId":4522,"city":"San Jose","country":"US" }]

CIFEnric

h

"whois":[ {"OrgId":"CISCOS","Parent":"NET-144-0-0-0-0","OrgAbuseName":"Cisco Systems Inc","RegDate":"1991-01-171991-01-17","OrgName":"Cisco Systems","Address":"170 West Tasman Drive","NetType":"Direct Assignment"} ],“cif”:”Yes”

EnrichedMessage

Cache

MySQL

Geo Lite Data

Cache

HBase

Who Is Data

Cache

HBase

CIF Data

Page 19: Analyzing 1.2 Million Network Packets per Second in Real-time

Cisco Confidential 19© 2013-2014 Cisco and/or its affiliates. All rights reserved.

Applications: Telemetry Matching and DPI

Step1: Search

Step2: Match

Step3: Analyze

Step4: Build PCAP

Page 20: Analyzing 1.2 Million Network Packets per Second in Real-time

Cisco Confidential 20© 2013-2014 Cisco and/or its affiliates. All rights reserved.

Integration with Analytics Tools

Dashboards Reports

Page 21: Analyzing 1.2 Million Network Packets per Second in Real-time

Cisco Confidential 21© 2013-2014 Cisco and/or its affiliates. All rights reserved.

Best Practices and

Lessons Learned

Page 22: Analyzing 1.2 Million Network Packets per Second in Real-time

Cisco Confidential 22© 2013-2014 Cisco and/or its affiliates. All rights reserved.

Journey Towards Highly Scalable

Application

Page 23: Analyzing 1.2 Million Network Packets per Second in Real-time

Cisco Confidential 23© 2013-2014 Cisco and/or its affiliates. All rights reserved.

Kafka Tuning

Page 24: Analyzing 1.2 Million Network Packets per Second in Real-time

Cisco Confidential 24© 2013-2014 Cisco and/or its affiliates. All rights reserved.

This is where we began

Page 25: Analyzing 1.2 Million Network Packets per Second in Real-time

Cisco Confidential 25© 2013-2014 Cisco and/or its affiliates. All rights reserved.

Some code optimizations and increased parallelism

Page 26: Analyzing 1.2 Million Network Packets per Second in Real-time

Cisco Confidential 26© 2013-2014 Cisco and/or its affiliates. All rights reserved.

Is Disk I/O heavy

Kafka 0.8+ supports replication and JBOD Better performance compared to RAID

Parallelism is largely driven by number of disks and partitions per topic

Key configuration parameters: num.io.threads - Keep it at least equal to number of disks provided

to Kafka num.network.threads - adjust it based on number of concurrent

producers, consumers and replication factor

Kafka Tuning

Page 27: Analyzing 1.2 Million Network Packets per Second in Real-time

Cisco Confidential 27© 2013-2014 Cisco and/or its affiliates. All rights reserved.

After Kafka Tuning

Page 28: Analyzing 1.2 Million Network Packets per Second in Real-time

Cisco Confidential 28© 2013-2014 Cisco and/or its affiliates. All rights reserved.

Bottleneck Isolation, Resource Profiling, Load Balancing

Page 29: Analyzing 1.2 Million Network Packets per Second in Real-time

Cisco Confidential 29© 2013-2014 Cisco and/or its affiliates. All rights reserved.

HBase Tuning

Page 30: Analyzing 1.2 Million Network Packets per Second in Real-time

Cisco Confidential 30© 2013-2014 Cisco and/or its affiliates. All rights reserved.

This is where we began

Page 31: Analyzing 1.2 Million Network Packets per Second in Real-time

Cisco Confidential 31© 2013-2014 Cisco and/or its affiliates. All rights reserved.

Row Key design is critical (gets or scans or both?) Keys with IP Addresses

Standard IP addresses have only two variations of the first character : 1 & 2

Minimum key length will be 7 characters and max 15 with a typical average of 12

Subnet range scans become difficult – range of 90 to 220 excludes 112 IP converted to hex (10.20.30.40 => 0a141e28)

gives 16 variations of first key character consistently 8 character key Easy to search for subnet ranges

Row Key Design

Page 32: Analyzing 1.2 Million Network Packets per Second in Real-time

Cisco Confidential 32© 2013-2014 Cisco and/or its affiliates. All rights reserved.

Experiments with Row Key

Page 33: Analyzing 1.2 Million Network Packets per Second in Real-time

Cisco Confidential 33© 2013-2014 Cisco and/or its affiliates. All rights reserved.

Know your data Auto split under high workload can result into hotspots and split

storms Understand your data and presplit the regions Identify how many regions a RS can have to perform optimally. Use

the formula below(RS memory)*(total memstore fraction)/((memstore size)*(# column families))

Region Splits

Page 34: Analyzing 1.2 Million Network Packets per Second in Real-time

Cisco Confidential 34© 2013-2014 Cisco and/or its affiliates. All rights reserved.

With Region Pre-Splits

Page 35: Analyzing 1.2 Million Network Packets per Second in Real-time

Cisco Confidential 35© 2013-2014 Cisco and/or its affiliates. All rights reserved.

Enable Micro Batching (client side buffer)

Smart shuffle/grouping in storm

Understand your data and situationally exploit various WAL options

Watch for many minor compactions For heavy ‘write’ workload Increase hbase.hstore.blockingStoreFiles (we used 200)

Know Your Application

Page 36: Analyzing 1.2 Million Network Packets per Second in Real-time

Cisco Confidential 36© 2013-2014 Cisco and/or its affiliates. All rights reserved.

And Finally

Page 37: Analyzing 1.2 Million Network Packets per Second in Real-time

Cisco Confidential 37© 2013-2014 Cisco and/or its affiliates. All rights reserved.

Kafka Spout

Page 38: Analyzing 1.2 Million Network Packets per Second in Real-time

Cisco Confidential 38© 2013-2014 Cisco and/or its affiliates. All rights reserved.

Parallelism is controlled by number of partitions per topic Set Kafka spout parallelism equal to number of

partitions in topic Other key parameters that drive performance

fetchSizeBytes

bufferSizeBytes

Kafka Spout

Page 39: Analyzing 1.2 Million Network Packets per Second in Real-time

Cisco Confidential 39© 2013-2014 Cisco and/or its affiliates. All rights reserved.

Mysteriously Missing Data

Page 40: Analyzing 1.2 Million Network Packets per Second in Real-time

Cisco Confidential 40© 2013-2014 Cisco and/or its affiliates. All rights reserved.

A bug in Kafka spout that used to miss out some partitions and loose data It is now fixed and available from Hortonworks repository (

http://repo.hortonworks.com/content/repositories/releases/org/apache/storm/storm-Kafka )

Mysteriously Missing Data Root Cause

Page 41: Analyzing 1.2 Million Network Packets per Second in Real-time

Cisco Confidential 41© 2013-2014 Cisco and/or its affiliates. All rights reserved.

Storm

Page 42: Analyzing 1.2 Million Network Packets per Second in Real-time

Cisco Confidential 42© 2013-2014 Cisco and/or its affiliates. All rights reserved.

Every small thing counts at scale Even simple string operations can slowdown throughput

when executed on millions of Tuples

Storm

Page 43: Analyzing 1.2 Million Network Packets per Second in Real-time

Cisco Confidential 43© 2013-2014 Cisco and/or its affiliates. All rights reserved.

Error handling is critical Poorly handled errors can lead to topology failure and

eventually loss of data (or data duplication)

Storm

Page 44: Analyzing 1.2 Million Network Packets per Second in Real-time

Cisco Confidential 44© 2013-2014 Cisco and/or its affiliates. All rights reserved.

Tune & Scale individual spout and bolts before performance testing/tuning entire topology Write your own simple data generator spouts and no-op

bolts

Making as many things configurable as possible helps a lot

Storm

Page 45: Analyzing 1.2 Million Network Packets per Second in Real-time

Cisco Confidential 47© 2013-2014 Cisco and/or its affiliates. All rights reserved.

When it comes to Hadoop…partner up Separate the hype from the opportunity Start small then scale up Design Iteratively It doesn’t work unless you have proven it at

scale Keep an eye on ROI

Lessons Learned

Page 46: Analyzing 1.2 Million Network Packets per Second in Real-time

Cisco Confidential 48© 2013-2014 Cisco and/or its affiliates. All rights reserved.

How can you contribute? Technology Partner Program – contribute

developers to join the Cisco and Hortonworks team

Looking for Community PartnersCisco + Hortonworks + Community Support for OpenSOC

Page 47: Analyzing 1.2 Million Network Packets per Second in Real-time

Thank you!

We are hiring:

[email protected]@hortonworks.com