The Network KnowsAvi FreedmanKentikCEO & Co-founder
All contents © Kentik Inc. 2
Tools, tools, everywhere…
Active Testing (ping/traceroute)
APMBI
Metric (App/SNMP/Server)BGP Hijack detection
NPMConfig Management
Policy AnalysisEvent Correlation
Routing AnalyticsForensics
Flow ToolsLogging
Traffic EngineeringThreat Intelligence
All contents © Kentik Inc. 3
With all those tools, can you:• See when there’s a real problem?• And where the problem is – app, server, network?• Let the network group understand if there are app issues?• Let non-network groups understand the network’s impact (or
not)?• Automatically detect traffic anomalies, attacks, and shifts?• Debug CDNs, cloud delivery, and the path to API partners?
• And… How often do you hear “is it the network?”
3
All contents © Kentik Inc. 4
The Network Knows
All contents © Kentik Inc. 5
The Network Knows• Apps generate traffic• But the network delivers it• And can see authorized/specified• And unauthorized/unspecified traffic• Often including performance and Layer 7 info• And it knows the ‘routing’ – the path traffic will take• And if it’s internal, external, or your or others’ infrastructures
5
All contents © Kentik Inc. 6
Network Traffic Instrumentation• Modern network devices can send traffic summaries = “NetFlow”
• (Or, often, sFlow, or IPFIX)• Which are all different protocols but have similar info
• [PROTOCOL, SRC/DST IP, PORT, MAC, VLAN, …]• These are continuous streams of samples of traffic (*)
• Usually just from the headers - though more advanced implementations can watch perf and L7 info
6
All contents © Kentik Inc. 7
+ Other Network Telemetry• There’s also SNMP (you can think of NetFlow as a double-click into SNMP data)
• As well as logs – interface up/down, fan+cpu+optic failures, re-config, routing up/down, memory or CPU issues
• And a lot of work being done on “streaming telemetry” of every detail of a device and its software – will need modern time-series backends
• And configs• And topology
7
All contents © Kentik Inc. 8
Network Nerd Use Cases for Network Knowledge
Anomaly Detection
Planning and Peering
Traffic Engineering
DDoS DefensePerformanceAnalytics
ThreatAnalytics
ServiceCreation
DigitalForensics
Customer Cost, Prospecting
All contents © Kentik Inc. 9
But Not Just for Network Nerds!• But systems and app folks should be able to debug also
• And network people should be able to know if the blip matters to production traffic
• So how do we tie systems together?• Make flow look like metrics and correlate there• Expose via APIs • Last resort – train others in flow usage
All contents © Kentik Inc. 10
OSS and Vendor Options for Flow• There are open source flow tools: pmacct, NFDUMP/NfSen, SiLK
• And vendors (Kentik as SaaS, Arbor as appliance)• And you can DIY:pmacct front-ending Hadoop-ish SQL, or Elastic
• NetFlow is UDP so it’s easy to replicate (samplicator) and send to multiple places
10
All contents © Kentik Inc. 11
OK,What’s so Hard?
All contents © Kentik Inc. 12
Awesome! What’s so hard?• Often requires fusing (geo, routing, app ID, threat intelligence …)
• Flow can be trillions of records/day – think of it as a sampled superset of all of your logs
• The OSS flow tools don’t cluster, so can’t store at scale
• And don’t integrate with other systems• Metrics systems often choke on the high cardinality of IP addresses and port #s
• DIY is hard but possible (usually pmacct+Elastic)
All contents © Kentik Inc. 13
Network Engineer
s
Distributed
Systems Engineer
s
SREsLow level network develope
rs
And DIY is hard
Resilience / ReliabilityGeo-distributed ingest
Flow friendly data-store
BGP DaemonFlow inspection & conversion
Network protocols hacking
Make all of the above work reliably
Train all the other teams on the involved network protocols and
their usage
Required areas of expertise (because every presentation needs a Venn diagram)
All contents © Kentik Inc. 14
But don’t give up…• It’s still better to get started!• Even if aggregate-based in a flow tool• I can provide a host agent that will generate metricsalong with flow (but be careful if you store IPs/ports in TSDBs)
All contents © Kentik Inc. 15
How To:
Get the data.Fuse the data.Store the data.Use the data.
Share the data.
All contents © Kentik Inc. 16
TCP stats data / app specific data
Where to find this data ?
Flow data NetFlow, SFlow, IPFIX
SNMP, Streaming telemetry
Sys/Event logsTACACS
&Syslog
AppServer,Logs,
Metrics
BGP, IGP Path infoNETWORK
+
+
+
=Combinatorially useful!
+Router
Router
PCAPagent
+User tags, Threat Intel, SDN Control, DNS, ping/trace
All contents © Kentik Inc. 17
A Broader View of “NetFlow”You can ALSO get performance data from the infrastructure:
• Queue Depth• Retransmits per flow• TCP latency• Application Latency
From:• Host software (nProbe)• Sensors / Taps• Webserver logs (Nginx)• Cisco AVC supported routers
17
All contents © Kentik Inc. 18
Fusing data for richer traffic analyticsFlow or BGP or SNMP or DNS or logs alone are not enough.This becomes much richer when combined with:
• Performance and layer 7 information• BGP attributes• Geography• Tags (rack, department, customer…)• Config changes and software versions• Threat intelligence and known-bad IPs
Fusing should be near real-time, performed at ingest and data specific
18
All contents © Kentik Inc. 19
Summary and Take-Aways
All contents © Kentik Inc. 20
Quick Demos:
GrafanaKentik
Host Agent
All contents © Kentik Inc. 21
Overview
Kentik is the network traffic intelligence company.
• Founded 2014• HQ: San Francisco• 100+ Customers• $38M in Funding• 60+ Team Members• 600% Growth in 2016