monitoring microservices: docker, mesos and kubernetes visibility at scale

Download Monitoring microservices: Docker, Mesos and Kubernetes visibility at scale

Post on 06-Jan-2017

402 views

Category:

Software

4 download

Embed Size (px)

TRANSCRIPT

  • Monitoring microservices: Docker, Mesos and

    Kubernetes visibility at scale

  • Me

    Alessandro Gallotta Software Engineer @sysdig

    @alex_gallotta

    @sysdig

  • Introducing Sysdig

    Capture system events, filter them, run useful scripts Lua scripting Open Source Nice curses UI

    lsof

    nets

    tat

    tcpd

    ump

    htopps

    stra

    ce

  • and more

    track user activity top files/processes/connections by cpu bytes

    logs containers tracers you name it, we track it

  • Design Goals

    Production-ready Simple lightweight

    Rich data Natural workflow Native support for containers Native support for and more

  • Demo time

  • Containers are Great

    Simple Scalable Isolated Service-oriented Elastic Flexible Separation of concerns

  • But Some Things Are Becoming More Complex

    CacheWebserverDatabase

    Legacy Monolitic App

  • But Some Things Are Becoming More Complex

    Computing Node

    Computing Node

    Computing Node

    Service1Service2Service3

    Computing Node

    Computing Node

    Computing Node

    Container-based App

  • But Some Things Are Becoming More Complex

    Computing Node

    Computing Node

    Computing Node

    Computing Node

    Computing Node

    Computing Node

    Container-based App

    Service1Service2Service3

  • But Things Are Becoming More ComplexComputing Node

    Computing Node

    Computing Node

    Service1Service2Service3

    Computing Node

    Computing Node

    Computing Node

    Container-based App

    Two Problems

  • Problem #1: How Do We Get Data Out of These Guys?

    Computing Node

    Computing Node

    Computing Node

    Service1Service2Service3

    Computing Node

    Computing Node

    Computing Node

    Container-based App

    System Network Process JVM Response Time Requests Errors

  • Problem #2: How Do We Get Make Sense of the Data?

    Computing Node

    Computing Node

    Computing Node

    Service1Service2Service3

    Computing Node

    Computing Node

    Computing Node

    Container-based App

  • Complexity Calls for Great Monitoring

    Isolated Automated Orchestration-aware Simple Scalable

  • The Orchestrated Version of This

  • Complexity Also Calls for Great Troubleshooting

    Whats the network activity of my

    Marathon group?

    Whats using the CPU the Wordpress

    task?

    How the hell does my Mesos task

    work?!

    Wheres the bottleneck?Whats the response

    time of my login service?

    What transactions is my Redis service serving?

  • Hypervisor

    How Do I Get Data Out of These Things: VMs

    VM1 VM3 VM2

  • Hypervisor

    Monitoring VMs, Option 1

    VM1 VM3 VM2

    Hypervisor-level instrumentation, Amazon CloudWatch

  • Hypervisor

    Monitoring VMs, Option 2

    VM1 VM3 VM2

    Monitoring Agent

  • OS

    Monitoring Containers

    Container1 Container3 Container2

  • OS

    Monitoring Containers, Option 1

    Container1 Container3 Container2

    Monitoring Agent

  • OS

    Monitoring Containers, Option 1

    Container1 Container3 Container2

    Monitoring Agent

    Not scalable Not composable Adds dependencies/size Kills the concept of one process per container

  • OS

    Monitoring Containers, Option 2

    Container1 Container3 Container2

    Container runtime level monitoring Kernel-level instrumentation

  • OS

    Monitoring Containers, Option 3

    Container1 Monitoring Container

    Container2

  • Sysdig Data Collection

    Kernel

    Container1

    Docker

    Container2

    Docker

    Container3

    LXCAppApp

  • Sysdig Data Collection

    Kernel

    Container1

    Docker

    Container2

    Docker

    Container3

    LXCAppApp

    Instrumentation through kernel module

  • Sysdig Data Collection

    Kernel

    Container1

    Docker

    Container2

    Docker

    Container3

    LXCAppApp

    sysdig

    Docker

    Capture and analysis

  • Sky cloud is the limit

    Correlate data Scale with your infrastructure Alerts, notifications, visualization tools Continuous data collection and retention from production systems

  • Sysdig Cloud

    Sysdig evolution for the cloud Preserve the premises production ready natural workflow ease of use 0 to low config needed

  • Out of the box support

  • Demo time 2

  • How About Security?

    Did someone log into one of our containers?

    Has something been installed in

    one of the containers?

    Have we been hacked?Were configuration files

    changed?

  • How About Security?

    Did someone log into one of our containers?

    Have we been hacked?Were configuration files

    changed?

    Has something been installed in

    one of the containers?

  • An anomaly detection system built on top of the sysdig engine

  • Falco Architecture

    Kernel

    Container1

    Docker

    Container2

    rkt

    Container3

    LXCAppApp

    Rule system

    Docker

    File activity Network Activity User Activity Process execution IPC

  • Rules Examples

    rule: shell_in_container desc: a shell running in a container condition: container.id != host and proc.name = bash output: Shell running in container (user=%user.name container_id=%container.id container_name=%container.name shell=%proc.name parent=%proc.pname) priority: WARNING

  • Rules Examples

    rule: mysqld_spawn_process desc: mysqld spawning a new process after startup. condition: spawn_process and proc.name = mysqld and not proc_is_new output: mysqld spawned new process after startup (user=%user.name command=%proc.cmdline file=%fd.name) priority: WARNING

  • Rules Examples

    macro: open_connection condition: syscall.type=connect and evt.dir=< and fd.sockfamily =ip

    rule: system_binaries_network_activity desc: any network connection initiated by system binaries that are not expected to send or receive any network traffic condition: open_connection and proc.name in (ls, ps, mkdir, ) output: Known system binary made network connection (user=%user.name command=%proc.cmdline connection=%fd.name) priority: WARNING"

  • Thank You!www.sysdig.org

    www.sysdig.org/falco

    @alex_gallotta

    @sysdig

    github.com/draios

    www.sysdig.com

    http://www.sysdig.org/http://www.sysdig.org/falco/https://github.com/draios/sysdighttps://github.com/draios/sysdighttp://www.sysdig.com/

Recommended

View more >