osdc 2014: christian kniep - understand your data center by overlaying multiple information layers

124
OSDC 2014 Overlay Datacenter Information Christian Kniep Bull SAS 2014-04-10

Upload: netways

Post on 06-May-2015

1.179 views

Category:

Software


0 download

DESCRIPTION

Today's data center managers are burdened by a lack of aligned information of multiple layers. Work-flow events like 'job starts' aligned with performance metrics and events extracted from log facilities are low-hanging fruit that is on the edge to become use-able due to open-source software like Graphite, StatsD, logstash and alike. This talk aims to show off the benefits of merging multiple layers of information within an InfiniBand cluster by using use-cases for level 1/2/3 personnel.

TRANSCRIPT

Page 1: OSDC 2014: Christian Kniep -  Understand your data center by overlaying multiple information layers

OSDC 2014

Overlay Datacenter InformationChristian Kniep Bull SAS!2014-04-10

Page 2: OSDC 2014: Christian Kniep -  Understand your data center by overlaying multiple information layers

About Me

❖ Me (>30y)

2

Page 3: OSDC 2014: Christian Kniep -  Understand your data center by overlaying multiple information layers

!

❖ SysOps (>10y)

About Me

❖ Me (>30y)

2

Page 4: OSDC 2014: Christian Kniep -  Understand your data center by overlaying multiple information layers

!

!

❖ SysOps v1.1 (>8y)

!

❖ SysOps (>10y)

About Me

❖ Me (>30y)

2

Page 5: OSDC 2014: Christian Kniep -  Understand your data center by overlaying multiple information layers

!

!

❖ SysOps v1.1 (>8y)

!

!

!

❖ BSc (2008-2011)

!

❖ SysOps (>10y)

About Me

❖ Me (>30y)

2

Page 6: OSDC 2014: Christian Kniep -  Understand your data center by overlaying multiple information layers

!

!

!

!

❖ DevOps (>4y)

!

!

❖ SysOps v1.1 (>8y)

!

!

!

❖ BSc (2008-2011)

!

❖ SysOps (>10y)

About Me

❖ Me (>30y)

2

Page 7: OSDC 2014: Christian Kniep -  Understand your data center by overlaying multiple information layers

!

!

!

!

!

❖ R&D [OpsDev?](>1y)

!

!

!

!

❖ DevOps (>4y)

!

!

❖ SysOps v1.1 (>8y)

!

!

!

❖ BSc (2008-2011)

!

❖ SysOps (>10y)

About Me

❖ Me (>30y)

2

Page 8: OSDC 2014: Christian Kniep -  Understand your data center by overlaying multiple information layers

Agenda

3

Page 9: OSDC 2014: Christian Kniep -  Understand your data center by overlaying multiple information layers

❖ Cluster Stack

Agenda

3

Cluster Stack

Page 10: OSDC 2014: Christian Kniep -  Understand your data center by overlaying multiple information layers

!

❖ Motivation (InfiniBand use-case)

❖ Cluster Stack

Agenda

3

Cluster Stack

IB

Page 11: OSDC 2014: Christian Kniep -  Understand your data center by overlaying multiple information layers

!

!

❖ QNIB/ng

!

❖ Motivation (InfiniBand use-case)

❖ Cluster Stack

Agenda

3

Cluster Stack

QNIBngIB

Page 12: OSDC 2014: Christian Kniep -  Understand your data center by overlaying multiple information layers

!

!

!

❖ QNIBTerminal (virtual cluster using docker)

!

!

❖ QNIB/ng

!

❖ Motivation (InfiniBand use-case)

❖ Cluster Stack

Agenda

3

Cluster Stack

QNIBngIB QNIBTerminal

Page 13: OSDC 2014: Christian Kniep -  Understand your data center by overlaying multiple information layers

!

!

!

❖ QNIBTerminal (virtual cluster using docker)

!

!

❖ QNIB/ng

!

❖ Motivation (InfiniBand use-case)

❖ Cluster Stack

Agenda

3

Cluster Stack

QNIBngIB

I.

QNIBTerminal

II.

III.

Page 14: OSDC 2014: Christian Kniep -  Understand your data center by overlaying multiple information layers

Cluster Stack Work Environment

4

Page 15: OSDC 2014: Christian Kniep -  Understand your data center by overlaying multiple information layers

Cluster?

5

„A computer cluster consists of a set of loosely connected or tightly connected computers !that work together so that in many respects they can be viewed as a single system.“ - wikipedia.org

User

Page 16: OSDC 2014: Christian Kniep -  Understand your data center by overlaying multiple information layers

Cluster?

5

„A computer cluster consists of a set of loosely connected or tightly connected computers !that work together so that in many respects they can be viewed as a single system.“ - wikipedia.org

User

Page 17: OSDC 2014: Christian Kniep -  Understand your data center by overlaying multiple information layers

Cluster?

5

„A computer cluster consists of a set of loosely connected or tightly connected computers !that work together so that in many respects they can be viewed as a single system.“ - wikipedia.org

User

Page 18: OSDC 2014: Christian Kniep -  Understand your data center by overlaying multiple information layers

HPC-Cluster

6

High Performance Computing

Page 19: OSDC 2014: Christian Kniep -  Understand your data center by overlaying multiple information layers

HPC-Cluster

6

High Performance Computing

❖ HPC: Surfing the bottleneck!

❖ Weakest link breaks performance

Page 20: OSDC 2014: Christian Kniep -  Understand your data center by overlaying multiple information layers

HPC-Cluster

6

High Performance Computing

❖ HPC: Surfing the bottleneck!

❖ Weakest link breaks performance

Page 21: OSDC 2014: Christian Kniep -  Understand your data center by overlaying multiple information layers

Cluster Layers

7

(rough estimate)

Events Metrics

Page 22: OSDC 2014: Christian Kniep -  Understand your data center by overlaying multiple information layers

Cluster Layers

7

Hardware:! ! ! IMPI, lm_sensors, IB counter

(rough estimate)

Events Metrics

Page 23: OSDC 2014: Christian Kniep -  Understand your data center by overlaying multiple information layers

Cluster Layers

7

Hardware:! ! ! IMPI, lm_sensors, IB counterOperating System:! Kernel, Userland tools

(rough estimate)

Events Metrics

Page 24: OSDC 2014: Christian Kniep -  Understand your data center by overlaying multiple information layers

Cluster Layers

7

Hardware:! ! ! IMPI, lm_sensors, IB counterOperating System:! Kernel, Userland toolsMiddleWare:! ! ! MPI, ISV-libs

(rough estimate)

Events Metrics

Page 25: OSDC 2014: Christian Kniep -  Understand your data center by overlaying multiple information layers

Cluster Layers

7

Hardware:! ! ! IMPI, lm_sensors, IB counterOperating System:! Kernel, Userland toolsMiddleWare:! ! ! MPI, ISV-libsServices:! ! ! ! Storage, Job Scheduler, sshd

(rough estimate)

Events Metrics

Page 26: OSDC 2014: Christian Kniep -  Understand your data center by overlaying multiple information layers

Cluster Layers

7

Hardware:! ! ! IMPI, lm_sensors, IB counterOperating System:! Kernel, Userland toolsMiddleWare:! ! ! MPI, ISV-libsServices:! ! ! ! Storage, Job Scheduler, sshdSoftware:! ! ! ! End user application

(rough estimate)

Events Metrics

Page 27: OSDC 2014: Christian Kniep -  Understand your data center by overlaying multiple information layers

Cluster Layers

7

Hardware:! ! ! IMPI, lm_sensors, IB counterOperating System:! Kernel, Userland toolsMiddleWare:! ! ! MPI, ISV-libsServices:! ! ! ! Storage, Job Scheduler, sshdSoftware:! ! ! ! End user application

(rough estimate)

EndUser

Events Metrics

Page 28: OSDC 2014: Christian Kniep -  Understand your data center by overlaying multiple information layers

Cluster Layers

7

Hardware:! ! ! IMPI, lm_sensors, IB counterOperating System:! Kernel, Userland toolsMiddleWare:! ! ! MPI, ISV-libsServices:! ! ! ! Storage, Job Scheduler, sshdSoftware:! ! ! ! End user application

(rough estimate)

EndUser

Power User/ISV

Events Metrics

Page 29: OSDC 2014: Christian Kniep -  Understand your data center by overlaying multiple information layers

Cluster Layers

7

Hardware:! ! ! IMPI, lm_sensors, IB counterOperating System:! Kernel, Userland toolsMiddleWare:! ! ! MPI, ISV-libsServices:! ! ! ! Storage, Job Scheduler, sshdSoftware:! ! ! ! End user application

(rough estimate)

EndUser

Excel:! ! ! ! ! KPI, SLA

Mgm

t Power User/ISV

Events Metrics

Page 30: OSDC 2014: Christian Kniep -  Understand your data center by overlaying multiple information layers

Cluster Layers

7

Hardware:! ! ! IMPI, lm_sensors, IB counterOperating System:! Kernel, Userland toolsMiddleWare:! ! ! MPI, ISV-libsServices:! ! ! ! Storage, Job Scheduler, sshdSoftware:! ! ! ! End user application

(rough estimate)

EndUser

Excel:! ! ! ! ! KPI, SLA

Mgm

t

SysO

psPow

er User/ISV

Events Metrics

Page 31: OSDC 2014: Christian Kniep -  Understand your data center by overlaying multiple information layers

Cluster Layers

7

Hardware:! ! ! IMPI, lm_sensors, IB counterOperating System:! Kernel, Userland toolsMiddleWare:! ! ! MPI, ISV-libsServices:! ! ! ! Storage, Job Scheduler, sshdSoftware:! ! ! ! End user application

(rough estimate)

EndUser

Excel:! ! ! ! ! KPI, SLA

Mgm

t

SysO

psPow

er User/ISV

SysO

ps L

1

Events Metrics

Page 32: OSDC 2014: Christian Kniep -  Understand your data center by overlaying multiple information layers

Cluster Layers

7

Hardware:! ! ! IMPI, lm_sensors, IB counterOperating System:! Kernel, Userland toolsMiddleWare:! ! ! MPI, ISV-libsServices:! ! ! ! Storage, Job Scheduler, sshdSoftware:! ! ! ! End user application

(rough estimate)

EndUser

Excel:! ! ! ! ! KPI, SLA

Mgm

t

SysO

psPow

er User/ISVSysO

ps L

2

SysO

ps L

1

Events Metrics

Page 33: OSDC 2014: Christian Kniep -  Understand your data center by overlaying multiple information layers

Cluster Layers

7

Hardware:! ! ! IMPI, lm_sensors, IB counterOperating System:! Kernel, Userland toolsMiddleWare:! ! ! MPI, ISV-libsServices:! ! ! ! Storage, Job Scheduler, sshdSoftware:! ! ! ! End user application

(rough estimate)

EndUser

Excel:! ! ! ! ! KPI, SLA

Mgm

t

SysO

psPow

er User/ISVSysO

ps L

2

SysO

ps L

1

Events Metrics

SysO

ps L

3

Page 34: OSDC 2014: Christian Kniep -  Understand your data center by overlaying multiple information layers

Cluster Layers

7

Hardware:! ! ! IMPI, lm_sensors, IB counterOperating System:! Kernel, Userland toolsMiddleWare:! ! ! MPI, ISV-libsServices:! ! ! ! Storage, Job Scheduler, sshdSoftware:! ! ! ! End user application

(rough estimate)

EndUser

Excel:! ! ! ! ! KPI, SLA

Mgm

t

SysO

psPow

er User/ISV

SysO

ps M

gmt

SysO

ps L

2

SysO

ps L

1

Events Metrics

SysO

ps L

3

Page 35: OSDC 2014: Christian Kniep -  Understand your data center by overlaying multiple information layers

Cluster Layers

7

Hardware:! ! ! IMPI, lm_sensors, IB counterOperating System:! Kernel, Userland toolsMiddleWare:! ! ! MPI, ISV-libsServices:! ! ! ! Storage, Job Scheduler, sshdSoftware:! ! ! ! End user application

(rough estimate)

EndUser

Excel:! ! ! ! ! KPI, SLA

Mgm

t

SysO

psPow

er User/ISV

SysO

ps M

gmt ISV Mgm

t

SysO

ps L

2

SysO

ps L

1

Events Metrics

SysO

ps L

3

Page 36: OSDC 2014: Christian Kniep -  Understand your data center by overlaying multiple information layers

Layern

❖ Every Layer is composed of layers!

❖ How deep to go?

8

Page 37: OSDC 2014: Christian Kniep -  Understand your data center by overlaying multiple information layers

Little Data w/o Connection

9

❖ Multiple data sources

Page 38: OSDC 2014: Christian Kniep -  Understand your data center by overlaying multiple information layers

!

❖ No way of connecting them

Little Data w/o Connection

9

❖ Multiple data sources

Page 39: OSDC 2014: Christian Kniep -  Understand your data center by overlaying multiple information layers

!

!

❖ Connecting is manual labour

!

❖ No way of connecting them

Little Data w/o Connection

9

❖ Multiple data sources

Page 40: OSDC 2014: Christian Kniep -  Understand your data center by overlaying multiple information layers

!

!

!

❖ Experience driven

!

!

❖ Connecting is manual labour

!

❖ No way of connecting them

Little Data w/o Connection

9

❖ Multiple data sources

Page 41: OSDC 2014: Christian Kniep -  Understand your data center by overlaying multiple information layers

!

!

!

!

❖ Niche solutions misleading

!

!

!

❖ Experience driven

!

!

❖ Connecting is manual labour

!

❖ No way of connecting them

Little Data w/o Connection

9

❖ Multiple data sources

Page 42: OSDC 2014: Christian Kniep -  Understand your data center by overlaying multiple information layers

IB + QNIBng Motivation

10

Page 43: OSDC 2014: Christian Kniep -  Understand your data center by overlaying multiple information layers

Modular Switch

11

❖ Looks like one „switch“!

Page 44: OSDC 2014: Christian Kniep -  Understand your data center by overlaying multiple information layers

Modular Switch

12

❖ Looks like one „switch“!

❖ Composed of a network itself

Page 45: OSDC 2014: Christian Kniep -  Understand your data center by overlaying multiple information layers

Modular Switch

13

❖ Looks like one „switch“!

❖ Composed of a network itself!

❖ Which route is taken is transparent to application!

❖ LB1<>FB1<>LB4

Page 46: OSDC 2014: Christian Kniep -  Understand your data center by overlaying multiple information layers

Modular Switch

14

❖ Looks like one „switch“!

❖ Composed of a network itself!

❖ Which route is taken is transparent to application!

❖ LB1<>FB1<>LB4!

❖ LB1<>FB2<>LB4

Page 47: OSDC 2014: Christian Kniep -  Understand your data center by overlaying multiple information layers

Modular Switch

15

❖ Looks like one „switch“!

❖ Composed of a network itself!

❖ Which route is taken is transparent to application!

❖ LB1<>FB1<>LB4!

❖ LB1<>FB2<>LB4!

❖ LB1 ->FB1 ->LB4 / LB1 <-FB2 <-LB4

Page 48: OSDC 2014: Christian Kniep -  Understand your data center by overlaying multiple information layers

Debug-Nightmare

16

❖ Job seems to fail due to bad internal link

Page 49: OSDC 2014: Christian Kniep -  Understand your data center by overlaying multiple information layers

!

❖ 96 port switch

Debug-Nightmare

16

❖ Job seems to fail due to bad internal link

Page 50: OSDC 2014: Christian Kniep -  Understand your data center by overlaying multiple information layers

!

!

❖ multiple autonomous job-cells

!

❖ 96 port switch

Debug-Nightmare

16

❖ Job seems to fail due to bad internal link

Page 51: OSDC 2014: Christian Kniep -  Understand your data center by overlaying multiple information layers

!

!

!

❖ Relevant information!

❖ Job status (Resource Scheduler)!

❖ Routes (IB Subnet Manager)!

❖ IB Counter (Command Line)

!

!

❖ multiple autonomous job-cells

!

❖ 96 port switch

Debug-Nightmare

16

❖ Job seems to fail due to bad internal link

Page 52: OSDC 2014: Christian Kniep -  Understand your data center by overlaying multiple information layers

!

!

!

!

!

!

!

❖ changing one plug, recomputes routes :)

!

!

!

❖ Relevant information!

❖ Job status (Resource Scheduler)!

❖ Routes (IB Subnet Manager)!

❖ IB Counter (Command Line)

!

!

❖ multiple autonomous job-cells

!

❖ 96 port switch

Debug-Nightmare

16

❖ Job seems to fail due to bad internal link

Page 53: OSDC 2014: Christian Kniep -  Understand your data center by overlaying multiple information layers

Communication Networks

IBPM: Demo Overview Background: InfiniBand (IB)

Rate Measurement in IB Networks

IBPM: Demo Scenarios

IBPM: Open-Source-Based InfiniBand Performance Monitoring

Michael Hoefling, Michael Menth, Christian Kniep, and Marcus Camen: "IBPM: An Open-Source-Based Framework for InfiniBand Performance Monitoring", in Proceedings of the 16th GI/ITG Conference on Measurement, Modeling, and Evaluation of Computer and Communication Systems (MMB) and Dependability and Fault Tolerance (DFT), March 2012, Kaiserslautern, Germany

University of Tuebingen · Sand 13 · 72076 Tübingen Phone: +49-7071-29-70507

[email protected] http://kn.inf.uni-tuebingen.de/staff/hoefling

IBPM: An Open-Source-Based Framework for InfiniBand Performance Monitoring Michael Hoefling1, Michael Menth1, Christian Kniep2, Marcus Camen2

1 These authors are with the University of Tuebingen, Tuebingen, Germany. 2 These authors are with science+computing ag, Tuebingen, Germany.

f State-of-the art communication technology for interconnection in high-performance computing data centers

f Point-to-point bidirectional links f High throughput (40 Gbit/s with QDR) f Low latency f Dynamic on-line network reconfiguration

in cooperation with

Idea f Extract raw network information from IB network f Analyze output f Derive statistics about performance of the network Topology Extraction f Subnet discovery using ibnetdiscover f Produces human readable file of network topology f Process output to produce graphical representation of the

network Remote Counter Readout f Each port has its own set of performance counters f Counters measure, e.g., transferred data, congestion, errors,

link states changes

Features f Automatic topology extraction and visualization f Visualization of traffic locality f Visualization of link utilization f Visualization of congestion f Visualization of port performance history Architecture

ibsim-Based Network Simulation f ibsim simulates an IB network f Simple topology changes possible (GUI) f ibsim limitations

� No performance simulation possible � No data rate changes possible

Real IB Network f Physical network f Allows performance measurements f GUI controlled traffic scenarios

Scenario 1: Topology Changes f Node and/or switch becomes unavailable f Connectivity state is represented in the topology map Scenario 2: Port Performance and Link Utilization f Nodes communicate with each other f Port performance accessible through simple point-and-click

interface on a node or switch f Link utilization is visualized through utilization-based-coloring of

the links in the performance map Scenario 3: Traffic Locality f Nodes use pre-defined traffic patterns f Traffic locality is visualized through locality-based-coloring of

the switches in the locality map

17

Page 54: OSDC 2014: Christian Kniep -  Understand your data center by overlaying multiple information layers

OpenSM

18

Page 55: OSDC 2014: Christian Kniep -  Understand your data center by overlaying multiple information layers

Sw

OpenSM

18

OpenSM

nodenode

Swnode

nodenode

node

node

Page 56: OSDC 2014: Christian Kniep -  Understand your data center by overlaying multiple information layers

❖ OpenSM Performance Manager

Sw

OpenSM

18

OpenSM

PerfMgmt

nodenode

Swnode

nodenode

node

node

Page 57: OSDC 2014: Christian Kniep -  Understand your data center by overlaying multiple information layers

!

❖ Sends token to all ports

❖ OpenSM Performance Manager

Sw

OpenSM

18

OpenSM

PerfMgmt

nodenode

Swnode

nodenode

node

node

Page 58: OSDC 2014: Christian Kniep -  Understand your data center by overlaying multiple information layers

!

!

❖ All ports reply with metrics

!

❖ Sends token to all ports

❖ OpenSM Performance Manager

Sw

OpenSM

18

OpenSM

PerfMgmt

nodenode

Swnode

nodenode

node

node

Page 59: OSDC 2014: Christian Kniep -  Understand your data center by overlaying multiple information layers

!

!

!

❖ Callback triggered for every reply

!

!

❖ All ports reply with metrics

!

❖ Sends token to all ports

❖ OpenSM Performance Manager

Sw

OpenSM

18

OpenSM

PerfMgmt

nodenode

Swnode

nodenode

node

node

Page 60: OSDC 2014: Christian Kniep -  Understand your data center by overlaying multiple information layers

!

!

!

❖ Callback triggered for every reply

!

!

❖ All ports reply with metrics

!

❖ Sends token to all ports

❖ OpenSM Performance Manager

Sw

OpenSM

18

OpenSM

PerfMgmtosmeventplugin

nodenode

Swnode

nodenode

node

node

❖ osmeventplugin

Page 61: OSDC 2014: Christian Kniep -  Understand your data center by overlaying multiple information layers

!

!

!

❖ Callback triggered for every reply

!

❖ Dumps info to file

!

!

❖ All ports reply with metrics

!

❖ Sends token to all ports

❖ OpenSM Performance Manager

Sw

OpenSM

18

OpenSM

PerfMgmtosmeventplugin

nodenode

Swnode

nodenode

node

node

❖ osmeventplugin

Page 62: OSDC 2014: Christian Kniep -  Understand your data center by overlaying multiple information layers

!

!

!

❖ Callback triggered for every reply

!

❖ Dumps info to file

!

!

❖ All ports reply with metrics

!

❖ Sends token to all ports

❖ OpenSM Performance Manager

Sw

OpenSM

18

OpenSM

PerfMgmt

nodenode

Swnode

nodenode

node

node

❖ osmeventplugin

Page 63: OSDC 2014: Christian Kniep -  Understand your data center by overlaying multiple information layers

OpenSM

PerfMgmt

OpenSM

19

Page 64: OSDC 2014: Christian Kniep -  Understand your data center by overlaying multiple information layers

OpenSM

PerfMgmtqnib

OpenSM

19

❖ qnib

Page 65: OSDC 2014: Christian Kniep -  Understand your data center by overlaying multiple information layers

OpenSM

PerfMgmtqnib

OpenSM

19

!

❖ sends metrics to RRDtool !

❖ events to PostgreSQL

❖ qnib

Page 66: OSDC 2014: Christian Kniep -  Understand your data center by overlaying multiple information layers

OpenSM

PerfMgmt

qnibng

OpenSM

19

!

❖ sends metrics to RRDtool !

❖ events to PostgreSQL

❖ qnib

❖ qnibng

Page 67: OSDC 2014: Christian Kniep -  Understand your data center by overlaying multiple information layers

OpenSM

PerfMgmt

qnibng

OpenSM

19

!

❖ sends metrics to RRDtool !

❖ events to PostgreSQL

❖ qnib

!

❖ sends metrics to graphite !

❖ events to logstash

❖ qnibng

Page 68: OSDC 2014: Christian Kniep -  Understand your data center by overlaying multiple information layers

OpenSM

PerfMgmt

qnibng

OpenSM

19

!

❖ sends metrics to RRDtool !

❖ events to PostgreSQL

❖ qnib

!

❖ sends metrics to graphite !

❖ events to logstash

❖ qnibng

Page 69: OSDC 2014: Christian Kniep -  Understand your data center by overlaying multiple information layers

Graphite Events port is up/down

20

Page 70: OSDC 2014: Christian Kniep -  Understand your data center by overlaying multiple information layers

21

Page 71: OSDC 2014: Christian Kniep -  Understand your data center by overlaying multiple information layers

22

Page 72: OSDC 2014: Christian Kniep -  Understand your data center by overlaying multiple information layers

QNIBTerminal Proof of Concept

23

Page 73: OSDC 2014: Christian Kniep -  Understand your data center by overlaying multiple information layers

Cluster Stack Mock-Up❖ IB events and metrics are not enough!

❖ How to get real-world behavior?!

❖ Wanted:!

❖ Slurm (Resource Scheduler)!

❖ MPI enabled compute nodes!

❖ As much additional cluster stack as possible (Graphite,elasticsearch/logstash/kibana, Icinga, Cluster-FS, …)

24

Page 74: OSDC 2014: Christian Kniep -  Understand your data center by overlaying multiple information layers

Classical Virtualization

❖ Big overhead for simple node!

❖ Resources provisioned in advance!

❖ Host resources allocated

25

Page 75: OSDC 2014: Christian Kniep -  Understand your data center by overlaying multiple information layers

LXC (docker)

❖ minimal overhead ( couple of MB)!

❖ no resource pinning!

❖ cgroups option!

❖ highly automatable

26

Page 76: OSDC 2014: Christian Kniep -  Understand your data center by overlaying multiple information layers

LXC (docker)

❖ minimal overhead ( couple of MB)!

❖ no resource pinning!

❖ cgroups option!

❖ highly automatable

26

NOW: Watch OSDC2014 talk ‚Docker‘ by ‚Tobias Schwab‘

Page 77: OSDC 2014: Christian Kniep -  Understand your data center by overlaying multiple information layers

Virtual Cluster Nodes

27

host

Page 78: OSDC 2014: Christian Kniep -  Understand your data center by overlaying multiple information layers

Virtual Cluster Nodes

❖ Master Node (etcd, DNS, slurmctld)

27

host

master

Page 79: OSDC 2014: Christian Kniep -  Understand your data center by overlaying multiple information layers

!

❖ monitoring (graphite + statsd)

Virtual Cluster Nodes

❖ Master Node (etcd, DNS, slurmctld)

27

host

master

monitoring

Page 80: OSDC 2014: Christian Kniep -  Understand your data center by overlaying multiple information layers

!

!

❖ log mgmt (ELK)

!

❖ monitoring (graphite + statsd)

Virtual Cluster Nodes

❖ Master Node (etcd, DNS, slurmctld)

27

host

master

monitoring

log mgm

t

Page 81: OSDC 2014: Christian Kniep -  Understand your data center by overlaying multiple information layers

!

!

!

❖ compute nodes (slurmd)

!

!

❖ log mgmt (ELK)

!

❖ monitoring (graphite + statsd)

Virtual Cluster Nodes

❖ Master Node (etcd, DNS, slurmctld)

27

host

master

monitoring

log mgm

tcom

pute0com

pute1

computeN

Page 82: OSDC 2014: Christian Kniep -  Understand your data center by overlaying multiple information layers

!

!

!

!

❖ alarming (Icinga) [not integrated]

!

!

!

❖ compute nodes (slurmd)

!

!

❖ log mgmt (ELK)

!

❖ monitoring (graphite + statsd)

Virtual Cluster Nodes

❖ Master Node (etcd, DNS, slurmctld)

27

host

master

monitoring

log mgm

tcom

pute0com

pute1

computeN

Page 83: OSDC 2014: Christian Kniep -  Understand your data center by overlaying multiple information layers

Master Node

❖ takes care of inventory (etcd)!

❖ provides DNS (+PTR)!

❖ Integrate Rudder, ansible, chef,…?

28

Page 84: OSDC 2014: Christian Kniep -  Understand your data center by overlaying multiple information layers

Non-Master Nodes (in general)

❖ are started with master as DNS!

❖ mounting /scratch, /chome (sits on SSDs)!

❖ supervisord kicks in and starts services and setup-scripts!

❖ sending metrics to graphite!

❖ logs to logstash

29

Page 85: OSDC 2014: Christian Kniep -  Understand your data center by overlaying multiple information layers

docker-compute

❖ slurmd!

❖ sshd!

❖ logstash-forwarder!

❖ openmpi!

❖ qperf

30

Page 86: OSDC 2014: Christian Kniep -  Understand your data center by overlaying multiple information layers

docker-compute

❖ slurmd!

❖ sshd!

❖ logstash-forwarder!

❖ openmpi!

❖ qperf

30

Page 87: OSDC 2014: Christian Kniep -  Understand your data center by overlaying multiple information layers

docker-compute

❖ slurmd!

❖ sshd!

❖ logstash-forwarder!

❖ openmpi!

❖ qperf

30

Page 88: OSDC 2014: Christian Kniep -  Understand your data center by overlaying multiple information layers

docker-compute

❖ slurmd!

❖ sshd!

❖ logstash-forwarder!

❖ openmpi!

❖ qperf

30

Page 89: OSDC 2014: Christian Kniep -  Understand your data center by overlaying multiple information layers

docker-compute

❖ slurmd!

❖ sshd!

❖ logstash-forwarder!

❖ openmpi!

❖ qperf

30

Page 90: OSDC 2014: Christian Kniep -  Understand your data center by overlaying multiple information layers

docker-compute

❖ slurmd!

❖ sshd!

❖ logstash-forwarder!

❖ openmpi!

❖ qperf

30

Page 91: OSDC 2014: Christian Kniep -  Understand your data center by overlaying multiple information layers

docker-compute

❖ slurmd!

❖ sshd!

❖ logstash-forwarder!

❖ openmpi!

❖ qperf

30

Page 92: OSDC 2014: Christian Kniep -  Understand your data center by overlaying multiple information layers

docker-graphite (monitoring)

❖ full graphite stack + statsd!

❖ stresses IO (<3 SSDs)!

❖ needs more care (optimize IO)

31

Page 93: OSDC 2014: Christian Kniep -  Understand your data center by overlaying multiple information layers

docker-elk (Log Mgmt)

❖ elasticsearch, logstash, kibana!

❖ inputs: syslog, lumberjack!

❖ filters: none!

❖ outputs: elasticsearch

32

Page 94: OSDC 2014: Christian Kniep -  Understand your data center by overlaying multiple information layers

It’s alive!

33

Page 95: OSDC 2014: Christian Kniep -  Understand your data center by overlaying multiple information layers

Start Compute Node

34

Page 96: OSDC 2014: Christian Kniep -  Understand your data center by overlaying multiple information layers

Start Compute Node

35

Page 97: OSDC 2014: Christian Kniep -  Understand your data center by overlaying multiple information layers

Check Slurm Config

36

Page 98: OSDC 2014: Christian Kniep -  Understand your data center by overlaying multiple information layers

Check Slurm Config

36

Page 99: OSDC 2014: Christian Kniep -  Understand your data center by overlaying multiple information layers

Check Slurm Config

36

Page 100: OSDC 2014: Christian Kniep -  Understand your data center by overlaying multiple information layers

Check Slurm Config

36

Page 101: OSDC 2014: Christian Kniep -  Understand your data center by overlaying multiple information layers

Check Slurm Config

36

Page 102: OSDC 2014: Christian Kniep -  Understand your data center by overlaying multiple information layers

Run MPI-Job

37

Page 103: OSDC 2014: Christian Kniep -  Understand your data center by overlaying multiple information layers

Run MPI-Job

37

Page 104: OSDC 2014: Christian Kniep -  Understand your data center by overlaying multiple information layers

Run MPI-Job

37

Page 105: OSDC 2014: Christian Kniep -  Understand your data center by overlaying multiple information layers

TCP benchmark

38

Page 106: OSDC 2014: Christian Kniep -  Understand your data center by overlaying multiple information layers

QNIBTerminal Future Work

39

Page 107: OSDC 2014: Christian Kniep -  Understand your data center by overlaying multiple information layers

docker-icinga

40

❖ Icinga to provide !

❖ state-of-the-cluster overview!

❖ bundle with graphite/elk!

❖ no big deal…

Page 108: OSDC 2014: Christian Kniep -  Understand your data center by overlaying multiple information layers

docker-icinga

40

❖ Icinga to provide !

❖ state-of-the-cluster overview!

❖ bundle with graphite/elk!

❖ no big deal…

!

!

!

!

❖ Is this going to scale?

Page 109: OSDC 2014: Christian Kniep -  Understand your data center by overlaying multiple information layers

docker-(GlusterFS,Lustre)

❖ Cluster scratch to integrate with!

❖ Use of kernel-modules freezes attempt!

❖ Might be pushed in VirtualBox (vagrant)

41

Page 110: OSDC 2014: Christian Kniep -  Understand your data center by overlaying multiple information layers

❖ How is SysOps/DevOps/Mgmt

Humans!

42

Page 111: OSDC 2014: Christian Kniep -  Understand your data center by overlaying multiple information layers

!

❖ react to the changes

❖ How is SysOps/DevOps/Mgmt

Humans!

42

Page 112: OSDC 2014: Christian Kniep -  Understand your data center by overlaying multiple information layers

!

!

❖ adopt them

!

❖ react to the changes

❖ How is SysOps/DevOps/Mgmt

Humans!

42

Page 113: OSDC 2014: Christian Kniep -  Understand your data center by overlaying multiple information layers

!

!

!

❖ feared by them

!

!

❖ adopt them

!

❖ react to the changes

❖ How is SysOps/DevOps/Mgmt

Humans!

42

Page 114: OSDC 2014: Christian Kniep -  Understand your data center by overlaying multiple information layers

❖ Truckload of

Big Data!

43

Page 115: OSDC 2014: Christian Kniep -  Understand your data center by overlaying multiple information layers

!

❖ Events

❖ Truckload of

Big Data!

43

Page 116: OSDC 2014: Christian Kniep -  Understand your data center by overlaying multiple information layers

!

!

❖ Metrics

!

❖ Events

❖ Truckload of

Big Data!

43

Page 117: OSDC 2014: Christian Kniep -  Understand your data center by overlaying multiple information layers

!

!

!

❖ Interaction

!

!

❖ Metrics

!

❖ Events

❖ Truckload of

Big Data!

43

Page 118: OSDC 2014: Christian Kniep -  Understand your data center by overlaying multiple information layers

!

!

!

❖ Interaction

!

!

❖ Metrics

!

❖ Events

❖ Truckload of

Big Data!

43

node01.system.memory.usage 9!node13.system.memory.usage 14!node35.system.memory.usage 12!node95.system.memory.usage 11

Page 119: OSDC 2014: Christian Kniep -  Understand your data center by overlaying multiple information layers

!

!

!

❖ Interaction

!

!

❖ Metrics

!

❖ Events

❖ Truckload of

Big Data!

43

node01.system.memory.usage 9!node13.system.memory.usage 14!node35.system.memory.usage 12!node95.system.memory.usage 11

target=sumSeries(node{01,13,35,95}.system.memory.usage)

Page 120: OSDC 2014: Christian Kniep -  Understand your data center by overlaying multiple information layers

!

!

!

❖ Interaction

!

!

❖ Metrics

!

❖ Events

❖ Truckload of

Big Data!

43

job1.node01.system.memory.usage 9!job1.node13.system.memory.usage 14!job1.node35.system.memory.usage 12!job1.node95.system.memory.usage 11

node01.system.memory.usage 9!node13.system.memory.usage 14!node35.system.memory.usage 12!node95.system.memory.usage 11

target=sumSeries(node{01,13,35,95}.system.memory.usage)

Page 121: OSDC 2014: Christian Kniep -  Understand your data center by overlaying multiple information layers

!

!

!

❖ Interaction

!

!

❖ Metrics

!

❖ Events

❖ Truckload of

Big Data!

43

job1.node01.system.memory.usage 9!job1.node13.system.memory.usage 14!job1.node35.system.memory.usage 12!job1.node95.system.memory.usage 11

target=sumSeries(job01.*.system.memory.usage)

node01.system.memory.usage 9!node13.system.memory.usage 14!node35.system.memory.usage 12!node95.system.memory.usage 11

target=sumSeries(node{01,13,35,95}.system.memory.usage)

Page 122: OSDC 2014: Christian Kniep -  Understand your data center by overlaying multiple information layers

pipework / mininet

❖ Currently all containers are bound to docker0 bridge!

❖ Creating topology with virtual/real switches would be nice!

❖ First iteration might use pipework!

❖ More complete one should use vSwitches (mininet?)

44

Page 123: OSDC 2014: Christian Kniep -  Understand your data center by overlaying multiple information layers

Dockerfiles❖ Only 3 images are fd20 based

45

Page 124: OSDC 2014: Christian Kniep -  Understand your data center by overlaying multiple information layers

Questions?❖ Pictures!

❖ p2: http://de.wikipedia.org/wiki/Datei:Audi_logo.svg http://commons.wikimedia.org/wiki/File:Daimler_AG.svg http://ffb.uni-lueneburg.de/20JahreFFB/!

❖ p4: https://www.flickr.com/photos/adeneko/4229090961!

❖ p6: cae t100 https://www.flickr.com/photos/losalamosnatlab/7422429706!

❖ p8: http://www.brendangregg.com/Slides/SCaLE_Linux_Performance2013.pdf!

❖ p9: https://www.flickr.com/photos/riafoge/6796129047!

❖ p10: https://www.flickr.com/photos/119364768@N03/12928685224/!

❖ p11: http://www.mellanox.com/page/products_dyn?product_family=74 !

❖ p23: https://www.flickr.com/photos/jaxport/3077543062!

❖ p25/26: https://blog.trifork.com/2013/08/08/next-step-in-virtualization-docker-lightweight-containers/!

❖ p33: https://www.flickr.com/photos/fkehren/5139094564!

❖ p39: https://www.flickr.com/photos/brizzlebornandbred/12852909293

46