draft: python for system administrators

68
DRAFT Python for System Administrator EuroPython 2014, 24 th July - Berlin Roberto Polli - [email protected] Babel Srl P.zza S. Benedetto da Norcia, 33 00040, Pomezia (RM) - www.babel.it 24 July 2014 Roberto Polli - [email protected]

Upload: roberto-polli

Post on 10-May-2015

409 views

Category:

Documents


3 download

DESCRIPTION

Draft of the EP14 Training

TRANSCRIPT

Page 1: DRAFT: Python for System Administrators

DRAFTPython for System Administrator

EuroPython 2014, 24th July - Berlin

Roberto Polli - [email protected]

Babel Srl P.zza S. Benedetto da Norcia, 3300040, Pomezia (RM) - www.babel.it

24 July 2014

Roberto Polli - [email protected]

Page 2: DRAFT: Python for System Administrators

DRAFTAgenda

IntroipythonPath management: 10’Encoding: 10’Data Gathering: 20’

module: psutilmodule: subprocessThe /proc filesystem

Parsing: 60’Regular Expressions

Nosetest Intermezzo: 15’Processing: 45’

DistributionsDeviationCorrelationPlotting Time

End

Roberto Polli - [email protected]

Page 3: DRAFT: Python for System Administrators

DRAFTWho? What? Why?

• Use python to replace Grep Awk Sed Perl. Speed up your daily job.• Roberto Polli - Community Manager @ Babel.it. Loves writing in C, Java

and Python. Red Hat Certified Engineer and Virtualization Administrator.• Babel – Proud sponsor of this talk ;) Delivers large mail infrastructures

based on Open Source software for Italian ISP and PA. Contributes tovarious FLOSS.

Intro Roberto Polli - [email protected]

Page 4: DRAFT: Python for System Administrators

DRAFTRequirements

• python 2.7+, ipython• course code from github

#git clone https://github.com/ioggstream/python-course• test your environment (eg. psutil, numpy, scipy, matplotlib)

#nosetests -vs test prerequisites.py• first part: nose, psutil• second part: scipy, numpy, matplotlib• ♦optional/advanced content ♦

Intro Roberto Polli - [email protected]

Page 5: DRAFT: Python for System Administrators

DRAFTHow

• Get ready before starting: code is here on github!• Type everything but #comments and try/except• Type fast with tab-completion and copy-paste• Be curious: inspect and print returned variables• Never ∗ close your iPython session: you’ll lose your precious variables

* (ok, sometimes you can).

Intro Roberto Polli - [email protected]

Page 6: DRAFT: Python for System Administrators

DRAFTReferences

• irc.freenode.net# python - The Python Community :D• Python Cookbook 3rd ed. O’Reilly - David Beazley and Brian K. Jones• Programming Python 4th ed. O’Reilly - Mark Lutz• Dive into Python3 2nd ed. Apress - Mark Pilgrim• nose.readthedocs.org• github.com/ioggstream/python-course

Intro Roberto Polli - [email protected]

Page 7: DRAFT: Python for System Administrators

DRAFTiPython I

• Interactive interpreter with tons of functionalities, and the main tool ofour training.

• The most fun way to learn and use python!• Supports tab-completion , readline , inline help• Allows pasting from clipboard with %paste , and multi-line editing with

%edit• Run it enabling plotting support:

# ipython --pylab

ipython Roberto Polli - [email protected]

Page 8: DRAFT: Python for System Administrators

DRAFTiPython II

# iPython supports inline-help appending ? to an objectstr?

# We can run commands and capture the output in a variable# don’t need to quote using the ! magic on unixret = !cat /etc/hosts

# windows has etc\hosts too ;)ret = !type c: windows\system32\drivers\etc\hosts

ipython Roberto Polli - [email protected]

Page 9: DRAFT: Python for System Administrators

DRAFTiPython III# returned objects can be filtered withret. grep (’localhost’)# Now get the first space-splitted column of the outputret. fields (0)ret.grep(’localhost’).fields(0)

# And the last returned value is stored inlocalip = _

# We can type long commands in an editor like ‘vi’ using%edit mytmp.py # type print(ret[0]), then exit (eg. wq!)> Editing... done. Executing edited code...

ipython Roberto Polli - [email protected]

Page 10: DRAFT: Python for System Administrators

DRAFTPath management: Goal

• Normalize paths on different platform• Create, copy and remove folders• Handle errors

modules: os, os.path, shutil, errnosee also: pathlib on Python 3.4+

Path management: 10’ Roberto Polli - [email protected]

Page 11: DRAFT: Python for System Administrators

DRAFTPath management: os.path, sys

basedir, hosts = "/", "etc/hosts"# Check the hosting platform with the sys modulefrom sys import platformif platform.startswith(’win’):

basedir = ’c:/windows/system32/drivers’

# Always use the os.path module!from os.path import join, normpathhosts = join(basedir, hosts)hosts = normpath(hosts)print("Normalized path is", hosts)

Path management: 10’ Roberto Polli - [email protected]

Page 12: DRAFT: Python for System Administrators

DRAFTPath management: os.path, sys

• os.path is the best way to manage paths!• multiplatform• safe

• join removes redundant ”/”• normpath fixes ”/” orientation and redundant ”..”• realpath resolves symlinks

And now, a rapid glance to other toolsPath management: 10’ Roberto Polli - [email protected]

Page 13: DRAFT: Python for System Administrators

DRAFTMove trees: shutil, os, os.path

from os import makedirs # ...tree creation...from os.path import isdir # ...checking...from shutil import copytree, rmtreemakedirs("/tmp/py/foo/bar")

# We can copy a whole tree and test itcopytree("/tmp/py/foo", "/tmp/py/foo2")assert isdir("/tmp/py/foo2/bar")

rmtree("/tmp/py/foo") # ... and finally delete itassert not isdir("/tmp/py/foo/bar")

Path management: 10’ Roberto Polli - [email protected]

Page 14: DRAFT: Python for System Administrators

DRAFTMove trees: errno

# We can use exception handlers to investigate errorstry:

# python2 does not allow to ignore existing directories...makedirs ("/tmp/py/foo/bar")# ...and raises an OSError

except OSError as e:# Just use the errno module to check the error valueimport errnoassert e.errno == errno.EEXIST

help(makedirs)

Path management: 10’ Roberto Polli - [email protected]

Page 15: DRAFT: Python for System Administrators

DRAFTEncoding: Goal

• A string more than a sequence of bytes• A string is a couple (bytes, encoding)• Use unicode literals in python2• Manage differently encoded filenames• A string is not a sequence of bytes

modules: os, os.path, glob

Encoding: 10’ Roberto Polli - [email protected]

Page 16: DRAFT: Python for System Administrators

DRAFTSong of Childhood

Als das Kind Kindwar, ging es mithangenden Armen,wollte der Bach sei einFluß, der Flußsei einStrom, und diesePfutze das Meer.Als das Kind Kindwar, wues nicht, daßesKind war, alles warihm beseelt, und alleSeelen waren eins.Als das Kind Kindwar, hatte es vonnichts eine Meinung,hatte keineGewohnheit, saßoft imSchneidersitz, lief ausdem Stand, hatteeinen Wirbel im Haarund machte keinGesicht beimfotografieren.

“‘When the child was a child,

characters were bytes, and

strings list of bytes”’

Als das Kind Kindwar, fielen ihm dieBeeren wie nurBeeren in die Handund jetzt immer noch,machten ihm diefrischen Walnusse einerauhe Zunge und jetztimmer noch, hatte esauf jedem Berg dieSehnsucht nach demimmer hoheren Berg,und in jeder Stadt dieSehnsucht nach dernoch groStadt, unddas ist immer nochso, griff im Wipfeleines Baums nachdem Kirschen ineinemHochgefuhl wieauch heute noch, eineScheu vor jedemFremden und hat sieimmer noch, wartetees auf den erstenSchnee, und wartet soimmer noch.

Encoding: 10’ Roberto Polli - [email protected]

Page 17: DRAFT: Python for System Administrators

DRAFTEncoding is a map

# Py3 doesn’t need the uthe_string = u "S\u00fcd" # Sud

# can be encoded in differentin_utf8 = the_string.encode(’utf-8’)in_win = the_string.encode(’cp1252’)

type(in_utf8) == bytes # byte-sequences

# Decoding bytes using the wrong map..# ...gives sad results ;)in_utf8.decode(’cp1252’) # SA1/4d

• Encoding is a one-to-onemap between atypographical characterand a byte-sequence

• Decoding is its reversemap

char ascii utf-8 cp1252a [97] [97] [97]u - [195, 188] [252]

Encoding: 10’ Roberto Polli - [email protected]

Page 18: DRAFT: Python for System Administrators

DRAFTEnters Encoding

# Filenames are binary data! Be careful when reading from# a (eg. vfat) filesystem!# To make python2 encoding-aware we shouldfrom __future__ import unicode_literals

# Create 3 windows-encoded filenames inbasedir = "/tmp/py"

# using the provided functionfrom course import create_wuerstelstrassecreate_wuerstelstrasse(basedir)

Encoding: 10’ Roberto Polli - [email protected]

Page 19: DRAFT: Python for System Administrators

DRAFTEncoded filenames: glob

from glob import glob as ls # expands wildcards like a shell.

files = ls("/tmp/py/*.txt") # To avoid encoding issues ...# UnicodeDecodeError : ’ascii’ codec can’t decode byte 0xFC0xFC == 252 # remember the u in cp1252 map?

files = ls( b "/tmp/py/*.txt") #..we explicitly use bytes

Encoding: 10’ Roberto Polli - [email protected]

Page 20: DRAFT: Python for System Administrators

DRAFTData Gathering: Goal

Gathering System Data with multiplatform and platform-dependent tools.• Get infos from files, /proc and /sys• Capture command output• Use psutil to get IO, CPU and memory data• Parse files with a strategy

modules: psutil, subprocess, os

Data Gathering: 20’ Roberto Polli - [email protected]

Page 21: DRAFT: Python for System Administrators

DRAFTData Gathering: grep

def grep(needle, fpath):"""is a minimal grep implementation

goal: open() is iterable and doesn’tneed splitlines()

goal: comprehension can filter iterables"""return [x for x in open(fpath) if needle in x]

# Do we have "localhost" in our "/etc/hosts"?grep("localhost", "/etc/hosts")

Data Gathering: 20’ Roberto Polli - [email protected]

Page 22: DRAFT: Python for System Administrators

DRAFTData Gathering: psutil

# The psutil module is very nice!import psutil

# Works on Windows, Linux and MacOSpsutil.cpu_percent()

# And its output is easy to managepsutil.disk_io_counters()

Exercise: Which other information does psutil provide?

Data Gathering: 20’module: psutil Roberto Polli - [email protected]

Page 23: DRAFT: Python for System Administrators

DRAFTData Gathering: Exercises

Write a vmstat-like function printing every second:• cpu usage % ;• bytes read and written in the given interval;• Hint: use psutil, time.sleep(1)• Hint: try on ipython and then write the function using

%edit vmstat.py

Data Gathering: 20’module: psutil Roberto Polli - [email protected]

Page 24: DRAFT: Python for System Administrators

DRAFTData Gathering: subprocess

# The check_output function returns the command stdoutfrom subprocess import check_output

# It takes a list as an argument!out = check_output("ping -w1 -c1 www.google.com". split ())

# and returns a stringprint(out)

Data Gathering: 20’module: subprocess Roberto Polli - [email protected]

Page 25: DRAFT: Python for System Administrators

DRAFTData Gathering: subprocess, sys

def sh(cmd, shell=False, timeout=0):"""Returns an iterable output of a command string, checking ... """from sys import version_info as python versionif python_version < (3, 3): # ..before using...

if timeout:raise ValueError("Timeout not supported")

output = check_output(cmd.split(), shell=shell)else:

output = check_output(cmd.split(), shell=shell, timeout=timeout)

return output. splitlines ()

Data Gathering: 20’module: subprocess Roberto Polli - [email protected]

Page 26: DRAFT: Python for System Administrators

DRAFTData Gathering: Exercises

Write a simple pgrep-like function for your OS which:• ppgrep signature is the following

def ppgrep(program):"""@param program - eg. firefox, explorer.exe"""raise NotImplementedError

• prints a list of processes executing ‘program‘;• Hint: use subprocess, os, and list-comprehension

items = [ x for x in a_list if ’firefox’ in x]

Data Gathering: 20’module: subprocess Roberto Polli - [email protected]

Page 27: DRAFT: Python for System Administrators

DRAFT♦Data Gathering: Parsing /proc I ♦

def linux_threads(pid):"""The Linux /proc filesystem is a cool place to get infos."""from glob import glob # replaces * and ?path = "/proc/{}/task/*/status".format(pid)

# Pick a set of fields to gather...t_info = (’Pid’, ’Tgid’, ’voluntary’) # a tuplefor t_path in glob(path):

# ...and use comprehension to get interesting data.print([x for x in open(t_path)

if x. startswith (t_info)] # accepts tuples!)

Data Gathering: 20’The /proc filesystem Roberto Polli - [email protected]

Page 28: DRAFT: Python for System Administrators

DRAFTData Gathering: Parsing /proc II

# On Linux, /proc/diskstats is the source of I/O infosdisk_l = grep("sda", "/proc/diskstats")

# To gather that data we put the headers in a multi-line stringfrom course import diskstats_headers as headers

disk_info = disk_l[0].split() # Take the 1st entry, split the datas ...zip(headers, disk_info) # ...and tie them with the headerslist(_) # On py3 you need to iterate the generator!

Data Gathering: 20’The /proc filesystem Roberto Polli - [email protected]

Page 29: DRAFT: Python for System Administrators

DRAFTData Gathering: Parsing /proc III# Or create a reusable commodity class withfrom collections import namedtuple# using headers as attributes# like the one provided by psutilDiskStats = namedtuple(’DiskStat’, headers )

# ... and disk_info as valuesdstat = DiskStats(*disk_info)dstat.device, dstat.writes_ms

# Homework: check further features withhelp(collections)

Data Gathering: 20’The /proc filesystem Roberto Polli - [email protected]

Page 30: DRAFT: Python for System Administrators

DRAFTParsing: Goal

• Plan a parsing strategy• Use basic regular expressions: match, search, sub• Benchmarking a parser• Running nosetests• Write a simple parser

modules: re, nose, %timeit

Parsing: 60’ Roberto Polli - [email protected]

Page 31: DRAFT: Python for System Administrators

DRAFTParsing is hard...

”System Administrators spent 24.3% of their work-life parsingfiles.”∗

*Independent analysis by The GASP1 Society ;)

1Grep Awk Sed PerlParsing: 60’ Roberto Polli - [email protected]

Page 32: DRAFT: Python for System Administrators

DRAFT...use a strategy!

1. Collect parsing samples2. Play in ipython and collect %history3. Write tests, then the parser4. Eventually benchmark

Parsing: 60’ Roberto Polli - [email protected]

Page 33: DRAFT: Python for System Administrators

DRAFTParsing postfix logs

# Before writing the parser, collect samples of# the interesting lines. For now justfrom course import mail_sent, mail_delivered

# and \%edit a simpledef test_sent():

hour, host, to = parse_line(mail_sent)assert hour == ’08:00:00’assert to = ’[email protected]

Parsing: 60’ Roberto Polli - [email protected]

Page 34: DRAFT: Python for System Administrators

DRAFTParsing lines: split, zip

May 31 08:00:00 test-1 postfix/smtp[169]: 7CD8E730020: to=〈[email protected]〉, relay=mx2.foo.it[10.0.4.5]:25,

...

mail_sent.split() # Start using basic strings in ipython

# Then tie them with zip/zip()fields, counting = _, zip(range(20), _)fields = fields[:7] # We just care for the first 7 values

# and pick fields singularlyhour, host, dest = fields[2], fields[3], fields[6]

Parsing: 60’ Roberto Polli - [email protected]

Page 35: DRAFT: Python for System Administrators

DRAFTParse: Exercise I

In another window• edit 03 parsing test.py• complete the parse line(line) function

def parse_line(line):"""Write your function and test it

with test_sent()"""raise NotImplementedError

%paste your solution’s code in iPython and run manually the test functions

Parsing: 60’ Roberto Polli - [email protected]

Page 36: DRAFT: Python for System Administrators

DRAFTPython Regexp

# Python supports regular expressions viaimport re

# We start showing a grep-reloaded functiondef grep(expr, fpath):

one = re.compile(expr) # ...has two lookup methods...assert ( one.match # which searches from ˆ the beginning

and one. search ) # that searches anywhere

with open(fpath) as fp:return [x for x in fp if one.search(x)]

Parsing: 60’Regular Expressions Roberto Polli - [email protected]

Page 37: DRAFT: Python for System Administrators

DRAFTSplitting with re.split

from re import split # is a very nice function

# Let’s gather some ping statsif sys.platform.startswith(’win’):

cmd = "ping -n10 www.google.it"else:

cmd = "ping -c10 -w10 www.google.it"

# Split for both space and =ping_output = [ split("[ =]", x) for x in sh(cmd)]

Parsing: 60’Regular Expressions Roberto Polli - [email protected]

Page 38: DRAFT: Python for System Administrators

DRAFTSplitting with re.findall

from re import findall # can be misused too ;)

# eg. for adding the ":" to amac = "00""24""e8""b4""33""20"

# ...using thisre_hex = ’[0-9A-Fa-f]{2}’mac_address = ’:’.join(findall(re_hex, mac))print("The mac address is ", mac_address)

Actually this does a bit of validation, requiring all chars to be in the 0-F range

Parsing: 60’Regular Expressions Roberto Polli - [email protected]

Page 39: DRAFT: Python for System Administrators

DRAFTBenchmarking in iPython I

• Parsing big files needs benchmarks. iPython %timeit magic is a goodstarting point.test_regexps = ("..", "[a-fA-F0-9]{2}")for re_s in test_regexps:

%timeit ’:’.join(findall (re_s, mac))

• We can even compare compiled and inline regexpimport refor re_s in test_regexps:

re_c = re.compile (re_s)%timeit ’:’.join(re_c.findall (mac))

Parsing: 60’Regular Expressions Roberto Polli - [email protected]

Page 40: DRAFT: Python for System Administrators

DRAFTBenchmarking in iPython II

Or find other methods:• complex...

from re import sub as sed%timeit sed(r’(..)’, r’\1:’, mac)

• ...or simple%timeit ’:’.join([ mac[i:i+2] for i in range(0,12,2)])

• Outside iPython check the timeit module

Parsing: 60’Regular Expressions Roberto Polli - [email protected]

Page 41: DRAFT: Python for System Administrators

DRAFT♦Parsing: a real world Example ♦

# Don’t need to type this VSAN configuration script# which uses linux FC information from /sys filesystemfc_id_path = "/sys/class/fc_host/host*/port_name"for x in glob(fc_id_path):

# ...we boldly skip an explicit close()pwwn = open(x).read() # 0x500143802427e66cpwwn = pwwn[2:]# ...and even use the slower but readablepwwn = re.findall(r’..’, pwwn)print("member pwwn ", ’:’.join(pwwn))

Parsing: 60’Regular Expressions Roberto Polli - [email protected]

Page 42: DRAFT: Python for System Administrators

DRAFTParsing logs: a simple solution

def parse_line(line):import re# using _ we improve readability_, _, hour, host, _, _, dest = line.split()[:7]try:

# and if dest isn’t what we expect...dest = re.split(r’[<>]’,dest)[1]

except IndexError:# ...we set it to Nonedest = None

return (hour, host, dest)

Parsing: 60’Regular Expressions Roberto Polli - [email protected]

Page 43: DRAFT: Python for System Administrators

DRAFTParsing logs: II

# Now another test for the delivered messages# %edit 03_parsing_testdef test_delivered():

hour, host, destination = parse_line(test_str_2)assert hour == ’08:00:00’# Delivery logs should have destination == Noneassert destination is None

# Exercise: fix parse_line to work with both tests# and save test

Nosetest Intermezzo: 15’ Roberto Polli - [email protected]

Page 44: DRAFT: Python for System Administrators

DRAFTRunning nosetest

• Now run the following command from a shell# nosetests -vs 03_parsing_test.py03_parsing_test.test_sent ... ok03_parsing_test.test_delivered ... okRan 2 tests in 0.001s

• Nose is a test framework.• Nose runs every file matching test *• Nose runs every function matching test *

Nosetest Intermezzo: 15’ Roberto Polli - [email protected]

Page 45: DRAFT: Python for System Administrators

DRAFTSimple Test Script

• Open the 02 nosetests simple.py filedef setup():

print("is run before the testsuite, while")def teardown():

print("after all tests")def test_one():

# name a function like test_* to run it!assert 1 == 1

def test_two():# and use assert to test for successassert 1 == 0, "I was expecting 0"

Nosetest Intermezzo: 15’ Roberto Polli - [email protected]

Page 46: DRAFT: Python for System Administrators

DRAFT♦Complete Test Script: I ♦• A more flexible script is 02 nosetests full.py which uses a Test class

class Test(object):@classmethoddef setup_class(self): # is run once at startup,

# ..eg. to create database structureprint("setup testsuite environment")open("/tmp/test2.out", "w").write("0")

@classmethoddef teardown_class(self): # is run once after all tests to...

print("cleanup testsuite environment")os.unlink("/tmp/test2.out")

Nosetest Intermezzo: 15’ Roberto Polli - [email protected]

Page 47: DRAFT: Python for System Administrators

DRAFT♦Complete Test Script: II ♦• allowing pre-post testsuite and pre-post test fixtures

class Test(object):...# Using a Test class...def setup(self):

print("is_run_before_every_test") #..and..def teardown(self):

print("after_every_test") # eg truncate a table

# each test can use the prepared environmentdef test_a(self):

assert os.path.isfile("/tmp/test2.out")Nosetest Intermezzo: 15’ Roberto Polli - [email protected]

Page 48: DRAFT: Python for System Administrators

DRAFTSimple processing: Goal

• Handle gathered data with dict() and zip()• Find data relation with scipy• Get essential information like standard deviation σ and distributions δ• Linear correlation: what’s that, when can help• Plotting

modules: numpy, scipy, scipy.stats.stats, collections, random, time

Processing: 45’ Roberto Polli - [email protected]

Page 49: DRAFT: Python for System Administrators

DRAFTThe Chicken Paradox

“‘According to latest statistics,it appears that you eat one chicken per year:and, if that doesn’t fit your budget,you’ll fit into statistic anyway,because someone will eat two.”’ C. A. Salustri

Processing: 45’ Roberto Polli - [email protected]

Page 50: DRAFT: Python for System Administrators

DRAFTSimple processing: ExerciseHow to dismantle the chicken paradox? Gather data!

• Write the following function using our parsing strategydef ping_rtt(seconds=10):

"""@return: a list of ping RTT"""from course import sh# get sample output# find a solution in ipython# test and paste the coderaise NotImplementedError

• Gather 10 seconds of ping output• Hint: reuse the sh() function• Hint: slice and filter lists using comprehension

Processing: 45’Distributions Roberto Polli - [email protected]

Page 51: DRAFT: Python for System Administrators

DRAFTDistributions: set, defaultdictA distribution or δ shows the frequency of events, like how many people ate xchickens ;)

#Create a simple δ with set and dictd = {x: rtt.count(x) for x in set(rtt)}

# We can even usefrom collections import defaultdictd = defaultdict(int)for x in rtt:

distro[x] += 1

Distributions and Mean are both important!

Processing: 45’Distributions Roberto Polli - [email protected]

Page 52: DRAFT: Python for System Administrators

DRAFTStandard Deviation: scipy

• Standard deviation or σformula isσ2(X ) :=

∑(x−x)2

n• σ tells if δ is fair or not,

and how much the mean(x) is representative

• matplotlib.mlab.normpdfis a smooth functionapproximating thehistogram

from scipy import std, meanfair = [1, 1] # chickensunfair = [0, 2] # chickensassert mean(fair) == mean(unfair)

# Use standard deviation!std(fair) # 0std(unfair) # 1

Processing: 45’Deviation Roberto Polli - [email protected]

Page 53: DRAFT: Python for System Administrators

DRAFTSimple processing: scipy

Check your computed values vs the σ returned by ping (didn’t you notice pingreturned it?)"""goal: remember to convert to numeric / float

goal: use scipygoal: check stdev"""

from scipy import std, mean # max,min are builtinrtt = ping_rtt()

print(max(rtt), min(rtt), mean(rtt), std(rtt))

Processing: 45’Deviation Roberto Polli - [email protected]

Page 54: DRAFT: Python for System Administrators

DRAFTTime Distributions: Exercise

• Parse the provided maillog in ipython using its ! magic and get an hourlyemail δ

• Expected output:time_d = { # mail delivered (removed) between

0: xxx # 00:00 - 00:591: xxx # 01:00 - 01:59..}

Processing: 45’Deviation Roberto Polli - [email protected]

Page 55: DRAFT: Python for System Administrators

DRAFTTime Distributions: Exercise Solution

# deliveder emails are like the following#May 14 16:00:04 rpolli postfix/qmgr[122]: 4DC3DA: removed"

ret = !grep removed maillog # get the interesting lines

ts = ret.fields(2) # find the timestamp (3rd column)

hours = [ int(ts) for x in ts ]time_d = {x: count(x) for x in set(hours)}

Processing: 45’Deviation Roberto Polli - [email protected]

Page 56: DRAFT: Python for System Administrators

DRAFTPlotting distributions

# To plot data..from matplotlib import pyplot as plt# and set the interactive modeplt.ion()

# Plotting an histogram...frequency, bins, _ = hist(hours)

# .. returns adistribution = dict(zip(slots,

frequency))

This server works mostly atnight...

Processing: 45’Deviation Roberto Polli - [email protected]

Page 57: DRAFT: Python for System Administrators

DRAFTSize Distributions: Exercise

• Create a size δ using hist(..., bins=...)• Hint: help(hist)

size_d = { # mail size between0: xxx # 0 - 10k1: xxx # 10k - 20k..}

• Homework: Use the size δ to find size mean and size sigma and comparewith σ and mean evaluated from the original data-series

Processing: 45’Deviation Roberto Polli - [email protected]

Page 58: DRAFT: Python for System Administrators

DRAFT♦Simulating data with σ and x ♦

Mean and a stdev are useful starting point to simulate data using the gaussiandistribution.# A mail load generator creating attachments of a given size...from random import gaussmail_size = gauss(mean, sigma_s) # a random number

# and use time_d to simulate the load during the dayfrom time import localtimehour = localtime().tm_hourmail_per_minute = time_d[hour] / 60 # minutes in hour

Processing: 45’Deviation Roberto Polli - [email protected]

Page 59: DRAFT: Python for System Administrators

DRAFTLinear Correlation

# Let’s plot the following datasets# taken from a 4-hour distributionmail_sent = [1, 5, 500, 250, 100, 7]kB_s = [70, 300, 29000, 12500, 450, 500]

# A scatter plot can suggest relations# between dataplt.scatter(mail_sent, kB_s)

Correlating Mail and Thruput

100 0 100 200 300 400 500 600kMail sent

5000

0

5000

10000

15000

20000

25000

30000

35000

Thru

put

kB

/s

Correlating mail and thruput

Processing: 45’Correlation Roberto Polli - [email protected]

Page 60: DRAFT: Python for System Administrators

DRAFTLinear CorrelationThe Pearson Coefficient ρ is a relation indicator.

0 no relation1 direct relation (both dataset increase together)

-1 inverse relation (one increase as the other decrease)

ρ(X ,Y ) =

(∑(x − x)(y − y)

)√∑

(x − x)2√∑

(y − y)2(1)

from scipy.stats.stats import pearsonrret = pearsonr(mail_sent, kB_s)print(ret)>(0.9823, 0.0004)correlation, probability = ret

Processing: 45’Correlation Roberto Polli - [email protected]

Page 61: DRAFT: Python for System Administrators

DRAFTYou must (scatter) plot!

ρ does not detect non-linear correlation

Processing: 45’Correlation Roberto Polli - [email protected]

Page 62: DRAFT: Python for System Administrators

DRAFTCombinations

# Given a table with many data seriesfrom course import tabletable = {...

’cpu_usr’: [10, 23, 55, ..],’byte_in’: [2132, 3212, 3942, ..], }

# We can combine all their names withfrom itertools import combinationslist(combinations(table,2))>[(’swap_in’, ’cpu_sys’),(’swap_in’, ’csw’), (’cpu_sys’, ’csw’)... ]

Combinating 4 suites,2 at a time.

♥♠♥♣♥♦♠♣♠♦♣♦

Processing: 45’Correlation Roberto Polli - [email protected]

Page 63: DRAFT: Python for System Administrators

DRAFTNetfishing correlation

We can try every combination between data series and check if there’s someρ.for k1, k2 in combinations(table, 2):

corr, probability = pearsonr(table[k1], table[k2])if corr < 0.5:

# I’m *still* not interested in data under this thresholdcontinue

print("linear correlation between {} and {} is {}".format(k1, k2, corr))

Processing: 45’Correlation Roberto Polli - [email protected]

Page 64: DRAFT: Python for System Administrators

DRAFTCorrelating I/O and Context SwitchNow we’ll generate some correlation plots from table data, like this one.

Processing: 45’Plotting Time Roberto Polli - [email protected]

Page 65: DRAFT: Python for System Administrators

DRAFTNetfishing correlation II

# create all combined plotfor k1, k2 in combinations(table, 2):

corr, probability = pearsonr(table[k1], table[k2])plt.scatter(table[k1], table[k2])

# 3 digit precision on titleplt.title("R={:0.3f}".format(corr))plt.xlabel(k1); plt.ylabel(k2)

# save and close the plotplt.savefig("{}_{}.png".format(k1, k2)); plt.close()

Processing: 45’Plotting Time Roberto Polli - [email protected]

Page 66: DRAFT: Python for System Administrators

DRAFTMark time with colors# Use 3 colors to mark time-slotsfrom itertools import cyclecolors = cycle(’rgb’) # Red Green Bluemy_list = range(10)

# then import a function to chunk datasetsfrom course import in_chunksin_chunks(my_list, size=4)) # returns a <generator object ...>list(_) # ... which iterates to...> [[0, 1, 2, 3], # Plotted in Red

[4, 5, 6, 7], # ..Green[8, 9]] # ..Blue

Processing: 45’Plotting Time Roberto Polli - [email protected]

Page 67: DRAFT: Python for System Administrators

DRAFTMark time with colors# Get combined data directly via itemsfor (k1, v1), (k2, v2) in combinations(table. items (), 2):

corr, probability = pearsonr(v1, v2)

# Two nice generatorstime_chunked = zip(in_chunks(v1, size=8*3600),

in_chunks(v2, size=8*3600))[plt.scatter(t1, t2, color= next(colors) ) # iterate colors!

for t1, t2 in time_chunked]

# save and close the plotplt.savefig("timed_{}_{}.png".format(k1, k2)); plt.close()

Processing: 45’Plotting Time Roberto Polli - [email protected]

Page 68: DRAFT: Python for System Administrators

DRAFTThat’s all folks!

Thank you for the attention!Roberto Polli - [email protected]

End Roberto Polli - [email protected]