servers and processes: behavior and analysis
TRANSCRIPT
![Page 1: Servers and Processes: Behavior and Analysis](https://reader036.vdocuments.mx/reader036/viewer/2022062513/554f56eeb4c905b9508b5134/html5/thumbnails/1.jpg)
Servers and Processes
Servers and ProcessesBehavior and AnalysisBehavior and Analysis
![Page 2: Servers and Processes: Behavior and Analysis](https://reader036.vdocuments.mx/reader036/viewer/2022062513/554f56eeb4c905b9508b5134/html5/thumbnails/2.jpg)
The Next 90 MinutesThe Next 90 Minutes
Introduction
Servers, a mental model
Getting hands on
Processes
Wrapping it up
![Page 3: Servers and Processes: Behavior and Analysis](https://reader036.vdocuments.mx/reader036/viewer/2022062513/554f56eeb4c905b9508b5134/html5/thumbnails/3.jpg)
CaveatsCaveats
Tutorial aimed at people barely familiar with Linux consoles
Little server knowledge is assumed
Many advanced things are glossed over
...but feel free to ask!
The slides will be available online
![Page 4: Servers and Processes: Behavior and Analysis](https://reader036.vdocuments.mx/reader036/viewer/2022062513/554f56eeb4c905b9508b5134/html5/thumbnails/4.jpg)
Your PresenterYour Presenter
Mark Smith <[email protected]>
Co-founded Dreamwidth Studios, but works at Bump Technologies (http://bu.mp/)
Spent time at Google, Mozilla, others
Sysadmin, MySQL DBA, engineer, ...
![Page 5: Servers and Processes: Behavior and Analysis](https://reader036.vdocuments.mx/reader036/viewer/2022062513/554f56eeb4c905b9508b5134/html5/thumbnails/5.jpg)
ServersServers
![Page 6: Servers and Processes: Behavior and Analysis](https://reader036.vdocuments.mx/reader036/viewer/2022062513/554f56eeb4c905b9508b5134/html5/thumbnails/6.jpg)
ServersServers
Machines that take input and make output
Made up of components: RAM, CPU, I/O
Each component has various capacities
Systems Administration: the understanding, care, and feeding of all these disparate components (among other things)
![Page 7: Servers and Processes: Behavior and Analysis](https://reader036.vdocuments.mx/reader036/viewer/2022062513/554f56eeb4c905b9508b5134/html5/thumbnails/7.jpg)
ComponentsComponents
Capacity
Latency
Throughput
Full state
Thrash state
![Page 8: Servers and Processes: Behavior and Analysis](https://reader036.vdocuments.mx/reader036/viewer/2022062513/554f56eeb4c905b9508b5134/html5/thumbnails/8.jpg)
RAMRAM
Capacity measured in bytes (GB usually)
Latency measured in nanoseconds
Throughput measured in bytes/second
Full state: can’t add more, but no real loss of performance
Thrash state: not very relevant
![Page 9: Servers and Processes: Behavior and Analysis](https://reader036.vdocuments.mx/reader036/viewer/2022062513/554f56eeb4c905b9508b5134/html5/thumbnails/9.jpg)
Disk (Rotational)Disk (Rotational)Capacity measured in bytes (GB or TB)
Latency measured in milliseconds
Throughput measured in bytes/second
Full state: can’t add more, but otherwise fine
Thrash state: server and process starvation, performance drops drastically
![Page 10: Servers and Processes: Behavior and Analysis](https://reader036.vdocuments.mx/reader036/viewer/2022062513/554f56eeb4c905b9508b5134/html5/thumbnails/10.jpg)
Disk (SSD)Disk (SSD)Capacity measured in bytes (GB or TB)
Latency measured in milliseconds (but 100x faster than rotational disks)
Throughput measured in bytes/second
Full state: can’t add more, but otherwise fine
Thrash state: obviated by lack of rotation
![Page 11: Servers and Processes: Behavior and Analysis](https://reader036.vdocuments.mx/reader036/viewer/2022062513/554f56eeb4c905b9508b5134/html5/thumbnails/11.jpg)
CPUCPUCapacity measured in operations per second, also known as hertz (MHz, GHz, etc)
Throughput and latency of a CPU are very advanced things most sysadmins don’t need to worry about (e.g., optimizing for L1 cache and local RAM in NUMA systems)
Full/thrash state: system/process starvation
![Page 12: Servers and Processes: Behavior and Analysis](https://reader036.vdocuments.mx/reader036/viewer/2022062513/554f56eeb4c905b9508b5134/html5/thumbnails/12.jpg)
NetworkNetworkCapacity not relevant
Latency measured in milliseconds (usually)
Throughput measured in bits/second and usually 1 Gbps (10 Gbps becoming common)
Full state: dropped packets, behavior depends on protocol (i.e., TCP or UDP)
Thrash state: not relevant
![Page 13: Servers and Processes: Behavior and Analysis](https://reader036.vdocuments.mx/reader036/viewer/2022062513/554f56eeb4c905b9508b5134/html5/thumbnails/13.jpg)
Timing ComparisonsTiming Comparisons
1 second - tick, tock, tick, tock, ...
1,000 milliseconds (ms) per second
1,000,000 microseconds (µs) per second
1,000,000,000 nanoseconds (ns) per second
![Page 14: Servers and Processes: Behavior and Analysis](https://reader036.vdocuments.mx/reader036/viewer/2022062513/554f56eeb4c905b9508b5134/html5/thumbnails/14.jpg)
Timing (Part 2)Timing (Part 2)
One seek on a rotational disk is ~6ms
SSD seeks are about 100µs: 60x faster than a rotational seek
RAM seeks are about 60ns: 1,666x faster than an SSD seek (100,000x faster than a rotational seek!)
![Page 15: Servers and Processes: Behavior and Analysis](https://reader036.vdocuments.mx/reader036/viewer/2022062513/554f56eeb4c905b9508b5134/html5/thumbnails/15.jpg)
Hands On Time!Hands On Time!
![Page 16: Servers and Processes: Behavior and Analysis](https://reader036.vdocuments.mx/reader036/viewer/2022062513/554f56eeb4c905b9508b5134/html5/thumbnails/16.jpg)
SSH to the VMSSH to the VM
Open your local terminal (PuTTY in Windows, iTerm/Terminal/etc in Mac OS X, whatever you like in Linux)
ssh -p 2222 [email protected]
Password is “demo”
Please be nice :)
![Page 17: Servers and Processes: Behavior and Analysis](https://reader036.vdocuments.mx/reader036/viewer/2022062513/554f56eeb4c905b9508b5134/html5/thumbnails/17.jpg)
It’s dark in here.It’s dark in here.
Heartbeat the machine
uptime How’s it doing?
free -m How’s the RAM?
df -h How’re the disks?
![Page 18: Servers and Processes: Behavior and Analysis](https://reader036.vdocuments.mx/reader036/viewer/2022062513/554f56eeb4c905b9508b5134/html5/thumbnails/18.jpg)
Load AverageLoad Average
It’s a seat-of-the-pants number
Rule of thumb: low is good, high might be bad
You have to learn how your machines work for this number to mean much
![Page 19: Servers and Processes: Behavior and Analysis](https://reader036.vdocuments.mx/reader036/viewer/2022062513/554f56eeb4c905b9508b5134/html5/thumbnails/19.jpg)
Top of the WorldTop of the World
Easy way to see what’s running and what is consuming the most resources
top
Press “P” to sort by Processor usage
Press “M” to sort by Memory usage
![Page 20: Servers and Processes: Behavior and Analysis](https://reader036.vdocuments.mx/reader036/viewer/2022062513/554f56eeb4c905b9508b5134/html5/thumbnails/20.jpg)
Exhibit #1Exhibit #1
Now I will do something on the machine
Run through your heartbeat steps again: uptime, free -m, df -h, top
Remember to sort top by P and M
What has changed? What is going on?
![Page 21: Servers and Processes: Behavior and Analysis](https://reader036.vdocuments.mx/reader036/viewer/2022062513/554f56eeb4c905b9508b5134/html5/thumbnails/21.jpg)
Results #1Results #1
You probably noticed 1-cpu.pl
It’s pushing the CPU to 100%
Is it broken? Is this bad?
Know your software and systems (very important to know what normal is)
![Page 22: Servers and Processes: Behavior and Analysis](https://reader036.vdocuments.mx/reader036/viewer/2022062513/554f56eeb4c905b9508b5134/html5/thumbnails/22.jpg)
Exhibit #2Exhibit #2
Now I will do something else
Run through your heartbeat steps again: uptime, free -m, df -h, top
Remember to sort top by P and M
What has changed? What is going on?
![Page 23: Servers and Processes: Behavior and Analysis](https://reader036.vdocuments.mx/reader036/viewer/2022062513/554f56eeb4c905b9508b5134/html5/thumbnails/23.jpg)
Results #2Results #2
Lots of memory is being consumed
It’s some 2-memory.pl command
Does the machine feel sluggish? Each command takes a second to start and stop?
What is going on here?
![Page 24: Servers and Processes: Behavior and Analysis](https://reader036.vdocuments.mx/reader036/viewer/2022062513/554f56eeb4c905b9508b5134/html5/thumbnails/24.jpg)
vmstatvmstatThe vmstat tool tells us useful things about the state of the kernel and resource usage
Try: vmstat -SM 1
Watch while I run the test again
Note the si/so and bi/bo columns
Now notice the CPU columns on the right
![Page 25: Servers and Processes: Behavior and Analysis](https://reader036.vdocuments.mx/reader036/viewer/2022062513/554f56eeb4c905b9508b5134/html5/thumbnails/25.jpg)
SwapSwapRAM is a finite resource
Not all RAM is used equally
Kernel tracks usage of pages
Kernel can write RAM to disk and free it up
This is called swapping: you store RAM on disk. Remember the timing slide!
![Page 26: Servers and Processes: Behavior and Analysis](https://reader036.vdocuments.mx/reader036/viewer/2022062513/554f56eeb4c905b9508b5134/html5/thumbnails/26.jpg)
Swap (Part 2)Swap (Part 2)Swap is useful mostly on consumer machines
In most server environments, swap is death
Disks are hundreds to thousands of times (or more!) slower than RAM
Generally, any active swapping is bad
![Page 27: Servers and Processes: Behavior and Analysis](https://reader036.vdocuments.mx/reader036/viewer/2022062513/554f56eeb4c905b9508b5134/html5/thumbnails/27.jpg)
Exhibit #3Exhibit #3
Try uptime, free -m, df -h, top again
Also, try: iostat -kx 1
Watch the %util column as this test runs
Also the bi/bo columns in vmstat
What is going on here?
![Page 28: Servers and Processes: Behavior and Analysis](https://reader036.vdocuments.mx/reader036/viewer/2022062513/554f56eeb4c905b9508b5134/html5/thumbnails/28.jpg)
Results #3Results #3
Disk usage is high
RAM is not full
CPU is not pegged
Machine responds well
Disk utilization at 100%
![Page 29: Servers and Processes: Behavior and Analysis](https://reader036.vdocuments.mx/reader036/viewer/2022062513/554f56eeb4c905b9508b5134/html5/thumbnails/29.jpg)
What does it mean?What does it mean?
Based on the various data you’ve gathered, is the machine healthy and happy with this program running on it?
Why or why not?
Discussion.
![Page 30: Servers and Processes: Behavior and Analysis](https://reader036.vdocuments.mx/reader036/viewer/2022062513/554f56eeb4c905b9508b5134/html5/thumbnails/30.jpg)
Solutions?Solutions?This program is using more RAM or CPU than the machine has available
Program can be optimized to use less
Machine can be upgraded to have more
Simple problem, straightforward solutions
(Straightforward does not always mean easy)
![Page 31: Servers and Processes: Behavior and Analysis](https://reader036.vdocuments.mx/reader036/viewer/2022062513/554f56eeb4c905b9508b5134/html5/thumbnails/31.jpg)
ProgramsPrograms
![Page 32: Servers and Processes: Behavior and Analysis](https://reader036.vdocuments.mx/reader036/viewer/2022062513/554f56eeb4c905b9508b5134/html5/thumbnails/32.jpg)
ProgramsPrograms
Software that runs on a machine
Has traits such as single- or multi-threaded, compiled or interpreted, etc
Requires certain resources and inputs
Makes certain outputs
![Page 33: Servers and Processes: Behavior and Analysis](https://reader036.vdocuments.mx/reader036/viewer/2022062513/554f56eeb4c905b9508b5134/html5/thumbnails/33.jpg)
More ConstraintsMore Constraints
Programs have more constraints to consider
Open files and sockets (file descriptors)
Permissions (depend on user/group)
CPU limits (depends on threads)
![Page 34: Servers and Processes: Behavior and Analysis](https://reader036.vdocuments.mx/reader036/viewer/2022062513/554f56eeb4c905b9508b5134/html5/thumbnails/34.jpg)
Exhibit #4Exhibit #4
There’s a program running now, but something is wrong with it
Use the usual tools (uptime, free -m, df -h, top)
System looks OK...
![Page 35: Servers and Processes: Behavior and Analysis](https://reader036.vdocuments.mx/reader036/viewer/2022062513/554f56eeb4c905b9508b5134/html5/thumbnails/35.jpg)
File LimitsFile Limits
Programs have certain limits
Get the PID of the 4-files.pl program
ps aufx | grep 4-files
cat /proc/PID/limits
![Page 36: Servers and Processes: Behavior and Analysis](https://reader036.vdocuments.mx/reader036/viewer/2022062513/554f56eeb4c905b9508b5134/html5/thumbnails/36.jpg)
lsoflsof
See what files a program has open
lsof -np PID
Woah, lots! At the limit? Count them:
lsof -np PID | wc -l
![Page 37: Servers and Processes: Behavior and Analysis](https://reader036.vdocuments.mx/reader036/viewer/2022062513/554f56eeb4c905b9508b5134/html5/thumbnails/37.jpg)
But... a problem?But... a problem?
But is this a problem? Well, it is if the program is trying to open more files
How do we tell?
Software calls open, which is a system call
![Page 38: Servers and Processes: Behavior and Analysis](https://reader036.vdocuments.mx/reader036/viewer/2022062513/554f56eeb4c905b9508b5134/html5/thumbnails/38.jpg)
System CallsSystem Calls
The kernel provides certain services
Almost all I/O goes through the kernel
Current time, fork, cd, exec, etc etc
Requires a small context switch
Can lead to “sys” CPU usage
![Page 39: Servers and Processes: Behavior and Analysis](https://reader036.vdocuments.mx/reader036/viewer/2022062513/554f56eeb4c905b9508b5134/html5/thumbnails/39.jpg)
stracestrace
System calls made by a process can be traced
Let’s look at 4-files again:
sudo strace -p PID
Look at the “open” line, is it OK?
![Page 40: Servers and Processes: Behavior and Analysis](https://reader036.vdocuments.mx/reader036/viewer/2022062513/554f56eeb4c905b9508b5134/html5/thumbnails/40.jpg)
Results #4Results #4
Clearly this program is broken
Several fixes... open fewer files, raise your limits, etc
(We won’t cover the specifics of raising limits, you can search Google if you need it)
![Page 41: Servers and Processes: Behavior and Analysis](https://reader036.vdocuments.mx/reader036/viewer/2022062513/554f56eeb4c905b9508b5134/html5/thumbnails/41.jpg)
It’s all turtles.It’s all turtles.
Linux uses “files” and “filesystems” a lot
Sockets are just “files”, they use the same file descriptor number space
Result: “Max open files” includes sockets
They also show up in lsof, too!
![Page 42: Servers and Processes: Behavior and Analysis](https://reader036.vdocuments.mx/reader036/viewer/2022062513/554f56eeb4c905b9508b5134/html5/thumbnails/42.jpg)
Exhibit #5Exhibit #5
Let me give us a new program
Get the PID, remember how?
ps aufx | grep 5-network
Look at the files: lsof -np PID
Note the “TCP” file!
![Page 43: Servers and Processes: Behavior and Analysis](https://reader036.vdocuments.mx/reader036/viewer/2022062513/554f56eeb4c905b9508b5134/html5/thumbnails/43.jpg)
Test the ServerTest the Server
telnet 182.255.123.52 7000
(This server is slow, it might take a bit)
A very simple timeserver
Now: strace -p PID
![Page 44: Servers and Processes: Behavior and Analysis](https://reader036.vdocuments.mx/reader036/viewer/2022062513/554f56eeb4c905b9508b5134/html5/thumbnails/44.jpg)
The TraceThe Traceaccept(3, {sa_family=AF_INET, sin_port=htons(39474),
sin_addr=inet_addr("127.0.0.1")}, [16]) = 4
ioctl(4, SNDCTL_TMR_TIMEBASE or TCGETS, 0x7fff73f27608) = -1 ENOTTY ...
lseek(4, 0, SEEK_CUR) = -1 ESPIPE (Illegal seek)
ioctl(4, SNDCTL_TMR_TIMEBASE or TCGETS, 0x7fff73f27608) = -1 ENOTTY ...
lseek(4, 0, SEEK_CUR) = -1 ESPIPE (Illegal seek)
fcntl(4, F_SETFD, FD_CLOEXEC) = 0
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=118, ...}) = 0
write(4, "The time is: Wed Feb 6 13:34:22"..., 38) = 38
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigaction(SIGCHLD, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
nanosleep({1, 0}, 0x7fff73f28880) = 0
write(4, "Thank you for visiting!\n", 24) = 24
close(4) = 0
![Page 45: Servers and Processes: Behavior and Analysis](https://reader036.vdocuments.mx/reader036/viewer/2022062513/554f56eeb4c905b9508b5134/html5/thumbnails/45.jpg)
The TraceThe Traceaccept(3, {sa_family=AF_INET, sin_port=htons(39474),
sin_addr=inet_addr("127.0.0.1")}, [16]) = 4
ioctl(4, SNDCTL_TMR_TIMEBASE or TCGETS, 0x7fff73f27608) = -1 ENOTTY ...
lseek(4, 0, SEEK_CUR) = -1 ESPIPE (Illegal seek)
ioctl(4, SNDCTL_TMR_TIMEBASE or TCGETS, 0x7fff73f27608) = -1 ENOTTY ...
lseek(4, 0, SEEK_CUR) = -1 ESPIPE (Illegal seek)
fcntl(4, F_SETFD, FD_CLOEXEC) = 0
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=118, ...}) = 0
write(4, "The time is: Wed Feb 6 13:34:22"..., 38) = 38
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigaction(SIGCHLD, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
nanosleep({1, 0}, 0x7fff73f28880) = 0
write(4, "Thank you for visiting!\n", 24) = 24
close(4) = 0
![Page 46: Servers and Processes: Behavior and Analysis](https://reader036.vdocuments.mx/reader036/viewer/2022062513/554f56eeb4c905b9508b5134/html5/thumbnails/46.jpg)
The TraceThe Traceaccept(3, {sa_family=AF_INET, sin_port=htons(39474),
sin_addr=inet_addr("127.0.0.1")}, [16]) = 4
ioctl(4, SNDCTL_TMR_TIMEBASE or TCGETS, 0x7fff73f27608) = -1 ENOTTY ...
lseek(4, 0, SEEK_CUR) = -1 ESPIPE (Illegal seek)
ioctl(4, SNDCTL_TMR_TIMEBASE or TCGETS, 0x7fff73f27608) = -1 ENOTTY ...
lseek(4, 0, SEEK_CUR) = -1 ESPIPE (Illegal seek)
fcntl(4, F_SETFD, FD_CLOEXEC) = 0
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=118, ...}) = 0
write(4, "The time is: Wed Feb 6 13:34:22"..., 38) = 38
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigaction(SIGCHLD, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
nanosleep({1, 0}, 0x7fff73f28880) = 0
write(4, "Thank you for visiting!\n", 24) = 24
close(4) = 0
![Page 47: Servers and Processes: Behavior and Analysis](https://reader036.vdocuments.mx/reader036/viewer/2022062513/554f56eeb4c905b9508b5134/html5/thumbnails/47.jpg)
The TraceThe Traceaccept(3, {sa_family=AF_INET, sin_port=htons(39474),
sin_addr=inet_addr("127.0.0.1")}, [16]) = 4
ioctl(4, SNDCTL_TMR_TIMEBASE or TCGETS, 0x7fff73f27608) = -1 ENOTTY ...
lseek(4, 0, SEEK_CUR) = -1 ESPIPE (Illegal seek)
ioctl(4, SNDCTL_TMR_TIMEBASE or TCGETS, 0x7fff73f27608) = -1 ENOTTY ...
lseek(4, 0, SEEK_CUR) = -1 ESPIPE (Illegal seek)
fcntl(4, F_SETFD, FD_CLOEXEC) = 0
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=118, ...}) = 0
write(4, "The time is: Wed Feb 6 13:34:22"..., 38) = 38
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigaction(SIGCHLD, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
nanosleep({1, 0}, 0x7fff73f28880) = 0
write(4, "Thank you for visiting!\n", 24) = 24
close(4) = 0
![Page 48: Servers and Processes: Behavior and Analysis](https://reader036.vdocuments.mx/reader036/viewer/2022062513/554f56eeb4c905b9508b5134/html5/thumbnails/48.jpg)
The TraceThe Traceaccept(3, {sa_family=AF_INET, sin_port=htons(39474),
sin_addr=inet_addr("127.0.0.1")}, [16]) = 4
ioctl(4, SNDCTL_TMR_TIMEBASE or TCGETS, 0x7fff73f27608) = -1 ENOTTY ...
lseek(4, 0, SEEK_CUR) = -1 ESPIPE (Illegal seek)
ioctl(4, SNDCTL_TMR_TIMEBASE or TCGETS, 0x7fff73f27608) = -1 ENOTTY ...
lseek(4, 0, SEEK_CUR) = -1 ESPIPE (Illegal seek)
fcntl(4, F_SETFD, FD_CLOEXEC) = 0
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=118, ...}) = 0
write(4, "The time is: Wed Feb 6 13:34:22"..., 38) = 38
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigaction(SIGCHLD, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
nanosleep({1, 0}, 0x7fff73f28880) = 0
write(4, "Thank you for visiting!\n", 24) = 24
close(4) = 0
![Page 49: Servers and Processes: Behavior and Analysis](https://reader036.vdocuments.mx/reader036/viewer/2022062513/554f56eeb4c905b9508b5134/html5/thumbnails/49.jpg)
The TraceThe Traceaccept(3, {sa_family=AF_INET, sin_port=htons(39474),
sin_addr=inet_addr("127.0.0.1")}, [16]) = 4
ioctl(4, SNDCTL_TMR_TIMEBASE or TCGETS, 0x7fff73f27608) = -1 ENOTTY ...
lseek(4, 0, SEEK_CUR) = -1 ESPIPE (Illegal seek)
ioctl(4, SNDCTL_TMR_TIMEBASE or TCGETS, 0x7fff73f27608) = -1 ENOTTY ...
lseek(4, 0, SEEK_CUR) = -1 ESPIPE (Illegal seek)
fcntl(4, F_SETFD, FD_CLOEXEC) = 0
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=118, ...}) = 0
write(4, "The time is: Wed Feb 6 13:34:22"..., 38) = 38
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigaction(SIGCHLD, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
nanosleep({1, 0}, 0x7fff73f28880) = 0
write(4, "Thank you for visiting!\n", 24) = 24
close(4) = 0
![Page 50: Servers and Processes: Behavior and Analysis](https://reader036.vdocuments.mx/reader036/viewer/2022062513/554f56eeb4c905b9508b5134/html5/thumbnails/50.jpg)
The TraceThe Traceaccept(3, {sa_family=AF_INET, sin_port=htons(39474),
sin_addr=inet_addr("127.0.0.1")}, [16]) = 4
ioctl(4, SNDCTL_TMR_TIMEBASE or TCGETS, 0x7fff73f27608) = -1 ENOTTY ...
lseek(4, 0, SEEK_CUR) = -1 ESPIPE (Illegal seek)
ioctl(4, SNDCTL_TMR_TIMEBASE or TCGETS, 0x7fff73f27608) = -1 ENOTTY ...
lseek(4, 0, SEEK_CUR) = -1 ESPIPE (Illegal seek)
fcntl(4, F_SETFD, FD_CLOEXEC) = 0
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=118, ...}) = 0
write(4, "The time is: Wed Feb 6 13:34:22"..., 38) = 38
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigaction(SIGCHLD, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
nanosleep({1, 0}, 0x7fff73f28880) = 0
write(4, "Thank you for visiting!\n", 24) = 24
close(4) = 0
![Page 51: Servers and Processes: Behavior and Analysis](https://reader036.vdocuments.mx/reader036/viewer/2022062513/554f56eeb4c905b9508b5134/html5/thumbnails/51.jpg)
The TraceThe Traceaccept(3, {sa_family=AF_INET, sin_port=htons(39474),
sin_addr=inet_addr("127.0.0.1")}, [16]) = 4
ioctl(4, SNDCTL_TMR_TIMEBASE or TCGETS, 0x7fff73f27608) = -1 ENOTTY ...
lseek(4, 0, SEEK_CUR) = -1 ESPIPE (Illegal seek)
ioctl(4, SNDCTL_TMR_TIMEBASE or TCGETS, 0x7fff73f27608) = -1 ENOTTY ...
lseek(4, 0, SEEK_CUR) = -1 ESPIPE (Illegal seek)
fcntl(4, F_SETFD, FD_CLOEXEC) = 0
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=118, ...}) = 0
write(4, "The time is: Wed Feb 6 13:34:22"..., 38) = 38
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigaction(SIGCHLD, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
nanosleep({1, 0}, 0x7fff73f28880) = 0
write(4, "Thank you for visiting!\n", 24) = 24
close(4) = 0
![Page 52: Servers and Processes: Behavior and Analysis](https://reader036.vdocuments.mx/reader036/viewer/2022062513/554f56eeb4c905b9508b5134/html5/thumbnails/52.jpg)
The TraceThe Traceaccept(3, {sa_family=AF_INET, sin_port=htons(39474),
sin_addr=inet_addr("127.0.0.1")}, [16]) = 4
ioctl(4, SNDCTL_TMR_TIMEBASE or TCGETS, 0x7fff73f27608) = -1 ENOTTY ...
lseek(4, 0, SEEK_CUR) = -1 ESPIPE (Illegal seek)
ioctl(4, SNDCTL_TMR_TIMEBASE or TCGETS, 0x7fff73f27608) = -1 ENOTTY ...
lseek(4, 0, SEEK_CUR) = -1 ESPIPE (Illegal seek)
fcntl(4, F_SETFD, FD_CLOEXEC) = 0
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=118, ...}) = 0
write(4, "The time is: Wed Feb 6 13:34:22"..., 38) = 38
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigaction(SIGCHLD, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
nanosleep({1, 0}, 0x7fff73f28880) = 0
write(4, "Thank you for visiting!\n", 24) = 24
close(4) = 0
![Page 53: Servers and Processes: Behavior and Analysis](https://reader036.vdocuments.mx/reader036/viewer/2022062513/554f56eeb4c905b9508b5134/html5/thumbnails/53.jpg)
Results #5Results #5
Tracing shows you data, too
Can be very valuable for finding moving parts that aren’t moving well
Combined with the other tools you can really see what is going on in your system
![Page 54: Servers and Processes: Behavior and Analysis](https://reader036.vdocuments.mx/reader036/viewer/2022062513/554f56eeb4c905b9508b5134/html5/thumbnails/54.jpg)
KernelKernel
![Page 55: Servers and Processes: Behavior and Analysis](https://reader036.vdocuments.mx/reader036/viewer/2022062513/554f56eeb4c905b9508b5134/html5/thumbnails/55.jpg)
Invisible GlueInvisible Glue
Kernel issues are fairly rare, but usually frustrating if they show up
Usually the result of some sort of limit hit
Tons of caches, buckets, and limits
Be suspicious of “powers of two” numbers
![Page 56: Servers and Processes: Behavior and Analysis](https://reader036.vdocuments.mx/reader036/viewer/2022062513/554f56eeb4c905b9508b5134/html5/thumbnails/56.jpg)
Common ChecksCommon Checks
Try: sudo dmesg
Kernel message log shows many problems
Look for suspicious messages
![Page 57: Servers and Processes: Behavior and Analysis](https://reader036.vdocuments.mx/reader036/viewer/2022062513/554f56eeb4c905b9508b5134/html5/thumbnails/57.jpg)
“Suspicious”“Suspicious”
Out of memory: Kill process 19393 (2-memory.pl) score 90 or sacrifice child
nf_conntrack: Table full, dropping packet
ata7.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
![Page 58: Servers and Processes: Behavior and Analysis](https://reader036.vdocuments.mx/reader036/viewer/2022062513/554f56eeb4c905b9508b5134/html5/thumbnails/58.jpg)
More Places to LookMore Places to Look
The /var/log directory has much data
Generally in a problem state, look for recently updated files: ls -lart
Loud logs are often unhappy logs
Hardware failure is often noted in one of the log files
![Page 59: Servers and Processes: Behavior and Analysis](https://reader036.vdocuments.mx/reader036/viewer/2022062513/554f56eeb4c905b9508b5134/html5/thumbnails/59.jpg)
SummarySummary
![Page 60: Servers and Processes: Behavior and Analysis](https://reader036.vdocuments.mx/reader036/viewer/2022062513/554f56eeb4c905b9508b5134/html5/thumbnails/60.jpg)
ProcessProcessCheck the components: CPU, RAM, disks
Find what limits are being hit and by what
If the system is fine, it’s probably software
Trace the program, check the logs
Analyze well before you fix
![Page 61: Servers and Processes: Behavior and Analysis](https://reader036.vdocuments.mx/reader036/viewer/2022062513/554f56eeb4c905b9508b5134/html5/thumbnails/61.jpg)
Familiarity!Familiarity!
Systems administration done only as an afterthought will be painful and hard
Be familiar with your servers and your software
Keep a shell open, watch top throughout the day, watch the disks, etc
![Page 62: Servers and Processes: Behavior and Analysis](https://reader036.vdocuments.mx/reader036/viewer/2022062513/554f56eeb4c905b9508b5134/html5/thumbnails/62.jpg)
Next StepsNext Steps
Certain tools make life easier
Nagios for monitoring (e.g., alert you when CPU exceeds 90%)
Cacti/Ganglia/OpenTSDB for trending
Fabric for multiple machine operations
Puppet/Chef for configuration management
![Page 63: Servers and Processes: Behavior and Analysis](https://reader036.vdocuments.mx/reader036/viewer/2022062513/554f56eeb4c905b9508b5134/html5/thumbnails/63.jpg)
Thanks!Thanks!Questions?Questions?