process monitoring in unix shell scripting
DESCRIPTION
Quick overview of how to write a process monitor in Unix Shell Scripting.TRANSCRIPT
CIS 216
Dan Morrill
Process Monitoring in Shell Scripting
TopGets you a list of processes that are consuming the CPU
htop Near real time list of running processes by CPU, includes
scrolling, and mouse supportvmstat
Provides information about processes, memory, paging, I/O, traps and CPU
w/who/finger Provides information about users that are consuming
resources on the computerps (ps –ef)
Lists all the currently running processes on a Linux computer
How to get a process list
pgrep/pkillpgrep <process name> lists the PID of the process
based on namepkill <process name> sends a specific kill signal
(default sigterm or shutdown) to a matching processfree
Shows the current memory usage of the system. Shows physical and swap memory
mpstatmpstat 2 5 - shows five set of data of global statistics
among all processors at two second intervals.mpstat –P ALL 2 5 - shows 5 sets of statistics for all
processors at two second intervals.
Other types of process mapping
iostatreports CPU statists for devices and partitions
(including NFS Samba partitions)pmap
This command reports memory map of a process. This can be used to find memory usage of the process.
Other types of process mapping
Set the debug mode for this, you will want it, remember what each debug mode switch does1. # set -n : Uncomment to check script syntax,
without execution.2. # Note: Do not forget to put the comment
back in or3. # the shell script will not execute!4. # set -x : Uncomment to debug this shell
script
Script 4A (Choose either 4A or 4B)
PROC_MON=`basename $0` # Defines the script_name variable as the file name of this script
LOGFILE="/home/ganesh/procmon.log" # Shows log file and where located[[ ! -s $LOGFILE ]] && touch $LOGFILE # This checks to see if the file exists # if not it creates one.
TTY=$(tty) # Current tty or pty
PROCESS="ssh" # This will define which process to monitor
SLEEP_TIME="1" # This is the sleep time in second between monitoring
txtred=$(tput setaf 1) # Red: will indicate a failed process and the informationtxtgrn=$(tput setaf 2) # Green: this is successful process informationtxtylw=$(tput setaf 3) # Yellow: this is used to show cautionary informationtxtrst=$(tput sgr0) # resets text
Declare your variables
function exit_trap # this is the behavior of the trap signal{# Log an ending time for process monitoring DATE=$(date +%D) TIME=$(date +%T) # Get a new timestamp... echo "$DATE @ $TIME: Monitoring for $PROCESS terminated" >> $LOGFILE & # this will create an entry in the logfile echo "$DATE @ $TIME: ${txtred}Monitoring for $PROCESS terminated${txtrst}"
#kill all functions kill -9 $(jobs -p) 2>/dev/null
Define the function to monitor a process
Set the trap to see if the process exitstrap 'exit_trap; exit 0' 1 2 3 15
# this will see if process is running if not will start it
ps aux | grep "$PROCESS" | grep -v "grep $PROCESS" \| grep -v $PROC_MON >/dev/null
Set the trap for the process
if (( $? != 0 ))then DATE=$(date +%D) TIME=$(date +%T) echo echo "$DATE @ $TIME: $PROCESS is NOT active...starting $PROCESS.." >> $LOGFILE & # creates # an entry in the logfile echo "$DATE @ $TIME: ${txtylw}$PROCESS is NOT active...starting $PROCESS..${txtrst}" echosleep 1 service $PROCESS start & echo "$DATE @ $TIME: $PROCESS has been started..." >> $LOGFILE & #puts an enrty in logfile else # this will say what to do if process is already running echo -e "\n" # a blank line DATE=$(date +%D) TIME=$(date +%T) echo "$DATE @ $TIME: $PROCESS is currnetly RUNNING..." >> $LOGFILE & # puts entry in logfile echo "$DATE @ $TIME: ${txtgrn}$PROCESS is currently RUNNING...${txtrst}"fi
Set user output
while (( RC == 0 )) # this will loop until the return code is not zerodo ps aux | grep $PROCESS | grep -v "grep $PROCESS" \ | grep -v $PROC_MON >/dev/null 2>&1 if (( $? != 0 )) # check the return code then echo DATE=$(date +%D) TIME=$(date +%T) echo "$DATE @ $TIME: $PROCESS has STOPPED..." >> $LOGFILE & # entry in logfile echo "$DATE @ $TIME: ${txtred}$PROCESS has STOPPED...${txtrst}" echo service $PROCESS start & echo "$DATE @ $TIME: $PROCESS has RESTARTED..." >> $LOGFILE & # ENTRY IN LOGFILE echo "$DATE @ $TIME: ${txtgrn}$PROCESS has RESTARTED...${txtrst}" sleep 1
Loop the process so it always monitors
ps aux | grep $PROCESS | grep -v "grep $PROCESS" \ | grep -v $PROC_MON >/dev/null 2>&1 if (( $? != 0 )) # This will check the return code then echo DATE=$(date +%D) # New time stamp TIME=$(date +%T) echo "$DATE @ $TIME: $PROCESS failed to restart..." >> $LOGFILE & #entry in logfile echo "$DATE @ $TIME: ${txtred}$PROCESS failed to restart...${txtrst}" exit 0fifi sleep $SLEEP_TIME # This is needed to reduce CPU Load!!!done
Check to see if the process restarted
Process is hard coded in the script# Process to be monitored
target="ssh"
Script 4b Select the process
wait_time="10“This is in seconds
Select the wait time to check
log_file="procmon.log"
Select where the output log goes
script_failure="0"
Check to see if the script restarted the process successfully
# Monitor process and restart if necessaryfor attempt in 1 2 3do ps aux | grep "$target" | grep -v "grep $target" \ | grep -v $script_name >/dev/null if [ $? != 0 ] then log_time=$(date) echo echo "$(tput setaf 3)$target is not running. Attempt will be made to restart. This is attempt $attempt of 3.$(tput sgr0)" echo >>$log_file echo "$log_time: $target is not running. Restarting. Attempt $attempt of 3.">>$log_file echo service $target start & sleep 2 # Pause to prevent false positives from restart attempt. else attempt="3" fidonesleep 2 # Pause to prevent false positives from restart attempt.}
Core process monitor and restart
detect_failure(){ps aux | grep "$target" | grep -v "grep $target" \| grep -v $script_name >/dev/nullif [ $? != 0 ]then log_time=$(date) echo echo "$(tput setaf 1)$target is not running after 3 attempts. Process has failed and cannot be restarted. $(tput sgr0)" # Report failure to user echo "This script will now close." echo "">>$log_file echo "$log_time: $target cannot be restarted.">>$log_file # Log failure script_failure="1" # Set failure flagelse log_time=$(date) echo echo "$log_time : $target is running." echo "$log_time : $target is running." >> $log_filefi}
Core restart
program_closing(){# Report and log script shutdownlog_time=$(date)echoecho "Closing ProcMon script. No further monitoring of $target will be performed." #Reports closing of ProcMon to userechoecho "$(tput setaf 1)$log_time: Monitoring for $target terminated. $(tput sgr0)"echoecho "$log_time: Monitoring for $target terminated.">>$log_file # Logs closing of ProcMon to log_fileecho >> $log_fileecho "***************" >> $log_fileecho >> $log_file
# Ensure this script is properly killedkill -9 > /dev/null}
Core script stop
# Trap shutdown attempts to enable logging of shutdowntrap 'program_closing; exit 0' 1 2 3 15
# Inform user of purpose of scriptclearechoecho "This script will monitor $target to ensure that it is running,"echo "and attempt to restart it if it is not. If it is unable to"echo "restart after 3 attempts, it will report failure and close."sleep 2
#Perform monitoringwhile [ $script_failure != "1" ]do process_monitoring # Monitors process and attempts 3 restarts if it fails. detect_failure # Reports failure in the event that the process does not restart. if [ $script_failure != "1" ] then sleep $wait_time fidone
sleep 2program_closing # Logs script closure
exit 0
Core set traps
Questions?