...making Linux just a little more fun!
Ben Okopnik [ben at linuxgazette.net]
----- Forwarded message from Allan Peda <tl082@yahoo.com> -----
From: Allan Peda <tl082@yahoo.com> To: tag@lists.linuxgazette.netSent: Wednesday, May 20, 2009 11:34:27 AM
Subject: Two Cent tipI have written previously on other topics for LG, and then IBM, but it's been a while, and I'd like to first share this without creating a full article (though I'd consider one).
This is a bit long for a two cent tip, but I wanted to share a solution I came up with for long running processes that sometimes hang for an indefinite period of time. The solution I envisioned was to launch the process with a specified timeout period, so instead of running the problematic script directly, I would "wrap" it within a timeout shell function, which is no-coincidentally called "timeout". This script could signal reluctant processes that their time is up, allowing the calling procedure to catch an OS error, and respond appropriately.
Say the process that sometimes hung was called "long_data_load"; instead of running it directly from the command line (or a calling script), I would call it using the function defined below.
The unwrapped program might be:
long_data_load arg_one arg_two .... etc
which, for a timeout limit of 10 minutes, this would then become:
timeout 10 long_data_load arg_one arg_two .... etc
So, in the example above, if the script failed to complete within ten minutes, it would instead be killed (using a hard SIGKILL), and an error would be retuned. I have been using this on a production system for two months, and it has turned out to be very useful in re-attempting network intensive procedures that sometimes seem never to complete. Source code follows:
#!/bin/bash # # Allan Peda # April 17, 2009 # # function to call a long running script with a # user set timeout period # Script must have the executable bit set # # Note that "at" rounds down to the nearest minute # best to use use full path function timeout { if [[ ${1//[^[:digit:]]} != ${1} ]]; then echo "First argument of this function is timeout in minutes." >&2 return 1 fi declare -i timeout_minutes=${1:-1} shift # sanity check, can this be run at all? if [ ! -x $1 ]; then echo "Error: attempt to locate background executable failed." >&2 return 2 fi "$@" & declare -i bckrnd_pid=$! declare -i jobspec=$(echo kill -9 $bckrnd_pid |\ at now + $timeout_minutes minutes 2>&1 |\ perl -ne 's/\D+(\d+)\b.+/$1/ and print') # echo kill -9 $bckrnd_pid | at now + $timeout_minutes minutes # echo "will kill -9 $bckrnd_pid after $timeout_minutes minutes" >&2 wait $bckrnd_pid declare -i rc=$? # cleanup unused batch job atrm $jobspec return $rc } # test case: # ask child to sleep for 163 seconds # putting process into the background, the reattaching # but kill it after 2 minutes, unless it returns # before then # timeout 2 /bin/sleep 163 # echo "returned $? after $SECONDS seconds."
----- End forwarded message -----
-- * Ben Okopnik * Editor-in-Chief, Linux Gazette * http://LinuxGazette.NET *