Admin Mistakes: Solaris killall
So, with this post I’m introducing another new category named mistakes where I’ll be posting some mistakes I have done that will hopefully help other sysadmins avoid them.
Background
It is late afternoon and you have about 15 SSH sessions open on various servers. The operating systems range from Linux, AIX, Solaris and a couple BSD derivatives. While doing some benchmark you realize that the benchmark application is probably stuck on some infinite loop and it is eating up all system’s memory (which was not expected). So, you try to kill it…
killall bench_application
Since this was not a production system we didn’t really care and as we all know ‘kill’ could take some time to terminate the application depending on the application’s signal handler. So, after calling ‘kill’ you go back to some other urgent tasks you were doing on the other servers.
Then you start getting alert emails for the memory usage on this server, so you open up the SSH session and you accidentally type…
killall
Instead of:
killall -s KILL bench_application
But then you realize that you’re on a Solaris server (not on the Linux you thought you were)! And more specifically, on a development server where software developers have shell access for testing and building.
Problem
On Solaris, ‘killall’ is used to terminate all active processes. Meaning that all users got disconnected and any running building process was terminated.
Mistake
So, of course the mistake was that I was not paying the appropriate attention on the SSH sessions I had open.
Resolution
Just think at least twice before you hit that return key. :P
correct solution: use pkill instead of killall.
Mr Admin
October 19, 2011 at 21:15
Although the above is just a typographic mistake that accidentally happened to be a valid command, I might suggest to all busy sysadmins that they distinguish 15 different ssh sessions by 15 different prompts, each one with the hostname of the machine included.
uh
October 26, 2011 at 20:43
I totally agree with you and whenever I can proceed to such changes it’s the first thing I do. However, in this case as well as in many other cases it’s a company’s (more specifically IT department’s) policy which cannot be changed that easily.
As you probably already know, most companies have policies for everything including: naming (hostnames, aliases, etc.), shell prompts, encodings/locales, etc.
Finally, keep in mind that in the majority of these tasks not all of the servers you’re administrating are your company’s. You also have access on systems hosted on other companies which should follow some of their policies.
xorl
October 27, 2011 at 08:37
handler() {
echo -n “Do you really, really want to killall? [Y/N] ”
read -N 1 inp
echo
if test “$inp” = “Y” -o “$inp” = “N”; then
“$@”
fi
}
ret
November 1, 2011 at 02:22
Agreed I would use pkill rather than killall!!!
kcope
November 10, 2011 at 15:22