Admin Mistakes: Solaris killall
So, with this post I’m introducing another new category named mistakes where I’ll be posting some mistakes I have done that will hopefully help other sysadmins avoid them.
It is late afternoon and you have about 15 SSH sessions open on various servers. The operating systems range from Linux, AIX, Solaris and a couple BSD derivatives. While doing some benchmark you realize that the benchmark application is probably stuck on some infinite loop and it is eating up all system’s memory (which was not expected). So, you try to kill it…
Since this was not a production system we didn’t really care and as we all know ‘kill’ could take some time to terminate the application depending on the application’s signal handler. So, after calling ‘kill’ you go back to some other urgent tasks you were doing on the other servers.
Then you start getting alert emails for the memory usage on this server, so you open up the SSH session and you accidentally type…
killall -s KILL bench_application
But then you realize that you’re on a Solaris server (not on the Linux you thought you were)! And more specifically, on a development server where software developers have shell access for testing and building.
On Solaris, ‘killall’ is used to terminate all active processes. Meaning that all users got disconnected and any running building process was terminated.
So, of course the mistake was that I was not paying the appropriate attention on the SSH sessions I had open.
Just think at least twice before you hit that return key. :P