xorl %eax, %eax

Archive for the ‘mistakes’ Category

Admin Mistake: F5 Load Balancer SNAT IP Address Apache Logging

with 10 comments

Assume that you have a common three-tier architecture on a web farm with layers being web, application and database servers. The load balancing is performed by an F5 BIG-IP LTM 1600 load balancer and the logging takes place on the web farm that uses Apache web servers.

When you attempt to review the access logs of the Apache web servers the only IP address for all the requests is that of the F5 load balancer. Assuming that the load balancer address is, the log entries would always look like that: - - [28/Sep/2012:15:06:18 +0000] "GET / HTTP/1.0" 200 228 "-" "Wget/1.12 (linux-gnu)" - - [28/Sep/2012:15:06:31 +0000] "GET / HTTP/1.0" 200 228 "-" "Wget/1.12 (linux-gnu)"

By default this F5 load balancer will perform SNAT (Source Network Address Translation) and this is why the requestor IP address is always the load balancer’s one.

The solution is to utilize HTTP header field XFF. On the load balancer side you will first have to follow the below steps in the BIG-IP configuration utility:
– Go to “Local Traffic”
– Select “Profiles”
– On the “Services” menu choose “HTTP”
– Create a new profile by clicking on “Create”
– Activate “Insert X-Forwarded For” check box and select “Enabled” from the menu
– Finally click on “Update”
At last, you can use this new HTTP profile to the virtual servers you want to have the XFF HTTP header field.
Moving to the web server side you will have to create a new custom log format on the virtual hosts you want to have proper source IP address logging. So, here is an example custom log format that will include the XFF field.

LogFormat "%v %{X-Forwarded-For}i %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\"" CUST_F5_XFF_LOG
CustomLog /somewhere/access_log CUST_F5_XFF_LOG

And assuming that the real IP address is while the load balancer’s is, the log entries will be: - - [28/Sep/2012:16:48:25 +0000] "GET / HTTP/1.0" 200 228 "-" "Wget/1.12 (linux-gnu)" - - [28/Sep/2012:16:41:28 +0000] "GET / HTTP/1.0" 200 228 "-" "Wget/1.12 (linux-gnu)"

Written by xorl

September 28, 2012 at 10:00

Admin Mistake: VMware ESX DRS Error

leave a comment »

After a hardware maintenace performed by a company some virtual machines could not be backed up by the VMware Solution of Symantec NetBackup.

All the affected virtual machines were hosted on the same VMware ESX server and if you log in to the VMware vSphere you were receiving the below error.

Unable to apply DRS resource settings on host 'hostname.somewhere' in SomeDatacenter'(Reason:A general system error occured: Invalid fault). This can significantly reduce the effectiveness of DRS.

After the hardware maintenance the engineers did not check that all the VMware services were running properly. In this case Distributed Resource Scheduler (DRS) had some issues on this specific server.

This is very simple. A restart of the hostd daemon will almost certainly fix the problem. In this case after restarting the management services everything went back to normal operation.

[root@somewhere ~]# service mgmt-vmware restart
Stopping VMware ESX Server Management services:
   VMware ESX Server Host Agent Watchdog                   [  OK  ]
   VMware ESX Server Host Agent                            [  OK  ]
Starting VMware ESX Server Management services:
   VMware ESX Server Host Agent (background)               [  OK  ]
   Availability report startup (background)                [  OK  ]
[root@somewhere ~]#

Written by xorl

September 27, 2012 at 13:53

Admin Mistake: Dell OMSA Not Running Properly on CentOS

leave a comment »

The concept is that you have some Dell R610 server running CentOS 5.8 operating system and you are using Dell OMSA command line utilities to perform the hardware monitoring.

The monitoring checks are failing with “Unknown” status and if you attempt to locally execute the equivalent commands there is no response. For example:

[root@somewhere ~]# omreport chassis

For further help, type the command followed by -?
[root@somewhere ~]#

Which of course is not the correct output.

My initial thought was that it was missing the compatibility C++ standard library but this was not the case.

[root@somewhere ~]# rpm -qa|grep compat-libstdc++
[root@somewhere ~]#

The problem was that “Systems Management Data Engine init script” was not configured to start on boot. Consequently, the required services were stopped after a reboot.

[root@somewhere ~]# service dataeng status
dsm_sa_datamgrd is stopped
dsm_sa_eventmgrd is stopped
dsm_sa_snmpd is stopped
[root@somewhere ~]#

Quite simple… First start the init script.

[root@somewhere ~]# service dataeng start
Starting Systems Management Data Engine:
Starting dsm_sa_datamgrd:                                  [  OK  ]
Starting dsm_sa_eventmgrd:                                 [  OK  ]
Starting dsm_sa_snmpd:                                     [  OK  ]
[root@somewhere ~]#

And then make it start on boot…

[root@somewhere ~]# chkconfig dataeng on
[root@somewhere ~]# chkconfig --list dataeng
dataeng         0:off   1:off   2:on    3:on    4:on    5:on    6:off
[root@somewhere ~]#

Obviously, the utilities are now working properly.

[root@somewhere ~]# omreport chassis

Main System Chassis

Ok       : Fans
Ok       : Intrusion
Ok       : Memory
Ok       : Power Supplies
Ok       : Power Management
Ok       : Processors
Ok       : Temperatures
Ok       : Voltages
Ok       : Hardware Log
Ok       : Batteries

For further help, type the command followed by -?

[root@somewhere ~]#

Written by xorl

August 6, 2012 at 15:05

Admin Mistakes: GNU, BSD TAR and POSIX Compatibility

with 2 comments

So, you’re writing a simple shell script to archive and move some files to another host. For the archives you’re using TAR command. Simple, isn’t it?

After a couple of days you have to extract the data from an archive to search for something but when you attempt to extract them you get errors similar to the following.

tar: Ignoring unknown extended header keyword `XXXXXXXXX'
tar: Ignoring unknown extended header keyword `XXXXXXXXX'
tar: Ignoring unknown extended header keyword `XXXXXXXXX'

And of course the data are not extracted properly.

The files were compressed on a Mac (Snow Leopard) which is using BSD TAR and the destination host was Linux (that uses GNU TAR). As you might have guessed, there is an incompatibility between BSD and GNU TAR regarding the handling of vendor extended attributes. Specifically, BSD TAR supports them (as defined in IEEE Std 1003.1-2001 (POSIX.1-2001)) while GNU TAR doesn’t.

There are a few different options we have to avoid this mistake. The best one is to simply use either BSD or GNU TAR but not combined. The other option is to use the “–format” option in order to use a compatible format between the systems. Here is the equivalent documentation for BSD TAR:

     --format format
             (c, r, u mode only) Use the specified format for the created archive.  Supported formats
             include ``cpio'', ``pax'', ``shar'', and ``ustar''.  Other formats may also be supported; see
             libarchive-formats(5) for more information about currently-supported formats.  In r and u
             modes, when extending an existing archive, the format specified here must be compatible with
             the format of the existing archive on disk.

And for GNU TAR:

              like --format=posix

       --format FORMAT
              selects output archive format
              v7 - Unix V7
              oldgnu - GNU tar <=1.12
              gnu - GNU tar 1.13
              ustar - POSIX.1-1988
              posix - POSIX.1-2001

Finally, you could utilize the “–pax-option” option of GNU TAR to delete these attributes. Here is its man page documentation:

       --pax-option KEYWORD-LIST
	      used  only with POSIX.1-2001 archives to modify the way tar han-
	      dles extended header keywords

For example, if your warnings were like:

tar: Ignoring unknown extended header keyword `somefile.ino'
tar: Ignoring unknown extended header keyword `somefile.nlink'

You could use option:


To delete them.

Written by xorl

May 15, 2012 at 16:58

Admin Mistakes: Apache Reload and Log Files

leave a comment »

So, you have a request to upload and configure a new website on some specific web server. The policy is to have a separate configuration file for each website (each new virtual host) under /etc/httpd/conf.d/ directory.

After finishing writing of the configuration file (which was about 200 lines due to numerous special requirements) you run the following command

# /etc/init.d/httpd configtest
Syntax OK

in order to check that there is no syntax error. And then you reload the Apache’s configuration…

# /etc/init.d/httpd reload
Reloading httpd:                                          [  OK  ]

However, when you check for the running Apache processes you see that it is not running.

# ps -C httpd
  PID TTY          TIME CMD

Now, let’s move to the next section to see what caused this problem.

After having another look at the newly added configuration I noticed that the ‘ErrorLog’ directive was pointing to an invalid directory due to a typo. If Apache is not able to access the configured log files, it won’t start and this is what happened.

Since each web server could host numerous websites and these were maintained by many different people, I wrote the following simple shell script that reports any missing log files.




function test_if_exists ()
	if [ -f $1 ]; then

function gimmie_the_dirs ()
	LFILES=$(egrep '^ErrorLog|^CustomLog' $1 | awk {'print $2'} | tr '\n' ' ')

for i in `ls $HTTPD_CONFS`; do
	gimmie_the_dirs $i
	for j in $LFILES; do
		test_if_exists $j
		if [ $RET -eq 1 ]; then
			echo -en "ERROR: $j does not exist\n"

This was later integrated in some shell scripts used for adding new websites and we never had this problem again.

Written by xorl

December 9, 2011 at 09:31

Admin Mistakes: Solaris killall

with 5 comments

So, with this post I’m introducing another new category named mistakes where I’ll be posting some mistakes I have done that will hopefully help other sysadmins avoid them.

It is late afternoon and you have about 15 SSH sessions open on various servers. The operating systems range from Linux, AIX, Solaris and a couple BSD derivatives. While doing some benchmark you realize that the benchmark application is probably stuck on some infinite loop and it is eating up all system’s memory (which was not expected). So, you try to kill it…

killall bench_application

Since this was not a production system we didn’t really care and as we all know ‘kill’ could take some time to terminate the application depending on the application’s signal handler. So, after calling ‘kill’ you go back to some other urgent tasks you were doing on the other servers.
Then you start getting alert emails for the memory usage on this server, so you open up the SSH session and you accidentally type…


Instead of:

killall -s KILL bench_application

But then you realize that you’re on a Solaris server (not on the Linux you thought you were)! And more specifically, on a development server where software developers have shell access for testing and building.

On Solaris, ‘killall’ is used to terminate all active processes. Meaning that all users got disconnected and any running building process was terminated.

So, of course the mistake was that I was not paying the appropriate attention on the SSH sessions I had open.

Just think at least twice before you hit that return key. :P

Written by xorl

October 19, 2011 at 09:27


Get every new post delivered to your Inbox.

Join 62 other followers