xorl %eax, %eax

Archive for the ‘administration’ Category

Admin Mistake: F5 Load Balancer SNAT IP Address Apache Logging

with 10 comments

Background
Assume that you have a common three-tier architecture on a web farm with layers being web, application and database servers. The load balancing is performed by an F5 BIG-IP LTM 1600 load balancer and the logging takes place on the web farm that uses Apache web servers.

Problem
When you attempt to review the access logs of the Apache web servers the only IP address for all the requests is that of the F5 load balancer. Assuming that the load balancer address is 10.10.10.10, the log entries would always look like that:

10.10.10.10 - - [28/Sep/2012:15:06:18 +0000] "GET / HTTP/1.0" 200 228 "-" "Wget/1.12 (linux-gnu)"
10.10.10.10 - - [28/Sep/2012:15:06:31 +0000] "GET / HTTP/1.0" 200 228 "-" "Wget/1.12 (linux-gnu)"

Mistake
By default this F5 load balancer will perform SNAT (Source Network Address Translation) and this is why the requestor IP address is always the load balancer’s one.

Resolution
The solution is to utilize HTTP header field XFF. On the load balancer side you will first have to follow the below steps in the BIG-IP configuration utility:
– Go to “Local Traffic”
– Select “Profiles”
– On the “Services” menu choose “HTTP”
– Create a new profile by clicking on “Create”
– Activate “Insert X-Forwarded For” check box and select “Enabled” from the menu
– Finally click on “Update”
At last, you can use this new HTTP profile to the virtual servers you want to have the XFF HTTP header field.
Moving to the web server side you will have to create a new custom log format on the virtual hosts you want to have proper source IP address logging. So, here is an example custom log format that will include the XFF field.

LogFormat "%v %{X-Forwarded-For}i %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\"" CUST_F5_XFF_LOG
CustomLog /somewhere/access_log CUST_F5_XFF_LOG

And assuming that the real IP address is 2.2.2.2 while the load balancer’s is 10.10.10.10, the log entries will be:

10.10.10.10 2.2.2.2 - - [28/Sep/2012:16:48:25 +0000] "GET / HTTP/1.0" 200 228 "-" "Wget/1.12 (linux-gnu)"
10.10.10.10 2.2.2.2 - - [28/Sep/2012:16:41:28 +0000] "GET / HTTP/1.0" 200 228 "-" "Wget/1.12 (linux-gnu)"

Written by xorl

September 28, 2012 at 10:00

Admin Mistake: VMware ESX DRS Error

leave a comment »

Background
After a hardware maintenace performed by a company some virtual machines could not be backed up by the VMware Solution of Symantec NetBackup.

Problem
All the affected virtual machines were hosted on the same VMware ESX server and if you log in to the VMware vSphere you were receiving the below error.

Unable to apply DRS resource settings on host 'hostname.somewhere' in SomeDatacenter'(Reason:A general system error occured: Invalid fault). This can significantly reduce the effectiveness of DRS.

Mistake
After the hardware maintenance the engineers did not check that all the VMware services were running properly. In this case Distributed Resource Scheduler (DRS) had some issues on this specific server.

Resolution
This is very simple. A restart of the hostd daemon will almost certainly fix the problem. In this case after restarting the management services everything went back to normal operation.

[root@somewhere ~]# service mgmt-vmware restart
Stopping VMware ESX Server Management services:
   VMware ESX Server Host Agent Watchdog                   [  OK  ]
   VMware ESX Server Host Agent                            [  OK  ]
Starting VMware ESX Server Management services:
   VMware ESX Server Host Agent (background)               [  OK  ]
   Availability report startup (background)                [  OK  ]
[root@somewhere ~]#

Written by xorl

September 27, 2012 at 13:53

IBM MegaRAID BIOS Config Utility RAID-10 Configuration

with 13 comments

So, a lot of people have difficulties configuring RAID-10 using MegaRAID because if you have for example four hard disks and you add them to a Disk Group, the only available options are RAID-0 and RAID-1.
Here is how to do this on an IBM System x3650 server with four 300GB SAS hard disks.

During the system boot you will be given the following options.



You will select the “Diagnostics” (in this case using F2 key) and when MegaRAID is loaded you can use “Ctrl+H” key combination to enter the WebBIOS configuration utility.



From the above capture you can also see that there is no virtual disk configured and the controller detected four JBOD disks.
When you enter the utility, you will have the ability to select the adapter you want to configure. In our case we only have one adapter so it is very straightforward.



Next, we have the MegaRAID BIOS configuration main menu for the selected adapter.



From the main menu you select option “Configuration Wizard” and you will get the following screen.



From the above configuration types, we select “New Configuration” since there is no prior configuration. This can also be used if you want to replace the existing configuration with a new one. Before proceeding you will get a warning that this selection will erase the current configuration as shown below.



Next, you select the hard disk drives you want to add to the RAID array. To select more than one press CTRL key. In our case all four disks will be selected in order to implement the RAID-10 level.



In the next window you will have to select “Manual Configuration” if you want to create a RAID-10 array.



Below you select (once again using CTRL key) the first two disks and click “Add To Array” in order to add them to the “Drive Group0″ on the right side panel.



After adding the first two disks you click on “Accept DG” to complete the setup of this drive group and create a new one.



Follow the same procedure and add the other two disks to the new drive group and then click “Accept DG” and “Next” to continue.



You add the two newly created arrays to a span by selecting each one of them and then clicking on “Add to SPAN”.



And as you can see below, the selected RAID level is 10. Here you can tune your RAID-10 configuration and when you are ready you click on “Update Size” and then “Accept” button.



Finally, you can continue by hitting “Next” and after the usual warning messages and final review of the configration, the RAID will start initializing.



On the bottom right you have some additional options that you can use but in any case, when the initialization process is completed the RAID-10 virtual disk will be ready to use.

Written by xorl

August 30, 2012 at 12:22

Posted in administration, ibm

Admin Mistake: Dell OMSA Not Running Properly on CentOS

leave a comment »

Background
The concept is that you have some Dell R610 server running CentOS 5.8 operating system and you are using Dell OMSA command line utilities to perform the hardware monitoring.

Problem
The monitoring checks are failing with “Unknown” status and if you attempt to locally execute the equivalent commands there is no response. For example:

[root@somewhere ~]# omreport chassis
Health

For further help, type the command followed by -?
[root@somewhere ~]#

Which of course is not the correct output.

Mistake
My initial thought was that it was missing the compatibility C++ standard library but this was not the case.

[root@somewhere ~]# rpm -qa|grep compat-libstdc++
compat-libstdc++-33-3.2.3-61
compat-libstdc++-296-2.96-138
compat-libstdc++-33-3.2.3-61
[root@somewhere ~]#

The problem was that “Systems Management Data Engine init script” was not configured to start on boot. Consequently, the required services were stopped after a reboot.

[root@somewhere ~]# service dataeng status
dsm_sa_datamgrd is stopped
dsm_sa_eventmgrd is stopped
dsm_sa_snmpd is stopped
[root@somewhere ~]#

Resolution
Quite simple… First start the init script.

[root@somewhere ~]# service dataeng start
Starting Systems Management Data Engine:
Starting dsm_sa_datamgrd:                                  [  OK  ]
Starting dsm_sa_eventmgrd:                                 [  OK  ]
Starting dsm_sa_snmpd:                                     [  OK  ]
[root@somewhere ~]#

And then make it start on boot…

[root@somewhere ~]# chkconfig dataeng on
[root@somewhere ~]# chkconfig --list dataeng
dataeng         0:off   1:off   2:on    3:on    4:on    5:on    6:off
[root@somewhere ~]#

Obviously, the utilities are now working properly.

[root@somewhere ~]# omreport chassis
Health

Main System Chassis

SEVERITY : COMPONENT
Ok       : Fans
Ok       : Intrusion
Ok       : Memory
Ok       : Power Supplies
Ok       : Power Management
Ok       : Processors
Ok       : Temperatures
Ok       : Voltages
Ok       : Hardware Log
Ok       : Batteries

For further help, type the command followed by -?

[root@somewhere ~]#

Written by xorl

August 6, 2012 at 15:05

Admin Mistakes: GNU, BSD TAR and POSIX Compatibility

with 2 comments

Background
So, you’re writing a simple shell script to archive and move some files to another host. For the archives you’re using TAR command. Simple, isn’t it?

Problem
After a couple of days you have to extract the data from an archive to search for something but when you attempt to extract them you get errors similar to the following.

tar: Ignoring unknown extended header keyword `XXXXXXXXX'
tar: Ignoring unknown extended header keyword `XXXXXXXXX'
tar: Ignoring unknown extended header keyword `XXXXXXXXX'

And of course the data are not extracted properly.

Mistake
The files were compressed on a Mac (Snow Leopard) which is using BSD TAR and the destination host was Linux (that uses GNU TAR). As you might have guessed, there is an incompatibility between BSD and GNU TAR regarding the handling of vendor extended attributes. Specifically, BSD TAR supports them (as defined in IEEE Std 1003.1-2001 (POSIX.1-2001)) while GNU TAR doesn’t.

Resolution
There are a few different options we have to avoid this mistake. The best one is to simply use either BSD or GNU TAR but not combined. The other option is to use the “–format” option in order to use a compatible format between the systems. Here is the equivalent documentation for BSD TAR:

     --format format
             (c, r, u mode only) Use the specified format for the created archive.  Supported formats
             include ``cpio'', ``pax'', ``shar'', and ``ustar''.  Other formats may also be supported; see
             libarchive-formats(5) for more information about currently-supported formats.  In r and u
             modes, when extending an existing archive, the format specified here must be compatible with
             the format of the existing archive on disk.

And for GNU TAR:

       --posix
              like --format=posix

       --format FORMAT
              selects output archive format
              v7 - Unix V7
              oldgnu - GNU tar <=1.12
              gnu - GNU tar 1.13
              ustar - POSIX.1-1988
              posix - POSIX.1-2001

Finally, you could utilize the “–pax-option” option of GNU TAR to delete these attributes. Here is its man page documentation:

       --pax-option KEYWORD-LIST
	      used  only with POSIX.1-2001 archives to modify the way tar han-
	      dles extended header keywords

For example, if your warnings were like:

tar: Ignoring unknown extended header keyword `somefile.ino'
tar: Ignoring unknown extended header keyword `somefile.nlink'

You could use option:

--pax-option="delete=somefile.{ino,nlink}"

To delete them.

Written by xorl

May 15, 2012 at 16:58

Admin Mistakes: Apache Reload and Log Files

leave a comment »

Background
So, you have a request to upload and configure a new website on some specific web server. The policy is to have a separate configuration file for each website (each new virtual host) under /etc/httpd/conf.d/ directory.

Problem
After finishing writing of the configuration file (which was about 200 lines due to numerous special requirements) you run the following command

# /etc/init.d/httpd configtest
Syntax OK

in order to check that there is no syntax error. And then you reload the Apache’s configuration…

# /etc/init.d/httpd reload
Reloading httpd:                                          [  OK  ]

However, when you check for the running Apache processes you see that it is not running.

# ps -C httpd
  PID TTY          TIME CMD
#

Now, let’s move to the next section to see what caused this problem.

Mistake
After having another look at the newly added configuration I noticed that the ‘ErrorLog’ directive was pointing to an invalid directory due to a typo. If Apache is not able to access the configured log files, it won’t start and this is what happened.

Resolution
Since each web server could host numerous websites and these were maintained by many different people, I wrote the following simple shell script that reports any missing log files.

#!/bin/sh

HTTPD_CONFS="/etc/httpd/conf.d/*.conf"
HTTPD_DIR="/etc/httpd"
RET=3

cd $HTTPD_DIR

function test_if_exists ()
{
	if [ -f $1 ]; then
		RET=0
	else
		RET=1
	fi
}

function gimmie_the_dirs ()
{
	LFILES=$(egrep '^ErrorLog|^CustomLog' $1 | awk {'print $2'} | tr '\n' ' ')
}

for i in `ls $HTTPD_CONFS`; do
	gimmie_the_dirs $i
	for j in $LFILES; do
		test_if_exists $j
		if [ $RET -eq 1 ]; then
			echo -en "ERROR: $j does not exist\n"
		fi
	done
done

This was later integrated in some shell scripts used for adding new websites and we never had this problem again.

Written by xorl

December 9, 2011 at 09:31

Admin Mistakes: Solaris killall

with 5 comments

So, with this post I’m introducing another new category named mistakes where I’ll be posting some mistakes I have done that will hopefully help other sysadmins avoid them.

Background
It is late afternoon and you have about 15 SSH sessions open on various servers. The operating systems range from Linux, AIX, Solaris and a couple BSD derivatives. While doing some benchmark you realize that the benchmark application is probably stuck on some infinite loop and it is eating up all system’s memory (which was not expected). So, you try to kill it…

killall bench_application

Since this was not a production system we didn’t really care and as we all know ‘kill’ could take some time to terminate the application depending on the application’s signal handler. So, after calling ‘kill’ you go back to some other urgent tasks you were doing on the other servers.
Then you start getting alert emails for the memory usage on this server, so you open up the SSH session and you accidentally type…

killall

Instead of:

killall -s KILL bench_application

But then you realize that you’re on a Solaris server (not on the Linux you thought you were)! And more specifically, on a development server where software developers have shell access for testing and building.

Problem
On Solaris, ‘killall’ is used to terminate all active processes. Meaning that all users got disconnected and any running building process was terminated.

Mistake
So, of course the mistake was that I was not paying the appropriate attention on the SSH sessions I had open.

Resolution
Just think at least twice before you hit that return key. :P

Written by xorl

October 19, 2011 at 09:27

Follow

Get every new post delivered to your Inbox.

Join 68 other followers