xorl %eax, %eax

Linux kernel clone(2)/execve(2) Userspace Memory Corruption

leave a comment »

I’ve read this email on oss-security by Eugene Teo two days ago and it is definitely an interesting bug. It is a design flaw in the clone(2) system call implementation. As you read from clone(2) man page it includes the following options:

CLONE_CHILD_SETTID (since Linux 2.5.49)
     Store child thread ID at location child_tidptr in child memory.

CLONE_CHILD_CLEARTID (since Linux 2.5.49)
     Erase child thread ID at location child_tidptr in child memory 
     when the child exits, and do a wakeup on the
     futex  at that address.  The address involved may be changed by 
     the set_tid_address(2) system call. This is used by threading 
     libraries.

Now, both flags are set in kernel/fork.c by copy_process() routine like this…

/*
 * This creates a new process as a copy of the old one,
 * but does not actually start it yet.
 *
 * It copies the registers, and all the appropriate
 * parts of the process environment (as per the clone
 * flags). The actual kick-off is left to the caller.
 */
static struct task_struct *copy_process(unsigned long clone_flags,
                                        unsigned long stack_start,
                                        struct pt_regs *regs,
                                        unsigned long stack_size,
                                        int __user *child_tidptr,
                                        struct pid *pid,
                                        int trace)
{
        int retval;
        struct task_struct *p;
        int cgroup_callbacks_done = 0;

   ... Skipping LOTS lines of code ...

        p->tgid = p->pid;
        if (clone_flags & CLONE_THREAD)
                p->tgid = current->tgid;
   ...
        p->set_child_tid = (clone_flags & CLONE_CHILD_SETTID) ? child_tidptr : NULL;
        /*
         * Clear TID on mm_release()?
         */
        p->clear_child_tid = (clone_flags & CLONE_CHILD_CLEARTID) ? child_tidptr: NULL;
   ...
        return ERR_PTR(retval);
}

So, if any of those flags are set then clear_child_tid is set to the user controlled child_tidptr. The kernel doesn’t clear this flag as you can see here. During the execve(2) of the new process, the kernel is going to copy the process’ memory with the new one, overwriting the current->clear_child_tid which is the user controlled child_tidptr!
As Michael K. Johnson of rPath pointed out, execve(2) does not reset clear_child_tid flag on SUID applications! In addition, Eugene Teo gave an outline of events in order to trigger this bug which you can find here. To fix this bug, mm_release() was updated like this:

 	 * trouble otherwise.  Userland only wants this done for a sys_exit.
 	 */
-	if (tsk->clear_child_tid
-	    && !(tsk->flags & PF_SIGNALED)
-	    && atomic_read(&mm->mm_users) > 1) {
-		u32 __user * tidptr = tsk->clear_child_tid;
+	if (tsk->clear_child_tid) {
+		if (!(tsk->flags & PF_SIGNALED) &&
+		    atomic_read(&mm->mm_users) > 1) {
+			/*
+			 * We don't check the error code - if userspace has
+			 * not set up a proper pointer then tough luck.
+			 */
+			put_user(0, tsk->clear_child_tid);
+			sys_futex(tsk->clear_child_tid, FUTEX_WAKE,
+					1, NULL, NULL, 0);
+		}
 		tsk->clear_child_tid = NULL;
-
-		/*
-		 * We don't check the error code - if userspace has
-		 * not set up a proper pointer then tough luck.
-		 */
-		put_user(0, tidptr);
-		sys_futex(tidptr, FUTEX_WAKE, 1, NULL, NULL, 0);
 	}

As you can see it copies 0 to clear_child_tid and then wakes up its futex(2) lock. Really interesting vulnerability. :)

Written by xorl

August 6, 2009 at 13:41

Posted in bugs, linux

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s