xorl %eax, %eax

Linux kernel vm86(2) GS Register Pointer Dereference

leave a comment »

This cool bug was reported by Lubomir Rintel and it affects at least 2.6.24 up to (and including) 2.6.30. However, as L. Rintel states, this affects only kernels with CONFIG_LOCKDEP and CONFIG_CC_STACKPROTECTOR options enabled. vm86(2) system call, internally invokes do_sys_vm86() which can be found at arch/x86/kernel/vm86_32.c. sys_vm86() initializes a kernel_vm86_struct with the registers for virtual 8086 mode and then this code takes place (code from 2.6.30):

static void do_sys_vm86(struct kernel_vm86_struct *info, struct task_struct *tsk)
{
        struct tss_struct *tss;
/*
 * make sure the vm86() system call doesn't try to do anything silly
 */
        info->regs.pt.ds = 0;
        info->regs.pt.es = 0;
        info->regs.pt.fs = 0;

/* we are clearing gs later just before "jmp resume_userspace",
 * because it is not saved/restored.
 */

/*
 * The flags register is also special: we cannot trust that the user
 * has set it up safely, so this makes sure interrupt etc flags are
 * inherited from protected mode.
 */
        VEFLAGS = info->regs.pt.flags;
        info->regs.pt.flags &= SAFE_MASK;
        info->regs.pt.flags |= info->regs32->flags & ~SAFE_MASK;
        info->regs.pt.flags |= X86_VM_MASK;

        switch (info->cpu_type) {
     ...
/*
 * Save old state, set default return value (%ax) to 0
 */
        info->regs32->ax = 0;
     ...
        __asm__ __volatile__(
                "movl %0,%%esp\n\t"
                "movl %1,%%ebp\n\t"
                "mov  %2, %%gs\n\t"
                "jmp resume_userspace"
                : /* no outputs */
                :"r" (&info->regs), "r" (task_thread_info(tsk)), "r" (0));
        /* we never return here */
}

Firstly, it clears segment registers DS, FS and ES to avoid any access to invalid locations and it says that they clear GS register before resume_userspace() since it isn’t saved/restored. It then performs some basic checks on the flags passed by the user and finally it enters a switch statement that sets the appropriate option depending on the CPU type. When finished, it will set AX (return value) to zero and then update ESP with the address of info->regs, EBP with threads’s location and GS with 0. At last, it will jump to resume_userspace(). This function can be found at arch/x86/kernel/entry_32.S.

ENTRY(resume_userspace)
        LOCKDEP_SYS_EXIT
        DISABLE_INTERRUPTS(CLBR_ANY)    # make sure we don't miss an interrupt
                                        # setting need_resched or sigpending
                                        # between sampling and the iret
        TRACE_IRQS_OFF
        movl TI_flags(%ebp), %ecx
        andl $_TIF_WORK_MASK, %ecx      # is there any work to be done on
                                        # int/exception return?
        jne work_pending
        jmp restore_all
END(ret_from_exception)

As you can see, if CONFIG_LOCKDEP is defined, it will invoke lockdep_sys_exit() to check for open locks before exiting. Now, in Linux kernel GS register is normally unused but stack protector (CONFIG_CC_STACKPROTECTOR) uses it to store the stack canary segment. Because of this, when the execution is moved to lockdep_sys_exit() a dereference will occur in GS register that contains an invalid value (zero instead of canary value segment). To fix this the following patch was applied:

@@ -287,10 +287,9 @@  static void do_sys_vm86(struct kernel_vm
 	info->regs.pt.ds = 0;
 	info->regs.pt.es = 0;
 	info->regs.pt.fs = 0;
-
-/* we are clearing gs later just before "jmp resume_userspace",
- * because it is not saved/restored.
- */
+#ifndef CONFIG_X86_32_LAZY_GS
+	info->regs.pt.gs = 0;
+#endif
 
 /*
  * The flags register is also special: we cannot trust that the user
@@ -343,7 +342,9 @@  static void do_sys_vm86(struct kernel_vm
 	__asm__ __volatile__(
 		"movl %0,%%esp\n\t"
 		"movl %1,%%ebp\n\t"
+#ifdef CONFIG_X86_32_LAZY_GS
 		"mov  %2, %%gs\n\t"
+#endif
 		"jmp resume_userspace"
 		: /* no outputs */
 		:"r" (&info->regs), "r" (task_thread_info(tsk)), "r" (0));

Now, if CONFIG_X86_32_LAZY_GS is enabled (meaning no stack protector) then GS is set to zero, otherwise stack protector will handle it as it should.

Written by xorl

July 15, 2009 at 23:07

Posted in bugs, linux

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s