xorl %eax, %eax

Archive for the ‘linux’ Category

Linux kernel 0days without code auditing

leave a comment »

The recent post of grsecurity “The Life of a Bad Security Fix” was the inspiration for this blog post which is somehow common sense among the community, but it’s not explicitly mentioned a lot outside of that.

I remember the first time I discovered a 0day privilege escalation in the Linux kernel many years ago. My process back then was fairly trivial and partially driven by my fear of this monster codebase called the upstream kernel. Here is what I did back then.

  1. Find a relatively simple kernel module that comes with the upstream kernel
  2. Dissect and learn every bit and piece of it
  3. Go through each function that involved user data and look for mistakes
  4. Test my hypothesis
  5. If valid, keep it aside to start working on an exploit for it

Later on, I found that you can only focus on the interesting code paths. The ones involving specific functions, logic, or interaction with user-provided or user-derived data. But all of that involves some sort of code auditing, right? Can we do it without it?

I have many such examples in my blog here going all the way back to 2006. Just like in the grsecurity’s blog post you can just read the ChangeLog or kernel commits. Magic 0days are hidden in there. ;)

ChangeLogs and commits are great and only require you to validate whether this is a vulnerability or a bug. The hard part of the code auditing and debugging has already been done, and in some cases reporters even include PoC code to reproduce the issue which gives you a starting point for your exploit development.

Another approach is going through the discoveries of syzkaller kernel fuzzer. Again, you usually have the trigger code and a pretty complete debugging output. This means, you know there is something interesting going on there, you just have to trace it down, and find a way to exploit it.

Lastly, you can keep an eye on malware sandboxes for Linux samples for potential 0days used in the wild. It is extremely rare that those samples will include Linux kernel 0days but it is also a very low effort task from our side to have some automated monitoring for those like a couple of YARA rules on VirusTotal API.

I hate to say it, but it’s true, nowadays you can build a decent stash of 0days for the Linux kernel with way less, in some cases zero, manual code auditing. I still enjoy going through the Linux kernel but if this is your job, then you might as well take advantage of work that has already been done by others.

Written by xorl

January 28, 2020 at 11:39

Posted in linux

CVE-2018-7273: Linux kernel floppy information leak

leave a comment »

This was a simple information leak vulnerability in the floppy driver that was reported by Brian Belleville to the Linux kernel. Technically, an attacker could trigger show_floppy() calls in the floppy driver to reveal sensitive kernel addresses (such as global variable pointers and function addresses) which can be used to identify the required offsets and consequently, bypass protections like KASLR. You can see drivers/block/floppy.c below.

static void show_floppy(void)
{
    int i;

    pr_info("\n");
    pr_info("floppy driver state\n");
   ...
    if (delayed_work_pending(&fd_timer))
        pr_info("delayed work.function=%p expires=%ld\n",
               fd_timer.work.func,
               fd_timer.timer.expires - jiffies);
    if (delayed_work_pending(&fd_timeout))
        pr_info("timer_function=%p expires=%ld\n",
               fd_timeout.work.func,
               fd_timeout.timer.expires - jiffies);
    pr_info("cont=%p\n", cont);
   ... 
    pr_info("\n");
}

As you can see above, there are a few instances where pr_info() functions are called passing the “%p” format specifier for some function pointers. The issue is that this leaks the address of those function pointers which an attacker can use to calculate the offsets of other, more important pointers, that can later be used to effectively bypass protections like KASLR.

One of the extended format specifiers is the “%fp” which stands for “function pointer” and it ensures that it will not leak a function pointer address which can be used to bypass protections such as KASLR. This is exactly what the patch was to fix this vulnerability.

diff --git a/drivers/block/floppy.c b/drivers/block/floppy.c
index eae484a..e29d417 100644
--- a/drivers/block/floppy.c
+++ b/drivers/block/floppy.c
@@ -1819,15 +1819,14 @@ static void show_floppy(void)
    if (work_pending(&floppy_work))
        pr_info("floppy_work.func=%pf\n", floppy_work.func);
    if (delayed_work_pending(&fd_timer))
-       pr_info("delayed work.function=%p expires=%ld\n",
+       pr_info("delayed work.function=%pf expires=%ld\n",
               fd_timer.work.func,
               fd_timer.timer.expires - jiffies);
    if (delayed_work_pending(&fd_timeout))
-       pr_info("timer_function=%p expires=%ld\n",
+       pr_info("timer_function=%pf expires=%ld\n",
               fd_timeout.work.func,
               fd_timeout.timer.expires - jiffies);

-   pr_info("cont=%p\n", cont);
    pr_info("current_req=%p\n", current_req);
    pr_info("command_status=%d\n", command_status);
    pr_info("\n");

Written by xorl

March 18, 2018 at 16:54

Posted in linux

LKM loading kernel restrictions

with one comment

Loading arbitrary kernel modules dynamically has always been a gray area between usability oriented and security oriented Linux developers & users. In this post I will present what options are available today from the Linux kernel and the most popular kernel hardening patch, the grsecurity. Those will give you some ideas on how those projects deal with the threat of Linux kernel’s LKMs (Loadable Kernel Modules).

Threat
This can be split to two main categories, allowing dynamic LKM loading introduces the following two threats:

  • Malicious LKMs. That’s more or less rootkits or similar malware that an adversary can load for various operations, most commonly to hide specific activities from the user-space.
  • Vulnerable LKM loading. Imagine that you have a 0day exploit on a specific network driver but this is not loaded by default. If you can trigger a dynamic loading then you can use your code to exploit it and compromise the system. This is what this vector is about.

Linux kernel and KSPP
The KSPP (Kernel Self-Protection Project) of the Linux kernel tried to fix this issue with the introduction of the kernel modules access restriction. Below you can see the exact description that Linux kernel’s documentation has for this restriction.

Restricting access to kernel modules

The kernel should never allow an unprivileged user the ability to load specific
kernel modules, since that would provide a facility to unexpectedly extend the
available attack surface. (The on-demand loading of modules via their predefined
subsystems, e.g. MODULE_ALIAS_*, is considered “expected” here, though additional
consideration should be given even to these.) For example, loading a filesystem
module via an unprivileged socket API is nonsense: only the root or physically
local user should trigger filesystem module loading. (And even this can be up
for debate in some scenarios.)

To protect against even privileged users, systems may need to either disable
module loading entirely (e.g. monolithic kernel builds or modules_disabled
sysctl), or provide signed modules (e.g. CONFIG_MODULE_SIG_FORCE, or dm-crypt
with LoadPin), to keep from having root load arbitrary kernel code via the
module loader interface.

The most restrictive way is via modules_disabled sysctl variable which is available by default on the Linux kernel. This can either be set dynamically as you see here.

sysctl -w kernel.modules_disabled=1

Or permanently as part of the runtime kernel configuration as you can see here.

echo 'kernel.modules_disabled=1' >> /etc/sysctl.d/99-custom.conf

In both cases, the result is the same. Basically, the above change its default value from “0” to “1”. You can find the exact definition of this variable in kernel/sysctl.c.

kernel/sysctl.c
    {
        .procname    = "modules_disabled",
        .data        = &modules_disabled,
        .maxlen        = sizeof(int),
        .mode        = 0644,
        /* only handle a transition from default "0" to "1" */
        .proc_handler    = proc_dointvec_minmax,
        .extra1        = &one,
        .extra2        = &one,
    },

If we look into kernel/module.c we will see that if modules_disabled has a non-zero value it is not allowing LKM loading (may_init_module()) or even unloading (delete_module() system call) of any LKM. Below you can see the module initialization code that requires both the SYS_MODULE POSIX capability, and modules_disabled to be zero.

static int may_init_module(void)
{
    if (!capable(CAP_SYS_MODULE) || modules_disabled)
        return -EPERM;

    return 0;
}

Similarly, the delete_module() system call has the exact same check as shown below. In both cases, failure of those will result in a Permission Denied (EPERM) error.

SYSCALL_DEFINE2(delete_module, const char __user *, name_user,
        unsigned int, flags)
{
    struct module *mod;
    char name[MODULE_NAME_LEN];
    int ret, forced = 0;

    if (!capable(CAP_SYS_MODULE) || modules_disabled)
        return -EPERM;

Looking in kernel/kmod.c we can also see another check, before the kernel module loading request is passed to call_modprobe() to get loaded in the kernel, the __request_module() function verifies that modprobe_path is set, meaning the LKM is not loaded via an API or socket instead of /sbin/modprobe command.

int __request_module(bool wait, const char *fmt, ...)
{
    va_list args;
    char module_name[MODULE_NAME_LEN];
    int ret;

    /*
     * We don't allow synchronous module loading from async.  Module
     * init may invoke async_synchronize_full() which will end up
     * waiting for this task which already is waiting for the module
     * loading to complete, leading to a deadlock.
     */
    WARN_ON_ONCE(wait && current_is_async());

    if (!modprobe_path[0])
        return 0;

The above were the features that Linux kernel had for years to protect against this threat. The downside though is that completely disabling loading and unloading of LKMs can break some legitimate operations such as system upgrades, reboots on systems that load modules after boot, automation configuring software RAID devices after boot, etc.

To deal with the above, on 22 May 2017 the KSPP team proposed a patch to __request_module() (still to be added to the kernel) which follows a different approach.

int __request_module(bool wait, int allow_cap, const char *fmt, ...)
{
    ...

    if (!modprobe_path[0])
        return 0;

    ...

    ret = security_kernel_module_request(module_name, allow_cap);
    if (ret)

What you see here is that in the very early stage of the kernel module loading security_kernel_module_request() is invoked with the module to be loaded as well as allow_cap variable which can be set to either “0” or “1”. If its value is positive, the security subsystem will trust the caller to load modules with specific predifned (hardcoded) aliases. This should allow auto-loading of specific aliases. This was done to close a design flaw of the Linux kernel where although all modules required the CAP_SYS_MODULE capability to load modules (which is already checked as shown earlier), the network modules required the CAP_NET_ADMIN capability which completely bypassed the previously described controls. Using this modified __request_module() it is ensured that only specific modules that are allowed by the security subsystem will be able to auto-load. However, it is also crucial to note that to this date, the only security subsystem that utilizes security_kernel_module_request() hook is the SELinux.

Before we move on with grsecurity, it is important to note that in 07 November 2010 Dan Rosenberg proposed a replacement of modules_disabled, the modules_restrict which was a copy of grsecurity’s logic. It had three values, 0 (disabled), 1 (only root can load/unload LKMs), 2 (no one can load/unload – same as modules_disabled). You can see the check that it was adding to __request_module() below.

	/* Can non-root users cause auto-loading of modules? */
	if (current_uid() && modules_restrict)
		return -EPERM;

However, this was never added to the upstream kernel so there is no need to dive more into the details behind it. Just as an overview, here is the proposed kernel configuration option documentation for modules_restrict.

modules_restrict:
 
A toggle value indicating if modules are allowed to be loaded
in an otherwise modular kernel.  This toggle defaults to off
(0), but can be set true (1).  Once true, modules can be
neither loaded nor unloaded, and the toggle cannot be set back
to false.
A value indicating if module loading is restricted in an 
otherwise modular kernel.  This value defaults to off (0), 
but can be set to (1) or (2).  If set to (1), modules cannot 
be auto-loaded by non-root users, for example by creating a 
socket using a packet family that is compiled as a module and 
not already loaded.  If set to (2), modules can neither be 
loaded nor unloaded, and the value can no longer be changed.

grsecurity
Unfortunately, grsecurity stable patches are no longer publicly available. For this reason, in this article I will be using the grsecurity patch for kernel releases 3.1 to 4.9.24. For the LKM loading hardening grsecurity offers a kernel configuration option known as MODHARDEN (Harden Module Auto-loading). If we go back to ___request_module() in kernel/kmod.c we will see how this feature works.

    ret = security_kernel_module_request(module_name);
    if (ret)
        return ret;

#ifdef CONFIG_GRKERNSEC_MODHARDEN
    if (uid_eq(current_uid(), GLOBAL_ROOT_UID)) {
        /* hack to workaround consolekit/udisks stupidity */
        read_lock(&tasklist_lock);
        if (!strcmp(current->comm, "mount") &&
            current->real_parent && !strncmp(current->real_parent->comm, "udisk", 5)) {
            read_unlock(&tasklist_lock);
            printk(KERN_ALERT "grsec: denied attempt to auto-load fs module %.64s by udisks\n", module_name);
            return -EPERM;
        }
        read_unlock(&tasklist_lock);
    }
#endif

The check in this case is relatively simple, it verifies that the caller’s UID is the same as the static global UID of root user. This ensure that only users with UID=0 can load kernel modules which completely eliminates the cases of unprivileged users exploiting flaws that are allowing them to request kernel module loading. To overcome the network kernel modules issue grsecurity followed a different approach which maintains the capability check (which is currently used by a very limited amount of security subsystems) but redirects all loading to the ___request_module() function to ensure that only root can load them.

void dev_load(struct net *net, const char *name)
{
    ...

    no_module = !dev;
    if (no_module && capable(CAP_NET_ADMIN))
        no_module = request_module("netdev-%s", name);
    if (no_module && capable(CAP_SYS_MODULE)) {
#ifdef CONFIG_GRKERNSEC_MODHARDEN
        ___request_module(true, "grsec_modharden_netdev", "%s", name);
#else
        request_module("%s", name);
#endif
}

Furthermore, grsecurity identified that a similar security design flaw also exists in the filesystem modules loading (still to be identified and fixed in the upstream kernel), which was fixed in a similar manner. Below is the grsecurity version of fs/filesystems.c’s get_fs_type() function which is ensuring that filesystem modules are loaded only by root user.

    fs = __get_fs_type(name, len);
#ifdef CONFIG_GRKERNSEC_MODHARDEN
    if (!fs && (___request_module(true, "grsec_modharden_fs", "fs-%.*s", len, name) == 0))
#else
    if (!fs && (request_module("fs-%.*s", len, name) == 0))
#endif
fs = __get_fs_type(name, len);

This Linux kernel design flaw allows loading of non-filesystem kernel modules via mount. How grsecurity detects those is quite clever and can be found in simplify_symbols() function of kernel/module.c. What it does is ensuring that the the arguments of the module are copied to the kernel side, and then checks the module’s loading information in the symbol table to ensure that the loaded module is trying to register a filesystem instead of any arbitrary kernel module.

#ifdef CONFIG_GRKERNSEC_MODHARDEN
    int is_fs_load = 0;
    int register_filesystem_found = 0;
    char *p;

    p = strstr(mod->args, "grsec_modharden_fs");
    if (p) {
        char *endptr = p + sizeof("grsec_modharden_fs") - 1;
        /* copy \0 as well */
        memmove(p, endptr, strlen(mod->args) - (unsigned int)(endptr - mod->args) + 1);
        is_fs_load = 1;
    }
#endif

    for (i = 1; i < symsec->sh_size / sizeof(Elf_Sym); i++) {
        const char *name = info->strtab + sym[i].st_name;

#ifdef CONFIG_GRKERNSEC_MODHARDEN
        /* it's a real shame this will never get ripped and copied
           upstream! ;(
        */
        if (is_fs_load && !strcmp(name, "register_filesystem"))
            register_filesystem_found = 1;
#endif

switch (sym[i].st_shndx) {

To help in detection of malicious users trying to exploit this Linux kernel design flaw, grsecurity has also an alerting mechanism in place which immediately logs any attempts to load Linux kernel modules that are not filesystems using this design flaw. Meaning loading a kernel module via “mount” without that being an actual filesystem module.

#ifdef CONFIG_GRKERNSEC_MODHARDEN
    if (is_fs_load && !register_filesystem_found) {
        printk(KERN_ALERT "grsec: Denied attempt to load non-fs module %.64s through mount\n", mod->name);
        ret = -EPERM;
    }
#endif

    return ret;
}

Security wise grsecurity’s approach is by far the most complete but unfortunately, by default the Linux kernel doesn’t have anything even close to this.

Written by xorl

February 17, 2018 at 15:07

Posted in linux

CVE-2013-3228: Linux kernel IrDA Information Leak

with 3 comments

This is another simple kernel memory information leak fixed by Mathias Krauss. Here is the exact code where this bug is located in net/irda/af_irda.c code.

/*
 * Function irda_recvmsg_dgram (iocb, sock, msg, size, flags)
 *
 *    Try to receive message and copy it to user. The frame is discarded
 *    after being read, regardless of how much the user actually read
 */
static int irda_recvmsg_dgram(struct kiocb *iocb, struct socket *sock,
			      struct msghdr *msg, size_t size, int flags)
{
	struct sock *sk = sock->sk;
	struct irda_sock *self = irda_sk(sk);
	struct sk_buff *skb;
	size_t copied;
	int err;

	IRDA_DEBUG(4, "%s()\n", __func__);

	skb = skb_recv_datagram(sk, flags & ~MSG_DONTWAIT,
				flags & MSG_DONTWAIT, &err);
   ...
	skb_copy_datagram_iovec(skb, 0, msg->msg_iov, copied);

	skb_free_datagram(sk, skb);
   ...
	return copied;
}

This is a command which is defined as shown below.

static const struct proto_ops irda_seqpacket_ops = {
  ...
	.recvmsg =	irda_recvmsg_dgram,
  ...
};

static const struct proto_ops irda_dgram_ops = {
  ...
	.recvmsg =	irda_recvmsg_dgram,
  ...
};

#ifdef CONFIG_IRDA_ULTRA
static const struct proto_ops irda_ultra_ops = {
  ...
	.recvmsg =	irda_recvmsg_dgram,
  ...
};
#endif /* CONFIG_IRDA_ULTRA */

As Mathias Krausse pointed out, the ‘msg_namelen’ member of the ‘msghdr’ structure remains uninitialized resulting in kernel information leak. Below is how this structure is defined in include/linux/socket.h header file.

/*
 *      As we do 4.4BSD message passing we use a 4.4BSD message passing
 *      system, not 4.3. Thus msg_accrights(len) are now missing. They
 *      belong in an obscure libc emulation or the bin.
 */
  
struct msghdr {
        void    *       msg_name;       /* Socket name                  */
        int             msg_namelen;    /* Length of name               */
        struct iovec *  msg_iov;        /* Data blocks                  */
        __kernel_size_t msg_iovlen;     /* Number of blocks             */
        void    *       msg_control;    /* Per protocol magic (eg BSD file descriptor passing) */
       __kernel_size_t msg_controllen; /* Length of cmsg list */
       unsigned int    msg_flags;
};

And the fix was to add the missing initialization.

 	IRDA_DEBUG(4, "%s()\n", __func__);
 
+	msg->msg_namelen = 0;
+
 	skb = skb_recv_datagram(sk, flags & ~MSG_DONTWAIT,
 				flags & MSG_DONTWAIT, &err);

Written by xorl

May 26, 2013 at 14:18

Posted in linux, vulnerabilities

CVE-2013-1798: Linux kernel KVM IOAPIC_REG_SELECT Invalid Memory Access

leave a comment »

This was very nice vulnerability reported by Andrew Honig of Google. The bug is triggered when a user specifies an invalid IOAPIC_REG_SELECT value which is reachable via read KVM I/O device operation as you can see below.

static int ioapic_mmio_read(struct kvm_io_device *this, gpa_t addr, int len,
			    void *val)
{
	struct kvm_ioapic *ioapic = to_ioapic(this);
	u32 result;
   ...
	switch (addr) {
	case IOAPIC_REG_SELECT:
		result = ioapic->ioregsel;
		break;

	case IOAPIC_REG_WINDOW:
		result = ioapic_read_indirect(ioapic, addr, len);
		break;
   ...
	return 0;
}
   ...
static const struct kvm_io_device_ops ioapic_mmio_ops = {
	.read     = ioapic_mmio_read,
	.write    = ioapic_mmio_write,
};

Additionally, if a user makes a read by invoking IOAPIC_REG_WINDOW it will result in calling ioapic_read_indirect(). Here is what this function does.

static unsigned long ioapic_read_indirect(struct kvm_ioapic *ioapic,
					  unsigned long addr,
					  unsigned long length)
{
	unsigned long result = 0;

	switch (ioapic->ioregsel) {
  ...
	default:
		{
			u32 redir_index = (ioapic->ioregsel - 0x10) >> 1;
			u64 redir_content;

			ASSERT(redir_index < IOAPIC_NUM_PINS);

			redir_content = ioapic->redirtbl[redir_index].bits;
			result = (ioapic->ioregsel & 0x1) ?
			    (redir_content >> 32) & 0xffffffff :
			    redir_content & 0xffffffff;
			break;
		}
	}

	return result;
}

It calculates and initializes the value of ‘redir_index’ from the user controlled ‘ioapic->ioregsel’ variable and then uses it as an index to ‘ioapic->redirtbl[]’ array. If this value is larger than IOAPIC_NUM_PINS it will result in invalid memory access. Here is how IOAPIC_NUM_PINS is defined in virt/kvm/ioapic.h header file.

#define IOAPIC_NUM_PINS  KVM_IOAPIC_NUM_PINS

And this is because it is architecture specific. For IA64 is defined in include/uapi/asm/kvm.h as 48 and for x86 in arch/x86/include/uapi/asm/kvm.h as 24. As you might have noticed there is an ASSERT() call to make this check but of course, this will only take effect in the debug builds.
The fix was to replace that ASSERT() call with a range check like this.

 			u32 redir_index = (ioapic->ioregsel - 0x10) >> 1;
 			u64 redir_content;
-			ASSERT(redir_index < IOAPIC_NUM_PINS);
+			if (redir_index < IOAPIC_NUM_PINS)
+				redir_content =
+					ioapic->redirtbl[redir_index].bits;
+			else
+				redir_content = ~0ULL;
-			redir_content = ioapic->redirtbl[redir_index].bits;
 			result = (ioapic->ioregsel & 0x1) ?
 			    (redir_content >> 32) & 0xffffffff :

Written by xorl

May 23, 2013 at 22:44

Posted in linux, vulnerabilities