Archive for the ‘linux’ Category
CVE-2013-1798: Linux kernel KVM IOAPIC_REG_SELECT Invalid Memory Access
This was very nice vulnerability reported by Andrew Honig of Google. The bug is triggered when a user specifies an invalid IOAPIC_REG_SELECT value which is reachable via read KVM I/O device operation as you can see below.
static int ioapic_mmio_read(struct kvm_io_device *this, gpa_t addr, int len,
void *val)
{
struct kvm_ioapic *ioapic = to_ioapic(this);
u32 result;
...
switch (addr) {
case IOAPIC_REG_SELECT:
result = ioapic->ioregsel;
break;
case IOAPIC_REG_WINDOW:
result = ioapic_read_indirect(ioapic, addr, len);
break;
...
return 0;
}
...
static const struct kvm_io_device_ops ioapic_mmio_ops = {
.read = ioapic_mmio_read,
.write = ioapic_mmio_write,
};
Additionally, if a user makes a read by invoking IOAPIC_REG_WINDOW it will result in calling ioapic_read_indirect(). Here is what this function does.
static unsigned long ioapic_read_indirect(struct kvm_ioapic *ioapic,
unsigned long addr,
unsigned long length)
{
unsigned long result = 0;
switch (ioapic->ioregsel) {
...
default:
{
u32 redir_index = (ioapic->ioregsel - 0x10) >> 1;
u64 redir_content;
ASSERT(redir_index < IOAPIC_NUM_PINS);
redir_content = ioapic->redirtbl[redir_index].bits;
result = (ioapic->ioregsel & 0x1) ?
(redir_content >> 32) & 0xffffffff :
redir_content & 0xffffffff;
break;
}
}
return result;
}
It calculates and initializes the value of ‘redir_index’ from the user controlled ‘ioapic->ioregsel’ variable and then uses it as an index to ‘ioapic->redirtbl[]‘ array. If this value is larger than IOAPIC_NUM_PINS it will result in invalid memory access. Here is how IOAPIC_NUM_PINS is defined in virt/kvm/ioapic.h header file.
#define IOAPIC_NUM_PINS KVM_IOAPIC_NUM_PINS
And this is because it is architecture specific. For IA64 is defined in include/uapi/asm/kvm.h as 48 and for x86 in arch/x86/include/uapi/asm/kvm.h as 24. As you might have noticed there is an ASSERT() call to make this check but of course, this will only take effect in the debug builds.
The fix was to replace that ASSERT() call with a range check like this.
u32 redir_index = (ioapic->ioregsel - 0x10) >> 1; u64 redir_content; - ASSERT(redir_index < IOAPIC_NUM_PINS); + if (redir_index < IOAPIC_NUM_PINS) + redir_content = + ioapic->redirtbl[redir_index].bits; + else + redir_content = ~0ULL; - redir_content = ioapic->redirtbl[redir_index].bits; result = (ioapic->ioregsel & 0x1) ? (redir_content >> 32) & 0xffffffff :
CVE-2013-1796: Linux kernel KVM MSR_KVM_SYSTEM_TIME Buffer Overflow
This is a really nice vulnerability killed by Andy Honig. It is particularly interesting because it allows host kernel memory corruption through guest GPA (Guest Physical Address) manipulation. If we have a look in arch/x86/kvm/x86.c we can see the following code.
int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
{
bool pr = false;
u32 msr = msr_info->index;
u64 data = msr_info->data;
switch (msr) {
...
case MSR_KVM_SYSTEM_TIME: {
kvmclock_reset(vcpu);
vcpu->arch.time = data;
kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu);
/* we verify if the enable bit is set... */
if (!(data & 1))
break;
/* ...but clean it before doing the actual write */
vcpu->arch.time_offset = data & ~(PAGE_MASK | 1);
vcpu->arch.time_page =
gfn_to_page(vcpu->kvm, data >> PAGE_SHIFT);
if (is_error_page(vcpu->arch.time_page))
vcpu->arch.time_page = NULL;
break;
}
...
return 0;
}
EXPORT_SYMBOL_GPL(kvm_set_msr_common);
So by utilizing the ‘MSR_KVM_SYSTEM_TIME’ kvmclock MSR a user can set ‘vcpu->arch.time_page’ through gfn_to_page() call that uses the user derived ‘data’ information. As Andy Honig mentioned in his commit, the arbitrary write occurs when kmap atomic attempts to obtain a pointer to the time structure page and performing a memcpy() to it starting at the user controlled offset. The fix was to add a check that verifies that the provided value does not exceed the structure’s boundaries.
/* ...but clean it before doing the actual write */ vcpu->arch.time_offset = data & ~(PAGE_MASK | 1); + /* Check that the address is 32-byte aligned. */ + if (vcpu->arch.time_offset & + (sizeof(struct pvclock_vcpu_time_info) - 1)) + break; + vcpu->arch.time_page = gfn_to_page(vcpu->kvm, data >> PAGE_SHIFT);
CVE-2013-1848: Linux kernel EXT3 ext3_msg() Format String
Recently Lars-Peter Clausen committed a change on Linux kernel that fixes a format string vulnerability in the EXT3 filesystem code. The susceptible code resides in fs/ext3/super.c but to better understand it we need to have a look on how ext3_msg() is defined first.
void ext3_msg(struct super_block *sb, const char *prefix,
const char *fmt, ...)
{
struct va_format vaf;
va_list args;
va_start(args, fmt);
vaf.fmt = fmt;
vaf.va = &args;
printk("%sEXT3-fs (%s): %pV\n", prefix, sb->s_id, &vaf);
va_end(args);
}
So, it should be called passing the following three mandatory arguments:
- Pointer to the super-block structure
- Prefix string
- Format string
And of course, any variables to be printed. As Lars-Peter Clausen noticed, there were two cases where there was no prefix defined. This makes the format string argument to be passed as prefix and any variables to be processed as the format string. Here are these two cases:
/*
* Open the external journal device
*/
static struct block_device *ext3_blkdev_get(dev_t dev, struct super_block *sb)
{
...
fail:
ext3_msg(sb, "error: failed to open journal device %s: %ld",
__bdevname(dev, b), PTR_ERR(bdev));
return NULL;
}
And…
static ext3_fsblk_t get_sb_block(void **data, struct super_block *sb)
{
ext3_fsblk_t sb_block;
...
if (*options && *options != ',') {
ext3_msg(sb, "error: invalid sb specification: %s",
(char *) *data);
...
return sb_block;
}
The fix was to add the missing prefix argument to the function call like this.
@@ -353,7 +353,7 @@ static struct block_device *ext3_blkdev_get(dev_t dev, struct super_block *sb)
return bdev;
fail:
- ext3_msg(sb, "error: failed to open journal device %s: %ld",
+ ext3_msg(sb, KERN_ERR, "error: failed to open journal device %s: %ld",
__bdevname(dev, b), PTR_ERR(bdev));
return NULL;
@@ -887,7 +887,7 @@ static ext3_fsblk_t get_sb_block(void **data, struct super_block *sb)
/*todo: use simple_strtoll with >32bit ext3 */
sb_block = simple_strtoul(options, &options, 0);
if (*options && *options != ',') {
- ext3_msg(sb, "error: invalid sb specification: %s",
+ ext3_msg(sb, KERN_ERR, "error: invalid sb specification: %s",
(char *) *data);
return 1;
}
CVE-2013-1774: Linux kernel Edgeport USB Serial Converter NULL Pointer Dereference
This is a vulnerability fixed by Wolfgang Frisch and the buggy code resides in drivers/usb/serial/io_ti.c as you can see below.
static void chase_port(struct edgeport_port *port, unsigned long timeout,
int flush)
{
int baud_rate;
struct tty_struct *tty = tty_port_tty_get(&port->port->port);
struct usb_serial *serial = port->port->serial;
wait_queue_t wait;
unsigned long flags;
...
remove_wait_queue(&tty->write_wait, &wait);
...
tty_kref_put(tty);
...
}
If the equivalent /dev/ttyUSB device file is in use while the device is disconnected then any call to chase_port() (used to chase the port, close and flush it) will lead to NULL pointer dereference since there is no longer a ‘tty’ associated with it. The fix was to add a simple check for this case.
unsigned long flags; + if (!tty) + return; + if (!timeout)
CVE-2013-1819: Linux kernel XFS _xfs_buf_find() NULL Pointer Dereference
On 21 January 2013 Dave Chinner of Red Hat committed a change that fixes a NULL pointer dereference vulnerability in XFS filesystem. The below routine is located in fs/xfs/xfs_buf.c file.
/*
* Finding and Reading Buffers
*/
/*
* Look up, and creates if absent, a lockable buffer for
* a given range of an inode. The buffer is returned
* locked. No I/O is implied by this call.
*/
xfs_buf_t *
_xfs_buf_find(
struct xfs_buftarg *btp,
struct xfs_buf_map *map,
int nmaps,
xfs_buf_flags_t flags,
xfs_buf_t *new_bp)
{
size_t numbytes;
struct xfs_perag *pag;
...
/* get tree root */
pag = xfs_perag_get(btp->bt_mount,
xfs_daddr_to_agno(btp->bt_mount, blkno));
/* walk tree */
...
return bp;
}
First of all, the xfs_addr_to_agno() C macro is the following as defined in fs/xfs/xfs_mount.h header file.
#define xfs_daddr_to_agno(mp,d) \
((xfs_agnumber_t)(XFS_BB_TO_FSBT(mp, d) / (mp)->m_sb.sb_agblocks))
As Dave Chinner pointed out, if we try to walk a filesystem and the extent map has corrupted block number (out of range address) the call to xfs_perag_get() above will trigger a NULL pointer dereference.
/*
* Reference counting access wrappers to the perag structures.
* Because we never free per-ag structures, the only thing we
* have to protect against changes is the tree structure itself.
*/
struct xfs_perag *
xfs_perag_get(struct xfs_mount *mp, xfs_agnumber_t agno)
{
struct xfs_perag *pag;
int ref = 0;
rcu_read_lock();
pag = radix_tree_lookup(&mp->m_perag_tree, agno);
if (pag) {
ASSERT(atomic_read(&pag->pag_ref) >= 0);
ref = atomic_inc_return(&pag->pag_ref);
}
rcu_read_unlock();
trace_xfs_perag_get(mp, agno, ref, _RET_IP_);
return pag;
}
The radix_tree_lookup() call will use the invalid block number ‘agblocks’ (size of an allocation group) as an index key to the ‘mp->m_perag_tree’ radix tree.
The fix to this bug was to add a new variable to the susceptible routine:
xfs_buf_t *bp; xfs_daddr_t blkno = map[0].bm_bn; + xfs_daddr_t eofs; int numblks = 0;
And write a check for the block number not being larger than the end of the filesystem.
ASSERT(!(BBTOB(blkno) & (xfs_off_t)btp->bt_smask));
+ /*
+ * Corrupted block numbers can get through to here, unfortunately, so we
+ * have to check that the buffer falls within the filesystem bounds.
+ */
+ eofs = XFS_FSB_TO_BB(btp->bt_mount, btp->bt_mount->m_sb.sb_dblocks);
+ if (blkno >= eofs) {
+ /*
+ * XXX (dgc): we should really be returning EFSCORRUPTED here,
+ * but none of the higher level infrastructure supports
+ * returning a specific error on buffer lookup failures.
+ */
+ xfs_alert(btp->bt_mount,
+ "%s: Block out of range: block 0x%llx, EOFS 0x%llx ",
+ __func__, blkno, eofs);
+ return NULL;
+ }
+
/* get tree root */
