Last time I wrote about the strcpy() function and said that it's unsafe. But why exactly is it unsafe? Let us see the details of what is going on under the hood when strcpy() is called. To do so, we will dive down to the machine level and have a look at what is happening in the stack memory.
In the C language, strings have no explicit length. Instead, the length is determined by a terminating NUL character. Therefore, strcpy() copies bytes until it sees a zero byte:
void strcpy(char *dst, char *src) {
/* copy src to dst until src[0] == 0 */
while(*src)
*dst++ = *src++;
}
Consider the following function, which includes the common programming mistake of assuming that the input will fit into the buffer:
void func(char *input) {
char buf[256];
strcpy(buf, input);
}
Now let's examine what happens at the machine level when this baby executes. When a subroutine is called, the CPU pushes the address of the next instruction onto the stack. This address is the return address. To return from a subroutine, the return address is popped off the stack and loaded as the current instruction pointer. Thus the program jumps back and resumes execution at the point right after the subroutine was called. Using a stack allows for nested subroutine calls.
Local variables and function parameters are placed on the stack as well. This is great because now when a subroutine ends, the local variables go out of scope as the stack frame is ‘cleaned up’ (in fact, the data is still there, but the stack pointer is moved).
This amounts to the following picture of a stack frame when we are executing the func:
stack pointer ->
+------------------------------+
| local var: char buf[256] |
|------------------------------|
| return address |
|------------------------------|
| parameter: char *input |
|------------------------------|
| local vars of caller |
|------------------------------|
| return address of caller |
|------------------------------|
| parameters of caller |
|------------------------------|
| ... |
end of stack
You can see that an attacker can overwrite the return address when he supplies an input string that is longer than buf. Not only can he overwrite the return address, he can insert a specially crafted mini-program into buf. What many exploits do is put the return address as the address of buf so that the program will jump back to execute the payload in buf.
To prove that this actually works, we can do a little experiment. Write a small program that fills a buffer with garbage and overwrites the return address as described above. Guess what, it doesn't work! The program gets killed by a run-time check, cleverly inserted by the compiler. Here is a postmortem stack trace:
(gdb) bt
#0 0x00007fff9a45cd46 in __kill ()
#1 0x00007fff9ad98ec0 in __abort ()
#2 0x00007fff9ad5a77d in __chk_fail ()
#3 0x00007fff9ad5aa4f in __strcpy_chk ()
#4 0x0000000100000de4 in func (input=0x7fff5fbff7e0 'x', "\030??_") at hijack.c:19
#5 0x0000000100000e5e in main () at hijack.c:29
This is in OS X using clang. Googling turns up __memcpy_chk for gcc:
“GCC implements a limited buffer overflow protection mechanism that can prevent some buffer overflow attacks.”
void *
__memcpy_chk (void *__restrict__ dest,
const void *__restrict__ src,
size_t len, size_t slen)
{
if (len > slen)
__chk_fail ();
return memcpy (dest, src, len);
}
As you can see, the compiler inserts a run-time check for the size of the buffer. On top of this, a second run-time check is made that checks the integrity of the stack. This technique is known as inserting a ‘stack canary’ and can be observed by studying a disassembly of our func below. Here you can see that 288 bytes (or in hexadecimal notation, 0x120) of space is taken from the stack for local variables. This is more than the 256 we actually requested. The remaining 32 bytes are used for the stack canary.
Then it calls strcpy_chk() rather than strcpy(). Finally, the stack canary is checked, and may result in stack_chk_fail() being called. Otherwise, the stack frame is cleaned up and the function returns normally.
0x0000000100000d90 push %rbp
0x0000000100000d91 mov %rsp,%rbp
0x0000000100000d94 sub $0x120,%rsp
0x0000000100000d9b mov 0x26e(%rip),%rax
0x0000000100000da2 mov (%rax),%rax
0x0000000100000da5 mov %rax,-0x8(%rbp)
0x0000000100000da9 mov $0x100,%rdx
0x0000000100000db3 lea -0x110(%rbp),%rax
0x0000000100000dba mov %rdi,-0x10(%rbp)
0x0000000100000dbe mov -0x10(%rbp),%rsi
0x0000000100000dc2 mov %rax,%rdi
0x0000000100000dc5 callq 0x100000e8c <__strcpy_chk>
0x0000000100000dca mov 0x23f(%rip),%rdx
0x0000000100000dd1 mov (%rdx),%rdx
0x0000000100000dd4 mov -0x8(%rbp),%rsi
0x0000000100000dd8 cmp %rsi,%rdx
0x0000000100000ddb mov %rax,-0x118(%rbp)
0x0000000100000de2 jne 0x100000df1 <func+97>
0x0000000100000de8 add $0x120,%rsp
0x0000000100000def pop %rbp
0x0000000100000df0 retq
0x0000000100000df1 callq 0x100000e86 <__stack_chk_fail>
So, the compiler does a lot of work for us in order to prevent simple buffer overflows. The canary is initialized with a random value before main() runs, so it practically can not be defeated. But beware, an attacker may still influence the program behavior in different ways and deliberately not touch the canary.
Other techniques that help prevent buffer overflow attacks are ASLR (address space layout randomization) and DEP (data execution prevention) or NX pages (non-executable memory pages). Although they make it more difficult, these too can be circumvented by trickery.
Mind ye that all of this is plain impossible if only you (or the high-level language itself!) always properly check the size of the buffer and array bounds. It is something that C normally does not do for you, so be mindful that the compiler will not always be able to save you from writing insecure code.