Sunday, April 7, 2013

strcpy(): The safety of an unsafe string copy function

Last time I wrote about the strcpy() function and said that it's unsafe. But why exactly is it unsafe? Let us see the details of what is going on under the hood when strcpy() is called. To do so, we will dive down to the machine level and have a look at what is happening in the stack memory.

In the C language, strings have no explicit length. Instead, the length is determined by a terminating NUL character. Therefore, strcpy() copies bytes until it sees a zero byte:
 void strcpy(char *dst, char *src) {
     /* copy src to dst until src[0] == 0 */
     while(*src)
         *dst++ = *src++;
 }
Consider the following function, which includes the common programming mistake of assuming that the input will fit into the buffer:
 void func(char *input) {
    char buf[256];

    strcpy(buf, input);
 }
Now let's examine what happens at the machine level when this baby executes. When a subroutine is called, the CPU pushes the address of the next instruction onto the stack. This address is the return address. To return from a subroutine, the return address is popped off the stack and loaded as the current instruction pointer. Thus the program jumps back and resumes execution at the point right after the subroutine was called. Using a stack allows for nested subroutine calls.
Local variables and function parameters are placed on the stack as well. This is great because now when a subroutine ends, the local variables go out of scope as the stack frame is ‘cleaned up’ (in fact, the data is still there, but the stack pointer is moved).
This amounts to the following picture of a stack frame when we are executing the func:
 stack pointer ->
 +------------------------------+
 | local var:  char buf[256]    |
 |------------------------------|
 |        return address        |
 |------------------------------|
 | parameter:  char *input      |
 |------------------------------|
 | local vars       of caller   |
 |------------------------------|
 | return address   of caller   |
 |------------------------------|
 | parameters       of caller   |
 |------------------------------|
 | ...                          |
 end of stack
You can see that an attacker can overwrite the return address when he supplies an input string that is longer than buf. Not only can he overwrite the return address, he can insert a specially crafted mini-program into buf. What many exploits do is put the return address as the address of buf so that the program will jump back to execute the payload in buf.

To prove that this actually works, we can do a little experiment. Write a small program that fills a buffer with garbage and overwrites the return address as described above. Guess what, it doesn't work! The program gets killed by a run-time check, cleverly inserted by the compiler. Here is a postmortem stack trace:
(gdb) bt
#0  0x00007fff9a45cd46 in __kill ()
#1  0x00007fff9ad98ec0 in __abort ()
#2  0x00007fff9ad5a77d in __chk_fail ()
#3  0x00007fff9ad5aa4f in __strcpy_chk ()
#4  0x0000000100000de4 in func (input=0x7fff5fbff7e0 'x' , "\030??_") at hijack.c:19
#5  0x0000000100000e5e in main () at hijack.c:29
This is in OS X using clang. Googling turns up __memcpy_chk for gcc:

“GCC implements a limited buffer overflow protection mechanism that can prevent some buffer overflow attacks.”

void *
__memcpy_chk (void *__restrict__ dest,
              const void *__restrict__ src,
              size_t len, size_t slen)
{
    if (len > slen)
        __chk_fail ();
    return memcpy (dest, src, len);
}
As you can see, the compiler inserts a run-time check for the size of the buffer. On top of this, a second run-time check is made that checks the integrity of the stack. This technique is known as inserting a ‘stack canary’ and can be observed by studying a disassembly of our func below. Here you can see that 288 bytes (or in hexadecimal notation, 0x120) of space is taken from the stack for local variables. This is more than the 256 we actually requested. The remaining 32 bytes are used for the stack canary.
Then it calls strcpy_chk() rather than strcpy(). Finally, the stack canary is checked, and may result in stack_chk_fail() being called. Otherwise, the stack frame is cleaned up and the function returns normally.
0x0000000100000d90  push   %rbp
0x0000000100000d91  mov    %rsp,%rbp
0x0000000100000d94  sub    $0x120,%rsp
0x0000000100000d9b  mov    0x26e(%rip),%rax
0x0000000100000da2  mov    (%rax),%rax
0x0000000100000da5  mov    %rax,-0x8(%rbp)
0x0000000100000da9  mov    $0x100,%rdx
0x0000000100000db3  lea    -0x110(%rbp),%rax
0x0000000100000dba  mov    %rdi,-0x10(%rbp)
0x0000000100000dbe  mov    -0x10(%rbp),%rsi
0x0000000100000dc2  mov    %rax,%rdi
0x0000000100000dc5  callq  0x100000e8c <__strcpy_chk>
0x0000000100000dca  mov    0x23f(%rip),%rdx
0x0000000100000dd1  mov    (%rdx),%rdx
0x0000000100000dd4  mov    -0x8(%rbp),%rsi
0x0000000100000dd8  cmp    %rsi,%rdx
0x0000000100000ddb  mov    %rax,-0x118(%rbp)
0x0000000100000de2  jne    0x100000df1 <func+97>
0x0000000100000de8  add    $0x120,%rsp
0x0000000100000def  pop    %rbp
0x0000000100000df0  retq  
0x0000000100000df1  callq  0x100000e86 <__stack_chk_fail>
So, the compiler does a lot of work for us in order to prevent simple buffer overflows. The canary is initialized with a random value before main() runs, so it practically can not be defeated. But beware, an attacker may still influence the program behavior in different ways and deliberately not touch the canary.

Other techniques that help prevent buffer overflow attacks are ASLR (address space layout randomization) and DEP (data execution prevention) or NX pages (non-executable memory pages). Although they make it more difficult, these too can be circumvented by trickery.

Mind ye that all of this is plain impossible if only you (or the high-level language itself!) always properly check the size of the buffer and array bounds. It is something that C normally does not do for you, so be mindful that the compiler will not always be able to save you from writing insecure code.