Practical ARM64 (Subroutines)

8 min readAug 26, 2022


Calling subroutines in higher level programming languages is trivial, the developer has simply to reference the name of a subroutine, give some arguments (if any) and handle the result. Doing the same in assembly language can be sometimes overwhelming as the developer has to take care a lot of details and comply with the calling conventions of each processor family.

A calling convention defines how arguments are passed to subroutines and how the results are returned. These “rules” are not enforced by hardware, but they must be followed during the development process in order for the product to be available to other programmers.

When it comes to AArch64 the rules of calling a subroutine are the following:

  • Up to eight parameters are stored in registers x0-x7:
  • Any additional parameter must be passed in the stack in reverse order
  • The subroutine’s result (if there is any), should be stored in the x0 register

Marshalling: is the process of placing arguments to the corresponding location

1st argument → x0, 2nd argument →x1, …, 8th argument →x7

Additionally there are volatile (caller saved) and non-volatile (callee saved) registers. Simply said, when you store a data in to a volatile register don’t assume that this information will survive a subroutine call. Contrariwise, a subroutine must save the contents of a non-volatile register before usage and restore them afterwards. In respect to AArch64 we have the following conventions:

  • x0-x7 are volatile while X0 is used to store the result of a subroutine
  • x8-x18 are also volatile, while during a system call, X8 stores the (linux) system call number
  • x19-x28 are non-volatile
  • x29, x30, sp correspond to the Frame Pointer (FP), Link Register (LR) and Stack Pointer.
Volatile and non volatile registers

Calling a subroutine

Let’s first see the steps that we should take when calling a subroutine.

Arguments to registers

Let’s start with a simple case where we have only up to eight arguments that we have to take care of. In the example below, we are calling the printf function passing the format string to x0 (line 10), and the rest of the parameters to w1-w7 registers (lines 12–15):

Compiling and running the program ($as nstack.s -o nstack.o && ld nstack.o -o nstack -lc)

As we discussed in the previous posts, the bl instruction will store the contents of the program counter (pc) to the link register (lr) and set the new value of the program counter to the address of the first instruction of the subroutine that we are calling. According to the printf’s manual, this subroutine expects the format string as a first argument and the displayed values as a 2nd, 3rd and so on:

Since we comply with the calling convention, printf executes as expected, printing the given values to the standard output.

Arguments to the stack

When calling a subroutine that takes more than eight arguments, the extra ones must be stored in to the stack. The process of popping and pushing values from and in to the stack takes place in two steps:

  • First the developer has to allocate space in the stack by modifying the value of the stack pointer (sp).
  • Then, store or recover a value to or from the memory address where sp points to.

Allocating space

This is done by subtracting the space that we need in byte units from the value that the stack pointer points to, while taking care of the stack alignment. In AArch64 the stack pointer must always be 16 bytes aligned.

Although this seems confusing, thing of the stack as the pile depicted below:

AArch64 16 Bytes alignment requirement

In order to store 16 bytes the Stack TOP must be placed one position lower, for 32 bytes two positions and so on. To store values which are not multiples of 16 we need to find the closest 16 byte multiple boundary and set the Stack TOP to this value. This means that in order to store 8 bytes the stack top should still be placed one position lower, for 24 bytes two positions … and so on.

In the example below, we need to store 24 bytes in total (8 for each register):

The instructions at lines 7 and 8 will modify the stack as follows:

More specifically, for the sake of simplicity, assume that sp points to 0 when entering the main function. The [sp, #-32]! will set sp equal to sp −32 and X29 will be stored at sp[-32:-25] and X30 at sp[-24:-17]. Finally X19 is stored at sp + 16 (the sp value is not modified).

Now that (hopefully), this step is clear, let’s see an example, which make use of these concepts. We will use the notation sp[a:b] to indicate the stack offsets and start by storing an array of 8 integers in to the stack:

Compile and load the program above in gdb and set a breakpoint at *main+0. Then step in to each instruction in order to observe the changes in the stack:

sp is set to sp-32, x29 will be stored at 0x7..ffb10 → 0x7..ffb17 and X30 at 0x7..ffb18
sp is not modified and X19 is stored at 0x7..ffb20
sp is set to sp-32
sp is restored to the entry value

Notice that the instruction at line 25 allocates a 32 bytes space and the next stp instructions push the array elements in to the stack. Finally the instruction at line 33 will restore the stack pointer to the state before saving the array elements, and finally recover the rest of the values (line 34, 35).

In the next example, we are calling printf once again, passing more than 8 arguments this time:

Few things to notice:

  • At line 12, we store x19 as we are going to use it and it is non volatile
  • printf will take 12 arguments in total, including the format string, this means that 4 arguments have to be pushed to the stack
  • At lines 25, 26 x11 will be stored at sp[16:23], x12 at sp[24:31], x9 at sp[0:7] and x10 at sp[8:15]
  • Although that the extra arguments are 4 bytes each we store them as 8 bytes value in to the stack before the call to printf
  • In line 28 we set the return value to 0 and restore sp (line 29)
  • Finally, we restore x19, x29 and x30 from the stack and return to the address indicated by x30

Implementing a subroutine

We saw the steps that we should take when calling a subroutine and now it is time to see the conventions from the perspective of the called subroutine. From what have been discussed so far, you must already figured out that:

  • We can safely assume that up to 8 arguments must be stored in the registers x0 to x7 and the extra ones in the stack.
  • The returned value must be stored in x0


  • When using a non volatile register we must save its value before we use it and restore it before exiting
  • Volatile registers can be used without need to restore their value
  • The link register (x30) and frame pointer (x29) must also be saved when entering a subroutine and restored before exiting.

Example u-itoa

Reaching at the end of this post, we are going to write a program which converts an unsigned integer to a null-terminated string using a specified base and prints the result to the screen. More specifically, our main function calls the scanf to get an unsigned value and a base. It then calls our subroutine uitoa which does the conversion and prints the returned result to the screen. We are going to break our program in 3 parts, in order to make it easier to understand.

The first part which is the simplest one, asks the user to enter an unsigned integer and a base and then calls the standard scanf function to get the input. It then calls our subroutine uitoa which gets three arguments: the address where it should save the result, the integer to be converted and the conversion base:

Our simplified version of itoa, checks if the base is between 2 and 16 and the input is greater than 0. It then runs a loop where it divides the input with the base and stores the remainder on every iteration at position result[i]:

When this function exits, the result will be in reverse order at the memory address where x0 points to, while the length of the result is stored in the x1 register. Finally we print the result in reverse order:

The overall program structure is as follows:

Compile and run: