ARM 64 Assembly Series — Branch

In the previous post we talked about the ldr and str instructions which can be used to transfer data bidirectionally between a memory address and a register (or pair of registers):

Appending b, h or w to the instruction mnemonic indicates an unsigned byte, a half word or a word respectively. Adding an s in front of these letter (sb, sh, sw), it will force to the cpu handle the data as signed.

In this post we are going to talk about branch instructions and how they can be used in order to change the address of the next instruction that will be executed.


Branching is one of the most important concepts in programming as it allows the developer to define alternative code paths depending on various conditions. In high level languages, these conditions are evaluated using control flow statements like the if, for, while or even goto. Similarly, in low level languages there are special instructions that may be used in order to achieve the same result and route the code flow to a different path.

AArch64 defines a set of branch instructions which can be used to perform conditional or unconditional jumps within a function (branch) or calls to other functions (branch and link). Let’s see the most important of them as well as their usage.

Conditional and Unconditional Branches

Starting with the simplest case, a conditional or unconditional branch instruction looks as follows:

b<c> label

And can be interpreted as if <c> then pc = new_address .If the <c> parameter is omitted (e.g. b label), then it simply sets the pc = new_address .The label is an immediate which is encoded as a relative offset to the program counter. This immediate will be sign extended and multiplied in order to calculate the offset that will be added to the current address of the program counter:

offset = immX * 4

Where X will be 19bits for conditional branches and 26bits for unconditional. Finally, the symbol <c> is a mnemonic which denotes the state of a flag of the PSTATE register. The possible values of <c> as well as their meaning in regard to the PSTATE flags is depicted below:

Table 1: Condition modifiers for AArch64

That being said, the instruction bvs checks the overflow flag V in order to decide to follow or not a new code path, while the bne checks if the Zero flag is not equal to 1. In the following example, the cmp instruction at line 5 will set the Z flag to 1 if w1 is equal to zero, this will have as a result the beq to succeed, thus the code will follow the address indicated by the foo label. In case the w1 is not equal to zero, the code will continue up to line 9 where the unconditional branch will redirect the code flow to the address indicated by the label bar:

Conditional and Unconditional branch

Branch to register

In case that the address of the next instruction is fetched by a register, the branch instruction has the following forms:

br   Rn     //meaning that pc will be set to Rn
ret <Rn> //meaning that if Rn is omitted pc = lr else pc = Rn

Although that the instructions above are self explanatory, it worths to clarify that in the case of ret the <Rn> parameter is optional and if it is omitted then the value will be fetched by the link register.

Branch and link

The main difference with the previous cases is that before taking the new branch, the next instruction from the current address will be copied to the link register:

bl    label   //meaning that lr = pc+4 and pc = new_address*
blr Rn //meaning that lr = pc+4 and pc = Rn

*in this case the immediate is 26 bits and multiplied with 4

Compare and branch

These are conditional branches where the decision to continue the execution from a new address depends on the value of the register which is given as parameter. Their general form is depicted below:

cbz  Rn, label           //if Rn == 0 then pc = new_address*
cbnz Rn, label //if Rn != 0 then pc = new_address*
tbz Rn, #imm6, label //if Rn[#imm6] == 0 then pc = new_address**
tbnz Rn, #imm6, label //if Rn[#imm6] != 0 then pc = new_address**

*the immediate is 19 bits and multiplied with 4

**the immediate is 14 bits and multiplied with 4

The #imm6 is an integer ranging from 0 to 63, indicating a specific bit of the register which is given as a parameter. For example, the following instruction checks if the value contained in X0 is even and takes or not the branch to the address indicated by the label even :

tbz X0, 0, even          //if X0 % 2==0 then pc = even

PC relative address calculation

The adr and adrp instructions can be used to calculate an address associated with a label and store the result to a general register which is given as a parameter. Their general form is as follows:

adr Rn, label and adrp Rn, label

In the first case a 21bit immediate is used, resulting a range of 1MB within the current address while in the second case the address has a range of 4GB to the nearest 4KB page as the the 21bit immediate, is shifted left by 12 bits and the 12 LSB bits are padded with zero. As being said, the result in both cases is stored to the general purpose register which is given as a parameter.


Here is a summary table to help you keep track on what has been discussed so far in regard to the branch instructions:


Here is a simple loop and its arm equivalent:

x = 3;while (x > 17) {

And here is a simple C program which makes use of the concepts that we discussed so far:

After compiling it, load it to gdb and disassemble its main function:

We will go through each line explaining what the corresponding command is doing:

  • +0 stp x29, x30, [sp, #-32]!

Push the frame pointer (fp) and link register (lr) to the stack. Before executing this instruction, the stack pointer (sp) points to 0x7ffffff9f0. The instruction will be completed in the following steps:

  1. sp -= 0x20 => sp = 0x7FFFFF9D0
  2. Store x29 at 0x7FFFFF9D0
  3. Store x30 at 0x7FFFFF9D8
  • +8 mov w1, #0x1 and +12 mov w0, #0xa

The instructions above will prepare the call to the looper function by storing its parameters looper(10,1) to w0 and w1.

  • +16 bl 0x5555550774 <looper>

Notice that before branching to 0x5555550774 the program counter points to 0x…07cc:

The bl instruction will first store the return address to the link register thus

lr = pc + 4 => lr= 0x…7d0

And finally take the branch:

Inside the looper function we have the following:

The instructions:

  • <+0> sub sp, sp, #0x20, <+4> str w0, [sp, #12] and <+8> str w1, [sp, #8]

will set up the stack and push the function’s parameters to it. Similarly, the wzr, [sp, #28] will push the zero value to the stack, which after this instruction will be as follows:

Next comes the actual loop and. The w1 register will store our integer variable i and so we have the following:

What the green block does is increasing the second parameter which is given to the function by one (b++). Indeed, the value at address sp+8 contains this parameter which then loaded to w0, increased by one (at +24) and stored back to sp+8. The yellow block does exactly the same thing for the local variable i. Finally at offsets 44 →56 the increased by one value is stored to w1, the threshold is stored to w0, these values are compared and if w1 < w0 the loop continues. When w1 gets to be equal to w0 the blt is not taken and the code continues in order to store the return value to w0, restore the sp and use the ret to set the program counter to the value store to the link register:

The last instruction will bring us back to main:

Not much different that the previous call, our printf takes two parameters:


As it happened before, “%d\n” will be stored to x0 and the result from the looper will be stored to w1:

Finally, after returning from printf and then returning from main have the call to exit:



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store