Saturday, May 6, 2017

ARM Assembly: Conditional Branching

One of the most common uses of branching in programs is if statements.  If statements require conditional branches.  Conditional branches only branch if certain conditions are met, otherwise they do nothing.

ARM has a list of 17 condition codes.  One, AL, we don't care about, because it is implied.  This code means "always".  This condition code makes an instruction run unconditionally, but since this is the default, we never actually use it.  There are also two condition codes that are aliases for other condition codes.  This tutorial is not going to cover all of the remaining 16 condition codes.  We will look at the most common two, and we will look at a few others.  If you want a complete list, ARM has a quick reference card that lists them all and can be found with a simple Google search of "ARM Thumb2 QRC".  (I won't guarantee this link will never break, but as of this writing, it can be downloaded from ARM's website here.  The condition codes are on the last page.)

The processor has some behind-the-scenes mechanics going on to make all of this work.  We will look at it later.  Right now, we will just discuss the conditions that satisfy some of the condition codes.

The two most common condition codes are EQ and NE.  EQ means equal, and NE means not equal.  An if statement that is testing equality will use one of these.  Condition codes generally apply to some previous instruction that does some operation.  If that operation satisfied the conditions of the code, then the instruction the condition code applies to is executed, otherwise it is not.

ARM allows condition codes on a lot of instructions, but right now we are only going to look at conditional branches, as most assembly languages have very few conditional instructions that are not branches.

Let's write a program.
.text
.balign 4
.global _start
_start:
    mov r0, #12

    cmp r0, #10
    bne then
    mov r0, #1
    b endif
then:
    mov r0, #0
endif:

    mov r7, #1
    svc #0
This program starts by putting 12 in r0.  Then we use the cmp instruction.  This instruction will subtract the second value from the first value, but it won't store the result.  So, in this case, the instruction is doing 12 - 10 = 2.  The cmp instruction will tell the processor some information about the result of the operation.  The NE condition code checks to see if the cmp instruction resulted in a 0.  If it did not, then the bne instruction will execute, going to the end label.  This is equivalent to a C or C++ if (12 == 10).  It may seem counter intuitive to use the not equal condition, but if we want the "if" code to be positioned before the "then" code, we have to do it this way.  Of course, we could reverse the if and then code, and then we would be able to use beq, instead of bne.  This is often a case of personal preference, but if you know one block will be run significantly more often, position it second.  This is because the processor tries to predict whether the branch will happen or not, and most processors just assume it will.  This means that when it does not, the processor has to back up and redo some stuff, wasting some time.  If the more frequently run block is where the conditional branch goes to (the "then" block, in this case), your program will have better performance.

EQ and NE are the most frequently used condition codes, because testing for equality is the most common condition.  What if you need to test for inequality, for example, less than or greater than?  There are condition codes for this as well.  In fact, most of the condition codes are for some kind of inequality.  The MI (minus) condition code tests for the result to be negative.  HI (higher) is for unsigned greater than.  GT (greater than) is for signed greater than.  There are also codes for both signed and unsigned less than, and signed and unsigned greater than or equal and less than or equal.  There are also a couple for checking for overflow, so that we can catch it when an integer rolls over.

Let's try the signed greater than (GT).  I also want to introduce another mechanic here.  The cmp instruction does subtraction, and it does not store the result.  This is often what we want, but it is not always.  Sometimes we do want to save the result.  Sometimes we don't want to do subtraction.  Many of the math instructions have an optional S that can be put at the end, that will tell the processor to keep track of the same things it does for the cmp instruction.  To demonstrate these things, we will also use EQ, but we will use an add instruction.
.text
.balign 4
.global _start
_start:
    mov r1, #17
    mov r2, #23
    subs r2, r2, r1
    bgt if_0
    b endif_0
:if_0
    add r0, r0, #1
:endif_0

    adds r1, r1, r2
    beq if_1
    b endif_1
:if_1
    add r0, #2
:endif_1

   mov r7, #1
   svc #0
We start by putting some values in r1 and r2.  I am saving r0 for keeping track of what succeeded and what did not (because we are not using gcc, the initial value in r0 is 0).  Notice that the sub instruction has an extra s at the end of it this time.  This tells it to update the processor state with some information about the results of the operation.  Since we subtracted 17 from 23, we ended up with a positive value, which indicates that r2 was indeed greater than r1.  My if statements are a bit ugly.  In real life, we would optimize this by just skipping the body of the if statement with a less-than-or-equal-to condition, instead of using the GT condition.  If the result is greater than, we add 1 to r0.  Next, we do an add, with the s, and then check if the result is equal.  Note that EQ is only true when the result is 0, so using add may result in intuitive results.  I could add 5 to 5, and it would not satisfy an EQ condition.  If I added 5 and -5, then it would satisfy that condition.  We typically use sub or cmp for conditionals, but once you understand the different condition codes, you can get away with using other math instructions, so long as you understand the consequences.

The other branch instructions can also be conditional.  Just add the appropriate condition code to the end of the instruction.  So, bl might be bleq, if you only want to call a function if two values are equal.

Conditional branching is something you will want to get comfortable with, because it is the basis of if statements and loops.  You will likely find that the most efficient way to use these is not the most intuitive.  This is one place that is very commonly optimized, because the most optimal solutions are often very unintuitive .

No comments:

Post a Comment