Saturday, February 25, 2017

ARM Assembly: Basic Math

Programming is all about math.  Every program uses copious amounts of math.  Even a simple program that prints out a single line of text is using math somewhere, to figure out how long the text is and when it is done printing all of the characters.  While traditional programming languages may take care of some of this math for you, assembly language leaves most of the math to the programmer.  This means we need to learn how to do simple math in assembly, before we can really do much else.

The ARMv7 architecture has a large collection of math instructions.  I has several addition and subtraction instructions, and it has many different multiply instructions.  The ARMv7-R architecture also includes two division instructions.  Our Pi2 uses ARMv7-A, which does not include division instructions, however, the specific model, Cortex-A7, does include something called Virtual Extensions, which adds those division instructions.  This means we can add, subtract, multiply, and divide with fairly simple instructions.

We will use GCC again for this one, because we don't know how to print stuff to the screen without it yet.  We really need to learn to do math before we can stop relying on the C libraries for things.

There are five math instructions we care about right now.  There are several times that total, but many of them are for specific cases that are less common.  We will look at more of them later.  For now, we care about the add, sub, mul, udiv, and sdiv instructions.  These will allow us to add, subtract, multiply, and divide.  Notice there are two divide instructions, one for unsigned math and the other for signed math.  The other three operations only have one instruction each, because 2s-compliment math just works out for them, without the need to distinguish between signed and unsigned values.

The syntax for add and sub is the same.  According to ARM's documentation the syntax looks like this:
ADD{S} Rd, Rn, <Operand2>
SUB{S} Rd, Rn, <Operand2>
It would help to understand the documentation.  The part before the {S} is the actual instruction.  The {S} is an optional character that means the instruction should update the condition flags, which we discuss later.  Next is a list of what are essentially arguments.  Rd always represents the destination register.  Rn is just a register containing an input value.  <Operand2> can be a number of options.  The two most important are another register containing an input value and an immediate value of type #<imm8m> (a number that fits certain requirements).  We will look at other things <Operand2> can be later.

The syntax for mul, udiv, and sdiv are different:
MUL{S} Rd, Rm, Rs
UDIV Rd, Rn, Rm
SDIV Rd, Rn, Rm
You will notice that mul has the {S} option, but the divide instructions do not.  As with the other instructions, Rd is the destination register.  Rn and Rm are registers containing values.  Rs is also just a register with a value, but it can optionally shifted.  We won't worry about this right now though.

So now we know the syntax for the basic math instructions.  How do we actually use them?  Well, let's look at one more thing from the documentation, and then we will try it out.

The documentation describes the action of these instructions using mathematical notation, and understanding this can really help understand what an instruction is doing.  The five instructions above have their actions defined as the following:
ADD - Rd := Rn + <Operand2>
SUB - Rd := Rn - <Operand2>
MUL - Rd := (Rm * Rs)[31:0]
DIV - Rd := Rn / Rm
These pretty much all mean, the second and third operands are operated on, and the result is stored in Rd.  For subtraction and division, keep in mind that order matters (take a peak at the math notation for the RSB instruction, and compare it to SUB).  In the multiplication one, notice the [31:0].  This specifies that only the lowest 32 bits of the operation are stored in Rd (which makes sense, given that Rd is a 32 bit register).  This is also true of addition, when the result is bigger than 32 bits.

Actually using these is pretty simple.  Let's say we want to add 12 and 14.  We would start by putting one of the numbers in a register.  Then we could use an add instruction with an immediate value for the other one.  Alternatively, we could just put both values in registers and add them.
mov r1, #12
add r0, r1, #14
This code will put the number 12 into register r1.  Then it will add the value (12) in r1 to 14 and store the result (26) in r0.  Note that r1 will still have 12 in it when this is done.  We could have written add r1, r1, #14, and it would have overwritten r1 with the result of 12 + 14 (26).  If you don't care about intermediate values, you can just reuse their registers like this, but if you need to keep an intermediate value, make sure you save the result somewhere else.

We can write a simple program that does this addition with both input values in registers like this:
.text
.global main
main:
    mov r0, #12
    mov r1, #14
    add r0, r0, r1
    bx lr
Since we don't need to store anything in memory, we will start with the text section.  Because we are using GCC to compile, we need to start with a main function.  Then, we put 12 in r0 and 14 in r1.  Next we add r0 and r1 and store the result in r0.  The last line, bx lr, returns from the function.  You may recall reading in a previous article, that lr is the link register, where the return address is stored when we call a function.  main is just a function called by the C startup code, so the return address is stored in lr.  When we are done, we return with bx, which takes a register containing an address to go to.  So bx lr returns from main.  Once you are done, save your program as add.s.

Now we can compile the program.  Run gcc -o add add.s to compile.  Now run ./add to run the program.  The anticlimactic result is...nothing.  The program does not print anything to the screen, because we did not tell it to.  It did return something though.  Linux programs return an error code, and they do this by leaving the error code in r0 when they exit.  Notice our add instruction stored its result in r0, so the error code should be 26.  We can view this error code by running echo $?.  (Note that every program returns an error code, so this will only work if you run it directly after add.)  This should display the number 26.

From here the rest of the math instructions are fairly simple.  Multiplication and division require all operands to be in registers.  Subtraction works like addition.  Now let's write a bigger program that will print the values to the screen, instead of just returning one value as an error code.

Because we are compiling with GCC, we have free access to built-in C functions.  This means we can use printf() to output text to the console.  According to the documentation, the signature for this function is int printf(const char *format, ...);.   To call this function, we need to know a little bit about C calling conventions, for ARM.  Right now, there are two important things we need to know.  The first is that the return value is always placed in r0.  This is less important that the second, which is that arguments are passed in r0 to r3.  If there are more than four arguments, additional ones are passed on the stack.  Since we have not learned this yet, we will stick to four arguments or less.  One more important thing to know is that arguments are ordered in the registers the same as they would be ordered in the function call in C.

Knowing this, here is what we can get from the function signature of printf():
  • When it returns, the return value will be in r0.
  • The first argument must be placed in r0 before calling printf().
  • The first argument is a pointer to a cstring.  Note that cstrings are null terminated.
  • The second argument must be placed in r1, the third in r2, and the fourth in r3.
  • We are going to avoid more than four arguments, because we have not learned to pass arguments on the stack yet.
Let's examine a simple program that prints out a single string using printf().
.data
string:
    .asciz "Print me!\n"

.text
.global main
main:
    push {r12, lr}
    ldr r0, =string
    bl printf
    pop {r12, lr}
    bx lr

Save this as print.s, then compile it with gcc -o print print.s.  Now you can run ./print, and it will display the text "Print me!" followed by a newline.

What, exactly, is going on here?  We start by defining a data section.  Programs are composed of several sections.  The text section is where the executable code goes.  The data section is where global variables and constants go.  There is also a section for uninitialized data, and there are special sections that can be used for other things.  In fact, we could have put our string in a special section for read-only data, but we won't worry about that right now.  In the data section we create a label, which is nothing more than a symbol for referencing a specific location in memory.  Then we put some data in the data section.  The .asciz directive tells the assembler that we want to create a null (or zero) terminated string.  Next, we define the string.  The assembler will transparently add a null character to the end when it assembles the program.  Later will be need to create strings that are not null terminated, and we will use the .ascii directive for that.

Once our data is in the program, we will write our code in the text section, in the main function.  Now, I mentioned earlier that when a function is called, it overwrites the link register with its own return point.  Since main() is a function, its own return location is currently stored in lr.  When we call printf(), this will be overwritten, and if we don't save it somewhere else, main() won't know where to return to.  The first line of our main() function does a few things for us.  First, it stores the contents of r12 and lr on the stack.  Then, it updates the stack pointer, decreasing it by 8.  We will look at exactly why it does this later.  Next, we load the address that string points to into r0.  This is the cstring pointer that printf() expects for its first argument.  We don't have any more arguments for printf(), so next we call it.  Because we are using GCC, we don't have to do anything special; it knows where to find printf() for us.  printf() stores its return value in r0 before it returns, but we don't really care about that.  (Note, however, that since we don't change r0 after this, whatever it left there will be the error code for our program.)  Now, we need to get the value from lr back off of the stack, so we use the pop instruction.  Now we can return, using bx on the link register.

There are a few things you may have noticed here.  First, we are pushing and popping two registers.  Why are we storing r12, when it really never gets used anywhere?  The answer is that printf() and many other C functions and external interfaces (like the OS) expect the stack to be 8 byte aligned.  We will talk about alignment later, but the important part here is that if we push just one register, the stack will be 4 byte aligned, so we are pushing an extra one to keep it 8 byte aligned.

Now that we know how to print stuff to the screen, let's do some subtraction!
.data
string:
    .asciz "%d - %d = %d\n"

.text
.global main
main:
    push {r12, lr}
    ldr r0, =string
    mov r1, #12
    mov r2, #7
    sub r3, r1, r2
    bl printf
    pop {r12, lr}
    bx lr

Save this as sub.s, then compile with gcc -o sub sub.s.  Now run ./sub.  Verify that the output is correct.

This is very similar to the previous program, except that we are having printf() display the values used in our math.  Now, we could have done the math in any registers, but I specifically chose the ones I did, to avoid having to rearrange things later for printf().  As before, we load the address of string into r0.  Then we put our operands into r1 and r2 in the order we want prinf() to display them.  Lastly, we do our subtraction, placing the result in r3, as the fourth argument of printf().  Everything after this is identical to our previous program.  Being able to print out results like this is a major step in being able to debug larger programs, though once we let go of the C library, we will have to find other ways.

Multiplication will be left as an exercise for the reader.  Really, this should be trivial, given what you have already done.  Division will be discussed in a separate post, as there are some additional requirements to use the division instructions.

No comments:

Post a Comment