Saturday, May 6, 2017

ARM Assembly: Function Pointers

Now that we have been introduced to pointers, it is time to look at function pointers.  An important fact to keep in mind here is that the processor does not distinguish between instructions and data.  If you tell the process to run code at a particular location, it will interpret whatever is there as code, even if it is not intended to be code.  Types only really exist for the programmer.  It might help if you think of instructions as just another type.  Integers are 4 byte data values that represent whole numbers.  Floats are 4 byte data values that represent numbers which can have fractional parts.  Instructions are 4 byte values that are intended to be executed.  All of these data types are encoded exactly the same way, as 32 bit series of bits.  This means that interpretation depends on how to use them.  If you set the pc to some data, it will interpret that data as instructions.  If you perform a floating point operation on an integer, it will happily interpret it as whatever floating point value has that bit pattern.  The processor does not know or care what a given item in memory represents.  It will treat it however you tell it to.

This means that function pointers are just pointers.  When used correctly, they will point to values that you intend to be interpreted as instructions.  The processor does not know this though.  It just knows that you have a value in a register, and when you use it to call a function, it will interpret it as a pointer to executable code.

We will typically get function pointers in the same way that we get any pointer.  We will load the address into a register using the label of the function and a memory read.
.text
.balign 4
.global _start
_start:
    ldr r4, =fun
    blx r4

    mov r7, #1
    svc #0

fun:
    mov r0, #13
    bx lr
We could just call the function with bl fun.  This is significantly more efficient, because the function address is hard coded into the instruction, so we don't need an extra memory read.  What if our program is really big though, and the function is to far away in memory to encode the relative address in the bl instruction?  (Memory addresses are 32 bits.  Instructions are also 32 bits.  This means that an instruction cannot contain a whole memory address.  The ARM processor deals with this by using relative addresses.  Instead of writing the new address to the pc, it will add or subtract some value.  This value has a limited size though, and this means that addresses that are too distant in the program must use registers to store their entire address.)

The program starts by loading the memory address of our function into r4.  This is our function pointer.  We have to use a blx instruction to use a function pointer to call a function.  The function returns 13, which should be the error code for the program.

You can do this with any function in your file or declared as global in another file.  If were using GCC, we could get a function pointer for printf.  Just like any other value in memory, we can get the address of functions using their labels.

Long jumps is not the only application for function pointers.  Function pointers can be used to define behavior without constant conditionals.  For example, perhaps I am writing a program that needs to be able to use several different kinds of networks, but it will only use one at a time.  I could have if statements every time there is a network access.  This is terribly inefficient for a number of reasons.  What if instead, I set some function pointers for the kind of network I want.  This is one conditional, and then the network access can just use whatever function pointers I set.  If I need to change the type of network, I can replace the function pointers for the first type with the ones for the new type.

In assembly, you can also make anonymous functions, just by not specifying a label.  These must be called using function pointers, because there is no label to reference.  (This is a pretty advanced topic that I plan to cover much later in this series.)

There is one more kind of pointer to instructions that is not technically a function pointer, because it does not point to the start of a function.  In the above program, the last line is bx lr.  The lr register contains a memory address to an instruction somewhere within a function.  This is the address that we want the function to return to.  Since it is not a true function, we use bx, instead of blx.  We can store memory addresses that are not the beginning of a function in registers, creating pointers to instructions.  Aside from returning from functions, this is not terribly common to use.  If we want to navigate within a function, we generally just use labels, as this is more efficient.  When we want to go to another function, we almost always want to go to the beginning or to a return point.  There may be a few other esoteric applications for this, but for the most part your program will be better structured and far easier to read if you avoid this.

No comments:

Post a Comment