Saturday, May 6, 2017

ARM Assembly: Arrays and Structs

If you are doing these in order, you just learned about loops and indexing modes.  You now have the foundation for working with arrays and structs.

Structs are heterogeneous data structures.  This means that elements of a struct can and typically do have different data types or at least different meanings.  When we are mixing data types in a struct, there are some rules we need to follow.  The most important rule is that each variable should be aligned to its own size.  So, a short should be 2 byte aligned, an integer should be 4 byte aligned, and so on.  This may leave voids in our struct.  The rule with these voids is that they should never be used.

Technically, we have already made a struct, when we learned about global memory.  That struct contained 4 integers that represented an IP address.  Let's make a more complicated struct.
.data
.balign 4
struct:
    .word 32
    .byte 12
    .skip 1
    .short 64
    .word 169

.text
.global _start
_start:
    ldr r4, =struct
    ldr r0, [r4]
    ldrb r1, [r4, #4]
    ldrs r2, [r4, #6]
    ldr r3, [r4, #8]

    add r0, r0, r1
    add r0, r0, r2
    add r0, r0, r3

    mov r7, #1
    svc #0
We start by 4 byte aligning the beginning of the struct.  Typically we want to align the struct by the largest data type it contains.  So, if we were holding a 64 bit integer, we would want to 8 byte it.  The first value in the struct is an integer, which is 4 bytes.  The second is a 1 byte value.  The next value is a 2 byte short, but the byte messed up our alignment.  So, we will skip 1 byte, which will 2 byte align the short.  The next value is another 4 byte integer.  If you add up the bytes from the byte, the void, and the short, you will find that we are already 4 byte aligned, so we don't need to skip anything here.

The code starts by getting the address of the struct.  We can use the offset indexing mode to pull out specific elements of the struct.  When calculating the offsets, we have to take into account the voids as well as the data elements.

You may notice that I used ldrb and ldrs in there.  If we just do an ldr, we will end up telling our program to read 4 bytes every time.  For the byte, this means we will read the byte, the void, and the short, all into one register.  For the short, we would end up reading the short and then the lower two bytes of the integer right after it.  If we want ldr to behave correctly with smaller data types, we have to tell it when we need these types, otherwise it will assume we always want to read a full integer.

Hopefully it is now obvious why the indexing mode we used is so valuable.

Arrays are typically homogeneous data.  They usually contain data that all has the same type and all has the same significance.  In assembly it is possible to mix data types in an array-like data structure, but if you do, you have to keep track of the types, so you can treat them properly.  We won't be doing that here.

An array is just a chunk of memory that has a size that is a multiple of the data type we are storing in in.  The defining factor of an array is that the data elements stored all have the same significance.  When we learned about global variables, we created a struct containing an IP address.  The data type was homogeneous, but the significance of each element was different.  Each element was a different value in an IP address.  A trick for determining whether a data structure is an array or a struct is to consider what would happen if the order of elements was changed.  If changing the order would have no effect or only affect things like sorting, then it is probably an array.  If changing the order would completely change the meaning of the data, then it is probably a struct (with the IP address, changing order would make it an entirely different IP address).  That said, as with everything else related to data in assembly, what the data means depends entirely on how you treat it.  There are not really arrays or structs.  They are collections of data in memory that we use as arrays and structs.

Let's make an array.
.bss
.balign 4
array:
    .space 64

.text
.global _start
_start:
    ldr r4, =array

    b init_loop_test
:init_loop
    str  r0, [r4, r0, LSL #2]
    add r0, r0, #1
:init_loop_test
    cmp r0, #16
    bne init_loop

    mov r7, #1
    svc #0
We are going to create an integer array with 16 elements (16 * 4 = 64).  Because I don't want to type in .word and a value 16 times, I stuck it in the bss section, and I initialized it in a for loop.  The loop iterates through the elements using an index stored in r0.  I am also using the index as the value to store in each element.  When the loop finishes, each element of the array will contain its own index.  I am using the register offset indexing mode with a shift, to convert the indices into the right offset for each element.

When this program terminates, the number 16 will be returned as the error code.  We will eventually learn to use the debugger.  Once we do that, it might be good to come back to this lesson and examine the contents of the array with the debugger.

In this case, we retained the original array address.  Sometimes this is what you want to do.  If we did not care about this, we could have used the [Rm], #<offset> mode, with an offset of 4.  This would have updated the pointer to point to the next element after writing to each one.  We could have eliminated the add instruction if we had done this.  The reason I did not is that I wanted to store the index of each element in it, so I had to keep track of the current index.

Also note that if I wanted to get a specific element out of the array, without traversing the entire thing, I could use the indexing mode we used above, and just set r0 to the index of the element.  This is something you generally cannot do with the immediate offset indexing mode.

No comments:

Post a Comment