Technium Adeptus: ARM Assembly: Advanced Integer Math

The ARM architecture provides some additional capabilities in its math instructions that allow multiple operations to be done in a single instruction. This also allows for using faster instructions for certain math operations when one of the operands is static. Knowing how to use these instructions to their fullest will allow you to produce optimal programs.

In a previous article, we saw that some of the math instructions have an operand listed in the documentation as <Operand2>. This is something of a multi-purpose operand. As we saw, it can be a register that holds a value, and it can be an immediate value that fits some constraints. It can also be a register, where the value in the register is shifted left or right by a certain number of bits.

You probably already know that arithmetic shifts can be equated with multiplication and division. Since a logical left shift is the same as an arithmetic left shift, ARM just uses the term "logical shift left" and the syntax LSL to refer to it. Keep in mind that logical right shifts should be used for unsigned math, while arithmetic right shifts should be used for signed math. Shifting right is equivalent to division by a power of 2 (where the number of bits shifted is the exponent), and left shifting is equivalent to multiplication by a power of 2. We can use add and subtract instructions with shifts in the last operand to multiply by some values that are close to powers of 2. Why would we not just use a MUL instruction? While timing of instructions is processor specific, and this particular processor does not seem to have good published data on that timing, it is generally safe to assume that multiplication and division instructions are significantly slower than addition, subtraction, and bit shifts. The MUL instruction is best used when the second operand is not constant or when the equation cannot be reduced to a shifted ADD or SUB instruction.

The first example is multiplication by 4. We won't even need to use ADD or SUB for this one. The MOV instruction has a flexible operand as well. (Note that ARM does not have any dedicated shift instructions. It does have pseudo instructions for shifts, but the machine code generated just uses MOV instructions with shifts.) To multiply a value in r0 by 4, storing the result in r1, use the following:

mov r1, r0, LSL #2

This will multiply the value in r0 by 4, and then store it in r1. With this, we can multiply by any power of 2 that the processor is capable of. If we wanted to multiply by 5, we would use an ADD instruction:

add r1, r0, r0, LSL #2

This adds the value in r0 to the value in r0 multiplied by 4. In math, it might look like this: r1 = r0 + (r0 * 2^2), which is the same as r1 = r0 + (r0 * 4) or r1 = r0 * 5. Using the ADD instruction, we can multiply by one greater than any power of 2. So, what can we do with SUB? We cannot use SUB directly for this, because the operand that can be shifted is the one that is being subtracted. This could allow us to negate a value and then multiply it by one less than a power of 2. We rarely need to negate a value, and then multiply that by one less than a power of 2 though. More likely, we will need to multiply the value by one less than a power of 2 without first negating it. There is a special subtraction instruction that allows us to do just that.

With addition, order does not matter, but it does matter with subtraction, which is why ARM has provided the RSB instruction. This does reverse subtraction. If the math for the SUB instruction looks like r0 = r1 - r2, the math for RSB looks like r0 = -r1 + r2. If this does not make sense, consider that in normal subtraction, we subtract the second operand from the first. In assembly, the second operand is the flexible one. This means we can take a value in a register and subtract an immediate or shifted value from it. We cannot start with an immediate or shifted value, and subtract a register from it, without some extra instruction. The RSB instruction allows us to do this without any extra instructions. So, let's multiply a value by 7, using RSB.

rsb r0, r1, r1, LSL #3

In math, we are doing this: r0 = (r1 * 2^3) - r1 = (r1 * 8) - r1 = r1 * 7. When we want to multiply by a constant that is 1 less than a power of 2, we will typically use the RSB instruction to do that.

Division is not quite as flexible, as just adding or subtracting another instance of the original value will not increase or decrease the value we are dividing by. Using right shifts though, we can still divide by powers of 2. For example, if we want to divide by 16, we would use this:

mov r0, r1, LSR #4

This works only for unsigned values. Instead of LSR, you can use ASR to divide signed values by powers of 2.

There is a special way of doing division by constants, however, it is rarely used with processors that have integer division instructions. It essentially uses multiplication with a "magic number" that sort of represents the reciprocal of the number you want to divide by. We won't go any further into this, as we have access to integer division instructions, but you can learn more by Googling, "magic number division -sports".

Note that the above strategies only work when one of the values is constant. If neither of the values will be known until runtime, then you will have to use MUL, SDIV, or UDIV, and if you don't have access to integer division instructions, you either have to resort to long division or looped subtraction, for division.

You should spend some time looking over ARM's quick reference card for ARMv7, to see what other math instructions are available. It includes some more overt combined operation instructions, like multiply and subtract, dual signed multiply and add (multiplies 2 sets of 16 bit values, then adds them), as well as a few other instructions for 16 and 32 bit multiplication combined with other operations. Most of these exist, because they are useful for certain common applications, so there is certainly some value in knowing what is available to you.

If you have been doing these tutorials in order, you should know enough by now to write a program or two that will demonstrate your understanding of the things we just covered. I encourage you to do so, as doing helps to reinforce learning.

Technium Adeptus

Monday, March 6, 2017

ARM Assembly: Advanced Integer Math

No comments:

Post a Comment