Tuesday, April 25, 2017

ARM Assembly: Integer Division

Division in discussed briefly in a previous post, but no example is given.  Originally, I left figuring out division to the reader, but I forgot something critical.  So, we are going to look at integer division separately.

ARM is an architecture that has advanced and evolved over time.  Some of the things you have already learned did not exist in early versions of ARM.  Because division is expensive, in the literal cost of transistors on the chip, it has commonly been left out of many architectures.  Division is often implemented in software, as successive subtractions, long division, or using so called "magic numbers".  The ARMv6 architecture used on the original Raspberry Pi is one architecture that does not have an integer division instruction.  In fact, even ARMv7-A (the architecture used for the original Pi 2) does not have an integer division instruction.  The Cortex-A7 version, however, has something called Virtual Extensions, which adds two integer division instructions.  The original Pi 2 happens to have a Cortex-A7 version of the ARMv7-A architecture, which means that it does have these instructions.

There is still a problem.  If you attempt to compile a program using the udiv or sdiv instructions, you will likely get an error message saying that this processor does not support the instruction.  This is because gcc and as compile for ARMv6 by default, to ensure compatibility with older Pi versions.  (Note that this means that if you do want your program to run on the older Pis, you will have to use some other method for doing division.)  To compile a program using these instructions, you must tell the compiler that you want to compile for the ARMv7-A Cortex-A7 processor architecture.  This is done with directives.

Before I demonstrate the directives, let's briefly talk about the division instructions.  If you have done the basic math tutorial, you may recall that these instructions only take registers, instead of an optional immediate value for the third argument.  This means that the numerator (dividend) and the denominator (divisor) must both be placed in registers.
.arch armv7-a
.cpu cortex-a7

.text
.global main
main:
    mov r1, #12
    mov r2, #2
    udiv r0, r1, r2

    bx lr
The first thing you will notice is the two new directives at the top.  The first tells the assembler what architecture you are compiling for.  The second tells it the specific ARM core you are compiling for.  The rest of the program is trivial.  It puts two values into registers, then it does unsigned division on them.  If you compile this, run it, and echo the error code, it will be 6 (12 divided by 2).  Try changing the values to see what behavior you get when the dividend is not evenly divisible by the divisor.

The directives used to tell the compiler what architecture to compile for can be very valuable, but they should also be treated with care.  When you use instructions that are limited to a particular architecture, you limit what your program can run on.  For the Raspberry Pi, the consequences are very clear cut, and it is easy to make a decision.  If you want your program to run on older Pis, you can keep the default.  If you don't need it to run on older Pis, you can compile for ARMv7, using instructions that are not available in older versions of ARM.  Note that some instructions also have multiple encodings, and they may not all be available on all architectures.  This means, even if you don't use ARMv7 specific instructions, using these directives may produce code that won't run on ARMv6.  ARM processors are typically backward compatible (at least for a few versions back), so if you are using a more recent Pi 2 or a Pi 3, with ARMv8 processors, compiling for ARMv7 will be fine.  If you were writing assembly for wider range of devices though, it would be less clear what those devices can support.  If we were writing Intel assembly to run on Intel processors, we might expect some customers to run our code on Pentium 4 computers or perhaps even Pentium 3s.  Maybe someone would eventually want to try it on a Pentium 1 even.  We would need to decide what we are willing to support.  If the odds of someone needing to run it on a Pentium 4 or lower are very low, we could compile for the Core 2 architecture, using more advanced instructions than we would have access to for a Pentium 1 processor.  We are lucky we are doing this on the Pi, where things are clear cut, but in real life applications, we cannot always just compile for the architecture we are developing on, or we might severely limit our market.  Sometimes it may be better to forego convenient instructions to allow clients with older machines access to our software.

There are a few more directives we will look at later, but you might find it valuable to know the correct directives for the ARMv8 processors currently shipping on new Pi 2s as well as on Pi 3s.
.arch armv8-a
.arch_extension crc
.cpu cortex-a53
There is one additional directive here, which you will probably never need.  The Pi 3 (and new Pi 2) processor has a CRC extension that adds a few instructions (that will not be covered in this series).  You can look up these instructions in the ARMv8 manual, if you are interested.  If you do use them, you will need the .arch_extension directive above.

Later on, we will add another directive for the floating point unit, but for now, these ARMv7 and ARMv8 ones should be sufficient.

No comments:

Post a Comment