Monday, December 12, 2016

ARM Assembly: The ARM Architecture

The history of the ARM architecture is quite interesting, though we will only be looking at it very briefly.  ARM started with a company named Acorn Computers.  It initially produced simple computers, during the time when all computers were pretty simple.  Acorn Computers produced several machines that were successful in Europe, including the BBC Micro series.  The primary focus of the company was on processors though, and due to the unsuitability of other processors at the time, it decided to create its own processors, in the Acorn RISC Machine project, to go with the BBC Micro for more business oriented applications.  Early ARM processors compared quite favorably to early Intel processors, with the ARM2 outperforming the Intel 80286 at the same time as using less power.  In the end, the ARM processor line was turned into its own company, Advanced RISC Machines.  The unique thing about ARM as a company is that it does not manufacture processors.  It designs and licenses processor architectures for other companies to manufacture.  The consequence of this is that ARM processors are ubiquitous in the mobile device industry, because it is far cheaper to license designs and manufacture them than it is for companies to design custom processors themselves.  This is especially valuable in an industry that is constantly trying to push technological boundaries to keep up with competition.  In this business model, one company does all of the design work, often working with larger clients to include valuable new features, and the entire mobile device industry licenses and uses the designs to produce all sorts of different mobile devices.  In addition to mobile devices, there are some companies that are attempting to use the ARM architecture to produce server processors.  They believe that the lower power consumption and higher performance that may be possible with RISC processors could be very popular in a market that values processors that run cheaper and cooler but need plenty of processing power.

Instruction Sets

The specific part of the ARM architecture that is important to us is how the ARMv7 architecture works.  ARMv7 is broken up into 3 divisions.  There is an A series, an M series, and an R series.  The A series is general purpose.  The M series is mostly low power consumption processors designed for use in embedded applications and lower powered mobile devices.  The R series is designed for real-time embedded applications, where responsiveness and performance are extremely important.  We are mostly going to focus on the A series, with a little bit on the M series.  The main difference between A and M with respect to this series is that the M series only supports ARM's 16 bit Thumb/Thumb2 instruction set (possibly with support for the few 32 bit Thumb2 instructions), while the A series supports both the 32 bit ARM instruction set and the mostly 16 bit Thumb2 instruction set.

Later in this series we will also learn about another instruction set the processor in the Pi 2 supports.  This is the Neon/VFP instruction set that provides instructions for floating point math as well as parallel integer and floating point math.  We will discuss this more later.

Registers

Pretty much all processors need some sort of memory to operate on.  Modern processors use several levels of memory.  The lowest level is registers that each can store one value.  The size of the value is generally the word size of the processor.  The word size of the ARMv7 architecture is 32 bits, meaning that each register can hold a 32 bit value.

The ARMv7 architecture claims to have 16 general purpose registers, but this is not strictly true.  No less than 3 of these registers have  dedicated purposes, and attempting to use them for anything else is likely to cause problems.  Some programs may use a 4th register for a dedicated purpose as well.

ARM generally names its general purpose registers as rx, where x if the register number.  Thus, ARMv7 processors have registers r0 through r15.  We never use any registers above r12 for general use though, nor do we use them by their designations r13, r14, or r15.  Instead, we use sp for r13, lr for r14, and pc for r15.

The pc register is the program counter or instruction pointer.  It contains the memory address of the next instruction to be run.  With a few exceptions, whenever an instruction is executed, the instruction pointer is updated to point at the instruction after it.  Modifying the instruction pointer will change where the program is executing.

The sp register is the stack pointer.  It contains the memory address of the last entry on the stack.  It is often used as a reference point for accessing data stored on the stack.  Modification of the stack pointer should be done carefully and predictably, because otherwise the top of the stack can be lost, which can be disastrous.

The lr register is used to store the return point for a function.  When a function is called (there are some instructions for this), the address of the instruction directly after the function call is stored in lr, so that the function being called knows where to return to when it is done.  Technically, if the address stored in lr is saved somewhere else and restored later, lr can be used as a general purpose register, though I have not found this to be terribly common.  The most important thing to remember about lr is that if you are going to call a function from inside of another function, you need to store the value in lr somewhere else first, because it is going to get overwritten.  Don't worry to much about this right now though, as we will discuss it again when we start calling functions.

In partially compiled C programs, you may also see an fp register.  This is the frame pointer register, and when used, it is r11.  The frame pointer can be really handy for certain kinds of debugging, and it can be used as an alternative reference to sp, however it is not strictly necessary.  In fact, you will never find it in partially compiled C programs with any level of optimization.  It can only be found in unoptimized, partially compiled C code.  We won't be using it, however it is good to be aware of, as you may see it in assembly code generated by GCC.

Memory

One fairly unusual thing about ARM is that it does not have any instructions that directly operate on data in memory.  In fact, the only memory access instructions ARM has are a few for loading data from memory into registers.  Most modern processor architectures allow at least a few instructions besides load and store instructions to access data from memory directly.  ARM does not, and this means that if you want to operate on a value stored in memory, you must first load it into a register, and if you want to put the result of an operation in memory, you have to save it to memory after the operation is complete.  This keeps the instruction set fairly simple, and despite the fact that it means you have to use more instructions when accessing data in memory, ARM processors still have excellent performance.

The general workflow in 32 bit Intel processors involves a lot of memory accesses, because they have a very limited number of registers (8 general purpose registers, except that some have special purposes as well, which often limits their usefulness).  Consequently, instructions that can access memory directly have been seen as essential to good performance.  ARM, however, has plenty of registers, which means that intermediate values rarely ever have to be stored in memory.  Thus the value of instructions that access memory directly is very limited.  Historically, this has been such a significant strong point of ARM that it has been able to outperform Intel and other processors with significantly lower power consumption, despite the fact that RISC processors are defined by the fact that they maintain smaller instruction sets of simpler instructions.  In fact, this is such a potent strength that modern Intel 64 bit architectures have doubled the number of registers from 8 to 16 (comparatively though, 64 bit ARM processors now have 31 general purpose registers, and the pc and sp registers are dedicated registers, meaning that it more than doubled the number of usable registers).  The general workflow with ARM programs should take advantage of the copious amount of general purpose registers to avoid unnecessary memory accesses.

Peripherals

Peripherals won't matter much here with respect to processor architecture, because we will be accessing them through the operating system, not directly.  That said, it is still worth spending a paragraph on.

Peripherals for ARM processors are memory mapped.  This is the most common way of handling peripherals.  Memory mapped peripherals are mapped to unused memory addresses by the processor (the Pi 2 processor can address up to 1TB of memory, but it only has 1GB, which leaves a huge number of memory addresses free).  In many systems, they are hard coded to specific memory addresses, but in some (desktops and laptops, for example), they are mapped by the BIOS during boot based on a number of factors.  The Pi has its memory mappings hard coded, since its hardware is not changeable.  It can be accessed by reading and writing the memory addresses associated with the peripheral you want to control.  The catch here is that this is handled by Linux on the Pi, and we don't have direct access to the device's memory.  We access memory through a memory manager that is managed by the operating system.  We will look at this more later, but the important thing is, no matter what memory we attempt to modify in our programs, we cannot directly access memory that is mapped to peripherals.  We must go through the operating system.  This is for security reasons and because an operating system running multiple processes simultaneously has to implement some kind of access control to prevent programs from stepping on each other's toes.


Now that we have a basic understanding of the processor architecture in our Pi 2, we can get back to writing assembly code!