Monday, March 6, 2017

ARM Assembly: Memory Architecture

One of the biggest strengths of the ARM architecture is that it has a lot of registers.  This allows it to avoid wasting a lot of time waiting for memory transactions.  Any serious program is going to have to use memory though.  Before we learn how to use memory, we need to learn about the memory architecture of the Raspberry Pi (which is very similar to most modern systems).

The first thing we need to learn about is virtual memory.  Nearly all modern systems use memory virtualization for a number of reasons.  The first is security.  Early computers allowed programs to access memory directly, but this lead to major security problems when multiple programs could run at the same time.  A program could easily look at and change the memory used by another program.  A background program could watch memory being used by another program to capture a username and password, and then it might report that information somewhere that another user had access to, allowing a nefarious user to easily hack other users' accounts (in modern applications, could have allowed a program to see credit card information entered into a web page).  The second problem is a matter of memory addressing.  When multiple programs have direct access to the same memory space, it can be difficult for each to know what memory is being used by the other.  This is a bad situation when 20 or more programs could be running at the same time (as is common in modern computers).

Protected memory was created to deal with these two problems.  Protected memory is a processor level technology that allows the operating system to run at a different access level than other processes.  When the processor starts up, it is in unrestricted mode.  The operating system starts up with direct memory access, as well as direct access to all other hardware.  Once the OS has started, it starts up other programs in a restricted mode.  When the OS starts a program, it sets up the program memory address space in a memory manager unit within the processor.  When the program attempts to access a memory address, the CPU consults the memory manager unit to determine where that address points to in physical memory.  This is often called virtual memory.  On a 32 bit system, the program sees 4GB of memory space that it can access.  Part of this is where the program data is stored.  This also includes stack space, global memory, heap space, linked libraries, and some space reserved for the kernel to keep track of metadata about the program.  While the program sees all of this memory space, it cannot actually use all of it.  On the Raspberry Pi, this should be obvious, as it has only 1GB of memory.  When a program starts running, it can access (read and write) allocated stack space, global memory, and it can read (and write in some systems but not others) the text area, where the executable code is.  If it tries to read or write any other area of memory or in the reserved kernel space, it will fail with a segfault.  If the program wants more memory, it must ask the operating system for it.

 This represents how memory is mapped to programs in Linux.  The kernel space is the space reserved by the kernel for metadata.  In Linux, it is 1GB, in Windows it is 2GB by default.

The stack is used for storing local variables.  It grows downwards.  The stack has a maximum size limit, often 8MB.  This means that it is not suitable for storing large amounts of data.  It is mostly pre-allocated though, which means that it is readily available.

The memory mapping area is used for two things.  One is memory mapped files.  This includes dynamically linked libraries as well as files that are mapped to memory explicitly by the program, so it can access the file as if it were just memory addresses.  The benefit of mapping dynamically linked libraries this way is that multiple programs can have their virtual memory mapped to the same physical memory, allowing the operating system to load a single copy of each library for multiple programs to use.  The second thing this area is used for is anonymous mappings, which is essentially like mapping a file to memory without actually having a file.  We will discuss this more later, but this is one method of getting access to more memory from the OS.  This section grows downward.

The heap is the other method of getting access to more memory.  When the program starts, no heap memory is allocated.  The program must ask the OS to increase the size of the heap, which grows upward.

The BSS and data areas are global memory allocated when the program starts up.  The difference is, the data area contains variables that have been initialized.  In other words, they have values in then when the program starts.  The BSS area has uninitialized global variables, which are generally filled with zeroes when the program starts.

The text segment is one of the most important parts of the program.  It contains the executable code.


Now let's look at the spaces.  Most of these spaces represent random offsets.  Bugs in programs sometimes allow malicious code to be injected.  When this happens, it is very convenient to the malicious code when memory addresses of the various sections all start at the same place.  Modern operating systems make it more difficult for malicious code to affect buggy programs by randomly picking starting addresses for the stack, the memory mapping segment, and the heap.  There are two other spaces, one between the heap and the memory mapping segment, and one at the beginning of the program.  The one between the heap and memory mapping segment is generally the largest, and it leaves room for those sections to grow.  On the Raspberry Pi, it is impossible for them to meet, because the Pi will run out of memory before that ever happens.  On a system with more memory though, if those two sections meet, the OS will refuse the request to allocate more memory.  A robust program will deal with this gracefully.  The gap at the beginning of the program exists for another reason.  Most of the reason is due to certain performance gains in an ancient system the starting address was borrowed from (this gap is exactly 128MB).  There is, however, a very important reason the text segment cannot start at 0.  This reason is that 0 is used to indicate a null pointer, and if the text area started at address 0, then the null pointer would point to a valid memory location.

Aside from this, there are a few other segment types that can be used, though none of them are very important for this system.


Like most modern systems, the Raspberry Pi uses memory mapped hardware.  Like most 32 bit processors since the early-90s, the processor on the Pi is capable of addressing 1TB of memory (yes, the 4GB memory limit of 32 bit systems was Windows, not limitations of the processors; case in point, I have a 32 bit Xeon server that was running Windows Server 2003 Enterprise (essentially XP with a few extra features, like access to all of your memory) with 6GB of memory and full access to all of it; it is now running Linux, with full access to all 6GB).  With 1,000 times the memory space needed for the amount of memory is has, it just makes sense to use the memory system for accessing peripherals.  So, access to the GPIO pins, USB, Ethernet, audio, video, and everything else happens by accessing physical memory addresses.  Unfortunately (or, perhaps, fortunately) programs are limited to their virtual memory space, which is not mapped to this hardware.  The kernel, however, has direct access to this memory space, so programs must interact with hardware through the kernel.


Some of this information is academic, unless you end up programming embedded systems, operating systems, or drivers.  The important things to remember are that the stack grows down and the heap grows up, and the stack, the heap, and the memory mapping segment start at random addresses.  Also keep in mind that when a program accesses a memory address, it is not accessing that address on the actual memory of the Raspberry Pi.  It is essentially using a lookup table to find the data in physical memory.  And, of course, while the Pi's hardware is memory mapped, you won't be able to access it directly.  Instead, you will have to do that through the operating system.

No comments:

Post a Comment