Wednesday, April 26, 2017

ARM Assembly: Ditching GCC

We started this using gcc to compile our assembly programs.  There are significant benefits to doing this, but it also comes with some costs.  In addition, the greatest value in programming in assembly is optimization, and if you are using a lot of C functions in your code, it defeats this point.  The choice to use gcc or not depends on a number of factors.

GCC makes compiling assembly slightly easier, because it can be done in one command.  Compiling without gcc requires two commands.  That said, if this is a problem, it is probably better to use a makefile anyway, which eliminates this trivial inconvenience.  GCC also gives you access to C functions, like printf(), malloc(), and others.  Avoiding the need to rewrite existing C functions can be very valuable, saving a great deal of time.

GCC adds some startup code to your binary.  In reality, all programs start at the _start label.  This label tells the linker that this is where your program should begin execution.  GCC adds its own _start section to our programs.  After some startup code, _start calls the main() function, which is why our assembly programs need it.  GCC's startup code increases the size of our binaries, and it increases the time required to start our programs.  This time is short, but it can still make a difference.  Both the time and size can be a big deal for embedded devices.  Also, while being able to use C functions can save substantial amounts of time, optimization is often an important reason for using assembly, and C's built in functions are not optimal for most applications.  C functions are designed for the most generic case possible.  This means that they are big and full of features.  When size or speed optimization is important, it is often better to write application specific versions of these functions yourself.  In that case, it may become entirely unnecessary to have access to C functions in your program at all.  Additionally, libraries for embedded devices may not have some C functions that are available for other platforms.

There is always a trade off.  If we abandon C, we have to talk to the operating system ourselves to get things done.  Things like I/O and memory allocation are protected by the operating system, for security and stability reasons.  If we want to talk to a peripheral device, we have to do it through the operating system.  If we want to allocate more memory, we have to ask the operating system for more memory.  C provides convenient wrappers for this, but without C, we won't have access to those.

We talk to the operating system through system calls.  We will discuss those more in later tutorials, but there is one system call that is essential when ditching gcc.  That is the exit system call.  Our program does not even have the ability to terminate itself.  We have to ask the operating system to do that for us.  This is done with the exit system call.

Here is a simple pure assembly program for you:

.text
.global _start
_start:
    mov r0, #10
    add r0, r0, #3

    mov r7, #1
    svc #0
Assuming this is saved as pure.s, you can compile it with the following pair of commands.
as pure.s -o pure.o
ld pure.o -o pure
 As with gcc compiled programs, the value in r0 is the return value or error code for the program, when it terminates.  This means you can run this and then echo the error code to make sure it works as expected.

The first part of the program should be pretty familiar by now.  The only difference here is that we are using _start instead of main().  Note that _start is not a function, so you cannot return from it with bx lr.  Attempting to do this will cause a segfault, because the address in lr is 0 when the program starts, making it a null pointer.  If you don't exit from the program, it will continue trying to execute code beyond the end of what you wrote, which will either be data in the data area or a bunch of null instructions (empty space full of zeros).  Eventually it will get to the end of allocated memory and segfault, or it will interpret some data as an invalid instruction and crash.

The last two lines of the program are a system call.  r7 is used by Linux to pass the constant for the system call you want.  In this case, we want the exit system call, which is number 1.  The last line, svc #0, is the system call instruction.  It turns over control to the operating system, which checks the value in r7, and performs the requested operation, if the user has permissions to do so.  In this case, it terminates the program.


(Side note: r0 is not some magic register for returning the error code of your program.  Technically, it is just the argument for the exit system call.  System call arguments work just like those for normal functions, starting at r0.  Unlike normal functions, Linux system calls can have up to 6 arguments in registers, starting at r0 and ending at r5.  The exit system call takes one argument, which is the exit code for the program.  C happens to use the return value from main() as the argument for its own call to exit.)

No comments:

Post a Comment