Wednesday, April 26, 2017

ARM Assembly: Ditching GCC

We started this using gcc to compile our assembly programs.  There are significant benefits to doing this, but it also comes with some costs.  In addition, the greatest value in programming in assembly is optimization, and if you are using a lot of C functions in your code, it defeats this point.  The choice to use gcc or not depends on a number of factors.

GCC makes compiling assembly slightly easier, because it can be done in one command.  Compiling without gcc requires two commands.  That said, if this is a problem, it is probably better to use a makefile anyway, which eliminates this trivial inconvenience.  GCC also gives you access to C functions, like printf(), malloc(), and others.  Avoiding the need to rewrite existing C functions can be very valuable, saving a great deal of time.

GCC adds some startup code to your binary.  In reality, all programs start at the _start label.  This label tells the linker that this is where your program should begin execution.  GCC adds its own _start section to our programs.  After some startup code, _start calls the main() function, which is why our assembly programs need it.  GCC's startup code increases the size of our binaries, and it increases the time required to start our programs.  This time is short, but it can still make a difference.  Both the time and size can be a big deal for embedded devices.  Also, while being able to use C functions can save substantial amounts of time, optimization is often an important reason for using assembly, and C's built in functions are not optimal for most applications.  C functions are designed for the most generic case possible.  This means that they are big and full of features.  When size or speed optimization is important, it is often better to write application specific versions of these functions yourself.  In that case, it may become entirely unnecessary to have access to C functions in your program at all.  Additionally, libraries for embedded devices may not have some C functions that are available for other platforms.

There is always a trade off.  If we abandon C, we have to talk to the operating system ourselves to get things done.  Things like I/O and memory allocation are protected by the operating system, for security and stability reasons.  If we want to talk to a peripheral device, we have to do it through the operating system.  If we want to allocate more memory, we have to ask the operating system for more memory.  C provides convenient wrappers for this, but without C, we won't have access to those.

We talk to the operating system through system calls.  We will discuss those more in later tutorials, but there is one system call that is essential when ditching gcc.  That is the exit system call.  Our program does not even have the ability to terminate itself.  We have to ask the operating system to do that for us.  This is done with the exit system call.

Here is a simple pure assembly program for you:

.text
.global _start
_start:
    mov r0, #10
    add r0, r0, #3

    mov r7, #1
    svc #0
Assuming this is saved as pure.s, you can compile it with the following pair of commands.
as pure.s -o pure.o
ld pure.o -o pure
 As with gcc compiled programs, the value in r0 is the return value or error code for the program, when it terminates.  This means you can run this and then echo the error code to make sure it works as expected.

The first part of the program should be pretty familiar by now.  The only difference here is that we are using _start instead of main().  Note that _start is not a function, so you cannot return from it with bx lr.  Attempting to do this will cause a segfault, because the address in lr is 0 when the program starts, making it a null pointer.  If you don't exit from the program, it will continue trying to execute code beyond the end of what you wrote, which will either be data in the data area or a bunch of null instructions (empty space full of zeros).  Eventually it will get to the end of allocated memory and segfault, or it will interpret some data as an invalid instruction and crash.

The last two lines of the program are a system call.  r7 is used by Linux to pass the constant for the system call you want.  In this case, we want the exit system call, which is number 1.  The last line, svc #0, is the system call instruction.  It turns over control to the operating system, which checks the value in r7, and performs the requested operation, if the user has permissions to do so.  In this case, it terminates the program.


(Side note: r0 is not some magic register for returning the error code of your program.  Technically, it is just the argument for the exit system call.  System call arguments work just like those for normal functions, starting at r0.  Unlike normal functions, Linux system calls can have up to 6 arguments in registers, starting at r0 and ending at r5.  The exit system call takes one argument, which is the exit code for the program.  C happens to use the return value from main() as the argument for its own call to exit.)

Tuesday, April 25, 2017

ARM Assembly: Integer Division

Division in discussed briefly in a previous post, but no example is given.  Originally, I left figuring out division to the reader, but I forgot something critical.  So, we are going to look at integer division separately.

ARM is an architecture that has advanced and evolved over time.  Some of the things you have already learned did not exist in early versions of ARM.  Because division is expensive, in the literal cost of transistors on the chip, it has commonly been left out of many architectures.  Division is often implemented in software, as successive subtractions, long division, or using so called "magic numbers".  The ARMv6 architecture used on the original Raspberry Pi is one architecture that does not have an integer division instruction.  In fact, even ARMv7-A (the architecture used for the original Pi 2) does not have an integer division instruction.  The Cortex-A7 version, however, has something called Virtual Extensions, which adds two integer division instructions.  The original Pi 2 happens to have a Cortex-A7 version of the ARMv7-A architecture, which means that it does have these instructions.

There is still a problem.  If you attempt to compile a program using the udiv or sdiv instructions, you will likely get an error message saying that this processor does not support the instruction.  This is because gcc and as compile for ARMv6 by default, to ensure compatibility with older Pi versions.  (Note that this means that if you do want your program to run on the older Pis, you will have to use some other method for doing division.)  To compile a program using these instructions, you must tell the compiler that you want to compile for the ARMv7-A Cortex-A7 processor architecture.  This is done with directives.

Before I demonstrate the directives, let's briefly talk about the division instructions.  If you have done the basic math tutorial, you may recall that these instructions only take registers, instead of an optional immediate value for the third argument.  This means that the numerator (dividend) and the denominator (divisor) must both be placed in registers.
.arch armv7-a
.cpu cortex-a7

.text
.global main
main:
    mov r1, #12
    mov r2, #2
    udiv r0, r1, r2

    bx lr
The first thing you will notice is the two new directives at the top.  The first tells the assembler what architecture you are compiling for.  The second tells it the specific ARM core you are compiling for.  The rest of the program is trivial.  It puts two values into registers, then it does unsigned division on them.  If you compile this, run it, and echo the error code, it will be 6 (12 divided by 2).  Try changing the values to see what behavior you get when the dividend is not evenly divisible by the divisor.

The directives used to tell the compiler what architecture to compile for can be very valuable, but they should also be treated with care.  When you use instructions that are limited to a particular architecture, you limit what your program can run on.  For the Raspberry Pi, the consequences are very clear cut, and it is easy to make a decision.  If you want your program to run on older Pis, you can keep the default.  If you don't need it to run on older Pis, you can compile for ARMv7, using instructions that are not available in older versions of ARM.  Note that some instructions also have multiple encodings, and they may not all be available on all architectures.  This means, even if you don't use ARMv7 specific instructions, using these directives may produce code that won't run on ARMv6.  ARM processors are typically backward compatible (at least for a few versions back), so if you are using a more recent Pi 2 or a Pi 3, with ARMv8 processors, compiling for ARMv7 will be fine.  If you were writing assembly for wider range of devices though, it would be less clear what those devices can support.  If we were writing Intel assembly to run on Intel processors, we might expect some customers to run our code on Pentium 4 computers or perhaps even Pentium 3s.  Maybe someone would eventually want to try it on a Pentium 1 even.  We would need to decide what we are willing to support.  If the odds of someone needing to run it on a Pentium 4 or lower are very low, we could compile for the Core 2 architecture, using more advanced instructions than we would have access to for a Pentium 1 processor.  We are lucky we are doing this on the Pi, where things are clear cut, but in real life applications, we cannot always just compile for the architecture we are developing on, or we might severely limit our market.  Sometimes it may be better to forego convenient instructions to allow clients with older machines access to our software.

There are a few more directives we will look at later, but you might find it valuable to know the correct directives for the ARMv8 processors currently shipping on new Pi 2s as well as on Pi 3s.
.arch armv8-a
.arch_extension crc
.cpu cortex-a53
There is one additional directive here, which you will probably never need.  The Pi 3 (and new Pi 2) processor has a CRC extension that adds a few instructions (that will not be covered in this series).  You can look up these instructions in the ARMv8 manual, if you are interested.  If you do use them, you will need the .arch_extension directive above.

Later on, we will add another directive for the floating point unit, but for now, these ARMv7 and ARMv8 ones should be sufficient.

Monday, April 17, 2017

Rasperry Pi SSH Over Bluetooth

I have been trying to do this for years, always failing.  In theory, it is possible to setup a network, called a Personal Area Network, using Bluetooth.  Once a PAN is setup, it should be easy to SSH into a machine on the network that is running an SSH server, for example a Raspberry Pi.  This is a very desirable situation where the only wireless network does not allow computers on it to see each other, for example a public network at a college.  In the past, my solution was using an ethernet cable to setup an ad hoc network, which works but can be very inconvenient.  I recently got a Raspberry Pi 3, and after several hours of research, I finally got it working!

The main problem I have run into with this is that most instructions for using Bluetooth on the Pi are geared toward connecting to Bluetooth devices offering a service, for example Bluetooth speakers or a Bluetooth keyboard.  It should not be terribly difficult to setup a Bluetooth PAN this way, using a Windows or Mac computer as the service provider and the Pi as a client.  There are a few problems with this though.  The first is that getting the Pi to log onto the computer typically requires access to the Pi.  In other words, you need either a monitor and keyboard or an SSH connection over some other network.  This defeats the purpose.  The second is that the Pi is offering the SSH service.  It does not make sense for the device offering the desired service to have to initiate the connection to the client.  This may seem trivial, but in practice it means that to automate the process you would have to run something on the Pi to constantly try to connect until it is successful, and that is wasteful and inefficient, not to mention error prone.

So, my goal was to setup a Raspberry Pi as a PAN server that my Windows laptop can connect to and then access the SSH service running on the Pi.

What you will need for this:

A Pi 3, or a Pi or Pi 2 with a USB Bluetooth dongle, and a laptop or desktop computer with Bluetooth (USB dongle or built in).  Don't get a cheap USB Bluetooth dongle from China.  It turns out that is probably why I have failed so often in the  last several years.  The $1 Bluetooth dongles I got from China don't seem to support the mode required to host a PAN network.  The Pi 3's built in Bluetooth works fine, and I have a more expensive USB Bluetooth dongle I got from Walmart 5 or 6 years ago that also works fine (on an old Pi B+, no less).

Boot up your Pi and log in.  You will need direct access for the setup, so either connect it to a keyboard and monitor, or SSH into it through other means.

Now,  we need to do a lot of things as root, so we are going to log into the root account.

    sudo su

From here on out, be very careful.  As root, you can do pretty much anything to your Pi, and Linux does not typically ask you to confirm when instructed to have destructive behavior.

Now update your Pi and install some packages you will need to continue (you will need internet access for this).
  
    apt-get update
    apt-get upgrade

    apt-get install bluetooth bluez-tools python-dev libsystemd-dev bridge-utils python-pip libdbus-1-dev libdbus-glib-1-dev
    pip install systemd-python

    pip install dbus-python

Each of those will take 30 seconds to a few minutes.

Next we need to create some files.  Each entry below will start with the file name surrounded by some dashes.  Below that is what the contents of the file should look like.  When you are done, you should have created four new files.

-----/etc/systemd/network/pan.netdev-----
[NetDev]
Name=pan
Kind=bridge
ForwardDelaySec=0



-----/etc/systemd/network/pan.network-----
[Match]
Name=pan

[Network]
Address=0.0.0.0/24
DHCPServer=yes
IPMasquerade=yes


-----/etc/systemd/system/pan.service-----
[Unit]
Description=Bluetooth Personal Area Network
After=bluetooth.service systemd-networkd.service
Requires=systemd-networkd.service
PartOf=bluetooth.service

[Service]
Type=notify
ExecStart=/usr/local/sbin/pan

[Install]
WantedBy=bluetooth.target


-----/usr/local/sbin/pan-----
#!/bin/sh
iptables -F
iptables -t nat -F
iptables -t mangle -F
iptables -t nat -A POSTROUTING -o eth0 -j MASQUERADE

exec /usr/local/sbin/bt-pan --systemd --debug server pan


Once you have created these four files, you need the make the final one executable.

    chmod +x /usr/local/sbin/pan

Now you will need to install a script that handles some communication with the Bluetooth system.

    wget https://raw.githubusercontent.com/mk-fg/fgtk/master/bt-pan -O /usr/local/sbin/bt-pan
    chmod +x /usr/local/sbin/bt-pan


Now we are on the home stretch.  The last thing is making it all start up when the Pi boots.  First we will start the Bluetooth PAN service.

    systemctl enable pan.service

Next we need to tell it to listen for incoming connections.  Before we do this, we need to find out the MAC address of the Bluetooth adapter of the client machine (in my case, my Windows laptop).  In Windows 10, you can find this by opening the Properties of the Bluetooth adapter in the Device Manager.  One of the tabs will list the address of the device.  For other operating systems, you should be able to consult Google to find instructions for finding the MAC address of your Bluetooth adapter.

Once you know the device address, you can continue.

    crontab -e

The first time you run this, it will ask what editor you want to use.  Pick the one you are most comfortable with.  At the end of the file, add the following line:

    @reboot echo -e 'power on\ndiscoverable on\nagent on\ntrust 00:00:00:00:00:00\nquit' | bluetoothctl

Replace the colon separated zeros with the address of your client Bluetooth adapter.  This will tell the Pi that it should allow connections from that device.  Without the correct address here, the Pi will reject connections over Bluetooth.  If you need more than one device to be able to connect, you should be able to add additional trust lines to this string, with the Bluetooth adapter MAC addresses for those devices.  (Each should start with \n, followed by trust, a space, and then a MAC address.)  Save the file to the default location and exit the editor.

Lastly, if you have not already enabled SSH on the Pi, you will need to do that.  Run raspi-config.  Choose option 5, Interfacing Options, and then pick SSH, which is P2 in that menu.  Enable it, then close the configuration program. 

Now, reboot the Pi!  (I run init 6, but there is also a reboot program you can run from the command line.)


From here, you will need to look up how to log into a PAN for the operating system on your client computer.  In Windows 10, you right click the Bluetooth icon in the system menu, add a Bluetooth device, and connect to your Raspberry Pi.  Once connected, you can right click the icon again, click Join a Personal Area Network, right click on the icon for the Pi in the window that opens, then click Access Point in the Connect Using menu.  The default IP address for the Pi seems to be 10.0.0.1, though I am looking into changing this, as it is commonly used in business networks and may make it difficult to connect to the Pi while on some wireless networks.

I have found that this is sometimes a bit unreliable.  I do not claim to be an expert on Bluetooth, and I am still learning, but I believe the Pi may reject the trust command if it cannot detect the device when it boots.  If you have issues, make sure the client computer is running before turning on the Pi, and make sure the Pi is reasonably close when it boots (Bluetooth has a very short range).  This will ensure that the Pi can detect the client computer when it runs the trust line, hopefully consistently allowing you to log into the PAN and SSH into the Pi.


I'll update this as I learn more.  There are still some challenges that I want to see if I can overcome to make things work more consistently and smoothly.  Some of this seems kind of hacky, but there is not enough information available on using Bluez5 to do this, so it is all I have right now.