Technium Adeptus: September 2014

Friday, September 26, 2014

C Programming: Singleton Design Pattern

The Singleton design pattern is a common pattern used in object oriented programming. To use the pattern, any constructors of the singleton object must be private. The user must not be able to create new instances of the object explicitly. In this design pattern, the class must only be able to have a single instance. Often this instance is created the first time it is requested, but it may be created at startup time, depending on the programming language. Subsequent requests will be provided with the already existing instance. This design pattern is primarily useful in languages that require object orientation, as a place to collect related global variables and functions, where only one instance of the collection should ever exist. It is less often used in languages that allow object orientation but do not enforce it. There are some cases, however, where it is useful regardless of the language.

One place where the Singleton design pattern is useful regardless of language is the case where a single instance of a global variable is necessary, but it is also necessary to limit how the user may interact with that variable. In 3D graphics, the main camera is one of these global variables. The camera can be stored as a pair of vectors, one representing "up" and the other representing the direction the camera is facing. The third vector, the facing of one of the sides, can easily be calculated from the other two. It is essential, however, that the "up" and "facing" vectors always be perpendicular to each other. If they ever become parallel, the third vector cannot be calculated, and the camera math starts to get zeros and infinities where they do not belong. This makes it impossible for the computer to render graphics that make sense. The Singleton pattern can be used to solve this problem. A single instance of a camera class can be made where the main camera is a private variable of the class. The setter for the camera can ensure that changes to the camera never allow invalid states. Further, methods can be added to the Singleton that allow the user to apply specific transformations to the camera, which removes the burden (and risk) of users trying to do the math for the transforms themselves.

In most cases, the Singleton design pattern is used to hold global things where the language does not provide a better option. In some cases though, this design pattern can be useful in its own right. A problem occurs when the benefits of this design pattern are necessary in a language that does not support object orientation. For example, the C programming language has no object orientation support, but embedded systems often have limited support for languages other than C (or assembly). This may not be true of all non-object oriented languages, but the Singleton design pattern is actually possible in C.

This C Programming series is going to discuss how to use object oriented principles in the C language. In most cases, it is probably a bad idea to use these principles if any other option is available, but in cases like embedded systems, where an object oriented language is not available, it may be necessary, or at least substantially more efficient, to use these principles. The remainder of this article will discuss using the Singleton pattern in C and demonstrate how it can be done.

In C, encapsulation and hiding sensitive data is generally considered impossible. Very basic encapsulation can be accomplished with structs, but the language does not have any built in mechanics for preventing a user from changing any variable that is in scope. This means that protecting a global variable in a getter/setting fashion is impossible. This leads to several difficulties. The first is that it is impossible to enforce data validation. A well designed library might offer setters and getters, but a user of the library might choose to go around them, accessing the variable directly. This puts the burden of correctness on the user, which has proven problematic enough to justify the wide adoption of private and protected variables in object oriented languages.

There is a simple way of making private global variables in external libraries. This is probably nothing new, and has likely been used in many C libraries that use internal state machines. It is not, however, often taught in computer science classes. In C, libraries are contained in separate files from the main program. Each library has at least one source code file as well as a header file. The header file exposes interfaces contained in the library to the program that is using the library. Global variables are exposed with an "extern" statement. If they are not exported, the main program does not even know they exist and thus cannot access them. This does not mean that they do not exist though. The library where the variables are declared can still access them. If this library has exposed functions that can change the hidden variables, then the main program can still access them indirectly. This technique can be used for functions as well. Following is some example code for a C library using what amounts to the Singleton design pattern.

private.c:

// This variable is subject to strict
// requirements.
int private_variable = 5;

// private_variable must be between 5 and 10
// inclusive. Invalid input will be ignored.
void set_private(int input) {
    if (input < 5 || input > 10)
        return;
    else
        private_variable = input;
}

// We don't want to expose the variable or its
// memory address, so we use a getter to return
// by value.
int get_private() {
    return private_variable;
}

private.h

// The hidden variable must be between 5 and 10
// inclusive. Invalid input will be ignored.
void set_private(int input);
int get_private();

The source code is pretty straight forward. It has a single global variable, a setter, and a getter. For some reason, it is necessary to restrict what the variable is allowed to be, so the setter handles that by ignoring invalid input. The header file is pretty straight forward as well. It exposes the two functions but not the variable. To expose the variable, "extern int private_variable;" could be added to the header file. Notice also that the comment in the header file does not name the variable. If this was distributed as a header and a precompiled object file, the user would not be able to figure out the name of the variable without searching though the object file for intelligible text and then guessing. If the header reveals the name of the variable though, an injudicious user might add an "extern" statement to the header to gain access. Of course, any user that goes to this effort deserves whatever problems it causes, but there is no reason to make it easy. Here is a driver program to test the library with.

main.c

#include <stdio.h>
#include "private.h"

void main() {
    printf("Private = %i\n", get_private());
    printf("Setting Private to 10\n");
    set_private(10);
    printf("Private = %i\n", get_private());
    printf("Setting Private to 30\n");
    set_private(30);
    printf("Private = %i\n", get_private());
    printf("Setting Private to 0\n");
    set_private(0);
    printf("Private = %i\n", get_private());
    printf("Setting Private to 7\n");
    set_private(7);
    printf("Private = %i\n", get_private());
}

Try adding some code to access private_variable directly. It will not compile. The main program does not even know that variable exists! It can still change and read the variable indirectly through the setter and getter functions though.

This is not all. Using this same technique, it is possible to put private functions in the library (perhaps for implementation hiding, or maybe just to keep the namespace uncluttered). Any function in the library can be called by other functions in the library, but they can only be called externally if the function prototype is included in the header file. This makes it easy to use the object oriented ideas behind private variables and functions in C. The library represents the object in this case, and the header file determines what is exposed and what is hidden.

This is not one of the object oriented principles that should be avoided if possible. This method of encapsulation and data protection is very straight forward. It is not prone to abuse or errors (and in fact, it is actually designed to reduce the potential for errors). Most of the rest of this series will discuss less stable and manageable techniques that should be used only when absolutely necessary.

Thursday, September 18, 2014

Object Oriented Programming

In my studies, work, and research, I have discovered some important things about Object Oriented Programming, Objects, and how each should be used. I want to examine some misconceptions and less well known facts about objects in programming.

Objects are an abstract data type. At the deepest level, an object is a highly flexible template for creating custom data types. This makes objects a data type of data types or a meta data type. Objects are far more than this though. Objects are an amalgam of useful ideas commonly used in programming and programming languages.

Simply put, objects are containers. Objects can contain data and functions. This last part seems pretty novel. An object is a data type that can contain functions. Further, when a contained function is called, it automatically knows which instance of the object it belongs to. These ideas seem very novel. It turns out that they are not.

Objects have some dirty secrets. Objects are hiding places for global variables. In some cases, like the Singleton design pattern, this is easy to see. In other cases it is not. Objects are also containers that often hide the passing of large argument sets to functions. When used properly, this does not usually cause problems, but it can easily hide massive coupling issues. Objects can easily hide poor programming practices, and some common uses for objects would be considered poor programming if they were done without objects.

It turns out that in many cases objects are unnecessary. Because objects have higher overhead than more primitive data types that can be used for the same things, it is important to know where objects will be beneficial and where they may be detrimental. In some cases, it is a matter of trade off between development time and performance, and a judgment call must be made. In many cases, however, objects are unnecessarily used in places where performance is harmed, but no benefits to development time are gained.

Now I want to look at some examples of gaining some of the benefits of objects without actually using objects. This can result in improved performance without sacrificing anything for it.

A few months ago, I was writing a C program where I needed to keep track of a camera in 3D space. It was important that the user be able to create new camera instances, but it was also important that a main camera exist. The main camera would be used for all graphics calculations, and the user could load different camera instances into the main camera. This design had several benefits. One was that the user did not have to pass a camera to the graphics functions every time they were called. Since the graphics functions would typically be called many times per video frame, the overhead of argument passing was an important bottle neck. The other benefit was that the camera required some internal consistency to work properly. The camera consisted of two vectors that were absolutely required to be perpendicular to each other. Allowing the user to directly modify these vectors would make the graphics functions prone to user error, and it would further put a burden on the programmer to ensure that any direct modifications would maintain the vectors properly. In an OOP paradigm, this is an easy problem. The main camera could be made a private member variable in a singleton object, and all access would be controlled with getters and setters. In C, however, objects are not supported. Instead I had to use a novel approach that turned out to be at least as easy as an object but with lower memory and argument passing overhead. The camera handling functions were already contained in a separate file from the main program (to facilitate reuse). So, I put a (global) struct instance for the camera in the .c file for the camera library, but I did not export it in the header file. All of the graphics functions that required a camera were contained in the .c file, so they had direct access to the main camera struct. The main program did not have access to it though. I added some getters and setters to the camera library to allow restricted access to the main camera. The result of this was that I used the Singleton design pattern and I even encapsulated the main camera data, all in a programming language that does not have any support for objects. It also improved program efficiency in several areas.

This experience lead me to another conclusion: Objects are syntactic sugar. In my instance, with the Singleton, I literally avoided passing arguments to frequently used functions. When using large numbers of the same object type, this is impossible. If there had been some benefit to having and using multiple cameras, and if it was normal to use a different camera for each call, passing cameras as arguments would have been more efficient, and in most uses of objects, this is the case. Objects contain references to functions associated with that object type. One benefit of using objects is that the compiler handles the problem of which object instance belongs to which function call. In C, I would have to pass the appropriate data each time I called a function, even if I contained the function references in a struct with the data. The benefit of objects does not, however, improve efficiency. Instead it hides the passing of the argument. This is called "syntactic sugar," because it makes the syntax shorter and faster to type, presumably without making it harder to understand. Syntactic sugar is typically good when it is done well, and in most OOP languages, it is done fairly well. It is important to understand that syntactic sugar does not actually affect program performance though.

Now, as I mentioned before, objects are a meta data type. They are a data type for defining new data types. In this way, they can be very useful. When used properly, they can make programs much easier to develop, read, and understand. Objects in programming are very valuable. High value, however, does not mean that it is appropriate to use objects exclusively. Imagine using structs exclusively in C. We could definitely "encapsulate" all of our functions and variables into structs. We could even define a secondary main function, contain it in a struct, and then call it from main as soon as the program starts (and, in fact, in Java and highly object oriented uses of C++, this pattern is highly recommended). The result would be extremely difficult to read and understand, it would waste substantial amounts of memory, and it would destroy the ability of the compiler to optimize memory usage for cache efficiency (this last one is a problem of all object oriented languages). There are some places where using objects just does not make sense. For some program types, encapsulating most things might work well. For others, it is a waste of time. Selective use of objects can optimize design time where necessary while allowing for optimized performance where it is important. It also turns out that for many tasks, the time spent writing the paper work for the object takes far more time than writing the executable code. In short, objects should not be used where they do not make logical sense. There is no benefit derived from using objects where they are unnecessary and make no sense. Using objects exclusively is like using structs, trees, or any other data structure exclusively. It might be a fun exercise for a challenge, but there is almost no practical application where it is appropriate.

Nearly all of the elements of object oriented programming can be used separately in most modern programming languages. Most newer languages already contain large amounts of syntactic sugar, and many older ones do as well (for instance, the ++ operator in C and C++ is syntactic sugar for simple incrementation). Encapsulation can be attained by grouping things by file (this may not literally make a variable private, but private variables themselves are artificially limited variables that the compiled program knows nothing about). Arrays, structs, tuples, dictionaries, and other data structures can be used to group data, and most languages have some tuple-like mechanic for grouping heterogeneous data. In cases where part of the object paradigm makes sense, but not all of it, it is often possible to get the benefits of the parts you need without actually using objects. This may not always make sense to do, but when it does, it is often better than paying full price for objects when you only need part of them.

Objects and Object Oriented Programming can be very useful in designing and writing applications, but they should never be treated as a complete programming style. Like any other data structure, objects have their place and in their place provide very valuable benefits. Again, like other data structures, overuse of objects results in programs that are inefficient and that do not make logical sense. Objects in programming were designed to model how we imagine real world objects to be. Computers do not think in objects though, and some parts of programming will never fit an object model. Trying to force those things into an object model will ultimately come with extra costs in time, money, and performance. OOP can be very valuable when used properly, but it can cost when it is overused or misused.