Thursday, September 18, 2014

Object Oriented Programming

In my studies, work, and research, I have discovered some important things about Object Oriented Programming, Objects, and how each should be used.  I want to examine some misconceptions and less well known facts about objects in programming.

Objects are an abstract data type.  At the deepest level, an object is a highly flexible template for creating custom data types.  This makes objects a data type of data types or a meta data type.  Objects are far more than this though.  Objects are an amalgam of useful ideas commonly used in programming and programming languages.

Simply put, objects are containers.  Objects can contain data and functions.  This last part seems pretty novel.  An object is a data type that can contain functions.  Further, when a contained function is called, it automatically knows which instance of the object it belongs to.  These ideas seem very novel.  It turns out that they are not.

Objects have some dirty secrets.  Objects are hiding places for global variables.  In some cases, like the Singleton design pattern, this is easy to see.  In other cases it is not.  Objects are also containers that often hide the passing of large argument sets to functions.  When used properly, this does not usually cause problems, but it can easily hide massive coupling issues.  Objects can easily hide poor programming practices, and some common uses for objects would be considered poor programming if they were done without objects.

It turns out that in many cases objects are unnecessary.  Because objects have higher overhead than more primitive data types that can be used for the same things, it is important to know where objects will be beneficial and where they may be detrimental.  In some cases, it is a matter of trade off between development time and performance, and a judgment call must be made.  In many cases, however, objects are unnecessarily used in places where performance is harmed, but no benefits to development time are gained.

Now I want to look at some examples of gaining some of the benefits of objects without actually using objects.  This can result in improved performance without sacrificing anything for it.

A few months ago, I was writing a C program where I needed to keep track of a camera in 3D space.  It was important that the user be able to create new camera instances, but it was also important that a main camera exist.  The main camera would be used for all graphics calculations, and the user could load different camera instances into the main camera.  This design had several benefits.  One was that the user did not have to pass a camera to the graphics functions every time they were called.  Since the graphics functions would typically be called many times per video frame, the overhead of argument passing was an important bottle neck.  The other benefit was that the camera required some internal consistency to work properly.  The camera consisted of two vectors that were absolutely required to be perpendicular to each other.  Allowing the user to directly modify these vectors would make the graphics functions prone to user error, and it would further put a burden on the programmer to ensure that any direct modifications would maintain the vectors properly.  In an OOP paradigm, this is an easy problem.  The main camera could be made a private member variable in a singleton object, and all access would be controlled with getters and setters.  In C, however, objects are not supported.  Instead I had to use a novel approach that turned out to be at least as easy as an object but with lower memory and argument passing overhead.  The camera handling functions were already contained in a separate file from the main program (to facilitate reuse).  So, I put a (global) struct instance for the camera in the .c file for the camera library, but I did not export it in the header file.  All of the graphics functions that required a camera were contained in the .c file, so they had direct access to the main camera struct.  The main program did not have access to it though.  I added some getters and setters to the camera library to allow restricted access to the main camera.  The result of this was that I used the Singleton design pattern and I even encapsulated the main camera data, all in a programming language that does not have any support for objects.  It also improved program efficiency in several areas.

This experience lead me to another conclusion: Objects are syntactic sugar.  In my instance, with the Singleton, I literally avoided passing arguments to frequently used functions.  When using large numbers of the same object type, this is impossible.  If there had been some benefit to having and using multiple cameras, and if it was normal to use a different camera for each call, passing cameras as arguments would have been more efficient, and in most uses of objects, this is the case.  Objects contain references to functions associated with that object type.  One benefit of using objects is that the compiler handles the problem of which object instance belongs to which function call.  In C, I would have to pass the appropriate data each time I called a function, even if I contained the function references in a struct with the data.  The benefit of objects does not, however, improve efficiency.  Instead it hides the passing of the argument.  This is called "syntactic sugar," because it makes the syntax shorter and faster to type, presumably without making it harder to understand.  Syntactic sugar is typically good when it is done well, and in most OOP languages, it is done fairly well.  It is important to understand that syntactic sugar does not actually affect program performance though.

Now, as I mentioned before, objects are a meta data type.  They are a data type for defining new data types.  In this way, they can be very useful.  When used properly, they can make programs much easier to develop, read, and understand.  Objects in programming are very valuable.  High value, however, does not mean that it is appropriate to use objects exclusively.  Imagine using structs exclusively in C.  We could definitely "encapsulate" all of our functions and variables into structs.  We could even define a secondary main function, contain it in a struct, and then call it from main as soon as the program starts (and, in fact, in Java and highly object oriented uses of C++, this pattern is highly recommended).  The result would be extremely difficult to read and understand, it would waste substantial amounts of memory, and it would destroy the ability of the compiler to optimize memory usage for cache efficiency (this last one is a problem of all object oriented languages).  There are some places where using objects just does not make sense.  For some program types, encapsulating most things might work well.  For others, it is a waste of time.  Selective use of objects can optimize design time where necessary while allowing for optimized performance where it is important.  It also turns out that for many tasks, the time spent writing the paper work for the object takes far more time than writing the executable code.  In short, objects should not be used where they do not make logical sense.  There is no benefit derived from using objects where they are unnecessary and make no sense.  Using objects exclusively is like using structs, trees, or any other data structure exclusively.  It might be a fun exercise for a challenge, but there is almost no practical application where it is appropriate.

Nearly all of the elements of object oriented programming can be used separately in most modern programming languages.  Most newer languages already contain large amounts of syntactic sugar, and many older ones do as well (for instance, the ++ operator in C and C++ is syntactic sugar for simple incrementation).  Encapsulation can be attained by grouping things by file (this may not literally make a variable private, but private variables themselves are artificially limited variables that the compiled program knows nothing about).  Arrays, structs, tuples, dictionaries, and other data structures can be used to group data, and most languages have some tuple-like mechanic for grouping heterogeneous data.  In cases where part of the object paradigm makes sense, but not all of it, it is often possible to get the benefits of the parts you need without actually using objects.  This may not always make sense to do, but when it does, it is often better than paying full price for objects when you only need part of them.

Objects and Object Oriented Programming can be very useful in designing and writing applications, but they should never be treated as a complete programming style.  Like any other data structure, objects have their place and in their place provide very valuable benefits.  Again, like other data structures, overuse of objects results in programs that are inefficient and that do not make logical sense.  Objects in programming were designed to model how we imagine real world objects to be.  Computers do not think in objects though, and some parts of programming will never fit an object model.  Trying to force those things into an object model will ultimately come with extra costs in time, money, and performance.  OOP can be very valuable when used properly, but it can cost when it is overused or misused.

3 comments:

  1. One place where objects are never appropriate is the program entry point. Many languages (Python and QBasic, for instance) set the entry point at the first executable line of code that is not contained within any function (indeed, in a compiled program, the entry point is the first instruction of executable machine code). Some languages (like C and C++) have a specially named function as the entry point. Purely object oriented languages, however, cannot do either of these. These languages (most notably Java) must have an entry object that contains an entry function.

    Besides the fact that a trivial program like "Hello world!" requires many lines of bookkeeping code to create a new object and function in such languages, this also breaks the object paradigm. A program entry point is, by definition, the first line of code executed. In Java, the documented entry point is the main() method in the primary object for the program. This is a lie though.

    Here is the problem: Before execution can start, the primary class must be loaded. A purely OOP language must treat classes as objects. Classes, as objects, must have constructors that are called when the class is loaded. In Java, this constructor is called the "free floating static block," and it runs before main(). Java documentation may claim that main() is the entry point, but it clearly is not.

    Here is how to do it: Define a "free floating static block" in the primary class (see the source listed at the end for an example), and put code in it. You could run the entire program in there. Java allows files without entry points to be compiled, on the assumption that those files define auxiliary objects that will be used by the main program. Once this program is compiled, it can be run. It will crash, saying that there is no main() method, but not before it runs the code in the constructor. In a normal Java program, the free floating static block is empty and does nothing. If there is one defined though, it will run before the interpreter even looks for main(). If the constructor does not return until the program is finished running, the interpreter will not even notice the error until the program is finished. If the program exits without returning (System.exit(0)), it will never try to run main(), and no error will be thrown.

    Maybe there is a better design for this? Perhaps the primary object should not be allowed to have a free floating static block. Of course, this would violate OOP principles. Every object (and Java treats classes as objects) must have a constructor, even if it is empty. Maybe the constructor could be the entry point. This causes other problems though. The constructor's job is to initialize the object. Combining it with the program entry point violates principles of modularity (which are the basis of OOP). Ultimately, putting the entry point in an object at all turns the object into little more than excessive amounts of paperwork to accomplish a normally trivial task.

    The only viable conclusion is that the program entry point should not be contained within an object. Even containing the entry point in a function is unnecessary, though it does make the entry point extremely unambigious for the compiler and those who might maintain the code in the future. Putting the program entry point in a place that requires other code to run before it is accessible, however, is both prone to abuse and a violation of the definition of an entry point. (Just for the record, I am now tempted to write a Java program that does not contain a main() method, to show how abuse of this "feature" can make a program less maintainable and harder to read. If I can overcome my distaste for Java, maybe I will write an article demonstrating this.)


    This is where I discovered this:
    http://stackoverflow.com/a/3161298

    ReplyDelete
    Replies
    1. There is one place this might be useful. In auxiliary objects, the free floating static block could output a more specialized error message (similar to a "usage" message in a command line program that requires arguments; it could also output copyright and version information). It could also be used for unit testing, but integrating your unit testing code into the program code (especially in a compiled language) is considered bad form (it leads to unnecessary bloat that the end user does not need).

      Delete
    2. It has been pointed out to me that Java 8 checks for the main() method before it actually loads the class, so the free floating static block is not run in the entry class if main() does not exist.

      Of course, it still stands that in a purely OOP language, some principal of OOP must be violated for an entry point to exist.

      Delete