Monday, June 20, 2016

Video Game Development: Efficiency Part 2

Memory Profiling

We won't go deep into this subject, but it is good to be familiar with the idea.  Memory profiling is a method of tracking and viewing the memory usage statistics of your program.  It can be used to track down memory leaks and to find parts of the program that are using more memory than expected.  In general, it is good to use some kind of process viewer to see how much memory your game is using, but if you find it using more memory than you expect, you will need to do some memory profiling to figure out where and why it is using so much memory.

There is a Python library called memory_profiler that can be installed with the command pip install -U memory_profiler.  (Python generally comes with pip, but under certain circumstances, you may find you need to install it yourself.)  You should also run pip install -U psutil, or memory_profiler may take an excessively long time calculating the results (it also may not be able to calculate some of the results).  How to use this module is beyond the scope of this article, but a quick Google search should offer more than enough documentation.

Data Structures

Perhaps the two most common data structures used in software development are the array and the linked list.  Understanding how these work can provide you with tools for customizing performance to your target platform.  An array has a static size, and it is fast for random access.  Because the size is static, if you need the size to be changed, you have to create a new, larger array and then copy all of the data, which is quite expensive.  Linked lists are slow for random access, and they use more memory than arrays, but they are dynamically sizable, and resizing is cheap.  Neither data structure is perfect for every situation, but arrays tend to be better when your list of data will not be increasing or decreasing often, and you need to be able to quickly access any element.  Linked lists tend to be better when the size of your list will be changing frequently and most of the access is sequential (like, in a for loop).  These are not the only data structures with this kind of tradeoff, though they are the most commonly used data structures in games.

Objects

Games seem like the perfect place to use objects, but there are some important things to realize about objects and efficiency.  To start with, objects are a meta-data type.  In other words, objects define new data types.  As with any data type, objects should be used when appropriate and avoided when not.  Common wisdom teaches that the best programming style for object oriented programs is to encase everything in an object.  This makes about as much sense as saying that every piece of data in a program should be stored in the form of a string, a char, or an integer, especially with reference to game programming.

Objects tend to cause caching issues.  Caching prefers data that is used sequentially, for example, an array of integers that is being processed in a for loop.  An array of integers where only every 5th element is being processed in the loop is far less efficient for caching than an array where every element is being accessed, because we are causing data to be loaded into the cache, but we are not using it.  Objects are the epitome of this problem.  A game entity object might contain the coordinates of the object, the facing of the object, the current action of the object, the destination of the object, and a pointer to the sprite that will represent the object on the screen.  A collision detection algorithm might only care about the coordinates and destination of the object, but if it loads an array of objects, the facing, current action, and sprite pointer will also all be loaded into cache.  That leaves maybe half of the data in cache actually being used.  The renderer will only care about location and sprite, so again, we have cache coherency issues slowing down our program.  If, instead, we stored these entities in 5 arrays, one for each element of the object, only the appropriate data would be loaded into cache, which could significantly improve performance (at the cost of making the program a bit more difficult to understand).

Objects also use additional memory.  A simple struct holding the same data will only have the data overhead.  Depending on the language, an object may be storing type information, class meta-data, and some kind of function table (a VTABLE, in C++).  In some languages, objects have a selection of inherently built in data and functions that can make objects quite expensive.  In addition, looking up functions from a VTABLE requires some pointer lookups, which make using objects more expensive for the CPU as well.  Overall, objects are just an expensive way of making programming resemble the real world a bit more, but the cost is that it makes it harder for the computer to understand, which results in inefficiencies.

In some cases, objects can actually be good for performance.  If most accesses to an object use all of the data contained in the object, objects can actually improve cache coherency.  There may still be higher function call overhead, but that is generally less expensive than cache coherency problems.

Objects can be bad for performance, and this is a more serious problem for games.  Excessive use of objects, or using objects where it is not appropriate can come at a big cost.

In general, and especially in games, you should only use objects where they make sense.  Objects should not be used as containers for a bunch of random information.  They should encapsulate closely related data and functions used to manipulate that data.  They should also avoid being bottlenecks in your game.  Some inefficiency can be tolerated, if it is not somewhere that is used so frequently that it becomes a problem.  When dealing with parts of the game that are used frequently though, it is important to consider things like cache coherency and how the use of objects will affect memory usage and performance, and avoid using objects entirely where they don't make sense.

Compiler Optimization and Programmer Intent

One of the most valuable functions of a modern compiler is optimizing the machine code it produces.  A well designed compiler can do things like taking a short loop with a predictable size and unroll it into a sequence of instructions without any branches.  Another optimization is putting the code for small functions directly into the code where it is used.  Because branching is expensive, these reductions in branching improve the performance of the code without making it any less reliable.  When a compiler does this, it is determining the intent of the programmer by the code that it is given, and it is producing different but equivalent code that is more efficient.  Compiler optimization is an essential tool in modern application development, but it still has its limitations.

Compiler optimization is all about programmer intent.  If the compiler can figure out what the programmer wants, then how to do it becomes very flexible.  Unfortunately, imperative programming languages are all about the how, and determining what the programmer wants based on how the programmer says to do it is very difficult.  Declarative programming languages, including functional languages, focus far more on the what than the how, which allows for far more optimization, however programming large games requires a familiarity with the programming paradigm that very few programmers ever get with non-imperative languages.  So, unless you want to learn to write games in languages like Haskell (which is not only possible, but honestly not that hard once you gain a bit of experience), it is important to understand what assumptions a compiler can and cannot make about the intent of your code.

Unless you are using a functional language, it is rarely profitable to put a lot of effort into giving your compiler information about your intent.  C99 introduced the "restrict" keyword into the C language to tell the compiler that certain kinds of pointer ambiguity do not exist, which allows it to optimize more.  The "volatile" keyword in C and C++ exists to allow the compiler to optimize certain pointer operations by default, while still allowing the programmer to tell the compiler when not to optimize.  Beyond this though, the build in compiler optimization is generally enough, and when it is not, you have to optimize in assembly, find another solution to the problem, or just deal with it.  There are some valuable things to keep in mind that can help avoid making it difficult for the compiler to optimize.

One of the biggest hurdles to optimization is the unnecessary use of objects.  By encapsulating unlike but related data, you force the compiler to group that data together in memory, and you tell the compiler than you intend on using that information as a group.  When your actual usage patterns do not fit this assumption and do not work well with grouping dissimilar data, you put the compiler in a position where it will misinterpret your intent, resulting in significantly worse performance.  Even when encapsulation seems very appropriate, it can be the source of a lot of performance problems, because it sends the compiler mixed messages, and the compiler essentially has to guess.

Sometimes built in language features can help you provide information about intent, but it is also important to understand that built in features are often so generic that they are not optimal for any specific application.  The cout function in C++ is a great example of this.  It can do anything you need, but because you do not ever need everything it does, all of that extra functionality is wasted cost.  On the other hand, Python has a feature called a list comprehension that not only reduces what would normally be a for loop generating a list by appending items to the end of it into a single line statement, it also tells the compiler that your only intent is to create a list.  This allows for much more optimization, and the result is that list comprehensions are almost always faster than regular for loops for generating lists.

Because imperative language compilers optimize by trying to determine the programmer's intent, it is good practice to include as much intent information as you can.  Unfortunately, the amount of intent you can include is very limited and can cost far more time than it is worth.  Sometimes the best way to optimize is to write up some simple code for different ways of solving the problem and then performance testing them.

Bottlenecks

Bottlenecks have been mentioned several times so far, but what are they?  Every program has places where it spends a lot of time.  In a video game, this might be the main update function, the graphics rendering function, and the event handler.  These are bottlenecks, because if one of these things is slow, it slows down the whole program.  Specific operations can also be bottlenecks, if they happen a lot.  In general, anything that needs to happen every single game loop is a bottleneck, and if a bottleneck is slow, it does not matter how fast the rest of your program runs, your program will be slow.  This means that bottlenecks are the most important places for optimization.

The first major bottleneck is memory access.  Things like allocating memory, looking up pointers, and even reading or writing memory take a long time.  Avoiding unnecessary memory access can be hard, but with modern processors often having to wait for up to 200 cycles for a memory access, this is definitely a good place to consider optimization.

In a major loop that happens a lot, poorly optimized code can be a significant bottleneck.  If your program spends 50% of its time in a single function and the code in that function takes twice as long as it should, your program is wasting 25% of its time on poorly optimized code.

I/O is a major bottleneck as well.  Graphics I/O not only requires a lot of memory access, it also requires even slower communication with the graphics card.  Since rendering needs to happen at a very high rate, the cost of this bottleneck can be very high.

Some bottlenecks can be optimized away.  For things like I/O that cannot, concurrency is a possible solution.  Concurrency means running two different tasks at the same time, so that one task can keep running while another task is waiting for a slow operation.  Concurrency comes with its own costs, and it can sometimes be its own bottleneck, but a well designed game can put the rendering in a separate thread to eliminate it as a bottleneck for the rest of the game engine.

When optimizing, bottlenecks should always be the biggest focus.  Optimizing a function that takes only 1% of the execution time has almost no effect on performance, even if you optimize it by 90%.  Optimizing a function that takes 50% of the execution time can make a difference, even if you only optimize it by 10%.  It can be tempting to try to optimize everything, but the best strategy is to only optimize where it makes the greatest difference, and this is where the bottlenecks are.

Multitasking

Perhaps one of the biggest excuses in software development for lazy optimization is the enormous amount of resources available in modern computers.  Try telling a programmer he or she should be using a short instead of and int, and you will get a lecture about how modern computers have so much memory that will never even matter.  In some cases this may be true, but it is good practice to avoid flagrant waste.  Likewise, inefficient algorithms are quite common, and they are often written by people who believe that modern processors can handle anything, so it does not matter.  If the sort takes 0.002 seconds instead of 0.001, no one is ever going to notice right?  Unfortunately, this idea that modern computers have unlimited resources is just not true.

It is fairly common now to find computers with 8GB of ram.  Most of the offerings in stores have at least that much memory, and even most online offerings have plenty of memory.  So, if you are making a new game, and you are fine with limiting your market to people with 8GB of memory, you can use most of that right?  After the OS, which uses maybe 1GB or less if it is Linux or between 1GB and 2GB if it is Windows, your game should be fine using 6GB to 6.5GB.  Admittedly, most games are not up to this point yet, but it does not matter.  For an immersive, full screen game, you can probably expect to have around 3GB to 4GB of that 8GB.  Why?  What else is your customer using on their computer at the same time?  Almost no one, not even most gamers, have absolutely nothing running in the background while playing.  For games like World of Warcraft, Minecraft, and Terraria, which are all fairly immersive, full screen games (though, not always played literally full screen), the typical player also has a browser opened in the background, and it is pretty routine for modern browsers to use about half a GB of memory just for the browser and an empty tab.  If you have one or two pages open, with information about the game you are playing, you can expect closer to 1GB, and don't forget, the OS is still probably taking between 1GB and 2GB.  Few gamers have only the tabs they need opened though.  They might have a tab for email, a tab for Facebook, and one or two tabs for other things, which leaves you with even less memory.  Even if your game style does not lend itself to this kind of play, you should at least expect anti-virus software to be using some resources, and maybe things like a video driver control tool, a game launcher program (Steam or Battle.net launcher), a control tool for a touch pad, and any number of other things.  More casual gamers might even leave very heavy weight programs like MS Office open in the background, so they don't have to wait for it to start when they go back to what they were doing before.  In other words, your customer's computer is not your game's playground.  It is a place where your game is sharing resources with a lot of other things.

Processor power is harder to measure, but it is equally important to play nicely.  If your game never sleeps, you will likely end up consuming all of the processor power of at least one core.  If your game is multithreaded, you could consume one core for each thread.  Even a simple program that just loops repeatedly, waiting for an event can slow down your customers' computers.  Most programming languages have some kind of sleep function, and pretty much all game libraries have their own sleep function that is better.  These sleep functions are not merely loops that keep checking if enough time has passed.  They actually give up control to the OS, while requesting the OS to awaken them once the specified time has passed.  This allows other programs (including OS processes) to use the processor while the game is waiting (for example, for the next frame).  Just as with memory, your customers' computers almost certainly have other programs running in the background that need CPU time, and often your game is relying on some of these (which means that if you don't give up unneeded CPU time, it may, ironically, cause your game to lag and perform poorly).  Don't forget that your game will always have to play nice with other processes, unless it is the only thing running on an embedded system.

Another concern with attitude that every modern computer has plenty of resources is that it is a lie.  A vast majority of modern computers have less than 4GB of memory, and most of them have very weak processors compared to what you might have in your laptop or desktop.  I am talking about mobile devices, like smart phones and tablets.  Not only do these devices typically have the resources you might expect to find on a Pentium 2 or Pentium 3, they also tend to have a lot more things running in the background.  Most mobile device users do not realize that they can close programs, and even those that do often don't realize that it is a good idea to do so.  There is no way to get around the problem where a user has so many open applications that your game cannot run, but you can put effort into making sure your game is not squandering resources unnecessarily.

Writing games with the expectation of having exclusive access to all system resources will result in failure.  If you write a game that can take up to 8GB of memory to run, you should not expect it to run well for your customers on less than 12GB or 16GB of memory.  If your game performs fine on an average processor with nothing else running, don't expect your users to get good performance without a fairly high end processor.  If you really want to do it right though, instead of doing tests on a clean system running only the OS, try running it with all of the applications you normally have opened when working.  A high end IDE might be comparable to having a few MS Office documents open.  Leave your browser open with all of tabs you are currently using (if you only have a few, open 3 or 4 more).  Make sure your anti-virus program is doing whatever it does by default.  If your game runs perfectly fine under these conditions, you can probably expect anyone with an equally powerful computer to also run it fine.

The important part to remember is that modern computer users multitask, and if it is even possible that the user could be running something else at the same time as the game, you should assume that your game will be just one program in the ecosystem that has to get along and share with many others.


Efficiency has largely been thrown aside in software development and even much of computer science education, but it is no less important than it has been in the past.  As computers have become more powerful, programs have also become more hungry and users have increased their multitasking.  In the past, software developers and especially game designers have been lucky, because even though resources were far more limited, at least they could rely on the fact that all available resources were fair game.  Modern computers are more like a habitat for many programs inhabit at the same time.  If your programs don't play nice with everything else there, then you will find it hard to get and keep customers.  Instead of looking at 16GB of ram and thinking that nothing could ever use it all, consider how many other programs will running, each taking a chunk of that.  Playing nice may not make your game successful, but no one wants to play a game that requires them to shut everything else down just so it will run smoothly.

No comments:

Post a Comment