Thursday, June 9, 2016

Video Game Development: Random Number Generation

Random number generation is an important part of most games.  In tabletop games, it supplies an element of chance that can keep the game fresh and give less skilled players a chance at winning.  In video games, it plays many important roles.


Randomness in Video Games

Random number generation is used in many places in video games.  It is often used to add an element of chance.  It is also used to provide variation in game play.  The most common uses of random numbers in video games are in combat and loot generation.  A very common pattern for video game combat is for a unit or weapon to deal a certain set amount of damage along with an additional random amount of damage.  For example, a sword might deal 4 damage plus 0-4 additional damage (giving it a damage range of 4 to 8).  Loot generation is probably the most well known use of random numbers in video games.  It has become a common cliche for gamers to blame the RNG (random number generation) when they have a hard time obtaining an item that is supposed to be common (or at least, more common than their experience seems to suggest).  Many games use random numbers for far more, however.  Map generation is a common use of random numbers that tends to increase replay value.  Many games use random numbers in character generation, and some even use random numbers to customize how characters (including NPCs and mobs) look and act.  Understanding the process of random number generation can be very valuable in crafting the experience you want your game to provide.


Pseudo-random Weakness

First, I want to share an experience I had many years ago.  I had recently taught a friend some QBasic, and he called me up one day to ask if I could help him figure out why his program was not working.  He explained that he had written a program to roll characters for Dungeons & Dragons.  The traditional mechanic for rolling D&D characters is to roll 3 6-sided dice, and sum the results, for each of 6 character stats.  He had written a program that would do this.  He was looking for something specific though.  He wanted to roll stats suitable for a Paladin character, and the version of D&D he was using had strict requirements.  He told me he wanted a character where each stat was at least 16.  After rolling a character, his program would test the stats.  If they did not meet this requirement, the stats would be discarded and the program would try again.  When he called me, his program had been running for well over 24 hours, and it had not generated a single suitable character.  The computer he was using was a Pentium (if I recall correctly).  He read the simple program off to me, and I typed it into my 486.  I promised to do some testing and get back to him.

The result of this was quite amusing.  First, I ran the program on my computer.  After 24 hours, it had not produced any results for me either.  While I waited, I decided to do the math to figure out the probability of rolling 6 * 3d6 and getting all 16s or higher.  Following is that math:

First, there are 216 different possible rolls on 3d6 (3 six sided dice).

Second, there are 3 ways of getting 16 ([5, 5, 6], [5, 6, 5], and [6, 5, 5]).  There are 3 ways of getting 17 (3 combinations of [5, 6, 6]).  There is only one way of getting 18.  So, with 3d6, there are 7 ways out of 216 possibilities of rolling 16 or higher.

Third, since we are rolling 6 times, the odds of getting all 16 or higher are (7/216)^6.

The math for this is fairly simple: (7/216)^6 = 0.0324^6 = 0.0000000011584.  This comes out to about 1/863,245,388.

The problem was not the program.  The problem was that the probability of getting the desired character was close to 1 in one billion.  My friend's Pentium and my 486 could not roll fast enough to have even small odds of getting that roll, even given days or weeks to try.

It turns out, however, that a second problem may have also been a factor. This is a common problem for an pseudo-random number generators.  Computers don't generally generate truly random numbers.  Instead they start with a somewhat random seed (the system timer is a common seed, and some people like to create a pointer and use the memory address of that pointer as the seed), and then they use a deterministic (ie, non-random) algorithm to generate a seemingly random series of values.  There are certain requires a pseudo-random number generator must meet to be considered valid.  One essential requirement is that it needs to have an equal distribution of values.  In other words, it cannot favor certain values or ranges over others.  The results are also expected to be reasonably well distributed over short ranges.  A random number generator that rolls a lot of 1s and 2s for a while, and then rolls a lot of 3s-6s for a while is not considered good, even if the rolls are ultimately well distributed in the long range.  Contrasted with real life, sometimes randomness will end up with a series of similar results, even when it is unlikely.   As a result, pseudo-random number generators don't behave exactly like real life randomness.  This is important to understand, as it can have serious implications on very low probability events in your games.

I have written up a program to illustrate this.  I actually wrote two, but the Python one is too slow to see the effect in decent time.  Following is the code to the  C solution:
#include <math.h>
#include <time.h>
#include <stdio.h>
#include <stdlib.h>
#include <locale.h>

#define COINS 33

char and(char flips[COINS]);
char flip();

void main(){
    int *seed;
    srand((unsigned long)time(NULL) + (unsigned long)&seed);

    // If your compiler supports it, this allows
    // %ld to be replaced with %'ld to add commas
    // separating thousands in printf.
    setlocale(LC_NUMERIC, "en_US.utf8");

    long odds = pow(2, COINS);
    long tries = 0;
    char flips[COINS];
    flips[0] = 0;

    printf("Looking for all heads\n");
    printf("Flipping %d coins\n", COINS);
    printf("Odds: %ld to 1 against\n", odds - 1);

//    fprintf(stderr, "Looking for all heads\n");
//    fprintf(stderr, "Flipping %d coins\n", COINS);
//    fprintf(stderr, "Odds: %ld to 1 against\n", odds - 1);

    int i;
    while (!and(flips)) {
        tries += 1;
        for (i = 0; i < COINS; i++) {
            flips[i] = flip();
        }

        if (tries % odds == 0) {
            double prob = 1.0 - pow((double)(odds - 1)/(double)odds, (double)tries);

            printf("Tries: %ld, Probability of success: %f\n", tries, prob);

//            fprintf(stderr, "Tries: %ld, Probability of success: %f\n", tries, prob);
        }
        }

    printf("We have a winner after %ld tries\n", tries);

//    fprintf(stderr, "We have a winner after %ld tries\n", tries);
}

// and a list of coin flips; returns 1 if all coins are heads
char and(char flips[COINS]) {
    int i;
    for (i = 0; i < COINS; i++)
        if (flips[i] == 0)
            return 0;

    return 1;
}

// Generate a random integer between 1 and 0 (1 is heads, 0 is tails)
char flip() {
    return rand() % 2 == 0;
}

Uncomment the fprintf lines if you want to redirect the results to a file, but you still want to see them on the screen as well.

Near the top of the program is a line that tells the program how many coins to flip.  It is set to 33 in this program, though note that it may not work correctly on 32 bit systems (or compiled as a 32 bit program) if COINS is set higher than 31 (technically, this is only affects the odds calculations).  I have found that 33 is the sweet spot on my Linux computer (my Windows computer compiling with 32 bit MinGW is hitting it at 16).  This is where the random number generator will no longer ever flip heads consistently that many times in a row.  This program also periodically outputs the odds of having flipped all heads, given the number of tries.  Generally, it is pretty unusual for the program to go beyond about 0.86 probability, though on my Linux computer, it has gotten to 0.95 before getting all heads.  If it consistently goes past 0.86 though, you are starting to hit the limit of the RNG.  Given my experience with my Linux computer, using 64 bit gcc, and my Windows 10 computer, using MinGW 32 bit gcc, the limited seems to depend largely on what random number generation algorithm the compiler uses.

The point of this activity is to show that pseudo-random number generators tend to break down at the extremes.  When an event depends on a series of numbers generating correctly, it is very likely that the event will be less probable than the math would suggest.  Interestingly, generating a single number with the same probability will generally behave as expected.  For example, in the D&D problem, we were rolling 18 dice, looking for a fairly limited set of results.  On the older computers, presumably with advanced pseudo-random number generators, the chance of getting this combination was probably substantially less likely than the 1/863,245,388 that the math indicated was the probability.  It is even possible that this roll would never have happened, even given infinite time.  If, instead, we had just generated a single random number between 0 and 863,245,388, the odds would almost certainly have been precisely 1/863,245,388.  Unfortunately, the only way to be sure a series of rolls will have the expected probability is to test, and this can be excessively time consuming.  In general, if you want an event to occur at a specific probability, it is best to just generate a single number.  If there is some reason you cannot do this, avoid having the event depend on more than 8 or 10 random factors, and take some time to do the math to make sure the probability you are expecting is actually what you should be getting.


Probability

Understanding the outcomes of your random number generation is also very important.  For example, what outcomes would you expect to come from rolling 3d6 (3 6-sided dice)?  It should be obvious that the minimum roll is 3, and the maximum roll is 18.  This gives a range of 16 possible values.  What might not be obvious is that that chance of rolling a 3 is not 1/16.  It is much smaller.  It turns out there are 216 possible rolls with 3d6.  This is not obvious if the 3 dice all look that same, but a roll of 1, 1, 2 is not the same as 1, 2, 1 or 2, 1, 1 (even though we only care that the sum of all three is 4).  The only way to roll a 3 is 1, 1, 1, and the only way to roll 18 is 6, 6, 6.  As we have just seen though, there are 3 ways of rolling a 4.  This means that you are 3 times more likely to roll a 4 than a 3!  For 3d6, the most probable roll is 10 and 11 (to find this, add the lowest roll to the highest roll and divide by 2; if the result is a fraction, then the integers on either side are the most probable rolls).  The distribution is actually a bell curve, and the more dice you roll, the taller the curve.  In other words, summing more randomly generated numbers favors the average value more.  This can be problematic if you don't realize that this is happening, but if you understand this, you can use it to tailor and customize your probabilities.


Real Randomness

Contrary to popular belief, there is such a thing as real randomness.  When Albert Einstein said, "God does not play dice with the universe," he was referencing a common theory in quantum physics that said that actual real randomness does occur on a quantum level.  Eventually that theory was proven correct.  The universe is not deterministic at the quantum level.  It turns out that we can harness this randomness in several ways.  The traditional way is to measure radioactive decay, which is the most direct manifestation of this randomness, but this is not something most people can do easily.  Other means of collecting randomness include observing certain weather phenomena and measuring temperatures to a very high precision.  Some degree of real randomness can even be collected by observing certain properties of computer processors and their activity.  Most Unix based operating systems include a device file, /dev/random, that provides access to randomness collected from temperature and processor activity data.  In general, there are very few applications that need real randomness though, and even those typically use the randomness to produce a seed that is used by a pseudo-random number generator.  The only common application where randomness is important is cryptographics, including things like generating encryption keys.  For video games, you will probably only ever need this for secure network connections.

This brings us to seeds.  Pseudo-random number generators generate sequences of seemingly random numbers.  In reality, the current state of the RNG determines what the next value it produces will be.  This means that if a random number generator always starts with the same state, it will always produce the same sequence of random numbers.  By itself, this is a problem.  In the context of a video game, this means that the game will always generate the same map, monsters will always have the same behavior, and the entire purpose of the randomness will be defeated.  To fix this problem, the starting state of pseudo-random number generators can be set by the user.  A seed value is used to do this.  Given the same seed, the RNG will always produce the same series of values.  This means that we need to pick a different seed every time, but it also means that we can record the seed we use, in case we need to replicate some behavior when debugging.  In some games (like Minecraft), the players can even get or set the seed, so that players can share seeds for generating worlds that they particularly like (YouTubers sometimes share seeds so that their viewer can follow along in an identical world).  The problem now is getting a unique seed every time.  This is technically impossible, as the data type of the seed limits possible options, but we can get close enough.  The most common way of generating a random seed is to get the current value of the system timer.  This timer is based on the time of day or the number of seconds passed since a certain date.  It also tends to be a floating point value.  The odds of two people getting a seed at exactly the same time, down to millisecond or finer resolution is incredibly small, so this strategy tends to be especially effective.  While this is not generally a problem for games, this strategy has one weakness: If two instances of the program are started simultaneously (for example, using the & operator in a Bash shell), they can easily get the same time.  Another somewhat common way of getting a seed (in languages that support this) is to create an unused pointer, and then use the memory address of that pointer as the seed.  This can work quite well, as programs rarely get the same memory mapping twice, but it also has a weakness.  If the program is run twice in succession, without much happening between, it can sometimes get the same memory address for the pointer.  This is more of a concern for games than running the game twice simultaneously, which is why time is more commonly used.  If you want to avoid both of these problems though, you can instead add the time to the memory address, which will eliminate both of the problem.  Of course, if you are using a Unix based system, you can always get the seed from /dev/urandom, which is a less secure version of /dev/random (don't use /dev/random unless you really need the true randomness, because if it does not have enough randomness collected, it will make your program wait until it does; /dev/random is typically used by cryptographic programs and nothing else).  Sadly, using /dev/urandom will make your program platform dependent, and it won't be able to run on Windows machines or other OSs that don't have /dev/urandom.


The important things to remember when using randomness in a video game are understand the probability of important events (and do the math if you have to), understand the weakness of the distribution requirement and avoid situations where an important event is dependent on too many random factors, and make sure you use a good system for selecting a random seed.  (Note that Python's builtin random module automatically seeds for you, using the best method provided by whatever OS you are using.)

No comments:

Post a Comment