Thursday, April 12, 2018

Video Game Development: Cameras

If you have not read Video Game Development: Relativity, you might want to do so, before reading this article.  If you don't have a solid grasp of basic relativity, this article may be confusing and hard to follow.

While the concept of cameras is generally only a major topic in 3D graphics, this discussion is going to be limited to cameras looking at 2D game worlds.  The reason for this is that cameras in 3D space is a whole, fairly complex topic of its own, and a full explanation would take many, fairly long articles.  The purpose of this article is to explain how cameras work in 2D and why we need them.

One of the most important places for understanding relativity in video game development is in cameras.  The most obvious reference frame for games is the screen or window, where the upper left corner is (0, 0).  This works fine, until your game area is larger than the screen.  What happens when your game is on an 800 by 600 screen, but the game map is several thousand pixels wide and tall?  In most video games, a majority of the game map is off of the screen at any given time.  To view the entire map, we have to be able to change where on the map the screen is looking.  In my experience teaching video game design, the most obvious solution is to move the map around, keeping the screen as the reference frame.  This is like the pre-Copernican heliocentric model of the solar system though.  Every time the player wants to see a different area of the map, every single entity on the map has to be moved.  This even includes entities that are not supposed to be able to move, like trees or buildings.  As with the heliocentric model of the solar system, this results in a lot of extra math, and that can be very expensive when it comes to performance.  The fact is, the screen makes a pretty poor reference frame, when the map is larger than the screen.

If there is a main character, it might be tempting to make the main character the reference frame.  Unfortunately, this is no better than making the screen the reference frame.  In fact, generally these are identical, as the screen is generally always centered on the character, and if the character is the reference frame, the position of the screen never moves.  So again, we are back to the heliocentric analog, where we have to move everything, including stationary objects, whenever the character moves.  As before, the character is a pretty poor reference frame as well.

What if we make the reference frame the thing that is supposed to be stationary?  If everything is supposed to appear to be moving relative to the map, then maybe the map should be the reference frame.  This allows stationary things to remain stationary, instead of wasting processing time to move with reference to the character or screen.  But now we have a problem: How do we move everything relative to the screen, so we can see different parts of the map?  This is the wrong question.  This is like ancient astronomers asking, "How does everything move relative to the Earth?", and it is what resulted in the heliocentric model in the first place.  The answer that lead to the truth was that we are moving, relative to something else that makes a better reference frame.

What if we allow the player to move?  (It is important to distinguish here between character and player.  The character is an entity in the game that is controlled by the player.  The player is the person sitting at the desk in front of the computer.)  The player's eyes, in the game world, are essentially the screen.  So what I am suggesting is, what if the screen moves relative to the world?  It turns out this is the right solution.

In a video game, the screen represents the player looking into the game world.  It essentially represents a camera in the game world that is transmitting what it sees to the players screen.  In this context, if we want the player to see a different part of the world, we just move the camera.  In 2D games, the camera can be represented by a rectangle.  Typically the rectangle will be the size of the screen (if it is not, that generally represents scaling), and the position of the rectangle in the game world determines what part of the world the player sees.

Implementation of a camera is actually fairly simple.  The camera data structure contains four elements.  It contains an x and y coordinate that together represent its position in the game world, and it contains a height and width, that represent the area in the game world that it can see.  If that area is larger than the screen, that represents scaling down the image to fit that area onto the screen (essentially zooming out), and if that area is smaller than the screen, that represents scaling up the image to fill the screen area (zooming in).  Typically, however, the camera size is either the same size as the screen, or it is the size of the portion of the screen where the game world will be displayed.

Once you have a rectangular representation of your camera, all that is necessary is a little bit of additional math when rendering.  Consider, if an entity in the game is at (0, 0), and the camera is at (10, 10), then the entity will be rendered 10 pixels past the top and 10 pixels past the left of the camera.  This position is (-10, -10).  This suggests that all that is necessary to apply the camera is to subtract the camera from the position of each entity as it is being rendered.  It turns out this will get the correct results.  All you need to implement a camera is a representation of the camera as a rectangle and two subtractions for each entity being rendered.

Scaling requires a little bit more work.  Typically, the camera is the same size as the area of the screen where the image will be rendered, but sometimes we want to make the camera smaller or larger, to scale the image.  This allows the ability to zoom in or out.  To scale, we have to divide the screen area by the camera area, and then scale the image being rendered by that value.  For example, if our window is 100 by 100, and the camera is 200 by 200, then we need to scale the image by 0.5 by 0.5 (that is (100/200, 100/200)), for it to fit.  Some rendering libraries will handle this automatically, if you tell them what size you want the source and destination to be.  Others provide scaling functions that may be used by the developer, if desired.  In 2D games, zooming is not a terribly common feature, but with a well implemented camera, it is not hard to accomplish.

Cameras come with some additional advantages.  For example, if you perform collision detection of each game entity with the camera, you can avoid rendering things that are not on the screen, by only rendering entities that collide with the camera.  This can result in significant performance improvements, as rendering is generally far more expensive than collision detections.

Another advantage of cameras is that the view is not bound to the map or any other game entity.  For a game where the player controls a single character, keeping the character in the middle of the screen is as easy as centering the camera around the character each time the character moves.  If you want the player to see some other part of the map for some reason though, it is easy to move the camera to see that location for as long as necessary.  In games where the player should be able to view any part of the map as desired (real-time strategy, for example), the player can have complete control over the camera position.  Adding this flexibility to a game that uses the screen or character as the reference frame is significantly more work than just moving a camera around.  In addition, handling this with a camera is significantly more efficient.

Cameras are the simplest and most efficient way of providing a means for the player to view different parts of a game world.  They are not initially obvious to everyone though.  With a basic understanding of relativity, cameras are quite easy to implement, and they come with many benefits.

No comments:

Post a Comment