Tuesday, July 27, 2004
By: Jason Doucette
Identical Minesweeper Boards
I recently visited The Authoritative Minesweeper page, which shows world records for the best Minesweeper times. I thought it was interesting that they noticed some of the best times submitted to their web site were achieved with exactly identical boards:
Identical Minesweeper boards. (Images from The Authoritative Minesweeper.)
A record holder, Uri Avni, explains his thoughts as to why this occurs:
"Computers use a pseudo random algorithm in order to generate randomness; pseudo random is not a pure 'random', and could explain why some boards seems to be more common than expected. Another fact to be noticed is that the first square we click can never be a mine. This hurts the assumption of pure randomness." -- Uri Avni
However, he has missed an important part of how random numbers are generated. (The fact that the first click is never a mine is just the program trying to be nice, so you do not immediately receive a 'game over' message on your first click. It merely moves the mine, if it exists at that location, to some other empty location. It does not 'disturb' the randomness used to generate the board. If this occurred in any of the above images, you would merely see one of the mines 'misplaced'. This subtlety can be ignored in the following discussion.) I shall explain.
How Computers Generate Random Numbers
Uri Avni is quite correct in that the 'random' numbers that computers generate are indeed pseudo. This basically means that they are not truly random. Computers are deterministic machines. This means, if given exactly the same input information, it will result in exactly the same output. Every time. Makes you wonder about those Video Lottery Terminals, doesn't it? Such an algorithm to generate these numbers is called a pseudo random number generator (PRNG).
How Does A Deterministic Machine Create Randomness?
Programmers create a function which takes a single number as an input, and spits out a single number as its output. Ideally, this new number, generated via a complex algorithm within the function, is difficult to for a human to predict, even if she knows what the original input number was. In other words, show a person the first number (used as input into the function), and the second number (which the function outputs), and the distinction between the two should be complex enough that the person should not be able to discern any patterns. If the algorithm is good, this person will not be able to predict the output of any given input (with exception, of course, to the ones she has already seen). Just as you have absolutely no idea what the next dice roll will be in a Monopoly game, you will not be able to predict what new number this function will create, even though it is deterministic.
Each time the computer wishes to generate another random number, it will use the last outputted number as its input, and it will generate a new one. Each random number generated is the input that generates the next. Herein lies the problem. Due to the deterministic nature of machines, the exact same sequence of numbers will be generated every time, if given the same initial condition (the initial input number, known as the seed). Take another glance at the similar Minesweeper boards above while thinking about this: Assuming truly random numbers, the probabilities of any two boards being the same is extremely small. The probability decreases immensely as the board size and mine count increases. However, if only the same seed needs to be the same to generate the exact same sequence of numbers that is used to position the mines, then the Minesweeper board could be thousands of times larger, with millions of times more mines, and you will still get the exact same board. Even if you have the best random number generator algorithm in the world, if you seed it with the same initial input, you get the same output. Every time. No matter how many numbers are generated; no matter how many mines are placed. (They call it a seed for a reason.)
Regarding Minesweeper, the problem really is not that the random number generator used in the program is not good enough. It is good enough. No human can possibly predict where the mines are going to be based solely on the location of the currently uncovered mines. The mines really are placed randomly enough that we perceive the pseudo randomness as true randomness. If the random number generator algorithm was poor, then you would see patterns in mine placement. It is likely that Minesweeper uses the standard random number generator that comes with a standard C / C++ compiler, which is considered very poor (in comparison to professional random number generators used for more important topics, such as Internet security), but, it is good enough. The real problem is the seed.
How Do We Create a Good Seed?
This is a problem, since computers are deterministic. If we could already create a truly random seed, we would not need the seed to help initialize a pseudo random number sequence.
All we really need is a different seed every time. One method could be to set the seed to equal the number of games played. While this allows you to play a different game every time, it means that everyone else who plays the game on their own computer will play the exact same boards as you. Watching someone play his 5th game on his own computer can be used to properly predict mine locations on your 5th game on your computer. Another method is to store the last random number generated as the seed for the next game, since each number generated is really a seed for the next number to be generated. But, this still leaves the problem of generating the very first seed for the first game played. It all boils down to creating one seed that is as random as possible.
The simplest seed generation is to use the current time. Most computers know the current time down to the millisecond, and if you use this information, it is going to be fairly random. The game likely seeds the generator using the current time of when the program starts. Although the hours and minutes may be easy to predict, but I am willing to bet you cannot tell me what the thousands of a second were when you started the program. But, if two people start the game at precisely the same time, then they will play the same game (note that 'same time' to a computer means times that are imperceivably different, which is limited by the computer's timer resolution, which may only be 1/1,000th of a second). If you consider that millions of people have this game on their computers, the likelihood of this happening is high. This problem can be resolved by adding more seemingly random information into the generation of the seed. Such as the last time the mouse was clicked, how long the OS has been running, how long the program itself has been running, how much hard drive space you have left, how many processes are running, etc. If you include all of this information and more, what is the likelihood of two players having all of this information exactly the same, and thus generating the same seed? Very small. In fact, if you calculate the odds yourself, you can continually add more of this 'random' information into the generation of the seed until the odds are well below your minimal requirements.
Seed Limitations
There is another problem lurking that many people are unaware of. If the seed that is generating these random numbers is only a 32-bit value, which is very likely for this simple Minesweeper game, then there are only about 4.3 billion possible seeds. The seed can only hold about 4.3 billion unique values. No matter how much randomness is used to generate the seed, even if you used a truly legitimate random source to generate it, there are still only 4.3 billion different ways to store the 32 bits of data that comprise the seed. This implies that there are only 4.3 billion possible boards to play*. After 4.3 billion games have been played, people must be playing boards others have already seen. Considering Microsoft sells over 100 million copies of Windows per year with Minesweeper installed on most, it is very likely that many players are playing identical games. The images at the start of this article are proof.
I have determined, with a program I created, that my compiler, Visual Studio .NET 2003 (VC7) Standard Edition, uses a 31-bit seed for its standard pseudo random number generator function, rand(). The function that seeds this function is srand(), which takes a 32-bit value. My program exhaustively tests every possible seed, and realizes that each seed produces identical results as one other seed. These two seeds always differ by 2,147,483,648 (2^31). Thus, srand() appears to ignore the high bit of the 32-bit input value. I believe this is the function that Minesweeper uses, therefore, I hypothesize that there are only 2,147,483,648* possible Minesweeper boards.
*For each difficulty level, as using the same seed for two different difficulty levels will produce two unique boards. Also, this does not take into account the possibility of a player clicking on a mine for her first click, which would cause the program to relocate that mine, thus producing a slightly different board.
About the Author: I am Jason Doucette of Xona Games, an award-winning indie game studio that I founded with my twin brother. We make intensified arcade-style retro games. Our business, our games, our technology, and we as competitive gamers have won prestigious awards and received worldwide press. Our business has won $190,000 in contests. Our games have ranked from #1 in Canada to #1 in Japan, have become #1 best sellers in multiple countries, have won game contests, and have held 3 of the top 5 rated spots in Japan of all Xbox LIVE indie games. Our game engines have been awarded for technical excellence. And we, the developers, have placed #1 in competitive gaming competitions -- relating to the games we make. Read about our story, our awards, our games, and view our blog.