Random Number Distributions
Some Non-uniform Random Numbers
- (click here for more)Non-uniform Distributions
Most computer languages provide a pseudo-random number generator which gives values that are uniformly distributed within a particular range. For example the Javascript Math.random() function returns a real value between 0 and 1, which is just as likely to be in the range 0->0.1 as it is to be in the range 0.1->0.2, or any other range of size 0.1 that is contained within the range 0->1
In fact a uniformly distributed random number between 0 and 1 has a probability p of lying between q and q+p, if q>=0 and q+p<=1
This kind of distribution matches, for example, the number shown on a roll of a die, or the number on a lottery ball, where every possible outcome is equally likely.
However for some events or measurements, different outcomes have different probabilities. For example, the sum of two six-sided dice is 2 one time in thirty-six, whereas it is 7 one time in six. If we record the number of times a coin lands 'heads' before the first 'tails' then we would see 0 in 50% of the tests, 1 in 25%, 2 in 12.5%, and so on, with no number being ruled out in theory, although the chance of seeing 100 heads in a row is rather small.
The first of these examples approximates a so-called 'normal' or Gaussian distribution - the approximation is closer the more dice are added. The second example approximates an 'exponential' distribution.
Generating non-uniform numbers
How can we generate random numbers that conform to a given distribution? Using the functions described on this site, or using physical means, we can create numbers with a uniform distibution, but how can these be converted to another distribution?
Exponential distribution
An exponential distribution is characterised by a value called the rate parameter, r, which defines how rapidly the probability of a value decreases as the value itself increases. The mathematical formula for the distribution is:
P(x) = r exp(-r x) [for x>=0, where exp means 'e to the power of']
Using this formula we can derive a method to convert a uniform random number U into an exponentially distributed number E
E = -ln(U)/r [where ln is the natural logarithm function]
Normal distribution
A normal distribution is characterised by a 'mean' value, m (around which the random numbers tend to cluster), and the standard deviation, d, which indicates how widely spread the numbers are from the mean. Sometimes the variance, s, is used - this is the square of the standard deviation. About 68% of values will be within one standard deviation from the mean, 95% within 2*d from the mean, and 99% within 3*d.
For a normal distribution with mean m=0 and variance s=1, the equation for the distribution is the bell-shaped curve given by:
P(x) = exp(-x*x/2)/sqrt(2*pi)
Distributions with different means or variances can be derived from this formula by appropriate scaling and offsetting.
An easy way to approximate this distribution is to generate 12 uniform random numbers between -0.5 and 0.5, and add them up. This works because the mean of each input number is 0, and its variance turns out to be 1/12, so adding 12 such numbers will approximate a normal distribution with the desired mean and variance. Clearly this method will not generate values higher than 6 or lower than -6, so it is not a true normal distribution - but these values would only be expected to occur once in 500 million evaluations.
An alternative is the 'Box-Muller method', which takes two independent uniform random numbers U1 and U2 between 0 and 1, and uses them to generate two independent normally distributed random numbers N1 and N2.
N1 = sqrt(-2*ln(U1)) * cos(2*pi*U2)
N2 = sqrt(-2*ln(U1)) * sin(2*pi*U2)
A related method is the 'Marsaglia polar method' in which U1 and U2 are uniform random numbers between -1 and +1, but generated repeatedly until the condition U1*U1+U2*U2<1 is satisfied. Then:
S = U1*U1+U2*U2
T = sqrt(-2*ln(S)/S)
N1 = T*U1
N2 = T*U2
Other distributions
For more general distributions, the 'ziggurat' method can be employed. In fact this method also works well for the normal and exponential distributions, but it is more complicated to implement than the methods described above.
The ziggurat method relies on dividing the distribution into a number of horizontal regions of equal area. One region, the base, contains the tail of the distribution which typically is infinitely long, but must necessarily have finite area. The remaining regions are rectangles stacked on top of the base, each with the same area but constructed so as to encompass the curve with a small region extending outside the curve.
A random region is then chosen, and sampled with a random horizontal value - if it is within the curve it is returned as the newly distributed number - otherwise a new value is generated and the procedure restarted.
A full explanation of this method is too complex for this page, but can be found in many places on the internet, such as Wikipedia.