Common errors in pseudo-random number generators

rand() % range or rand() & range

To generate an integer number in a range, some people use module operator, which is the worst idea to reduce the interval [0; RANDMAX] to [0; range-1]. The problem is that for some generators, lower bits are less random than higher bits.

Example

Example with RANDU algorithm:

x(0) = 42
x(1) = (65539 * x0) mod 2^31 = 2752638
x(2) = (65539 * x1) mod 2^31 = 16515450
x(3) = (65539 * x2) mod 2^31 = 74318958
x(4) = (65539 * x2) mod 2^31 = 297274698
...

And now, let’s try to generate a number in range 0..10 with x(i) % 10:

2, 8, 0, 8, 8, ...

Examples with probability

Let’s imagine that you have a perfect random source (hardware entropy) generating numbers in range [0; 255]. And you want to generate random numbers in range [0; 62]:

randint(): rand() % 63

So you get:

  • input range [0; 62] => [0; 62]
  • input range [63; 125] => [0; 62]
  • input range [126; 188] => [0; 62]
  • input range [189; 251] => [0; 62]
  • input range [252; 255] => [0; 3]

The problem is in the last range: the output range is smaller than the input range, and so the values in range [0; 3] are more frequent than [4; 62]:

  • output range [0; 3]: p(i) = 5 / 256 = 1.95%
  • output range [4; 62]: p(i) = 4 / 256 = 1.56%

This can be even worse if the output range is closer to the input range. Eg. input=[0; 15], output=[0; 14]:

  • output value 0: p(i) = 2 / 16 = 12.5%
  • output range [1; 14]: p(i) = 1 / 16 = 6.3%

The number 0 is two times more frequence than the others.

Solution using float

Use floating point number:

int randint(int min, int max)
{
    double range  = (double)max - (double)min + 1.0;
    return min + (int)( (double)rand() * range / (RAND_MAX + 1.0) );
}

The result is converted to integer and C language truncated digits. so 0.9 is converted to 0. That’s why it uses a wider ranges:

  • max-min+1 and not not max-min
  • RAND_MAX+1.0 and not RAND_MAX

Warning

This solution doesn’t work with 64 bits integer because double has smaller precision (52 bits) and so you will loose the least significant bits.

Solution using integers

Hasard implements a solution using only integers working on 32 and 64 bits CPU. See lib/randint.c.

Non initialized generator

If a generator is not initialized, it’s easy to guess its internal state and to guess previous and next generated numbers.

Example

Example with non initialized generator:

#include <stdio.h>

int main()
{
    int i;
    for (i=0; i<4; i++) printf("%u\n", rand());
    return 0;
}

First program run:

1804289383
846930886
1681692777
1714636915

Second program run:

1804289383
846930886
1681692777
1714636915

Solution

Set the seed (eg. srand() function) with good entropy source.

Generator reseed at each tick

Some buggy generators are reseed (especially using time() and getpid()) at each tick which is bad! If many numbers numbers are generated during the same seconds, they will be all equal!

Example: old versions of PHP and ClamAV have this bug.

Poor seed

There are two distincts problems about the seed:

  • some seed values leads the smaller period than the maximum period
  • to guarantee the confidentiality, the attacker should not be able to guess the PRNG internal state

Small period

Some generators have smaller period if the seed is not correctly choosen. In case of Park-Miller generator, the seed have to be coprime with the modulus (ensure that gcd(seed, modulus)=1)

Example with RANDU:

x(n+1) = (x(n) * 65539) % 2147483648

With the worst seed, 1073741824 (modulus / 2), the period is only one:

x(0) = 1073741824
x(1) = 1073741824
x(2) = 1073741824
...

Another bad seed, 536870912 (modulus / 4), the period is two:

x(0) = 536870912
x(1) = 1610612736
x(2) = 536870912
x(3) = 1610612736
...

Confidentiality and PRNG state

In simple PRNG, it’s easy to recover the PRNG state: it’s very close to the random tick. But even if the attacker has no access to the random numbers, he can guess next numbers if he knows the initial seed.

Example: if the seed is only based time() + getpid(), the attacker can guess this value using bruteforce. Most computers share the same time() value (or close values) and getpid() value is in [2; 32767] in most cases.

So to make attacker job harder, use a True Random Number Generator like an Hardware Random Number Generator.

Integer overflow

In most langagues, integers have physical limits like [-2^31; 2^31-1] or [0; 2^64-1]. If an algorithm uses the wrong types, the result can be biased in favor of some numbers.

Example with integer overflow:

#define RAND_MAX 2147483648
uint32_t min, max, tick, number;
number = (double)tick * (max - min + 1) / (RAND_MAX + 1.0);

This code works correctly for most values, but if you try with min=0 and max=UINT32_MAX, the result doesn’t depend on tick anymore, the result is always zero! It’s because max-min+1 = UINT32_MAX+1 doesn’t fit in uint32_t type.

To fix this code, use the right types. Fixed example:

number = (double)tick * ((double)max - min + 1) / (RAND_MAX + 1.0);