bzt wrote:

Actually, that's the purpose of a random generator... It has to be a black box without any recognizable algorithm.

Only in use. In design, the random generator has to preclude the sequence of numbers being predicted. It can only do that when its design and its inner workings are known. For the PRNG in Linux, you can know exactly what goes into the entropy pool, and how it is mixed in, and see that there is no predetermined sequence of numbers being generated. For the RDRAND instruction, there are no such assurances. Only Intel's word that they are definitely not trying to undermine your security, honest!

bzt wrote:

But I'd like to point out this has nothing to do with cryptographically sound random, that requires real-life event entropy.

That would be the section on high-entropy generators I'm preparing. The idea is that if the entropy pool has some entropy, and some message has entropy, then SHA-1(entropy pool + message) has the same amount of entropy as both combined, while compressing down to a single SHA-1 block size. You can also use SHA-2 or SHA-3, or even hashing algorithms not called SHA. In any case, distribution of entropy and compression are properties of hash algorithms.

You then have to jealously guard the entropy pool, so nobody gets a look at it. When asked to generate random numbers based on the pool, you take the pool contents, calculate a hash based on them, and return the hash. However, that would lead to a repeating pattern in the hash block size, so before you return it, you must also mix the output back into the entropy pool again. That is basically the entire design. In production, the algorithm to mix the entropy into the pool does not have to be secure, but the point of the article should be to start with simplicity and move on to speed afterwards.

The idea of cryptographic security of the RNG is also about the entropy pool: The state of the random generator must not be knowable for anyone observing the outputs before or after another process has used it for some secure purpose. That is also a failing of the low-entropy generators, that you can identify their internal state by observing the outputs. For an RNG that is shared across the entire system, this is even more important. If the two of us were sharing a computer, it would not help you to generate a random key if right before that, I figured out the state of the RNG and know exactly what random numbers you are going to get.

Most of my knowledge about this comes from a talk I forgot the title of, but the subtitle was "How I learned to stop worrying and love the urandom", which was held at some Chaos Communication Conference if memory serves, and it made so many things so much clearer for me. For instance why the blocking behavior of /dev/random on Linux is nonsense, except maybe at boot when not many random events have happened yet. But gauging the entropy of arbitrary events is really hard in general, so guessing how many bits you may have added now is an inexact science at best. Hard disk command timing is random due to the behavior of air molecules on the surface of the disk, but an SSD has the exact same interface and no randomness at all. And similar for all other sources.7

And then there's adversarial entropy, which I was looking for the paper DJB wrote about it (or was it just a blog post? I forget). Essentially, it is possible an adversarial entropy device would try to get the state of the entropy pool and then deliver "random" numbers to the system such that the entropy pool is forced to become less random. Which sounds far fetched, but really, all you need is a rootkit and a local network device willing to time packets just right. And suddenly the attack becomes scarily plausible and hard to defend against. And the rootkit doesn't have to write into kernel space, only read, and Meltdown just recently happened... oh boy. And if it isn't Meltdown, it's Rowhammer or something like that.

So yeah, I have some writing to do.