How long would it take a monkey to write 'Hamlet'?

The so-calledinfinite monkey theorem holds that an ape with a typewriter randomly pressing keys would eventually write any work of literature: Hamlet , Don Quixote , or even a best- seller of his own creation. Although it is not very applicable in practice—it is, at the very least, complicated to have an immortal monkey willing to type forever—this assertion allows us to explore very interesting concepts such as randomness, behavior at infinity, and computation based on the generation of pseudo-random numbers.
This is a direct consequence of the second Borel–Cantelli lemma . This lemma states that if every attempt to achieve a particular result is independent of all the others and has a probability of success greater than zero, then given enough attempts, that result will occur infinitely many times. In the case of the infinite monkey theorem, if a monkey presses keys randomly indefinitely, the probability that it types a given text in a single attempt is very low, but not zero. Since the attempts are repeated indefinitely and are independent of each other, according to the lemma, the monkey will eventually type the desired text infinitely many times .
To be fulfilled, the theorem is based on several assumptions. The first is that the ape must type randomly. Colloquially, we understand a random phenomenon as one whose outcome cannot be determined with certainty before it occurs, even if the initial conditions are known. Examples of randomness include the roll of a die or the Christmas Lottery draw. In the case of the ape, it is assumed that, with each keystroke, all letters of the alphabet have the same probability of being drawn, regardless of the text already written.
This condition allows us to calculate the probability of the ape typing any given sequence. For example, the probability of typing "hello" by randomly pressing four keys on a Spanish keyboard (considering only the letters and the space) is (1/27)^4, approximately 0.0000019. This small value, for such a short sequence, already shows how complicated the issue is.
Here comes the second assumption of the theorem: there is an infinite amount of time available, and therefore an infinite number of attempts. After n attempts, assumed to be isolated for simplicity, the probability that the sequence 'hello' does not appear is (1 - 0.0000019)^ n . Although (1 - 0.0000019) is very close to 1, multiplying it by itself n times, if n is large enough, gives a value close to zero. Therefore, the monkey will write 'hello' with as high a probability as we like.
The same is true of any other sequence—even the one that includes all the words of Hamlet , in order—and is what the infinite monkey theorem is based on. Now, can we roughly estimate how long it would take to produce Shakespeare's classic, with high probability? In a recent article, they calculated that, with almost complete certainty, the entire current monkey population would not be able to write a text of more than a few words before the heat death of the universe.
Another interesting experiment related to this theorem allows the user to enter any sequence and simulates the random generation of text until the given sequence is found. To produce the text, this page uses so-called pseudorandom number generators . Being rule-based, the calculations performed by these programs are completely deterministic: if all the initial conditions are known, the generated number can be predicted. In other words, pseudorandom numbers are not random. However, once the initial conditions of the generator are unknown, the generated values are indistinguishable from truly random numbers. Various techniques exist for this purpose, such as generators based on modular arithmetic or those based on ciphers, among others.
Finally, in the spirit of large language models , could these be used as substitutes for the monkeys in our experiment? Could ChatGPT or DeepSeek spontaneously write Don Quixote if asked to write for an infinite amount of time? The above reasoning doesn't hold, since these models generate text based on the probability of words appearing in a given context; they are not the product of a random process. And since Don Quixote is among the texts they have been trained on, it might seem that the probability of them reproducing the entire work would be higher than in the previous case.
However, several factors make this extremely unlikely . First, these models are not trained to faithfully replicate Golden Age Spanish texts, but rather modern ones, making it difficult for them to accurately follow Cervantes's style. Furthermore, these programs are designed not to copy verbatim large portions of the texts they learned with, further reducing the chances of reproducing complete works. This, combined with other limitations of the program, means that while the model might be able to get closer than monkeys to certain parts of the text, the probability of it reproducing it in its entirety is tiny.
Pablo García Arce is a predoctoral researcher at the Spanish National Research Council (CSIC) at the Institute of Mathematical Sciences (ICMAT).
Coffee and Theorems is a section dedicated to mathematics and the environment in which it is created, coordinated by the Institute of Mathematical Sciences (ICMAT). In this section, researchers and members of the center describe the latest advances in this discipline, share common ground between mathematics and other social and cultural expressions, and remember those who shaped its development and knew how to transform coffee into theorems. The name evokes the definition of Hungarian mathematician Alfred Rényi: "A mathematician is a machine that transforms coffee into theorems."
Edited, translated, and coordinated by Ágata Timón García-Longoria . She is the coordinator of the Mathematical Culture Unit at the Institute of Mathematical Sciences (ICMAT).
EL PAÍS