Cryptography, Information Theory, and Error-Correction. Aiden A. Bruen
Читать онлайн книгу.Source: These percentages are based on the book “The Tragical History of Doctor Faustus”, by Christopher Marlowe, which is a book of approximately 100 000 characters that was chosen randomly from the Project Gutenberg.
Frequency analysis can be used for cryptanalysis. However, one needs a lot of craft in its use, along with any information that can be gathered about the contents of the message and the sender.
2.6 Breaking the Vigenère Cipher, Babbage–Kasiski
Now that the Vigenère cipher has been defined, we will show how to use the frequencies of letters in English, to break this cipher. We have two tasks.
Find the keyword length.
Find the keyword itself.
We have two methods to find the length of the keyword. The first method, the Babbage–Kasiski method, attempts to find repeated successive triples (or four‐tuples, or five‐tuples, etc.,) of letters in the cipher text. The second method treats the English language like a stationary or even ergodic source (see Chapter 11).
We will use two fundamental principles in carrying out our tasks.
“E” is the most frequent letter of the English language.
Informally, written English tends to “repeat itself.” This means that the frequencies of a passage starting in position 1 are similar to the frequencies of the passage starting in position when we slide the text along itself.
Once we obtain
The second principle has two important interpretations. For the Babbage–Kasiski method, this means that if we find a repeated letter (or sequence of letters) in the cipher text there is a good chance that it comes from a given letter (or sequence of letters) in the plain text that has been enciphered by a given letter (or letters) in the key. Thus, there is a reasonable expectation that the distances between such repeated sequences of letters equals the key length or a multiple of the key length.
The second principle has an important implication in terms of our second method, called the method of “coincidences,” as well. The basic idea is explained in an example below.
The Babbage–Kasiski method
To demonstrate this method for finding
EHMVL | VDWLP | WIWXW | PMMYD | PTKNF | RHWXS |
LTWLP | OSKNF | WDGNF | DEWLP | SOXWP | HIWLL |
EHMYD | LNGPT | EEUWE | QLLSX | TUP |
Our task is to search for repetitions in the above text. For a small cipher text, the brute‐force method is not too difficult. We focus on trigrams, and highlight some as follows:
After having found these, we compute the distances between the trigrams.
EHM | ‐ | 60 |
WLP | ‐ | 25,15,40 |
XWP | ‐ | 39 |
MYD | ‐ | 45 |
We note that with the exception of 39, 5 divides all of the distances. In fact, if we proceed with frequency analysis, it turns out that we can decipher this message with a key length of 5. The codeword is “ladel,” and the plain text is “Thor and the little mouse thought that they should douse the house with a thousand liters of lithium” (Who said that secret messages have to have a clear meaning!) Frequency analysis is used in our next example below.
It is purely by chance that we had the repeated trigram WXP – this repeated trigram was not the result of the same three letters being enciphered by the same part of the keyword. This highlights the fact that the above method is probabilistic.
The method of coincidences
We will now use the second principle of “coincidences,” to find the length of the keyword. The sequence of plain text letters in positions 1 to
VVHZK | UHRGF | HGKDK | ITKDW | EFHEV | SGMPH |
KIUWA | XGSQX | JQHRV | IUCCB | GACGF | SPGLH |
GFHHD | MHZGF | BSPSW | SDSXR | DFHEM | OEPGI |
QXKZW | LGHZI |