The goal of this notebook is to explore one aspect of the code-cracking work done at Bletchley Park ('BP') during World War II, in particular their work on the 'Lorenz' cipher. We will demonstrate the process of 'setting Chi wheels' as performed at BP with the aid of the purpose-built Colossus computer. There is a good overall guide to the workings of the Lorenz machine and cipher, and the attacks on it, in the Wikipedia 'Cryptanalysis of the Lorenz cipher' page, but to summarise:
xor
of the plaintext and a 5-bit keystream.xor
of a Chi stream and an extended Psi stream.To know the keystream, then, you need to know both:
The wheel patterns persisted for many messages, but the 'settings' were different for each message.
When performing 'Chi setting' for some particular intercepted enciphered message, the cryptanalysts knew:
but did not know:
This notebook will demonstrate the setting of two Chi wheels, given an 'intercepted message'.
In this notebook we encipher a message, and then, supposing it has been intercepted, work out the settings for two of the Chi wheels, using the techniques developed at BP and implemented using Colossus. We suppose that the Chi wheel patterns have already been cracked.
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pylab as plt
import gzip
%load_ext cython
%matplotlib inline
We first work on creating the keystream, and then will apply it to some plaintext to produce the specimen ciphertext.
The following patterns are actual wheel patterns used, as taken from p.21 of 'General Report on Tunny with Emphasis on Statistical Methods (1945)' (GRT henceforth):
specimen_patterns_raw = [
'0+0+0 0+0++ 0+0+0 +0+00 ++000 ++++0 0+0+0 +0++0 ++0',
'00+0+ 0++0+ 0+++0 0++0+ 0+00+ +0000 +0+0+ ++00+ 0++0+ 0+',
'0+00+ ++000 +0+00 0++00 +0+0+ +0+0+ ++00+ 0+00+ 0+0++ +0+0+ +',
'+0++0 0++0+ 00+0+ 0+00+ +0000 +++00 +0+++ +0+0+ 0+0+0 +00+0 ++0',
'+0+00 +0+0+ 0+00+ 000++ 000+0 +++0+ +0+0+ 0+00+ ++00+ 0+++0 0+0++ 0+0+',
'++000 +0++0 ++++0 +++00 ++000 +0+0+ 000++ +0',
'+0+00 00+++ 00+++ +0+++ +00++ ++0++ ++0++ ++000 ++++0 +++0+ ++00+ +++0+ +',
'0++00 ++00+ +00++ 00++0 0++++ 00++0 0+00+ +00++ 0',
'000+0 +00+0 ++000 0+++0 0+0++ ++0++ 0',
'+++00 +++00 +++00 +0000 +++0+ 00+0',
'00+0+ +++00 +00++ 0000+ +++00 +',
'++000 +++0+ 00+++ 0+000 ++0']
specimen_patterns = [p.replace(' ', '').replace('+', '1')
for p in specimen_patterns_raw]
The Lorenz machine's official numbering of its wheels did not match that assigned by BP, so to work with the BP notation we must select the appropriate patterns to get the various Chi, Psi, and Mu wheels:
specimen_chi_chars = [specimen_patterns[i] for i in [7, 8, 9, 10, 11]]
specimen_psi_chars = [specimen_patterns[i] for i in [0, 1, 2, 3, 4]]
specimen_mu_chars = [specimen_patterns[i] for i in [6, 5]]
def mk_arrays(strs):
return [np.array([int(c) for c in s], dtype=np.uint8) for s in strs]
specimen_chi = mk_arrays(specimen_chi_chars)
specimen_psi = mk_arrays(specimen_psi_chars)
specimen_mu = mk_arrays(specimen_mu_chars)
_exp_chi_lengths = [41, 31, 29, 26, 23]
_exp_psi_lengths = [43, 47, 51, 53, 59]
_exp_mu_lengths = [61, 37]
assert np.all(list(map(len, specimen_chi)) == _exp_chi_lengths)
assert np.all(list(map(len, specimen_psi)) == _exp_psi_lengths)
assert np.all(list(map(len, specimen_mu)) == _exp_mu_lengths)
The BP team used the term impulse for what we would now call a 'bit' or 'bitstream', and we adopt the 'impulse' term here. Also, the published descriptions of BP's work present 'Impulse 1' at the left of the group of five impulses, so we maintain this arrangement, which makes 'Impulse 1' be the most significant bit of a binary number.
First some utility functions for working with wheels and collections of impulses:
def letter_from_impulses(impulses):
return (16 * impulses[0]
+ 8 * impulses[1]
+ 4 * impulses[2]
+ 2 * impulses[3]
+ 1 * impulses[4])
def impulse_stream_from_wheel(w, n):
"Return an array of size N formed by cycling through the elements of W."
n_repeats_needed = int(np.ceil(n / w.size))
possibly_overlength_stream = np.tile(w, n_repeats_needed)
return possibly_overlength_stream[:n]
def rot_wheel(w, n):
"""
Return the result of rotating the given pattern W by N positions.
The [0] element of the result is the [N] element of the input W.
"""
return np.concatenate([w[n:], w[:n]])
The Chi wheels step on after every plain-text letter has been enciphered, so it is straightforward to generate the Chi contribution to the keystream, starting from particular settings:
class Chi:
def __init__(self, chi, step_counts):
"STEP_COUNTS --- list of values used as the wheels' settings."
self.chi = [rot_wheel(w, n) for w, n in zip(chi, step_counts)]
def letters(self, n_sprockets):
impulses = [impulse_stream_from_wheel(w, n_sprockets) for w in self.chi]
return letter_from_impulses(impulses)
For example, with the above specimen Chi patterns, and settings of
we can generate:
chi_wheels = Chi(specimen_chi, [7, 10, 22, 12, 19])
chi_wheels.letters(16)
The summary description of the Psi and extended Psi process above was conceptually useful, but in practice the (unextended) Psi stream is never created in the Lorenz machine. Instead, the Mu wheels control whether the Psi wheels move onwards. This behaviour means that it is slightly more involved to generate the extended Psi stream, as compared to the Chi stream:
class PsiAndMu:
def __init__(self, psi, mu, step_counts):
"""
STEP_COUNTS --- first five elements give Psi settings; remaining
two elements give the Mu settings.
"""
self.psi = [rot_wheel(w, n) for w, n in zip(psi, step_counts[:5])]
self.mu = [rot_wheel(w, n) for w, n in zip(mu, step_counts[5:])]
def letters(self, n_sprockets):
mu_0_stream = impulse_stream_from_wheel(self.mu[0], n_sprockets - 1)
unext_mu_1_stream = impulse_stream_from_wheel(self.mu[1], n_sprockets)
mu_1_idxs = np.concatenate([[0], np.cumsum(mu_0_stream.astype(np.int32))])
ext_mu_1_stream = unext_mu_1_stream[mu_1_idxs]
impulses = [impulse_stream_from_wheel(w, n_sprockets) for w in self.psi]
unext_psi_stream = letter_from_impulses(impulses)
psi_idxs = np.concatenate([[0], np.cumsum(ext_mu_1_stream[:-1].astype(np.int32))])
ext_psi_stream = unext_psi_stream[psi_idxs]
return ext_psi_stream
For example:
psi_mu_wheels = PsiAndMu(specimen_psi, specimen_mu, [3, 5, 7, 1, 3, 8, 10])
psi_mu_wheels.letters(16)
Using the specimen wheel patterns, suppose the wheel-settings for our message were:
wheel | setting | wheel | setting | wheel | setting | ||
---|---|---|---|---|---|---|---|
Chi 1 | 33 | Psi 1 | 19 | Mu 1 | 53 | ||
Chi 2 | 27 | Psi 2 | 25 | Mu 2 | 20 | ||
Chi 3 | 10 | Psi 3 | 33 | ||||
Chi 4 | 8 | Psi 4 | 49 | ||||
Chi 5 | 16 | Psi 5 | 22 |
We can then generate a length of keystream. For the purposes of this demonstration, we will work with a long message, to ensure the statistical properties used below emerge clearly. The Colossus machines at BP could work with messages up to 10,000 characters long, or 20,000 in the case of some specially extended machines, so we choose 16,000 as a representative length for a 'long' (but not implausibly so) message.
specimen_length = 16000
specimen_chi_wheels = Chi(specimen_chi, [33, 27, 10, 8, 16])
specimen_keystream_chi = specimen_chi_wheels.letters(specimen_length)
specimen_psi_mu_wheels = PsiAndMu(specimen_psi, specimen_mu, [19, 25, 33, 49, 22, 53, 20])
specimen_keystream_extended_psi = specimen_psi_mu_wheels.letters(specimen_length)
specimen_keystream = specimen_keystream_chi ^ specimen_keystream_extended_psi
We will use an extract from Dickens's The Pickwick Papers for our specimen plain-text message, extracting a chunk from the middle of the book.
The file as downloaded from Project Gutenberg needs to be converted to its 5-bit teleprinter representation. For details see the comments in the code below.
# Closest equivalents to each printable ASCII character. Non-printable
# characters are mostly invalid ('~'); exceptions are LF and CR.
teleprinter_from_ascii = (
b"~~~~~~~~~~ ~~ ~~~~~~~~~~~~~~~~~~ .'##%%'()#+,-./0123456789::(=)?"
b"@ABCDEFGHIJKLMNOPQRSTUVWXYZ(/)--'ABCDEFGHIJKLMNOPQRSTUVWXYZ(/)-~")
# As an intermediate representation, we will pretend that there's a
# 6-bit code where the most-significant bit indicates which 'shift'
# we're in (0=letter; 1=figure).
teleprinter_meaning_6bit = (
b"~T~O HNM~LRGIPCVEZDBSYFXAWJ~UQK~"
b"~5~9 #,.~)4@80:=3+~?'6%/-2~~71(~")
# Create look-up table from 7-bit ASCII to 6-bit teleprinter.
tp_6bit_lut = np.full(128, 255, dtype=np.uint8)
for i, x in enumerate(teleprinter_meaning_6bit):
tp_6bit_lut[x] = i
%%cython --cplus
from libcpp.vector cimport vector
import numpy as np
import cython
cimport numpy as np
np.import_array()
# To convert the made-up 6-bit code into the real 5-bit teleprinter code, we need
# to keep track of whether we're in 'letter' or 'figure' shift, and emit the shift
# codes accordingly to switch between them. An artifact of 'tp_6bit_lut' is that
# all ASCII spaces get turned into the figure-shift space 6-bit code, so we handle
# that case explicitly.
@cython.boundscheck(False)
def tp_5bit_from_6bit(xs_6bit):
cdef np.uint8_t TP_SPACE_IN_FIG_SHIFT = 36, TP_5BIT_SPACE = 4
cdef np.uint8_t TP_5BIT_SHIFT_MASK = 32, TP_5BIT_CHAR_MASK = 31
cdef np.uint8_t TP_LET_SHIFT = 31, TP_FIG_SHIFT = 27
cdef np.uint8_t ch, required_shift, active_shift = 0
cdef vector[np.uint8_t] xs_5bit
cdef np.uint8_t[::1] vw_xs_6b = xs_6bit
cdef np.uint32_t i
for i in range(vw_xs_6b.size):
ch = vw_xs_6b[i]
if ch == TP_SPACE_IN_FIG_SHIFT:
xs_5bit.push_back(TP_5BIT_SPACE)
else:
required_shift = ch & TP_5BIT_SHIFT_MASK
if required_shift != active_shift:
xs_5bit.push_back(TP_LET_SHIFT if required_shift == 0 else TP_FIG_SHIFT)
active_shift = required_shift
xs_5bit.push_back(ch & TP_5BIT_CHAR_MASK)
xs_5bit_arr = np.empty(xs_5bit.size(), dtype=np.uint8)
cdef np.uint8_t[::1] xs_5bit_vw = xs_5bit_arr
for i in range(xs_5bit_arr.size):
xs_5bit_vw[i] = xs_5bit.at(i)
return xs_5bit_arr
def teletype_representation(filename):
"""
Read a .gz file of utf-8 text with BOM, and return its closest 5-bit
teleprinter representation as a numpy 'uint8' vector.
"""
file_bytes = gzip.GzipFile(filename, 'rb').read()
utf_bom = b'\xef\xbb\xbf'
assert file_bytes[:len(utf_bom)] == utf_bom
data_ascii = np.frombuffer(file_bytes[len(utf_bom):], dtype=np.uint8)
data_tp = np.frombuffer(teleprinter_from_ascii, dtype=np.uint8)[data_ascii]
assert ord('~') not in data_tp
data_6bit = tp_6bit_lut[data_tp]
assert 255 not in data_6bit
data_5bit = tp_5bit_from_6bit(data_6bit)
assert np.all(data_5bit < 32)
return data_5bit
Recall that we have chosen 16000 as the length of our encrypted message, and generated a keystream of this length. We therefore extract a 16000-long specimen of plaintext from somewhere in the middle of the novel:
full_specimen_plaintext \
= teletype_representation('data/Dickens-Pickwick-Papers.txt.utf-8.gz')
specimen_start_idx = 1000000
specimen_end_idx = specimen_start_idx + specimen_length
specimen_plaintext = full_specimen_plaintext[specimen_start_idx:specimen_end_idx]
And finally we can get the specimen ciphertext:
specimen_ciphertext = specimen_plaintext ^ specimen_keystream
Suppose that this specimen_ciphertext
was transmitted.
We now switch roles to consider the work done by the cryptanalysts, and suppose that they have intercepted the above ciphertext. Recall that we are going to suppose that the wheel patterns have already been worked out (from messages sent earlier this month, say), and that our job is to crack the 'wheel settings' for this newly-intercepted message.
The Lorenz cipher machine did a good job of masking the distribution of the letters in the source material, taken one at a time — i.e., the distribution of ciphertext letters was close to uniform. However, the BP team discovered a weakness in terms of the distribution of the 'delta' of adjacent letters of the message.
'Delta' means the 'difference' between adjacent letters, i.e., the result of subtracting (in a bitwise sense) a letter from the previous one in the stream. However, under bitwise operations, 'addition' and 'subtraction' are the same operation, xor
.
In the following example, we print the resulting 'delta' aligned such that each of its elements is between the two elements of the original stream which xor
to produce it.
def delta_stream(x):
return x[1:] ^ x[:-1]
example_x = np.array([0, 1, 1, 0, 1, 0, 0, 1, 1])
print('x :', example_x)
print('delta_stream(x): ', delta_stream(example_x))
We now point out two facts noted at BP:
The properties of plain (Dickens) English mean that delta-plain is not quite uniformly distributed.
The extended Psi stream often repeats characters, because the Psi wheels do not move on after every letter. When the extended Psi has the same character for two consecutive positions, delta-extended-Psi is the all-bits-zero character. Therefore delta-extended-Psi is the all-bits-zero character much more often than would be the case by chance. There is also further non-uniformity to its distribution.
Putting these two observations together, delta-plain xor
delta-extended-Psi will have some small but significant non-uniformity. Another way of thinking of "delta-plain xor
delta-extended-Psi" is as "delta-ciphertext xor
delta-Chi". This latter combination is known as 'delta-de-Chi', as it is the delta with the Chi component removed ('de-Chi' as in 'decaffeinated'). This leads us to the insight made at BP and used in the breaking of the Chi-wheel settings:
If we guess the Chi settings, and find the resulting delta-de-Chi, we should know when we've guessed right, because the non-uniform statistics of the true delta-de-Chi should be apparent.
We now work through the details of quantifying these observations.
Based on traffic gathered by other means, the BP team had a large sample of the typical message plaintexts transmitted over any given link. Suppose that the cryptanalysts knew that messages were extracts from Dickens. We will use the complete text of Nicholas Nickleby as our sample text. (Using The Pickwick Papers would be cheating.)
training_plaintext \
= teletype_representation('data/Dickens-Nicholas-Nickleby.txt.utf-8.gz')
training_delta_plain = delta_stream(training_plaintext)
For looking at the results, it will be helpful to use the same names for the 5-bit codes that BP did, which are mostly their 'letter-shift' meanings, with '/
' for the all-bits-zero code, and digits for the other special codes:
letter_names = list('/T3O9HNM4LRGIPCVEZDBSYFXAWJ5UQK8')
letter_names_df = pd.DataFrame({'letter_name': letter_names})
raw_freq_df = pd.DataFrame(pd.Series(training_delta_plain).value_counts(),
columns=['freq'])
delta_plain_distrn = pd.concat([letter_names_df, raw_freq_df], axis=1)
delta_plain_distrn.index.name = 'code'
delta_plain_distrn['prob'] = delta_plain_distrn['freq'] / delta_plain_distrn['freq'].sum()
delta_plain_distrn['freq_pct'] = 100.0 * delta_plain_distrn['prob']
delta_plain_distrn.sort_values('freq', ascending=False).head(10)
This is distinctly non-uniform; we can see this more clearly if we plot the frequencies:
plt.figure(figsize=(5.0, 12.0))
sns.set_context('notebook', font_scale=1.75)
ax = sns.stripplot(y='letter_name', x='freq_pct', color='k', size=8.0,
data=delta_plain_distrn.sort_values('freq_pct', ascending=False))
ax.set_xlabel('frequency (%)')
ax.set_ylabel('letter name')
ax.set_title('distribution of letters in teleprinter Dickens delta-plain',
size='small')
ax.vlines(100.0 / 32, -0.5, 31.5, 'g')
ax.annotate('uniform', (100.0 / 32, 27), (3, 0), textcoords='offset points', color='g');
For example, the /
character (all bits zero) occurs more often than would be the case by chance. This arises from doubled letters in English text (for example the slightly contrived 'subbookkeeper') or indeed from any other repeated characters, say spaces.
Following the insight of the '1+2 break-in', we look for a pair of impulses whose sum in the delta-plain stream is distinctly non-uniform. The motivation is that it was computationally infeasible to brute-force a collection of five wheel settings — this would require examining 41 × 31 × 29 × 26 × 23 = 22,041,682 possibilities. However, it was feasible to brute-force just two wheel-settings together (say 31 × 29 = 899). Having said that, it was a radical step at the time to consider that a brute-force search through c.1000 possibilities, each one involving a computation over thousands of teleprinter characters, might be feasible.
The first job is to annotate the data-frame rows with the breakdown of the code into its five impulses. Recall that Impulse 1 is the $2^4$ bit and so on down to Impulse 5 being the $2^0$ bit. We create a data-frame holding the information on how each code breaks down into impulses:
all_codes = np.arange(32, dtype=np.uint32)
impulse_breakdown_df = pd.DataFrame.from_items([
('i{}'.format(i + 1), np.minimum(all_codes & (1 << (4 - i)), 1))
for i in range(5)])
impulse_breakdown_df.index.name = 'code'
impulse_cols = list(impulse_breakdown_df.columns)
and append it to our frequency data:
delta_plain_distrn = pd.concat([delta_plain_distrn, impulse_breakdown_df], axis=1)
delta_plain_distrn.sort_values('freq', ascending=False).head(10)
And now for each combination of two impulses, we find the distribution of 0s vs 1s in the sum of those impulses. Recall that 'dot' was the name given by BP to a zero in this context.
def impulse_pair_dot_probs(distrn):
freqs_2_impulses_rows = {
'{0}+{1}'.format(i1, i2):
distrn.query('({0} + {1}) % 2 == 0'.format(i1, i2))['freq'].sum()
for ii, i1 in enumerate(impulse_cols)
for i2 in impulse_cols[ii + 1:]}
freqs_2_impulses = pd.DataFrame(freqs_2_impulses_rows, index=['freq']).T
props_2_impulses = freqs_2_impulses / distrn['freq'].sum()
excesses = (props_2_impulses - 0.5).rename(columns={'freq': 'excess_prop'})
excesses.index.name = 'impulse_pair'
excesses.reset_index(inplace=True)
excesses['excess_pct'] = 100.0 * excesses['excess_prop']
return excesses
We are interested in which impulse-pairs have a sum whose distribution differs markedly from the uniform '50% zero, 50% one' we would expect by chance:
delta_plain_excesses = impulse_pair_dot_probs(delta_plain_distrn)
delta_plain_excesses
It is easier to see this graphically:
def plot_impulse_pair_sum_dot_excesses(excesses, name, lims=None):
plt.figure(figsize=(5.0, 6.0))
ax = sns.stripplot(y='impulse_pair', x='excess_pct', color='k', size=8.0,
data=excesses.sort_values('excess_pct'))
ax.set_ylabel('impulse pair sum')
ax.set_xlabel('excess proportion (%)')
ax.set_title('{} excess percent (wrt 50%) of impulse-pair-sum being "dot"'.format(name),
size='small')
ax.set_ylim(-0.25, 9.25)
if lims is not None:
ax.set_xlim(-lims, lims)
ax.vlines(0.0, -0.25, 9.25, 'g')
return ax
plot_impulse_pair_sum_dot_excesses(delta_plain_excesses, 'delta-plain');
Ths most marked differences from 50/50 are in impulse pairs (3)+(4) and (4)+(5), which have a deficit of 'dot' compared to 50%.
Because the Psi wheels do not move every character, the insight is that delta-extended-Psi is the 'all bits zero' character, /
, much more often than would be expected by chance. Further, the other characters are not uniformly distributed either. We now compute this distribution in detail.
Recall that we are supposing the cryptanalsyts know all of the wheel patterns, and are trying to 'set' the message by working out the settings. We therefore can work with the known patterns of the Psi and Mu wheels in the following.
One approach would be to generate a large sample of delta-extended-Psi and calculate statistics from it. However, we can look at this more theoretically.
The notation used in GRT is:
The proportion of crosses in the TM stream is called a. [p.17, 11C(d)]
The 'TM' stream is the 'Total Motor' stream — the sequence of dots or crosses which controls the advancement of the Psi wheels while creating extended-Psi.
The 'basic motor' is the stream generated by the 37-position Mu wheel, and the 'total motor' is derived from the 'basic motor' in combination with various choices of 'limitation'. We are considering the simpler case of no 'limitation', so TM stream = BM stream, and so a = proportion of crosses (recall 'cross' = '1') in mu(37).
assert specimen_mu[1].size == 37
mu_37_cross_proportion = specimen_mu[1].sum() / specimen_mu[1].size
print('Proportion of crosses in mu_37 = {:.2f}'.format(mu_37_cross_proportion))
The other component of the statistics of the delta-extended-Psi stream is referred to in GRT as follows:
The proportion of crosses in each Δψ [NB this is delta-Psi not delta-extended-Psi] is called b. [p.17, 11C(d) again]
We compute this for the five (known) Psi wheels:
def delta_wheel(x):
# This is different from delta_stream() --- wheel pattern is cyclic.
return x ^ rot_wheel(x, 1)
delta_psi_info = pd.DataFrame.from_items([
('wheel', ['psi_%d' % (i + 1) for i in range(len(specimen_psi))]),
('n_cams', [len(w) for w in specimen_psi]),
('delta_cross_proportion', [delta_wheel(w).sum() / w.size for w in specimen_psi])])
delta_psi_info
The GRT tells us that 'after March 1942', wheel patterns were nearly always chosen such that ab=1/2. In our case:
delta_psi_info['extended_cross_proportion'] \
= mu_37_cross_proportion * delta_psi_info.delta_cross_proportion
delta_psi_info
which is a good bit away from 'extended cross proportion = 1/2'. Maybe the specimen is an early one? The low value of ab will make the statistics more pronouncedly non-uniform, which will be helpful for this example.
The delta-extended-Psi stream is deterministic. However, over a long stream, the following approximation will be good. We act as if a character of delta-extended-Psi is generated according to:
mu_37_cross_proportion
.The approximation assumes that all five Psi wheels have the same delta-cross proportion, which is close to being true for our specimen wheels.
Putting this together, we see that the probability of delta-extended-Psi being some particular character depends only on the number of '1's ('crosses') in that character:
def delta_psi_prime_distribution(a, b):
def prob(i):
n_crosses = sum(1 for ch in format(i, '05b') if ch == '1')
n_dots = 5 - n_crosses
# Probability of generating letter with code 'i' in the
# case of a Total Motor cross:
tm_x_p = (1.0 - b) ** n_dots * b ** n_crosses
# Probability of generating letter with code 'i' in the
# case of a Total Motor dot:
tm_o_p = (1.0 if i == 0 else 0.0)
# Combine according to probability of Total Motor cross:
p = (1 - a) * tm_o_p + a * tm_x_p
return p
raw_df = pd.DataFrame.from_items([('code', np.arange(32)),
('letter_name', letter_names),
('prob', list(map(prob, np.arange(32))))])
return raw_df.set_index('code')
The delta-cross-proportion values (b) are all fairly similar, so take an average value:
specimen_avg_b = delta_psi_info.delta_cross_proportion.mean()
print('Specimen psi wheels\' average b = {:.2f}'.format(specimen_avg_b))
and see what we get for the letter distribution of delta-extended-Psi:
specimen_dpsi_probs_raw = delta_psi_prime_distribution(mu_37_cross_proportion,
specimen_avg_b)
assert abs(specimen_dpsi_probs_raw['prob'].sum() - 1.0) < 1.0e-15
specimen_dpsi_probs = pd.concat([specimen_dpsi_probs_raw, impulse_breakdown_df],
axis=1)
specimen_dpsi_probs.sort_values('prob', ascending=False).head(8)
Almost half the time, delta-extended-Psi is the all-bits-zero value '/
'; almost all of these cases occur when the Psi wheels do not move. If the Psi wheels do move, a cross is more likely than a dot in each delta-Psi impulse (b ≈ 0.68 > ½), so characters with more crosses in their representation (8
, 5KXQV
) are more likely than those with fewer (9T34E
, not shown in this 'head' of the table).
The distribution of characters in the delta-de-Chi stream is (to a good approximation over the long run) the convolution of:
This assumes independence of the keystream and the plaintext, which is reasonable.
def convolve_distributions(freqs0, freqs1):
records = [(c0, c1, f0, f1)
for c0, f0 in zip(freqs0.index.values, freqs0['prob'])
for c1, f1 in zip(freqs1.index.values, freqs1['prob'])]
df = pd.DataFrame.from_records(records, columns=['code0', 'code1', 'prob0', 'prob1'])
df['code_sum'] = np.array(df.code0) ^ np.array(df.code1)
df['freq'] = df.prob0 * df.prob1
adf = df.groupby('code_sum')[['freq']].sum()
adf['letter_name'] = letter_names
adf.index.name = 'code'
return pd.concat([adf[['letter_name', 'freq']], impulse_breakdown_df], axis=1)
As noted, delta-de-Chi is equivalent to delta-plain xor
delta-extended-Psi, so:
delta_de_chi = convolve_distributions(specimen_dpsi_probs, delta_plain_distrn)
assert abs(delta_de_chi['freq'].sum() - 1.0) < 1.0e-15
delta_de_chi.sort_values('freq', ascending=False).head(8)
delta_de_chi_excesses = impulse_pair_dot_probs(delta_de_chi)
delta_de_chi_excesses
plot_impulse_pair_sum_dot_excesses(delta_de_chi_excesses, 'delta-de-Chi', lims=3);
The pattern here is, as expected, very similar in shape to the results for delta-plain, but weakened.
For the sake of this example, we will set the pair '(3)+(4)' even though it is not the strongest — (4)+(5) is slightly stronger. Chi-wheel 3 is bigger than Chi-wheel 5, so the number of combinations for (3)+(4) is larger, and so (looking ahead) it will be more gratifying when the analysis works.
BP's so-called '1+2 break-in' exploited the fact that, for most sorts of German message, the impulse-pair (1)+(2) gave the strongest deviation from uniformity in delta-de-Chi.
Recall we know the wheel patterns; what we don't know is the 'setting' (starting position) of each wheel.
We can now replicate what Colossus did, which is to brute-force the combinations of all possible values of the joint Chi(3) and Chi(4) settings. We count the occurrences of 'dot' in the putative delta-de-Chi under all candidate settings for Chi(3) and Chi(4), and hope that the true setting will reveal itself by an unusually low 'dot' count. The settings we use for the other Chi wheels are irrelevant, so we use zero.
This procedure — running counts for all combinations of two wheels' settings — was referred to as a 'long run' on Colossus. A 'short run' studied all settings for a single wheel.
tape_length = specimen_ciphertext.size
records = []
for chi3 in range(specimen_chi[2].size):
for chi4 in range(specimen_chi[3].size):
candidate_chi_wheels = Chi(specimen_chi, [0, 0, chi3, chi4, 0])
candidate_chi_letters = candidate_chi_wheels.letters(tape_length)
candidate_de_chi = specimen_ciphertext ^ candidate_chi_letters
impulse_3 = np.minimum(1, candidate_de_chi & 4)
impulse_4 = np.minimum(1, candidate_de_chi & 2)
dot3p4 = np.sum(delta_stream(impulse_3 ^ impulse_4) == 0)
records.append((chi3, chi4, dot3p4))
long_run_df = pd.DataFrame.from_records(
records,
columns=['setting_chi3', 'setting_chi4', 'n_dots_3p4'])
print('Created {} records'.format(len(long_run_df)))
In use, Colossus was configured to print the count results of any promising-looking setting, where 'promising' meant 'at least 2σ away from the mean', and the mean and standard deviation are taken under the null hypothesis that dot/cross are equally likely. In this case, we know that there is a deficit of dots in (3)+(4) so we require the count to be less than the threshold μ−2σ.
binomial_mean = specimen_length * 0.5
binomial_std = np.sqrt(specimen_length * 0.5 * 0.5)
print_threshold = int(binomial_mean - 2.0 * binomial_std)
print('Records with a count below {} will be printed.'.format(print_threshold))
printed_df = long_run_df.query('n_dots_3p4 < {}'.format(print_threshold)).copy()
print('Number of records printed: {}'.format(len(printed_df)))
To study the 'printout', it is easier to look at a graph. We are looking for unusually low dot-counts, so 'promising' settings will be plotted far to the left of the graph:
printed_df['label'] = printed_df.apply(lambda r: '({0}, {1})'.format(*tuple(r)), axis=1)
plt.figure(figsize=(8.0, 14.0))
ax = sns.stripplot(y='label', x='n_dots_3p4', color='k', size=8.0, data=printed_df)
ax.set_xlabel('number of dots in delta-de-Chi (3)+(4)')
ax.set_ylabel('candidate (chi3, chi4) setting')
ax.set_title("'printer' output from long (chi3, chi4)-setting run",
size='small')
ax.set_xlim(7490, 8002)
ax.vlines([binomial_mean - n * binomial_std for n in range(9)],
-1, len(printed_df), 'g', lw=1.0);
for n in range(9):
ax.annotate('{}\u03c3'.format(n) if n else '\u03bc',
(binomial_mean - n * binomial_std, 9.75), (3, 0),
textcoords='offset points',
size='small', color='g');
The most promising setting (smallest number of dots) by a long way has Chi(3) set at 10 and Chi(4) set at 8 and gives a score around 7.5σ below the mean. These are the correct settings for these two wheels — the setting process was successful.
There are 29 records printed, which is more than we would expect a '<μ−2σ' threshold to give out of 754 — we would expect about:
print('Expected c.{:.0f} printed records under binomial distribution'
.format(len(long_run_df) * 0.025))
The explanation is that even under an incorrect Chi-wheel setting, the distribution is not truly uniform. We see, for example, that a setting of (5, 8) also gives a very strong deficit, although not as strong as the correct (10, 8). Looking at the delta-Chi(3) patterns at a setting of 10 and of 5 reveals why:
delta_chi_3 = delta_wheel(specimen_chi[2]) # [0] is Chi1, [1] is Chi2, etc.
print(''.join(map(str, rot_wheel(delta_chi_3, 10))))
print(''.join(map(str, rot_wheel(delta_chi_3, 5))))
n_agreements = np.sum(rot_wheel(delta_chi_3, 10) == rot_wheel(delta_chi_3, 5))
print('\nSettings agree at {} places out of {}'
.format(n_agreements, len(delta_chi_3)))
The wheel pattern is very similar to a shifted copy of itself. This was known as a 'slide' at BP and was a common feature of real wheel patterns.
From here, Colossus operators would perform other 'runs' to attempt to set the other Chi wheels. They would then get the 'de-Chi' of the intercept, and from there either hand or machine methods were used to set the Psi and Mu wheels and hence recover the original message.
This case was easy for a few reasons.
Nonetheless, it is satisfying to work through the details and see the results.
The underlying Jupyter notebook for this page, together with the required data files, are available as a zip file. To use them, you will need various Python packages to be set up. One straightforward way is to use the Anaconda Python distribution, and do:
conda create -n dickens_teleprinter 'python=3.4' numpy pandas cython numexpr seaborn jupyter
source activate dickens_teleprinter
jupyter notebook
You may need to 'trust' the notebook in order to run it.
This code in this notebook is available under the GNU General Public License version 3 (or, at your option, any later version). The text is available under the CC BY-SA 4.0 license. The data files included in the zip file are Project Gutenberg copies of the public-domain Dickens novels, and with regard to those two files, the following sentence applies: This eBook is for the use of anyone anywhere at no cost and with almost no restrictions whatsoever. You may copy it, give it away or re-use it under the terms of the Project Gutenberg License included with this eBook or online at www.gutenberg.org.