The goal of this notebook is to explore one aspect of the code-cracking work done at Bletchley Park ('BP') during World War II, in particular their work on the 'Lorenz' cipher. We will demonstrate the process of 'setting *Chi* wheels' as performed at BP with the aid of the purpose-built Colossus computer. There is a good overall guide to the workings of the Lorenz machine and cipher, and the attacks on it, in the Wikipedia 'Cryptanalysis of the Lorenz cipher' page, but to summarise:

- Plaintext is expressed as a stream of 5-bit characters.
- Ciphertext is computed as the
`xor`

of the plaintext and a 5-bit keystream. - The keystream is composed as the
`xor`

of a*Chi*stream and an*extended Psi*stream. - Each of the five individual bitstreams of the
*Chi*is a cyclic repetition of a fixed-length*pattern*, generated by cams around the circumference of a turning wheel. The five*Chi*patterns had different lengths, with the shortest being 23 bits long and the longest being 41 bits long. After enciphering each character, all the*Chi*wheels step one position onwards. - The
*extended Psi*is generated from the*Psi*stream. The*Psi*stream is generated in the same way as the*Chi*stream, using five wheels of lengths 43 up to 59; the*extended Psi*stream is then generated from*Psi*by repeating some of the*Psi*characters, as controlled by two*Mu*(motor) wheels. Details are in the Wikipedia article, and implemented in the code below. (We do not consider the more advanced 'limitations' here.)

To know the keystream, then, you need to know both:

- the pattern of 1s (called 'cross' at BP) and 0s (called 'dot' at BP) round the edge of each wheel;
- the starting position ('setting') of each wheel.

The wheel patterns persisted for many messages, but the 'settings' were different for each message.

When performing '*Chi* setting' for some particular intercepted enciphered message, the cryptanalysts knew:

- the structure and workings of the cipher machine;
- the patterns around the edges of all the wheels,

but did not know:

- the settings (start positions) of the wheels.

This notebook will demonstrate the setting of two *Chi* wheels, given an 'intercepted message'.

*Chi* wheels, using the techniques developed at BP and implemented using Colossus. We suppose that the *Chi* wheel *patterns* have already been cracked.

In [1]:

```
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pylab as plt
import gzip
%load_ext cython
%matplotlib inline
```

**General Report on Tunny with Emphasis on Statistical Methods (1945)**' (*GRT* henceforth):

In [2]:

```
specimen_patterns_raw = [
'0+0+0 0+0++ 0+0+0 +0+00 ++000 ++++0 0+0+0 +0++0 ++0',
'00+0+ 0++0+ 0+++0 0++0+ 0+00+ +0000 +0+0+ ++00+ 0++0+ 0+',
'0+00+ ++000 +0+00 0++00 +0+0+ +0+0+ ++00+ 0+00+ 0+0++ +0+0+ +',
'+0++0 0++0+ 00+0+ 0+00+ +0000 +++00 +0+++ +0+0+ 0+0+0 +00+0 ++0',
'+0+00 +0+0+ 0+00+ 000++ 000+0 +++0+ +0+0+ 0+00+ ++00+ 0+++0 0+0++ 0+0+',
'++000 +0++0 ++++0 +++00 ++000 +0+0+ 000++ +0',
'+0+00 00+++ 00+++ +0+++ +00++ ++0++ ++0++ ++000 ++++0 +++0+ ++00+ +++0+ +',
'0++00 ++00+ +00++ 00++0 0++++ 00++0 0+00+ +00++ 0',
'000+0 +00+0 ++000 0+++0 0+0++ ++0++ 0',
'+++00 +++00 +++00 +0000 +++0+ 00+0',
'00+0+ +++00 +00++ 0000+ +++00 +',
'++000 +++0+ 00+++ 0+000 ++0']
specimen_patterns = [p.replace(' ', '').replace('+', '1')
for p in specimen_patterns_raw]
```

*Chi*, *Psi*, and *Mu* wheels:

In [3]:

```
specimen_chi_chars = [specimen_patterns[i] for i in [7, 8, 9, 10, 11]]
specimen_psi_chars = [specimen_patterns[i] for i in [0, 1, 2, 3, 4]]
specimen_mu_chars = [specimen_patterns[i] for i in [6, 5]]
def mk_arrays(strs):
return [np.array([int(c) for c in s], dtype=np.uint8) for s in strs]
specimen_chi = mk_arrays(specimen_chi_chars)
specimen_psi = mk_arrays(specimen_psi_chars)
specimen_mu = mk_arrays(specimen_mu_chars)
_exp_chi_lengths = [41, 31, 29, 26, 23]
_exp_psi_lengths = [43, 47, 51, 53, 59]
_exp_mu_lengths = [61, 37]
assert np.all(list(map(len, specimen_chi)) == _exp_chi_lengths)
assert np.all(list(map(len, specimen_psi)) == _exp_psi_lengths)
assert np.all(list(map(len, specimen_mu)) == _exp_mu_lengths)
```

The BP team used the term *impulse* for what we would now call a 'bit' or 'bitstream', and we adopt the 'impulse' term here. Also, the published descriptions of BP's work present 'Impulse 1' at the left of the group of five impulses, so we maintain this arrangement, which makes 'Impulse 1' be the most significant bit of a binary number.

First some utility functions for working with wheels and collections of impulses:

In [4]:

```
def letter_from_impulses(impulses):
return (16 * impulses[0]
+ 8 * impulses[1]
+ 4 * impulses[2]
+ 2 * impulses[3]
+ 1 * impulses[4])
def impulse_stream_from_wheel(w, n):
"Return an array of size N formed by cycling through the elements of W."
n_repeats_needed = int(np.ceil(n / w.size))
possibly_overlength_stream = np.tile(w, n_repeats_needed)
return possibly_overlength_stream[:n]
def rot_wheel(w, n):
"""
Return the result of rotating the given pattern W by N positions.
The [0] element of the result is the [N] element of the input W.
"""
return np.concatenate([w[n:], w[:n]])
```

*Chi* wheels step on after every plain-text letter has been enciphered, so it is straightforward to generate the *Chi* contribution to the keystream, starting from particular settings:

In [5]:

```
class Chi:
def __init__(self, chi, step_counts):
"STEP_COUNTS --- list of values used as the wheels' settings."
self.chi = [rot_wheel(w, n) for w, n in zip(chi, step_counts)]
def letters(self, n_sprockets):
impulses = [impulse_stream_from_wheel(w, n_sprockets) for w in self.chi]
return letter_from_impulses(impulses)
```

For example, with the above specimen *Chi* patterns, and settings of

- 7 for the first
*Chi*wheel; - 10 for the second;
- 22 for the third;
- 12 for the fourth; and
- 19 for the fifth,

we can generate:

In [6]:

```
chi_wheels = Chi(specimen_chi, [7, 10, 22, 12, 19])
chi_wheels.letters(16)
```

Out[6]:

*Psi* and *extended Psi* process above was conceptually useful, but in practice the (unextended) *Psi* stream is never created in the Lorenz machine. Instead, the *Mu* wheels control whether the *Psi* wheels move onwards. This behaviour means that it is slightly more involved to generate the *extended Psi* stream, as compared to the *Chi* stream:

In [7]:

```
class PsiAndMu:
def __init__(self, psi, mu, step_counts):
"""
STEP_COUNTS --- first five elements give Psi settings; remaining
two elements give the Mu settings.
"""
self.psi = [rot_wheel(w, n) for w, n in zip(psi, step_counts[:5])]
self.mu = [rot_wheel(w, n) for w, n in zip(mu, step_counts[5:])]
def letters(self, n_sprockets):
mu_0_stream = impulse_stream_from_wheel(self.mu[0], n_sprockets - 1)
unext_mu_1_stream = impulse_stream_from_wheel(self.mu[1], n_sprockets)
mu_1_idxs = np.concatenate([[0], np.cumsum(mu_0_stream.astype(np.int32))])
ext_mu_1_stream = unext_mu_1_stream[mu_1_idxs]
impulses = [impulse_stream_from_wheel(w, n_sprockets) for w in self.psi]
unext_psi_stream = letter_from_impulses(impulses)
psi_idxs = np.concatenate([[0], np.cumsum(ext_mu_1_stream[:-1].astype(np.int32))])
ext_psi_stream = unext_psi_stream[psi_idxs]
return ext_psi_stream
```

For example:

In [8]:

```
psi_mu_wheels = PsiAndMu(specimen_psi, specimen_mu, [3, 5, 7, 1, 3, 8, 10])
psi_mu_wheels.letters(16)
```

Out[8]:

Using the specimen wheel patterns, suppose the wheel-settings for our message were:

wheel | setting | wheel | setting | wheel | setting | ||
---|---|---|---|---|---|---|---|

Chi 1 | 33 | Psi 1 | 19 | Mu 1 | 53 | ||

Chi 2 | 27 | Psi 2 | 25 | Mu 2 | 20 | ||

Chi 3 | 10 | Psi 3 | 33 | ||||

Chi 4 | 8 | Psi 4 | 49 | ||||

Chi 5 | 16 | Psi 5 | 22 |

We can then generate a length of keystream. For the purposes of this demonstration, we will work with a long message, to ensure the statistical properties used below emerge clearly. The Colossus machines at BP could work with messages up to 10,000 characters long, or 20,000 in the case of some specially extended machines, so we choose 16,000 as a representative length for a 'long' (but not implausibly so) message.

In [9]:

```
specimen_length = 16000
specimen_chi_wheels = Chi(specimen_chi, [33, 27, 10, 8, 16])
specimen_keystream_chi = specimen_chi_wheels.letters(specimen_length)
specimen_psi_mu_wheels = PsiAndMu(specimen_psi, specimen_mu, [19, 25, 33, 49, 22, 53, 20])
specimen_keystream_extended_psi = specimen_psi_mu_wheels.letters(specimen_length)
specimen_keystream = specimen_keystream_chi ^ specimen_keystream_extended_psi
```

We will use an extract from Dickens's *The Pickwick Papers* for our specimen plain-text message, extracting a chunk from the middle of the book.

The file as downloaded from Project Gutenberg needs to be converted to its 5-bit teleprinter representation. For details see the comments in the code below.

In [10]:

```
# Closest equivalents to each printable ASCII character. Non-printable
# characters are mostly invalid ('~'); exceptions are LF and CR.
teleprinter_from_ascii = (
b"~~~~~~~~~~ ~~ ~~~~~~~~~~~~~~~~~~ .'##%%'()#+,-./0123456789::(=)?"
b"@ABCDEFGHIJKLMNOPQRSTUVWXYZ(/)--'ABCDEFGHIJKLMNOPQRSTUVWXYZ(/)-~")
# As an intermediate representation, we will pretend that there's a
# 6-bit code where the most-significant bit indicates which 'shift'
# we're in (0=letter; 1=figure).
teleprinter_meaning_6bit = (
b"~T~O HNM~LRGIPCVEZDBSYFXAWJ~UQK~"
b"~5~9 #,.~)4@80:=3+~?'6%/-2~~71(~")
# Create look-up table from 7-bit ASCII to 6-bit teleprinter.
tp_6bit_lut = np.full(128, 255, dtype=np.uint8)
for i, x in enumerate(teleprinter_meaning_6bit):
tp_6bit_lut[x] = i
```

In [11]:

```
%%cython --cplus
from libcpp.vector cimport vector
import numpy as np
import cython
cimport numpy as np
np.import_array()
# To convert the made-up 6-bit code into the real 5-bit teleprinter code, we need
# to keep track of whether we're in 'letter' or 'figure' shift, and emit the shift
# codes accordingly to switch between them. An artifact of 'tp_6bit_lut' is that
# all ASCII spaces get turned into the figure-shift space 6-bit code, so we handle
# that case explicitly.
@cython.boundscheck(False)
def tp_5bit_from_6bit(xs_6bit):
cdef np.uint8_t TP_SPACE_IN_FIG_SHIFT = 36, TP_5BIT_SPACE = 4
cdef np.uint8_t TP_5BIT_SHIFT_MASK = 32, TP_5BIT_CHAR_MASK = 31
cdef np.uint8_t TP_LET_SHIFT = 31, TP_FIG_SHIFT = 27
cdef np.uint8_t ch, required_shift, active_shift = 0
cdef vector[np.uint8_t] xs_5bit
cdef np.uint8_t[::1] vw_xs_6b = xs_6bit
cdef np.uint32_t i
for i in range(vw_xs_6b.size):
ch = vw_xs_6b[i]
if ch == TP_SPACE_IN_FIG_SHIFT:
xs_5bit.push_back(TP_5BIT_SPACE)
else:
required_shift = ch & TP_5BIT_SHIFT_MASK
if required_shift != active_shift:
xs_5bit.push_back(TP_LET_SHIFT if required_shift == 0 else TP_FIG_SHIFT)
active_shift = required_shift
xs_5bit.push_back(ch & TP_5BIT_CHAR_MASK)
xs_5bit_arr = np.empty(xs_5bit.size(), dtype=np.uint8)
cdef np.uint8_t[::1] xs_5bit_vw = xs_5bit_arr
for i in range(xs_5bit_arr.size):
xs_5bit_vw[i] = xs_5bit.at(i)
return xs_5bit_arr
```

In [12]:

```
def teletype_representation(filename):
"""
Read a .gz file of utf-8 text with BOM, and return its closest 5-bit
teleprinter representation as a numpy 'uint8' vector.
"""
file_bytes = gzip.GzipFile(filename, 'rb').read()
utf_bom = b'\xef\xbb\xbf'
assert file_bytes[:len(utf_bom)] == utf_bom
data_ascii = np.frombuffer(file_bytes[len(utf_bom):], dtype=np.uint8)
data_tp = np.frombuffer(teleprinter_from_ascii, dtype=np.uint8)[data_ascii]
assert ord('~') not in data_tp
data_6bit = tp_6bit_lut[data_tp]
assert 255 not in data_6bit
data_5bit = tp_5bit_from_6bit(data_6bit)
assert np.all(data_5bit < 32)
return data_5bit
```

In [13]:

```
full_specimen_plaintext \
= teletype_representation('data/Dickens-Pickwick-Papers.txt.utf-8.gz')
specimen_start_idx = 1000000
specimen_end_idx = specimen_start_idx + specimen_length
specimen_plaintext = full_specimen_plaintext[specimen_start_idx:specimen_end_idx]
```

And finally we can get the specimen ciphertext:

In [14]:

```
specimen_ciphertext = specimen_plaintext ^ specimen_keystream
```

Suppose that this `specimen_ciphertext`

was transmitted.

The Lorenz cipher machine did a good job of masking the distribution of the letters in the source material, taken one at a time — i.e., the distribution of ciphertext letters was close to uniform. However, the BP team discovered a weakness in terms of the distribution of the 'delta' of adjacent letters of the message.

'Delta' means the 'difference' between adjacent letters, i.e., the result of subtracting (in a bitwise sense) a letter from the previous one in the stream. However, under bitwise operations, 'addition' and 'subtraction' are the same operation, `xor`

.

In the following example, we print the resulting 'delta' aligned such that each of its elements is between the two elements of the original stream which `xor`

to produce it.

In [15]:

```
def delta_stream(x):
return x[1:] ^ x[:-1]
example_x = np.array([0, 1, 1, 0, 1, 0, 0, 1, 1])
print('x :', example_x)
print('delta_stream(x): ', delta_stream(example_x))
```

We now point out two facts noted at BP:

The properties of plain (Dickens) English mean that delta-plain is not quite uniformly distributed.

The

*extended Psi*stream often repeats characters, because the*Psi*wheels do not move on after every letter. When the*extended Psi*has the same character for two consecutive positions, delta-*extended-Psi*is the all-bits-zero character. Therefore delta-*extended-Psi*is the all-bits-zero character much more often than would be the case by chance. There is also further non-uniformity to its distribution.

Putting these two observations together, delta-plain `xor`

delta-*extended-Psi* will have some small but significant non-uniformity. Another way of thinking of "delta-plain `xor`

delta-*extended-Psi*" is as "delta-ciphertext `xor`

delta-*Chi*". This latter combination is known as 'delta-de-*Chi*', as it is the delta with the *Chi* component removed ('de-*Chi*' as in 'decaffeinated'). This leads us to the insight made at BP and used in the breaking of the *Chi*-wheel settings:

If we guess the *Chi* settings, and find the resulting delta-de-*Chi*, we should know when we've guessed right, because the non-uniform statistics of the true delta-de-*Chi* should be apparent.

We now work through the details of quantifying these observations.

*Nicholas Nickleby* as our sample text. (Using *The Pickwick Papers* would be cheating.)

In [16]:

```
training_plaintext \
= teletype_representation('data/Dickens-Nicholas-Nickleby.txt.utf-8.gz')
```

In [17]:

```
training_delta_plain = delta_stream(training_plaintext)
```

`/`

' for the all-bits-zero code, and digits for the other special codes:

In [18]:

```
letter_names = list('/T3O9HNM4LRGIPCVEZDBSYFXAWJ5UQK8')
letter_names_df = pd.DataFrame({'letter_name': letter_names})
```

In [19]:

```
raw_freq_df = pd.DataFrame(pd.Series(training_delta_plain).value_counts(),
columns=['freq'])
delta_plain_distrn = pd.concat([letter_names_df, raw_freq_df], axis=1)
delta_plain_distrn.index.name = 'code'
delta_plain_distrn['prob'] = delta_plain_distrn['freq'] / delta_plain_distrn['freq'].sum()
delta_plain_distrn['freq_pct'] = 100.0 * delta_plain_distrn['prob']
delta_plain_distrn.sort_values('freq', ascending=False).head(10)
```

Out[19]:

This is distinctly non-uniform; we can see this more clearly if we plot the frequencies:

In [20]:

```
plt.figure(figsize=(5.0, 12.0))
sns.set_context('notebook', font_scale=1.75)
ax = sns.stripplot(y='letter_name', x='freq_pct', color='k', size=8.0,
data=delta_plain_distrn.sort_values('freq_pct', ascending=False))
ax.set_xlabel('frequency (%)')
ax.set_ylabel('letter name')
ax.set_title('distribution of letters in teleprinter Dickens delta-plain',
size='small')
ax.vlines(100.0 / 32, -0.5, 31.5, 'g')
ax.annotate('uniform', (100.0 / 32, 27), (3, 0), textcoords='offset points', color='g');
```

`/`

character (all bits zero) occurs more often than would be the case by chance. This arises from doubled letters in English text (for example the slightly contrived 'subbookkeeper') or indeed from any other repeated characters, say spaces.

Following the insight of the '1+2 break-in', we look for a pair of impulses whose sum in the delta-plain stream is distinctly non-uniform. The motivation is that it was computationally infeasible to brute-force a collection of five wheel settings — this would require examining 41 × 31 × 29 × 26 × 23 = 22,041,682 possibilities. However, it was feasible to brute-force just two wheel-settings together (say 31 × 29 = 899). Having said that, it was a radical step at the time to consider that a brute-force search through c.1000 possibilities, each one involving a computation over thousands of teleprinter characters, might be feasible.

The first job is to annotate the data-frame rows with the breakdown of the code into its five impulses. Recall that *Impulse 1* is the $2^4$ bit and so on down to *Impulse 5* being the $2^0$ bit. We create a data-frame holding the information on how each code breaks down into impulses:

In [21]:

```
all_codes = np.arange(32, dtype=np.uint32)
impulse_breakdown_df = pd.DataFrame.from_items([
('i{}'.format(i + 1), np.minimum(all_codes & (1 << (4 - i)), 1))
for i in range(5)])
impulse_breakdown_df.index.name = 'code'
impulse_cols = list(impulse_breakdown_df.columns)
```

and append it to our frequency data:

In [22]:

```
delta_plain_distrn = pd.concat([delta_plain_distrn, impulse_breakdown_df], axis=1)
delta_plain_distrn.sort_values('freq', ascending=False).head(10)
```

Out[22]:

In [23]:

```
def impulse_pair_dot_probs(distrn):
freqs_2_impulses_rows = {
'{0}+{1}'.format(i1, i2):
distrn.query('({0} + {1}) % 2 == 0'.format(i1, i2))['freq'].sum()
for ii, i1 in enumerate(impulse_cols)
for i2 in impulse_cols[ii + 1:]}
freqs_2_impulses = pd.DataFrame(freqs_2_impulses_rows, index=['freq']).T
props_2_impulses = freqs_2_impulses / distrn['freq'].sum()
excesses = (props_2_impulses - 0.5).rename(columns={'freq': 'excess_prop'})
excesses.index.name = 'impulse_pair'
excesses.reset_index(inplace=True)
excesses['excess_pct'] = 100.0 * excesses['excess_prop']
return excesses
```

In [24]:

```
delta_plain_excesses = impulse_pair_dot_probs(delta_plain_distrn)
delta_plain_excesses
```

Out[24]:

It is easier to see this graphically:

In [25]:

```
def plot_impulse_pair_sum_dot_excesses(excesses, name, lims=None):
plt.figure(figsize=(5.0, 6.0))
ax = sns.stripplot(y='impulse_pair', x='excess_pct', color='k', size=8.0,
data=excesses.sort_values('excess_pct'))
ax.set_ylabel('impulse pair sum')
ax.set_xlabel('excess proportion (%)')
ax.set_title('{} excess percent (wrt 50%) of impulse-pair-sum being "dot"'.format(name),
size='small')
ax.set_ylim(-0.25, 9.25)
if lims is not None:
ax.set_xlim(-lims, lims)
ax.vlines(0.0, -0.25, 9.25, 'g')
return ax
plot_impulse_pair_sum_dot_excesses(delta_plain_excesses, 'delta-plain');
```

*Psi* wheels do not move every character, the insight is that delta-*extended-Psi* is the 'all bits zero' character, `/`

, much more often than would be expected by chance. Further, the other characters are not uniformly distributed either. We now compute this distribution in detail.

*Psi* and *Mu* wheels in the following.

One approach would be to generate a large sample of delta-*extended-Psi* and calculate statistics from it. However, we can look at this more theoretically.

The notation used in *GRT* is:

The proportion of crosses in the TM stream is called

a. [p.17, 11C(d)]

The 'TM' stream is the 'Total Motor' stream — the sequence of dots or crosses which controls the advancement of the *Psi* wheels while creating *extended-Psi*.

The 'basic motor' is the stream generated by the 37-position *Mu* wheel, and the 'total motor' is derived from the 'basic motor' in combination with various choices of 'limitation'. We are considering the simpler case of no 'limitation', so TM stream = BM stream, and so *a* = proportion of crosses (recall 'cross' = '1') in mu(37).

In [26]:

```
assert specimen_mu[1].size == 37
mu_37_cross_proportion = specimen_mu[1].sum() / specimen_mu[1].size
print('Proportion of crosses in mu_37 = {:.2f}'.format(mu_37_cross_proportion))
```

The other component of the statistics of the delta-*extended-Psi* stream is referred to in *GRT* as follows:

The proportion of crosses in each Δψ [NB this is delta-

Psinot delta-extended-Psi] is calledb. [p.17, 11C(d) again]

We compute this for the five (known) *Psi* wheels:

In [27]:

```
def delta_wheel(x):
# This is different from delta_stream() --- wheel pattern is cyclic.
return x ^ rot_wheel(x, 1)
delta_psi_info = pd.DataFrame.from_items([
('wheel', ['psi_%d' % (i + 1) for i in range(len(specimen_psi))]),
('n_cams', [len(w) for w in specimen_psi]),
('delta_cross_proportion', [delta_wheel(w).sum() / w.size for w in specimen_psi])])
delta_psi_info
```

Out[27]:

*ab=1/2*. In our case:

In [28]:

```
delta_psi_info['extended_cross_proportion'] \
= mu_37_cross_proportion * delta_psi_info.delta_cross_proportion
delta_psi_info
```

Out[28]:

*ab* will make the statistics more pronouncedly non-uniform, which will be helpful for this example.

The delta-*extended-Psi* stream is deterministic. However, over a long stream, the following approximation will be good. We act as if a character of delta-*extended-Psi* is generated according to:

- Draw a random number
*M*from {0, 1}, where the outcome 1 happens with probability*a*= 'Total Motor cross proportion'. In our case (with no limitation), this is the same as`mu_37_cross_proportion`

. - If
*M*is 0, then the*Psi*wheels do not move, and so delta-*extended-Psi*is all-bits-zero. - If
*M*is 1, then generate five independent {0, 1} variables, each having a probability*b*of being 1, and concatenate them to make the character of delta-*extended-Psi*.

The approximation assumes that all five *Psi* wheels have the same delta-cross proportion, which is close to being true for our specimen wheels.

Putting this together, we see that the probability of delta-*extended-Psi* being some particular character depends only on the number of '1's ('crosses') in that character:

In [29]:

```
def delta_psi_prime_distribution(a, b):
def prob(i):
n_crosses = sum(1 for ch in format(i, '05b') if ch == '1')
n_dots = 5 - n_crosses
# Probability of generating letter with code 'i' in the
# case of a Total Motor cross:
tm_x_p = (1.0 - b) ** n_dots * b ** n_crosses
# Probability of generating letter with code 'i' in the
# case of a Total Motor dot:
tm_o_p = (1.0 if i == 0 else 0.0)
# Combine according to probability of Total Motor cross:
p = (1 - a) * tm_o_p + a * tm_x_p
return p
raw_df = pd.DataFrame.from_items([('code', np.arange(32)),
('letter_name', letter_names),
('prob', list(map(prob, np.arange(32))))])
return raw_df.set_index('code')
```

The *delta-cross-proportion* values (*b*) are all fairly similar, so take an average value:

In [30]:

```
specimen_avg_b = delta_psi_info.delta_cross_proportion.mean()
print('Specimen psi wheels\' average b = {:.2f}'.format(specimen_avg_b))
```

and see what we get for the letter distribution of delta-*extended-Psi*:

In [31]:

```
specimen_dpsi_probs_raw = delta_psi_prime_distribution(mu_37_cross_proportion,
specimen_avg_b)
assert abs(specimen_dpsi_probs_raw['prob'].sum() - 1.0) < 1.0e-15
specimen_dpsi_probs = pd.concat([specimen_dpsi_probs_raw, impulse_breakdown_df],
axis=1)
specimen_dpsi_probs.sort_values('prob', ascending=False).head(8)
```

Out[31]:

*extended-Psi* is the all-bits-zero value '`/`

'; almost all of these cases occur when the *Psi* wheels do not move. If the *Psi* wheels do move, a cross is more likely than a dot in each delta-*Psi* impulse (*b* â‰ˆ 0.68 > ½), so characters with more crosses in their representation (`8`

, `5KXQV`

) are more likely than those with fewer (`9T34E`

, not shown in this 'head' of the table).

The distribution of characters in the delta-de-*Chi* stream is (to a good approximation over the long run) the convolution of:

- the distribution of characters in delta-plain with
- the (approximate) distribution of characters in the delta-
*extended-Psi*stream.

This assumes independence of the keystream and the plaintext, which is reasonable.

In [32]:

```
def convolve_distributions(freqs0, freqs1):
records = [(c0, c1, f0, f1)
for c0, f0 in zip(freqs0.index.values, freqs0['prob'])
for c1, f1 in zip(freqs1.index.values, freqs1['prob'])]
df = pd.DataFrame.from_records(records, columns=['code0', 'code1', 'prob0', 'prob1'])
df['code_sum'] = np.array(df.code0) ^ np.array(df.code1)
df['freq'] = df.prob0 * df.prob1
adf = df.groupby('code_sum')[['freq']].sum()
adf['letter_name'] = letter_names
adf.index.name = 'code'
return pd.concat([adf[['letter_name', 'freq']], impulse_breakdown_df], axis=1)
```

As noted, delta-de-*Chi* is equivalent to delta-plain `xor`

delta-extended-*Psi*, so:

In [33]:

```
delta_de_chi = convolve_distributions(specimen_dpsi_probs, delta_plain_distrn)
assert abs(delta_de_chi['freq'].sum() - 1.0) < 1.0e-15
delta_de_chi.sort_values('freq', ascending=False).head(8)
```

Out[33]:

In [34]:

```
delta_de_chi_excesses = impulse_pair_dot_probs(delta_de_chi)
delta_de_chi_excesses
```

Out[34]:

In [35]:

```
plot_impulse_pair_sum_dot_excesses(delta_de_chi_excesses, 'delta-de-Chi', lims=3);
```

The pattern here is, as expected, very similar in shape to the results for delta-plain, but weakened.

For the sake of this example, we will set the pair '(3)+(4)' even though it is not the strongest — (4)+(5) is slightly stronger. *Chi*-wheel 3 is bigger than *Chi*-wheel 5, so the number of combinations for (3)+(4) is larger, and so (looking ahead) it will be more gratifying when the analysis works.

BP's so-called '1+2 break-in' exploited the fact that, for most sorts of German message, the impulse-pair (1)+(2) gave the strongest deviation from uniformity in delta-de-*Chi*.

Recall we know the wheel patterns; what we don't know is the 'setting' (starting position) of each wheel.

We can now replicate what Colossus did, which is to brute-force the combinations of all possible values of the joint *Chi*(3) and *Chi*(4) settings. We count the occurrences of 'dot' in the putative delta-de-*Chi* under all candidate settings for *Chi*(3) and *Chi*(4), and hope that the true setting will reveal itself by an unusually low 'dot' count. The settings we use for the other *Chi* wheels are irrelevant, so we use zero.

This procedure — running counts for all combinations of two wheels' settings — was referred to as a 'long run' on Colossus. A 'short run' studied all settings for a single wheel.

In [36]:

```
tape_length = specimen_ciphertext.size
records = []
for chi3 in range(specimen_chi[2].size):
for chi4 in range(specimen_chi[3].size):
candidate_chi_wheels = Chi(specimen_chi, [0, 0, chi3, chi4, 0])
candidate_chi_letters = candidate_chi_wheels.letters(tape_length)
candidate_de_chi = specimen_ciphertext ^ candidate_chi_letters
impulse_3 = np.minimum(1, candidate_de_chi & 4)
impulse_4 = np.minimum(1, candidate_de_chi & 2)
dot3p4 = np.sum(delta_stream(impulse_3 ^ impulse_4) == 0)
records.append((chi3, chi4, dot3p4))
long_run_df = pd.DataFrame.from_records(
records,
columns=['setting_chi3', 'setting_chi4', 'n_dots_3p4'])
print('Created {} records'.format(len(long_run_df)))
```

*deficit* of dots in (3)+(4) so we require the count to be less than the threshold μ−2σ.

In [37]:

```
binomial_mean = specimen_length * 0.5
binomial_std = np.sqrt(specimen_length * 0.5 * 0.5)
print_threshold = int(binomial_mean - 2.0 * binomial_std)
print('Records with a count below {} will be printed.'.format(print_threshold))
printed_df = long_run_df.query('n_dots_3p4 < {}'.format(print_threshold)).copy()
print('Number of records printed: {}'.format(len(printed_df)))
```

In [38]:

```
printed_df['label'] = printed_df.apply(lambda r: '({0}, {1})'.format(*tuple(r)), axis=1)
plt.figure(figsize=(8.0, 14.0))
ax = sns.stripplot(y='label', x='n_dots_3p4', color='k', size=8.0, data=printed_df)
ax.set_xlabel('number of dots in delta-de-Chi (3)+(4)')
ax.set_ylabel('candidate (chi3, chi4) setting')
ax.set_title("'printer' output from long (chi3, chi4)-setting run",
size='small')
ax.set_xlim(7490, 8002)
ax.vlines([binomial_mean - n * binomial_std for n in range(9)],
-1, len(printed_df), 'g', lw=1.0);
for n in range(9):
ax.annotate('{}\u03c3'.format(n) if n else '\u03bc',
(binomial_mean - n * binomial_std, 9.75), (3, 0),
textcoords='offset points',
size='small', color='g');
```

*Chi*(3) set at 10 and *Chi*(4) set at 8 and gives a score around 7.5σ below the mean. These are the correct settings for these two wheels — the setting process was successful.

In [39]:

```
print('Expected c.{:.0f} printed records under binomial distribution'
.format(len(long_run_df) * 0.025))
```

*Chi*-wheel setting, the distribution is not truly uniform. We see, for example, that a setting of (5, 8) also gives a very strong deficit, although not as strong as the correct (10, 8). Looking at the delta-*Chi(3)* patterns at a setting of 10 and of 5 reveals why:

In [40]:

```
delta_chi_3 = delta_wheel(specimen_chi[2]) # [0] is Chi1, [1] is Chi2, etc.
print(''.join(map(str, rot_wheel(delta_chi_3, 10))))
print(''.join(map(str, rot_wheel(delta_chi_3, 5))))
n_agreements = np.sum(rot_wheel(delta_chi_3, 10) == rot_wheel(delta_chi_3, 5))
print('\nSettings agree at {} places out of {}'
.format(n_agreements, len(delta_chi_3)))
```

*Chi* wheels. They would then get the 'de-*Chi*' of the intercept, and from there either hand or machine methods were used to set the *Psi* and *Mu* wheels and hence recover the original message.

This case was easy for a few reasons.

- We had a very large body of training text.
- We had a very large ciphertext, 'intercepted' without error.
- The wheel patterns were unusually amenable to this attack, with their highly non-uniform delta-
*extended-Psi*statistics.

Nonetheless, it is satisfying to work through the details and see the results.

The underlying Jupyter notebook for this page, together with the required data files, are available as a zip file. To use them, you will need various Python packages to be set up. One straightforward way is to use the Anaconda Python distribution, and do:

```
conda create -n dickens_teleprinter 'python=3.4' numpy pandas cython numexpr seaborn jupyter
source activate dickens_teleprinter
jupyter notebook
```

You may need to 'trust' the notebook in order to run it.

This code in this notebook is available under the GNU General Public License version 3 (or, at your option, any later version). The text is available under the CC BY-SA 4.0 license. The data files included in the zip file are Project Gutenberg copies of the public-domain Dickens novels, and with regard to those two files, the following sentence applies: *This eBook is for the use of anyone anywhere at no cost and with almost no restrictions whatsoever. You may copy it, give it away or re-use it under the terms of the Project Gutenberg License included
with this eBook or online at www.gutenberg.org.*