Replicating Colossus's Chi-wheel setting

Ben North, April 2016

The goal of this notebook is to explore one aspect of the code-cracking work done at Bletchley Park ('BP') during World War II, in particular their work on the 'Lorenz' cipher. We will demonstrate the process of 'setting Chi wheels' as performed at BP with the aid of the purpose-built Colossus computer. There is a good overall guide to the workings of the Lorenz machine and cipher, and the attacks on it, in the Wikipedia 'Cryptanalysis of the Lorenz cipher' page, but to summarise:

  • Plaintext is expressed as a stream of 5-bit characters.
  • Ciphertext is computed as the xor of the plaintext and a 5-bit keystream.
  • The keystream is composed as the xor of a Chi stream and an extended Psi stream.
  • Each of the five individual bitstreams of the Chi is a cyclic repetition of a fixed-length pattern, generated by cams around the circumference of a turning wheel. The five Chi patterns had different lengths, with the shortest being 23 bits long and the longest being 41 bits long. After enciphering each character, all the Chi wheels step one position onwards.
  • The extended Psi is generated from the Psi stream. The Psi stream is generated in the same way as the Chi stream, using five wheels of lengths 43 up to 59; the extended Psi stream is then generated from Psi by repeating some of the Psi characters, as controlled by two Mu (motor) wheels. Details are in the Wikipedia article, and implemented in the code below. (We do not consider the more advanced 'limitations' here.)

To know the keystream, then, you need to know both:

  • the pattern of 1s (called 'cross' at BP) and 0s (called 'dot' at BP) round the edge of each wheel;
  • the starting position ('setting') of each wheel.

The wheel patterns persisted for many messages, but the 'settings' were different for each message.

When performing 'Chi setting' for some particular intercepted enciphered message, the cryptanalysts knew:

  • the structure and workings of the cipher machine;
  • the patterns around the edges of all the wheels,

but did not know:

  • the settings (start positions) of the wheels.

This notebook will demonstrate the setting of two Chi wheels, given an 'intercepted message'.

Aim of this notebook

In this notebook we encipher a message, and then, supposing it has been intercepted, work out the settings for two of the Chi wheels, using the techniques developed at BP and implemented using Colossus. We suppose that the Chi wheel patterns have already been cracked.

In [1]:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pylab as plt
import gzip
%load_ext cython
%matplotlib inline

Generate specimen ciphertext

We first work on creating the keystream, and then will apply it to some plaintext to produce the specimen ciphertext.

Specimen wheels patterns

The following patterns are actual wheel patterns used, as taken from p.21 of 'General Report on Tunny with Emphasis on Statistical Methods (1945)' (GRT henceforth):

In [2]:
specimen_patterns_raw = [
    '0+0+0 0+0++ 0+0+0 +0+00 ++000 ++++0 0+0+0 +0++0 ++0',
    '00+0+ 0++0+ 0+++0 0++0+ 0+00+ +0000 +0+0+ ++00+ 0++0+ 0+',
    '0+00+ ++000 +0+00 0++00 +0+0+ +0+0+ ++00+ 0+00+ 0+0++ +0+0+ +',
    '+0++0 0++0+ 00+0+ 0+00+ +0000 +++00 +0+++ +0+0+ 0+0+0 +00+0 ++0',
    '+0+00 +0+0+ 0+00+ 000++ 000+0 +++0+ +0+0+ 0+00+ ++00+ 0+++0 0+0++ 0+0+',
    '++000 +0++0 ++++0 +++00 ++000 +0+0+ 000++ +0',
    '+0+00 00+++ 00+++ +0+++ +00++ ++0++ ++0++ ++000 ++++0 +++0+ ++00+ +++0+ +',
    '0++00 ++00+ +00++ 00++0 0++++ 00++0 0+00+ +00++ 0',
    '000+0 +00+0 ++000 0+++0 0+0++ ++0++ 0',
    '+++00 +++00 +++00 +0000 +++0+ 00+0',
    '00+0+ +++00 +00++ 0000+ +++00 +',
    '++000 +++0+ 00+++ 0+000 ++0']

specimen_patterns = [p.replace(' ', '').replace('+', '1')
                     for p in specimen_patterns_raw]

The Lorenz machine's official numbering of its wheels did not match that assigned by BP, so to work with the BP notation we must select the appropriate patterns to get the various Chi, Psi, and Mu wheels:

In [3]:
specimen_chi_chars = [specimen_patterns[i] for i in [7, 8, 9, 10, 11]]
specimen_psi_chars = [specimen_patterns[i] for i in [0, 1, 2, 3, 4]]
specimen_mu_chars = [specimen_patterns[i] for i in [6, 5]]

def mk_arrays(strs):
    return [np.array([int(c) for c in s], dtype=np.uint8) for s in strs]

specimen_chi = mk_arrays(specimen_chi_chars)
specimen_psi = mk_arrays(specimen_psi_chars)
specimen_mu = mk_arrays(specimen_mu_chars)

_exp_chi_lengths = [41, 31, 29, 26, 23]
_exp_psi_lengths = [43, 47, 51, 53, 59]
_exp_mu_lengths = [61, 37]

assert np.all(list(map(len, specimen_chi)) == _exp_chi_lengths)
assert np.all(list(map(len, specimen_psi)) == _exp_psi_lengths)
assert np.all(list(map(len, specimen_mu)) == _exp_mu_lengths)

Simulate behaviour of cipher machine

The BP team used the term impulse for what we would now call a 'bit' or 'bitstream', and we adopt the 'impulse' term here. Also, the published descriptions of BP's work present 'Impulse 1' at the left of the group of five impulses, so we maintain this arrangement, which makes 'Impulse 1' be the most significant bit of a binary number.

First some utility functions for working with wheels and collections of impulses:

In [4]:
def letter_from_impulses(impulses):
    return (16 * impulses[0]
            + 8 * impulses[1]
            + 4 * impulses[2]
            + 2 * impulses[3]
            + 1 * impulses[4])

def impulse_stream_from_wheel(w, n):
    "Return an array of size N formed by cycling through the elements of W."
    n_repeats_needed = int(np.ceil(n / w.size))
    possibly_overlength_stream = np.tile(w, n_repeats_needed)
    return possibly_overlength_stream[:n]

def rot_wheel(w, n):
    """
    Return the result of rotating the given pattern W by N positions.
    The [0] element of the result is the [N] element of the input W.
    """
    return np.concatenate([w[n:], w[:n]])

Chi wheels

The Chi wheels step on after every plain-text letter has been enciphered, so it is straightforward to generate the Chi contribution to the keystream, starting from particular settings:

In [5]:
class Chi:
    def __init__(self, chi, step_counts):
        "STEP_COUNTS --- list of values used as the wheels' settings."
        self.chi = [rot_wheel(w, n) for w, n in zip(chi, step_counts)]

    def letters(self, n_sprockets):
        impulses = [impulse_stream_from_wheel(w, n_sprockets) for w in self.chi]
        return letter_from_impulses(impulses)

For example, with the above specimen Chi patterns, and settings of

  • 7 for the first Chi wheel;
  • 10 for the second;
  • 22 for the third;
  • 12 for the fourth; and
  • 19 for the fifth,

we can generate:

In [6]:
chi_wheels = Chi(specimen_chi, [7, 10, 22, 12, 19])
chi_wheels.letters(16)
Out[6]:
array([12, 11, 23, 16,  1,  5, 24, 30, 14,  7, 19, 25,  4, 15, 28, 24], dtype=uint8)

Psi and Mu wheels

The summary description of the Psi and extended Psi process above was conceptually useful, but in practice the (unextended) Psi stream is never created in the Lorenz machine. Instead, the Mu wheels control whether the Psi wheels move onwards. This behaviour means that it is slightly more involved to generate the extended Psi stream, as compared to the Chi stream:

In [7]:
class PsiAndMu:
    def __init__(self, psi, mu, step_counts):
        """
        STEP_COUNTS --- first five elements give Psi settings; remaining
        two elements give the Mu settings.
        """
        self.psi = [rot_wheel(w, n) for w, n in zip(psi, step_counts[:5])]
        self.mu = [rot_wheel(w, n) for w, n in zip(mu, step_counts[5:])]

    def letters(self, n_sprockets):
        mu_0_stream = impulse_stream_from_wheel(self.mu[0], n_sprockets - 1)
        unext_mu_1_stream = impulse_stream_from_wheel(self.mu[1], n_sprockets)
        mu_1_idxs = np.concatenate([[0], np.cumsum(mu_0_stream.astype(np.int32))])
        ext_mu_1_stream = unext_mu_1_stream[mu_1_idxs]

        impulses = [impulse_stream_from_wheel(w, n_sprockets) for w in self.psi]
        unext_psi_stream = letter_from_impulses(impulses)
        psi_idxs = np.concatenate([[0], np.cumsum(ext_mu_1_stream[:-1].astype(np.int32))])
        ext_psi_stream = unext_psi_stream[psi_idxs]
        
        return ext_psi_stream

For example:

In [8]:
psi_mu_wheels = PsiAndMu(specimen_psi, specimen_mu, [3, 5, 7, 1, 3, 8, 10])
psi_mu_wheels.letters(16)
Out[8]:
array([16, 10, 11, 20,  9, 22, 27, 27,  8, 27,  4, 20, 20, 20, 11, 24], dtype=uint8)

Final specimen key-stream

Using the specimen wheel patterns, suppose the wheel-settings for our message were:

wheel setting wheel setting wheel setting
Chi 1 33 Psi 1 19 Mu 1 53
Chi 2 27 Psi 2 25 Mu 2 20
Chi 3 10 Psi 3 33
Chi 4 8 Psi 4 49
Chi 5 16 Psi 5 22

We can then generate a length of keystream. For the purposes of this demonstration, we will work with a long message, to ensure the statistical properties used below emerge clearly. The Colossus machines at BP could work with messages up to 10,000 characters long, or 20,000 in the case of some specially extended machines, so we choose 16,000 as a representative length for a 'long' (but not implausibly so) message.

In [9]:
specimen_length = 16000

specimen_chi_wheels = Chi(specimen_chi, [33, 27, 10, 8, 16])
specimen_keystream_chi = specimen_chi_wheels.letters(specimen_length)

specimen_psi_mu_wheels = PsiAndMu(specimen_psi, specimen_mu, [19, 25, 33, 49, 22, 53, 20])
specimen_keystream_extended_psi = specimen_psi_mu_wheels.letters(specimen_length)

specimen_keystream = specimen_keystream_chi ^ specimen_keystream_extended_psi

Specimen plaintext

We will use an extract from Dickens's The Pickwick Papers for our specimen plain-text message, extracting a chunk from the middle of the book.

The file as downloaded from Project Gutenberg needs to be converted to its 5-bit teleprinter representation. For details see the comments in the code below.

Code

In [10]:
# Closest equivalents to each printable ASCII character.  Non-printable
# characters are mostly invalid ('~'); exceptions are LF and CR.
teleprinter_from_ascii = (
    b"~~~~~~~~~~ ~~ ~~~~~~~~~~~~~~~~~~ .'##%%'()#+,-./0123456789::(=)?"
    b"@ABCDEFGHIJKLMNOPQRSTUVWXYZ(/)--'ABCDEFGHIJKLMNOPQRSTUVWXYZ(/)-~")

# As an intermediate representation, we will pretend that there's a
# 6-bit code where the most-significant bit indicates which 'shift'
# we're in (0=letter; 1=figure).
teleprinter_meaning_6bit = (
    b"~T~O HNM~LRGIPCVEZDBSYFXAWJ~UQK~"
    b"~5~9 #,.~)4@80:=3+~?'6%/-2~~71(~")

# Create look-up table from 7-bit ASCII to 6-bit teleprinter.
tp_6bit_lut = np.full(128, 255, dtype=np.uint8)
for i, x in enumerate(teleprinter_meaning_6bit):
    tp_6bit_lut[x] = i
In [11]:
%%cython --cplus
from libcpp.vector cimport vector
import numpy as np
import cython
cimport numpy as np

np.import_array()

# To convert the made-up 6-bit code into the real 5-bit teleprinter code, we need
# to keep track of whether we're in 'letter' or 'figure' shift, and emit the shift
# codes accordingly to switch between them.  An artifact of 'tp_6bit_lut' is that
# all ASCII spaces get turned into the figure-shift space 6-bit code, so we handle
# that case explicitly.

@cython.boundscheck(False)
def tp_5bit_from_6bit(xs_6bit):
    cdef np.uint8_t TP_SPACE_IN_FIG_SHIFT = 36, TP_5BIT_SPACE = 4
    cdef np.uint8_t TP_5BIT_SHIFT_MASK = 32, TP_5BIT_CHAR_MASK = 31
    cdef np.uint8_t TP_LET_SHIFT = 31, TP_FIG_SHIFT = 27
    cdef np.uint8_t ch, required_shift, active_shift = 0
    cdef vector[np.uint8_t] xs_5bit
    cdef np.uint8_t[::1] vw_xs_6b = xs_6bit
    cdef np.uint32_t i
    for i in range(vw_xs_6b.size):
        ch = vw_xs_6b[i]
        if ch == TP_SPACE_IN_FIG_SHIFT:
            xs_5bit.push_back(TP_5BIT_SPACE)
        else:
            required_shift = ch & TP_5BIT_SHIFT_MASK
            if required_shift != active_shift:
                xs_5bit.push_back(TP_LET_SHIFT if required_shift == 0 else TP_FIG_SHIFT)
                active_shift = required_shift
            xs_5bit.push_back(ch & TP_5BIT_CHAR_MASK)
    
    xs_5bit_arr = np.empty(xs_5bit.size(), dtype=np.uint8)
    cdef np.uint8_t[::1] xs_5bit_vw = xs_5bit_arr
    for i in range(xs_5bit_arr.size):
        xs_5bit_vw[i] = xs_5bit.at(i)
    
    return xs_5bit_arr
In [12]:
def teletype_representation(filename):
    """
    Read a .gz file of utf-8 text with BOM, and return its closest 5-bit
    teleprinter representation as a numpy 'uint8' vector.
    """
    file_bytes = gzip.GzipFile(filename, 'rb').read()
    utf_bom = b'\xef\xbb\xbf'
    assert file_bytes[:len(utf_bom)] == utf_bom
    data_ascii = np.frombuffer(file_bytes[len(utf_bom):], dtype=np.uint8)

    data_tp = np.frombuffer(teleprinter_from_ascii, dtype=np.uint8)[data_ascii]
    assert ord('~') not in data_tp

    data_6bit = tp_6bit_lut[data_tp]
    assert 255 not in data_6bit
    
    data_5bit = tp_5bit_from_6bit(data_6bit)
    assert np.all(data_5bit < 32)
    
    return data_5bit

Extract specimen plaintext

Recall that we have chosen 16000 as the length of our encrypted message, and generated a keystream of this length. We therefore extract a 16000-long specimen of plaintext from somewhere in the middle of the novel:

In [13]:
full_specimen_plaintext \
    = teletype_representation('data/Dickens-Pickwick-Papers.txt.utf-8.gz')

specimen_start_idx = 1000000
specimen_end_idx = specimen_start_idx + specimen_length
specimen_plaintext = full_specimen_plaintext[specimen_start_idx:specimen_end_idx]

Specimen 'transmitted ciphertext'

And finally we can get the specimen ciphertext:

In [14]:
specimen_ciphertext = specimen_plaintext ^ specimen_keystream

Suppose that this specimen_ciphertext was transmitted.


Cryptanalysis

We now switch roles to consider the work done by the cryptanalysts, and suppose that they have intercepted the above ciphertext. Recall that we are going to suppose that the wheel patterns have already been worked out (from messages sent earlier this month, say), and that our job is to crack the 'wheel settings' for this newly-intercepted message.

Summary of impulse-pair delta-de-Chi attack

The Lorenz cipher machine did a good job of masking the distribution of the letters in the source material, taken one at a time — i.e., the distribution of ciphertext letters was close to uniform. However, the BP team discovered a weakness in terms of the distribution of the 'delta' of adjacent letters of the message.

'Delta' means the 'difference' between adjacent letters, i.e., the result of subtracting (in a bitwise sense) a letter from the previous one in the stream. However, under bitwise operations, 'addition' and 'subtraction' are the same operation, xor.

In the following example, we print the resulting 'delta' aligned such that each of its elements is between the two elements of the original stream which xor to produce it.

In [15]:
def delta_stream(x):
    return x[1:] ^ x[:-1]

example_x = np.array([0, 1, 1, 0, 1, 0, 0, 1, 1])
print('x              :', example_x)
print('delta_stream(x): ', delta_stream(example_x))
x              : [0 1 1 0 1 0 0 1 1]
delta_stream(x):  [1 0 1 1 1 0 1 0]

We now point out two facts noted at BP:

  • The properties of plain (Dickens) English mean that delta-plain is not quite uniformly distributed.

  • The extended Psi stream often repeats characters, because the Psi wheels do not move on after every letter. When the extended Psi has the same character for two consecutive positions, delta-extended-Psi is the all-bits-zero character. Therefore delta-extended-Psi is the all-bits-zero character much more often than would be the case by chance. There is also further non-uniformity to its distribution.

Putting these two observations together, delta-plain xor delta-extended-Psi will have some small but significant non-uniformity. Another way of thinking of "delta-plain xor delta-extended-Psi" is as "delta-ciphertext xor delta-Chi". This latter combination is known as 'delta-de-Chi', as it is the delta with the Chi component removed ('de-Chi' as in 'decaffeinated'). This leads us to the insight made at BP and used in the breaking of the Chi-wheel settings:

If we guess the Chi settings, and find the resulting delta-de-Chi, we should know when we've guessed right, because the non-uniform statistics of the true delta-de-Chi should be apparent.

We now work through the details of quantifying these observations.

Plain-text statistics

Based on traffic gathered by other means, the BP team had a large sample of the typical message plaintexts transmitted over any given link. Suppose that the cryptanalysts knew that messages were extracts from Dickens. We will use the complete text of Nicholas Nickleby as our sample text. (Using The Pickwick Papers would be cheating.)

In [16]:
training_plaintext \
    = teletype_representation('data/Dickens-Nicholas-Nickleby.txt.utf-8.gz')

Compute training 'delta plain'

In [17]:
training_delta_plain = delta_stream(training_plaintext)

Human-readable names of characters

For looking at the results, it will be helpful to use the same names for the 5-bit codes that BP did, which are mostly their 'letter-shift' meanings, with '/' for the all-bits-zero code, and digits for the other special codes:

In [18]:
letter_names = list('/T3O9HNM4LRGIPCVEZDBSYFXAWJ5UQK8')
letter_names_df = pd.DataFrame({'letter_name': letter_names})

Distribution of characters in the delta-plain

In [19]:
raw_freq_df = pd.DataFrame(pd.Series(training_delta_plain).value_counts(),
                           columns=['freq'])

delta_plain_distrn = pd.concat([letter_names_df, raw_freq_df], axis=1)
delta_plain_distrn.index.name = 'code'

delta_plain_distrn['prob'] = delta_plain_distrn['freq'] / delta_plain_distrn['freq'].sum()
delta_plain_distrn['freq_pct'] = 100.0 * delta_plain_distrn['prob']

delta_plain_distrn.sort_values('freq', ascending=False).head(10)
Out[19]:
letter_name freq prob freq_pct
code
2 3 110613 0.054208 5.420821
5 H 104843 0.051381 5.138051
29 Q 93981 0.046057 4.605735
22 F 91684 0.044932 4.493166
21 Y 84644 0.041482 4.148156
0 / 83315 0.040830 4.083026
20 S 82840 0.040597 4.059747
26 J 82353 0.040359 4.035881
4 9 78675 0.038556 3.855633
16 E 76845 0.037659 3.765950

This is distinctly non-uniform; we can see this more clearly if we plot the frequencies:

In [20]:
plt.figure(figsize=(5.0, 12.0))
sns.set_context('notebook', font_scale=1.75)
ax = sns.stripplot(y='letter_name', x='freq_pct', color='k', size=8.0,
                   data=delta_plain_distrn.sort_values('freq_pct', ascending=False))
ax.set_xlabel('frequency (%)')
ax.set_ylabel('letter name')
ax.set_title('distribution of letters in teleprinter Dickens delta-plain',
             size='small')
ax.vlines(100.0 / 32, -0.5, 31.5, 'g')
ax.annotate('uniform', (100.0 / 32, 27), (3, 0), textcoords='offset points', color='g');

For example, the / character (all bits zero) occurs more often than would be the case by chance. This arises from doubled letters in English text (for example the slightly contrived 'subbookkeeper') or indeed from any other repeated characters, say spaces.

Distribution of impulse-pair-sums in delta-plain

Following the insight of the '1+2 break-in', we look for a pair of impulses whose sum in the delta-plain stream is distinctly non-uniform. The motivation is that it was computationally infeasible to brute-force a collection of five wheel settings — this would require examining 41 × 31 × 29 × 26 × 23 = 22,041,682 possibilities. However, it was feasible to brute-force just two wheel-settings together (say 31 × 29 = 899). Having said that, it was a radical step at the time to consider that a brute-force search through c.1000 possibilities, each one involving a computation over thousands of teleprinter characters, might be feasible.

The first job is to annotate the data-frame rows with the breakdown of the code into its five impulses. Recall that Impulse 1 is the $2^4$ bit and so on down to Impulse 5 being the $2^0$ bit. We create a data-frame holding the information on how each code breaks down into impulses:

In [21]:
all_codes = np.arange(32, dtype=np.uint32)
impulse_breakdown_df = pd.DataFrame.from_items([
        ('i{}'.format(i + 1), np.minimum(all_codes & (1 << (4 - i)), 1))
        for i in range(5)])
impulse_breakdown_df.index.name = 'code'

impulse_cols = list(impulse_breakdown_df.columns)

and append it to our frequency data:

In [22]:
delta_plain_distrn = pd.concat([delta_plain_distrn, impulse_breakdown_df], axis=1)
delta_plain_distrn.sort_values('freq', ascending=False).head(10)
Out[22]:
letter_name freq prob freq_pct i1 i2 i3 i4 i5
code
2 3 110613 0.054208 5.420821 0 0 0 1 0
5 H 104843 0.051381 5.138051 0 0 1 0 1
29 Q 93981 0.046057 4.605735 1 1 1 0 1
22 F 91684 0.044932 4.493166 1 0 1 1 0
21 Y 84644 0.041482 4.148156 1 0 1 0 1
0 / 83315 0.040830 4.083026 0 0 0 0 0
20 S 82840 0.040597 4.059747 1 0 1 0 0
26 J 82353 0.040359 4.035881 1 1 0 1 0
4 9 78675 0.038556 3.855633 0 0 1 0 0
16 E 76845 0.037659 3.765950 1 0 0 0 0

And now for each combination of two impulses, we find the distribution of 0s vs 1s in the sum of those impulses. Recall that 'dot' was the name given by BP to a zero in this context.

In [23]:
def impulse_pair_dot_probs(distrn):
    freqs_2_impulses_rows = {
        '{0}+{1}'.format(i1, i2):
            distrn.query('({0} + {1}) % 2 == 0'.format(i1, i2))['freq'].sum()
        for ii, i1 in enumerate(impulse_cols)
        for i2 in impulse_cols[ii + 1:]}

    freqs_2_impulses = pd.DataFrame(freqs_2_impulses_rows, index=['freq']).T
    props_2_impulses = freqs_2_impulses / distrn['freq'].sum()

    excesses = (props_2_impulses - 0.5).rename(columns={'freq': 'excess_prop'})
    excesses.index.name = 'impulse_pair'
    excesses.reset_index(inplace=True)
    excesses['excess_pct'] = 100.0 * excesses['excess_prop']
    
    return excesses

We are interested in which impulse-pairs have a sum whose distribution differs markedly from the uniform '50% zero, 50% one' we would expect by chance:

In [24]:
delta_plain_excesses = impulse_pair_dot_probs(delta_plain_distrn)
delta_plain_excesses
Out[24]:
impulse_pair excess_prop excess_pct
0 i1+i2 0.035822 3.582247
1 i1+i3 0.044908 4.490789
2 i1+i4 -0.021702 -2.170206
3 i1+i5 -0.022063 -2.206324
4 i2+i3 -0.018293 -1.829263
5 i2+i4 0.045498 4.549843
6 i2+i5 0.037543 3.754311
7 i3+i4 -0.051568 -5.156845
8 i3+i5 0.048589 4.858882
9 i4+i5 -0.053021 -5.302102

It is easier to see this graphically:

In [25]:
def plot_impulse_pair_sum_dot_excesses(excesses, name, lims=None):
    plt.figure(figsize=(5.0, 6.0))
    ax = sns.stripplot(y='impulse_pair', x='excess_pct', color='k', size=8.0,
                       data=excesses.sort_values('excess_pct'))
    ax.set_ylabel('impulse pair sum')
    ax.set_xlabel('excess proportion (%)')
    ax.set_title('{} excess percent (wrt 50%) of impulse-pair-sum being "dot"'.format(name),
                 size='small')
    ax.set_ylim(-0.25, 9.25)
    if lims is not None:
        ax.set_xlim(-lims, lims)
    ax.vlines(0.0, -0.25, 9.25, 'g')
    return ax

plot_impulse_pair_sum_dot_excesses(delta_plain_excesses, 'delta-plain');

Ths most marked differences from 50/50 are in impulse pairs (3)+(4) and (4)+(5), which have a deficit of 'dot' compared to 50%.

Statistics of delta-extended-Psi

Because the Psi wheels do not move every character, the insight is that delta-extended-Psi is the 'all bits zero' character, /, much more often than would be expected by chance. Further, the other characters are not uniformly distributed either. We now compute this distribution in detail.

Specimen delta-extended-Psi

Recall that we are supposing the cryptanalsyts know all of the wheel patterns, and are trying to 'set' the message by working out the settings. We therefore can work with the known patterns of the Psi and Mu wheels in the following.

'Total motor' cross proportion

One approach would be to generate a large sample of delta-extended-Psi and calculate statistics from it. However, we can look at this more theoretically.

The notation used in GRT is:

The proportion of crosses in the TM stream is called a. [p.17, 11C(d)]

The 'TM' stream is the 'Total Motor' stream — the sequence of dots or crosses which controls the advancement of the Psi wheels while creating extended-Psi.

The 'basic motor' is the stream generated by the 37-position Mu wheel, and the 'total motor' is derived from the 'basic motor' in combination with various choices of 'limitation'. We are considering the simpler case of no 'limitation', so TM stream = BM stream, and so a = proportion of crosses (recall 'cross' = '1') in mu(37).

In [26]:
assert specimen_mu[1].size == 37
mu_37_cross_proportion = specimen_mu[1].sum() / specimen_mu[1].size
print('Proportion of crosses in mu_37 = {:.2f}'.format(mu_37_cross_proportion))
Proportion of crosses in mu_37 = 0.54

Proportion of 'cross' in delta-Psi

The other component of the statistics of the delta-extended-Psi stream is referred to in GRT as follows:

The proportion of crosses in each Δψ [NB this is delta-Psi not delta-extended-Psi] is called b. [p.17, 11C(d) again]

We compute this for the five (known) Psi wheels:

In [27]:
def delta_wheel(x):
    # This is different from delta_stream() --- wheel pattern is cyclic.
    return x ^ rot_wheel(x, 1)

delta_psi_info = pd.DataFrame.from_items([
        ('wheel', ['psi_%d' % (i + 1) for i in range(len(specimen_psi))]),
        ('n_cams', [len(w) for w in specimen_psi]),
        ('delta_cross_proportion', [delta_wheel(w).sum() / w.size for w in specimen_psi])])

delta_psi_info
Out[27]:
wheel n_cams delta_cross_proportion
0 psi_1 43 0.697674
1 psi_2 47 0.680851
2 psi_3 51 0.666667
3 psi_4 53 0.679245
4 psi_5 59 0.677966

The GRT tells us that 'after March 1942', wheel patterns were nearly always chosen such that ab=1/2. In our case:

In [28]:
delta_psi_info['extended_cross_proportion'] \
    = mu_37_cross_proportion * delta_psi_info.delta_cross_proportion

delta_psi_info
Out[28]:
wheel n_cams delta_cross_proportion extended_cross_proportion
0 psi_1 43 0.697674 0.377121
1 psi_2 47 0.680851 0.368028
2 psi_3 51 0.666667 0.360360
3 psi_4 53 0.679245 0.367160
4 psi_5 59 0.677966 0.366468

which is a good bit away from 'extended cross proportion = 1/2'. Maybe the specimen is an early one? The low value of ab will make the statistics more pronouncedly non-uniform, which will be helpful for this example.

Approximate distribution of delta-extended-Psi

The delta-extended-Psi stream is deterministic. However, over a long stream, the following approximation will be good. We act as if a character of delta-extended-Psi is generated according to:

  • Draw a random number M from {0, 1}, where the outcome 1 happens with probability a = 'Total Motor cross proportion'. In our case (with no limitation), this is the same as mu_37_cross_proportion.
  • If M is 0, then the Psi wheels do not move, and so delta-extended-Psi is all-bits-zero.
  • If M is 1, then generate five independent {0, 1} variables, each having a probability b of being 1, and concatenate them to make the character of delta-extended-Psi.

The approximation assumes that all five Psi wheels have the same delta-cross proportion, which is close to being true for our specimen wheels.

Putting this together, we see that the probability of delta-extended-Psi being some particular character depends only on the number of '1's ('crosses') in that character:

In [29]:
def delta_psi_prime_distribution(a, b):
    def prob(i):
        n_crosses = sum(1 for ch in format(i, '05b') if ch == '1')
        n_dots = 5 - n_crosses
        # Probability of generating letter with code 'i' in the
        # case of a Total Motor cross:
        tm_x_p = (1.0 - b) ** n_dots * b ** n_crosses
        # Probability of generating letter with code 'i' in the
        # case of a Total Motor dot:
        tm_o_p = (1.0 if i == 0 else 0.0)
        # Combine according to probability of Total Motor cross:
        p = (1 - a) * tm_o_p + a * tm_x_p
        return p
    
    raw_df = pd.DataFrame.from_items([('code', np.arange(32)),
                                      ('letter_name', letter_names),
                                      ('prob', list(map(prob, np.arange(32))))])
    return raw_df.set_index('code')

The delta-cross-proportion values (b) are all fairly similar, so take an average value:

In [30]:
specimen_avg_b = delta_psi_info.delta_cross_proportion.mean()
print('Specimen psi wheels\' average b = {:.2f}'.format(specimen_avg_b))
Specimen psi wheels' average b = 0.68

and see what we get for the letter distribution of delta-extended-Psi:

In [31]:
specimen_dpsi_probs_raw = delta_psi_prime_distribution(mu_37_cross_proportion,
                                                       specimen_avg_b)

assert abs(specimen_dpsi_probs_raw['prob'].sum() - 1.0) < 1.0e-15

specimen_dpsi_probs = pd.concat([specimen_dpsi_probs_raw, impulse_breakdown_df],
                                axis=1)

specimen_dpsi_probs.sort_values('prob', ascending=False).head(8)
Out[31]:
letter_name prob i1 i2 i3 i4 i5
code
0 / 0.461260 0 0 0 0 0
31 8 0.078869 1 1 1 1 1
27 5 0.037033 1 1 0 1 1
30 K 0.037033 1 1 1 1 0
23 X 0.037033 1 0 1 1 1
29 Q 0.037033 1 1 1 0 1
15 V 0.037033 0 1 1 1 1
14 C 0.017389 0 1 1 1 0

Almost half the time, delta-extended-Psi is the all-bits-zero value '/'; almost all of these cases occur when the Psi wheels do not move. If the Psi wheels do move, a cross is more likely than a dot in each delta-Psi impulse (b ≈ 0.68 > ½), so characters with more crosses in their representation (8, 5KXQV) are more likely than those with fewer (9T34E, not shown in this 'head' of the table).

Statistics of delta-de-Chi

The distribution of characters in the delta-de-Chi stream is (to a good approximation over the long run) the convolution of:

  • the distribution of characters in delta-plain with
  • the (approximate) distribution of characters in the delta-extended-Psi stream.

This assumes independence of the keystream and the plaintext, which is reasonable.

Convolution of two distributions

In [32]:
def convolve_distributions(freqs0, freqs1):
    records = [(c0, c1, f0, f1)
               for c0, f0 in zip(freqs0.index.values, freqs0['prob'])
               for c1, f1 in zip(freqs1.index.values, freqs1['prob'])]
    df = pd.DataFrame.from_records(records, columns=['code0', 'code1', 'prob0', 'prob1'])
    df['code_sum'] = np.array(df.code0) ^ np.array(df.code1)
    df['freq'] = df.prob0 * df.prob1
    adf = df.groupby('code_sum')[['freq']].sum()
    adf['letter_name'] = letter_names
    adf.index.name = 'code'
    return pd.concat([adf[['letter_name', 'freq']], impulse_breakdown_df], axis=1)

Distribution of delta-de-Chi

As noted, delta-de-Chi is equivalent to delta-plain xor delta-extended-Psi, so:

In [33]:
delta_de_chi = convolve_distributions(specimen_dpsi_probs, delta_plain_distrn)

assert abs(delta_de_chi['freq'].sum() - 1.0) < 1.0e-15

delta_de_chi.sort_values('freq', ascending=False).head(8)
Out[33]:
letter_name freq i1 i2 i3 i4 i5
code
2 3 0.042637 0 0 0 1 0
5 H 0.040422 0 0 1 0 1
29 Q 0.039149 1 1 1 0 1
26 J 0.037088 1 1 0 1 0
22 F 0.036666 1 0 1 1 0
21 Y 0.035802 1 0 1 0 1
0 / 0.034852 0 0 0 0 0
20 S 0.034520 1 0 1 0 0

Distribution of impulse-pair sums of delta-de-Chi

In [34]:
delta_de_chi_excesses = impulse_pair_dot_probs(delta_de_chi)
delta_de_chi_excesses
Out[34]:
impulse_pair excess_prop excess_pct
0 i1+i2 0.018982 1.898190
1 i1+i3 0.023796 2.379616
2 i1+i4 -0.011500 -1.149966
3 i1+i5 -0.011691 -1.169105
4 i2+i3 -0.009693 -0.969305
5 i2+i4 0.024109 2.410908
6 i2+i5 0.019894 1.989365
7 i3+i4 -0.027326 -2.732551
8 i3+i5 0.025747 2.574664
9 i4+i5 -0.028095 -2.809521
In [35]:
plot_impulse_pair_sum_dot_excesses(delta_de_chi_excesses, 'delta-de-Chi', lims=3);

The pattern here is, as expected, very similar in shape to the results for delta-plain, but weakened.

For the sake of this example, we will set the pair '(3)+(4)' even though it is not the strongest — (4)+(5) is slightly stronger. Chi-wheel 3 is bigger than Chi-wheel 5, so the number of combinations for (3)+(4) is larger, and so (looking ahead) it will be more gratifying when the analysis works.

BP's so-called '1+2 break-in' exploited the fact that, for most sorts of German message, the impulse-pair (1)+(2) gave the strongest deviation from uniformity in delta-de-Chi.

Recall we know the wheel patterns; what we don't know is the 'setting' (starting position) of each wheel.

Attempt to set Chi(3) and Chi(4)

We can now replicate what Colossus did, which is to brute-force the combinations of all possible values of the joint Chi(3) and Chi(4) settings. We count the occurrences of 'dot' in the putative delta-de-Chi under all candidate settings for Chi(3) and Chi(4), and hope that the true setting will reveal itself by an unusually low 'dot' count. The settings we use for the other Chi wheels are irrelevant, so we use zero.

This procedure — running counts for all combinations of two wheels' settings — was referred to as a 'long run' on Colossus. A 'short run' studied all settings for a single wheel.

In [36]:
tape_length = specimen_ciphertext.size

records = []
for chi3 in range(specimen_chi[2].size):
    for chi4 in range(specimen_chi[3].size):
        candidate_chi_wheels = Chi(specimen_chi, [0, 0, chi3, chi4, 0])
        candidate_chi_letters = candidate_chi_wheels.letters(tape_length)
        candidate_de_chi = specimen_ciphertext ^ candidate_chi_letters

        impulse_3 = np.minimum(1, candidate_de_chi & 4)
        impulse_4 = np.minimum(1, candidate_de_chi & 2)
        dot3p4 = np.sum(delta_stream(impulse_3 ^ impulse_4) == 0)

        records.append((chi3, chi4, dot3p4))

long_run_df = pd.DataFrame.from_records(
                  records,
                  columns=['setting_chi3', 'setting_chi4', 'n_dots_3p4'])

print('Created {} records'.format(len(long_run_df)))
Created 754 records

Printer output

In use, Colossus was configured to print the count results of any promising-looking setting, where 'promising' meant 'at least 2σ away from the mean', and the mean and standard deviation are taken under the null hypothesis that dot/cross are equally likely. In this case, we know that there is a deficit of dots in (3)+(4) so we require the count to be less than the threshold μ−2σ.

In [37]:
binomial_mean = specimen_length * 0.5
binomial_std = np.sqrt(specimen_length * 0.5 * 0.5)

print_threshold = int(binomial_mean - 2.0 * binomial_std)
print('Records with a count below {} will be printed.'.format(print_threshold))

printed_df = long_run_df.query('n_dots_3p4 < {}'.format(print_threshold)).copy()
print('Number of records printed: {}'.format(len(printed_df)))
Records with a count below 7873 will be printed.
Number of records printed: 29

To study the 'printout', it is easier to look at a graph. We are looking for unusually low dot-counts, so 'promising' settings will be plotted far to the left of the graph:

In [38]:
printed_df['label'] = printed_df.apply(lambda r: '({0}, {1})'.format(*tuple(r)), axis=1)

plt.figure(figsize=(8.0, 14.0))
ax = sns.stripplot(y='label', x='n_dots_3p4', color='k', size=8.0, data=printed_df)
ax.set_xlabel('number of dots in delta-de-Chi (3)+(4)')
ax.set_ylabel('candidate (chi3, chi4) setting')
ax.set_title("'printer' output from long (chi3, chi4)-setting run",
             size='small')
ax.set_xlim(7490, 8002)
ax.vlines([binomial_mean - n * binomial_std for n in range(9)],
          -1, len(printed_df), 'g', lw=1.0);
for n in range(9):
    ax.annotate('{}\u03c3'.format(n) if n else '\u03bc',
                (binomial_mean - n * binomial_std, 9.75), (3, 0),
                textcoords='offset points',
                size='small', color='g');

Wheel-setting successful!

The most promising setting (smallest number of dots) by a long way has Chi(3) set at 10 and Chi(4) set at 8 and gives a score around 7.5σ below the mean. These are the correct settings for these two wheels — the setting process was successful.

Other promising settings

There are 29 records printed, which is more than we would expect a '<μ−2σ' threshold to give out of 754 — we would expect about:

In [39]:
print('Expected c.{:.0f} printed records under binomial distribution'
      .format(len(long_run_df) * 0.025))
Expected c.19 printed records under binomial distribution

The explanation is that even under an incorrect Chi-wheel setting, the distribution is not truly uniform. We see, for example, that a setting of (5, 8) also gives a very strong deficit, although not as strong as the correct (10, 8). Looking at the delta-Chi(3) patterns at a setting of 10 and of 5 reveals why:

In [40]:
delta_chi_3 = delta_wheel(specimen_chi[2])  # [0] is Chi1, [1] is Chi2, etc.
print(''.join(map(str, rot_wheel(delta_chi_3, 10))))
print(''.join(map(str, rot_wheel(delta_chi_3, 5))))

n_agreements = np.sum(rot_wheel(delta_chi_3, 10) == rot_wheel(delta_chi_3, 5))
print('\nSettings agree at {} places out of {}'
      .format(n_agreements, len(delta_chi_3)))
00101100010011101110010100101
00101001011000100111011100101

Settings agree at 21 places out of 29

The wheel pattern is very similar to a shifted copy of itself. This was known as a 'slide' at BP and was a common feature of real wheel patterns.

Analysis from here

From here, Colossus operators would perform other 'runs' to attempt to set the other Chi wheels. They would then get the 'de-Chi' of the intercept, and from there either hand or machine methods were used to set the Psi and Mu wheels and hence recover the original message.

Conclusions

This case was easy for a few reasons.

  • We had a very large body of training text.
  • We had a very large ciphertext, 'intercepted' without error.
  • The wheel patterns were unusually amenable to this attack, with their highly non-uniform delta-extended-Psi statistics.

Nonetheless, it is satisfying to work through the details and see the results.

How to run this notebook

The underlying Jupyter notebook for this page, together with the required data files, are available as a zip file. To use them, you will need various Python packages to be set up. One straightforward way is to use the Anaconda Python distribution, and do:

conda create -n dickens_teleprinter 'python=3.4' numpy pandas cython numexpr seaborn jupyter
source activate dickens_teleprinter
jupyter notebook

You may need to 'trust' the notebook in order to run it.

This code in this notebook is available under the GNU General Public License version 3 (or, at your option, any later version). The text is available under the CC BY-SA 4.0 license. The data files included in the zip file are Project Gutenberg copies of the public-domain Dickens novels, and with regard to those two files, the following sentence applies: This eBook is for the use of anyone anywhere at no cost and with almost no restrictions whatsoever. You may copy it, give it away or re-use it under the terms of the Project Gutenberg License included with this eBook or online at www.gutenberg.org.