import numpy as np
import matplotlib.pyplot as plt
Our next project deals with trying to break an encryption to discover the meaning of a secret message. Let's look at how the encryption process works.
Each character on a computer keyboard is assigned an ASCII code, which is an integer in the range 0-127. The ASCII code of a character can be obtained using the ord() function:
for c in "This is MTH 337":
print("'{}' -> {}".format(c, ord(c)))
'T' -> 84 'h' -> 104 'i' -> 105 's' -> 115 ' ' -> 32 'i' -> 105 's' -> 115 ' ' -> 32 'M' -> 77 'T' -> 84 'H' -> 72 ' ' -> 32 '3' -> 51 '3' -> 51 '7' -> 55
Conversely, the function chr() converts ASCII codes into characters:
char_list = []
for n in [104, 101, 108, 108, 111]:
char_list.append(chr(n))
txt = ''.join(char_list)
print(txt)
hello
It will be helpful to be able to convert easily between strings of characters and lists of their corresponding ASCII codes.
Exercise: Write functions str_to_ascii and ascii_to_str that will convert between strings and lists of ASCII codes. We can use the ord() and chr() functions to convert any particular character or ASCII code.
In order to securely send a confidential message one usually needs to encrypt it in some way to conceal its content. Here we consider the following encryption scheme:
For example, if the message is 'Top secret!' and the secret key is 'buffalo' then the encrypted message is: [54, 100, 86, 6, 84, 81, 82, 84, 90, 90, 7]. Let's develop some code that will allow us to perform this encryption ourselves.
message = 'Top secret!'
key = 'buffalo'
First, let's convert both to lists of ASCII codes using the str_to_ascii function.
for c in message:
print("'{}' -> {}".format(c, ord(c)))
'T' -> 84 'o' -> 111 'p' -> 112 ' ' -> 32 's' -> 115 'e' -> 101 'c' -> 99 'r' -> 114 'e' -> 101 't' -> 116 '!' -> 33
for c in key:
print("'{}' -> {}".format(c, ord(c)))
'b' -> 98 'u' -> 117 'f' -> 102 'f' -> 102 'a' -> 97 'l' -> 108 'o' -> 111
(84 + 98) % 128
54
Problem: Our message has more characters than our key, so we need to duplicate our key enough times to match the length of the message.
One idea: use a while loop to keep duplicating key_ascii until it matches or exceeds the length of message_asii.
print(len(key))
print(len(message))
7 11
Another idea: Use integer division to count how many times we need to duplicate the key_ascii list to match or exceed the length of message_ascii.
Another idea: use modular arithmetic on the index of key_ascii, dividing by the length of key_ascii to keep looping through key_ascii until we enough entries.
Exercise: Write a function get_padded_key_ascii that takes in arguments key_ascii and length and returns a padded version of key_ascii of length length, obtained by repeating key_ascii as many times as necessary.
Exercise: Write a function encrypt(message_ascii, key_ascii) that return the encrypted version of message_ascii using the secret key key_ascii (based on the code above).
In order to decrypt the message we work backwards: for each number $c_i$, we compute the reminder from the division of $c_i - k_i$ by $128$. This number is equal to $m_i$, so converting it into a character we get the $i$-th letter of the message.
Exercise: Write a function decrypt(encrypted_ascii, key_ascii) that returns the decrypted message.
Let's read in a file containing an encrypted message to play around with.
with open('jeevajac.txt', 'r') as f:
text = f.read()
Looking into the text variable, we see that we have a string representing a sequence of integers, each separated by a space.
text[:100]
'119 38 98 69 13 87 87 16 84 94 98 84 88 81 5 99 87 82 97 18 81 92 15 103 81 88 13 70 96 66 95 92 18 '
We need to convert this string into a list of integers which we can then decrypt. We can use .split to separate this into a list of strings, each of which represents one integer.
text.split()[:10]
['119', '38', '98', '69', '13', '87', '87', '16', '84', '94']
In order to decrypt this message, we need to convert this list of strings (each of which represents an integer) into a list of integers.
encrypted_message = []
for int_str in text.split():
encrypted_message.append(int(int_str))
encrypted_message[:10]
[119, 38, 98, 69, 13, 87, 87, 16, 84, 94]
Once we have a working decrypt function, we can start trying to decrypt this message using different possible secret keys chosen from the dictionary.txt file (linked on the project page).
The goal of the project will be to use Python to judge how "good" a possible decrypted message is. We can then have Python try all ~60,000 possible secret keys to decrypt the message, then take the secret key that gives the "best" decrypted message.
Suggestion: Create a dictionary whose keys are the words used to decrypt the message and whose values represent how "good" the corresponding decrypted message is. We can then take the "maximum" of this dictionary (according to the values, see Week 11 notebook) to find the word that gave the best decryption.
One idea is to look at the number of actual words that appear in a decrypted message. We can use the contents of dictionary.txt to help with this.
Reminder: We can check whether an object is part of a list using the in operator. For example, a in my_list will return True if my_list contains a and False otherwise.
my_list = ['Hello', 'Goodbye', 'Good morning']
print('Hello' in my_list)
print('bye' in my_list)
True False
To help with efficiency, we can make use of another iterable data type in Python, namely sets. Sets are unordered collections of objects (like dictionary keys). We can similarly use the in operator to test whether a given element is in a set. How does the timing of this check compare to a similar check for lists? Consider the following experiment.
We first define a list of integers 0, 1, 2, ..., 999999, then create set version of the same data.
N = 10**6
my_list = [i for i in range(N)]
my_set = set(my_list)
Let's see how long it takes to check if 999999 is in the list versus in the set. Since this is a relatively quick operation, let's time how long it takes to make each check 100 times.
import time
n = N-1 # Number to test for membership
num_checks = 100 # Number of times to test for membership
t0 = time.time() # Store the starting time
for check in range(num_checks):
n in my_list # Check if n is a member of my_list
t1 = time.time() # Store the stopping time
print('The list membership check took {} seconds.'.format(t1 - t0))
t0 = time.time() # Store the starting time
for check in range(num_checks):
n in my_set # Check if n is a member of my_set
t1 = time.time() # Store the stopping time
print('The set membership check took {} seconds.'.format(t1 - t0))
The list membership check took 2.1360480785369873 seconds. The set membership check took 0.00017833709716796875 seconds.
This speed difference can be explained through the following analogy:
Suppose someone (e.g. someone making a Python list) has placed a sequence of $1,000,000$ books in a particular order. We are then asked whether Frankenstein is one of the books in this sequence. We would have no choice but to look through the entire sequence of books until we've either found Frankenstein or reached the end.
Suppose instead that a librarian (e.g. Python when a set is defined) places the books into appropriate shelves to keep them organized. If we are asked whether Frankenstein is one for the books in this collection, we go to the shelf for books that start with "F" and check only that shelf.