short history of ciphers (cont’d)
TRANSCRIPT
Data Security
Short History of Ciphers (cont’d)
• Dark Ages-> in Europe really dark, while Arab scholars invented cryptanalysis
• Al-Kindi (ninth-century)-> Arab polymath and author of 290 books-> A Manuscript on Deciphering Cryptographic Messages
• Roger Bacon (13th century)-> first European book about cryptography-> Epistle on the Secret Works of Art and the Nullity of Magic
• Renaissance in the West (14th - 16th century)-> Europeans are back!-> cryptography is becoming popular again (routine diplomatic tool)-> and with it cryptanalysis-> suddenly, monoalphabetic ciphers aren’t all that secure anymore!
Data Security
Language Characteristics
• every letter of a language has a certain characteristic-> letter frequency-> contact with other letters-> position within words
• in English e is by far the most common letter, then T, A, O
• other letters are fairly rare, such as X, J, Q, Z
• then look at digrams (TH, HE, AN) and trigrams (THE, AND)
Data Security
• May also be useful to consider sequences of two or three consecutive letters called digrams and trigrams, respectively.
• e.g. common diagrams (in decreasing order): TH, HE, IN, ER, AN, RE, ED, ON, ES, ST, EN, AT, TO, NT, HA, ND, OU, EA, NG, AS, OR, …
• e.g. common trigrams (in decreasing order): THE, ING, AND, HER, ERE, ENT, THA, NTH, WAS, …
• Have tables of frequencies for letters, digrams, trigrams, contact data
Language Characteristics (cont’d)
letter probability letter probability
A .082 N .067
B .015 O .075
C .028 P .019
D .043 Q .001
E .127 R .060
F .022 S .063
G .020 T .091
H .061 U .028
I .070 V .010
J .002 W .023
K .008 X .001
L .040 Y .020
M .024 Z .001
Data Security
Use in Cryptanalysis
Key Concept-> monoalphabetic substitution ciphers do not change relative
letter frequencies-> calculate letter frequencies for ciphertext-> compare counts/plots against known values
For Caesar cipher-> look for common peaks/troughs-> peaks at: A-E-I triple, RST triple-> troughs: at JK, XYZ
For general monoalphabetic cipher-> must identify each letter
Data Security
Example Cryptanalysis
• given ciphertext:UZQSOVUOHXMOPVGPOZPEVSGZWSZOPFPESXUDBMETSXAIZVUEPHZHMDZSHZOWSFPAPPDTSVPQUZWYMXUZUHSXEPYEPOPDZSZUFPOMBZWPFUPZHMDJUDTMOHMQ
• count relative letter frequencies (see text)• guess P & Z are e and t• guess ZW is th and hence ZWP is the• proceeding with trial and error finally get:
it was disclosed yesterday that several informal butdirect contacts have been made with politicalrepresentatives of the vietcong in moscow
Data Security
Improvement of Substitution Ciphers
Dilute letter frequency-> represent plaintext letters by several cipher symbols-> cipher symbols all have equal frequency-> homophonic substitution ciphers
Use multiple cipher alphabets-> will explain this in just a minute-> polyalphabetic substitution ciphers
Encrypt multiple plaintext letters -> will talk about this in just two minutes-> polygram substitution ciphers
Data Security
Relative Frequency of Occurrence of Letters
01 2 3 4 5 6 1 7 8 9 10 10 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
Plaintext
Playfair
Vignere
Random polyalphabetic
Frequency ranked letters (decreasing frequency)
Nor
mal
ized
rela
tive
freq
uenc
y
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
Figure 3.6 Relative Frequency of Occurrence of Letters
Data Security
Short History of Ciphers (cont’d)
• February 8, 1587-> Mary Queen of Scots beheaded (so to say, by Queen Elizabeth I.)-> Why? Her nomenclator (cipher/code) was not secure enough!-> that’s what you get when you deal with double agents and
counter-intelligence agencies (see Babington, Gifford, Walsingham)
• Blaise de Vigenère-> published Traicté des Chiffres (“A Treatise on Secret Writing”) in 1586-> Mary Queen of Scots should have read this -> greatest cipher for its time, neglected for two centuries-> later (in the 1800s) called “le chiffre indéchiffrable”
Data Security
Polyalphabetic Ciphers
• use multiple cipher alphabets
• called polyalphabetic substitution ciphers
• makes cryptanalysis harder with more alphabets to guess and flatter frequency distribution
• use a key to select which alphabet is used for each letter of the message
• use each alphabet in turn
• repeat from start after end of key is reached
Data Security
Vigenère Cipher
• simplest polyalphabetic substitution cipher
• effectively multiple Caesar ciphers
• key is multiple letters long K = k1 k2 ... kd
• ith letter specifies ith alphabet to be used
• use each alphabet in turn
• repeat from start after d letters in message
• decryption simply works in reverse
Data Security
Example
• write the plaintext out
• write the keyword repeated above it
• use each key letter as a caesar cipher key
• encrypt the corresponding plaintext letter
• e.g., using keyword deceptivekey: deceptivedeceptivedeceptive
plaintext: wearediscoveredsaveyourself
ciphertext:ZICVTWQNGRZGVTWAVZHCQYGLMGJ
Data Security
Security of Vigenère Ciphers
• have multiple ciphertext letters for each plaintext letter
• hence letter frequencies are obscured-> but not totally lost!-> do you see where we’re heading towards...?!!
• Number of possible keywords of length m = 26m.– Much larger than that of a simple substitution cipher.– An alphabetic character of a plaintext can be mapped to one of
m possible alphabetic characters (assuming that the keyword contains m distinct characters).
– In general, cryptanalysis is much more difficult for polyalphabetic than for monoalphabetic cryptosystems.
• start with letter frequencies– see if look monoalphabetic or not– if not, then need to determine number of alphabets, since then
can attach each
Data Security
Short History of Ciphers (cont’d)
• Thomas Jefferson’s Wheel Cipher (around 1800)-> 26 stacked cylinders, each showing alphabet in random order-> one row spells plaintext, any other is used as ciphertext-> reinvented by French (~1890) and US government (~1914, M-94)
• Charles Babbage-> Difference Engine No. 1 and No. 2 in 1820-40s-> cracked the Vigenère cipher some time around 1854-> never publicized (did British Intelligence keep him from doing so?)
• Friedrich Wilhelm Kasiski-> Die Geheimschriften und die Dechiffrierkunst (1863)-> method to break Vigenère cipher-> became known as the “Kasiski test”
Data Security
Kasiski Test
• method developed by Babbage / Kasiski (1863)
• repetitions in ciphertext give clues to period
• so find same plaintext an exact period apart, which results in the same ciphertext (of course, could also be random fluke)
• see repeated “VTW” in previous example
• suggests key size of 3 or 9
• then attack each monoalphabetic cipher individually using same techniques as before
• The Zimmermann Telegram in WWI
Data Security
Kasiski Test (More Formal)
• Idea: any two identical strings will be encrypted to the same ciphertext if they are km positions apart where m is the keyword length and k is a positive integer.
• Two Steps of Attack:• Find the keyword length• Conduct statistical attack
• An Example of Attack• Look for trigrams that are identical• Compute the distance between them, d1, d2, …• Let m’ be a divisor of gcd(d1, d2, …)• Write the ciphertext in a rectangular array with m’ columns, then
statistical attack can be used on each column
Data Security
Low-Frequency Analysis
• works for English language plaintext
• after identifying the key length-> divide ciphertext into individual segments
• each segment is listed in a column and each column is shifted 25 times by one (have 25 columns per segment)
• one of the columns (per segment) contains plaintext letters-> however: cannot identify words because each segment
contains only part of the original plaintext
• now determine the five letters with lowest frequency-> in plaintext these are “j”,”k”,”q”,”x”,”z” with a total frequency of about 2%
• whichever column contains these letters with the appropriate total frequency should be the plaintext -> get one shift value per segment and from that the keyword
Data Security
Decipher this Vigenère Cipher
kbxzoeqecalrvbwlvvczthrzpnxumkgvjtfvtwudsinlhzmdrtniitcchlxygztpwrpqcmmekbthqfwvivjjrirljftbwlqyqetciipwilzvtgduifhbwlqzuqcoeskbtkxygztmsigbwlvvochafvcnxumkgvjtfvtwuprycjxaiuywgshjcvnmmekbtuyddmgkmmkltkfpkvuprzvgxzejpmpyxfpwiomeiihtebgacvsufahvxygiklvrimevtlniipseqnpspkjmeseegbhprkjmjummgzhlgrpjtzezfbdiiqgzdmvfobwpwzvndspfyaioekvptwsgwtpamfpwualvypdsilpqklvjgqhhpjqhtysrplioekcvnwifrttfslointivvngvqkkutaskkuthvvomglppvptwvffcrawfhislvrpotkmdcoxuekkwcalvtmxzekjmdycnjqrowkcbtzxycbxmimgzpucfpmspwtqdtywnjiialvwvxciiumxzjftickayaqipwygztpxnktaprjvicappfqhhtggighrudmgltccktkfpuwblxykvvlzvpudyiskhpyvvcvsprvzxapgrdttalvtmxzeeqbwlvnjqrowkcbtzxycbiomjjihhpigisflrrxtuiucvnalzpoioekjiewieuppwtvpapuckjqcnxycbxulrrxtumeikpbwvuadtikjqcnicumivlrrxtugrwatzwfomiomeimazikqppwtvpicfxykvvalrvqcoegrmcprxeijzijkbhlpwvwwhtggvpnezpkpbwvuqizichbdoegrmchkrkvxahfgacaecnvtjijuigpppjiewieepgvrfnwvpgrntnalfwow
Data Security
Autokey Cipher
• ideally want a key as long as the message• Vigenère proposed the autokey cipher • use keyword once, then follow with plaintext
• knowing keyword can recover the first few letters • use these in turn on the rest of the message
• but still have frequency characteristics to attack • Both the key and plaintext share the same frequency
distribution of letters è apply a statistical technique• eg. given key deceptive
key: deceptivewearediscoveredsav
plaintext: wearediscoveredsaveyourself
ciphertext: ZICVTWQNGKZEIIGASXSTSLVVWLA
Data Security
Vernam Cipher
• proposed by Gilbert Vernam (1918)
• consider binary data (bits)
• use XOR (exclusive-or) operation with very long key(key is repeated if necessary)
ci = pi Å ki and pi = ci Å ki
pi is ith binary digit of plaintextki is ith binary digit of keyci is ith binary digit of chiphertextÅ is XOR operation
Data Security
One-Time Pad
• if a truly random key as long as the message is used, the cipher will be secure
• called a One-Time pad
• is unbreakable since ciphertext bears no statistical relationship to the plaintext
• for any plaintext & any ciphertext there exists a key, which maps one to the other
• can only use the key once though
• have problem of secure distribution of key
Data Security
Unbreakable cipher
• For the same ciphertext, two keys can generate two plausible plaintexts
• Which plaintext is correct?
• Unbreakable
Data Security
Weakness of One-time Pad• Malleable: Provides secrecy but not authentication.• Keys must NOT be reused:
– cannot withstand Known Plaintext Attack– depending on known information about plaintexts, Eve can make use of
C1 Å C2 = (M1 Å K) Å (M2 Å K)= M1 Å M2
to figure out both messages
• In practice– Generate a large number of random bits, – Exchange the key material securely between the users before
sending a one-time enciphered message, – Keep both copies of the key material for each message
securely until they are used, and – Securely dispose of the key material after use, thereby
ensuring the key material is never reused.
It requires a perfect random numbers as key• Generating random bits
– radioactive decay– noisy diode– flipping coins
Data Security
Random numbers needed
• If the key material is generated by a deterministic program then it is not actually random – should never be used in a one-time pad cipher. – If so used, the method becomes a stream cipher; these
usually employ a short key that is used to generate a long pseudorandom stream, which is then combined with the message using some such mechanism as those used in one-time pads. Stream ciphers can be secure in practice, but they cannot be absolutely secure in the same provable sense as the one-time pad
Data Security
Stream ciphers
• Stream ciphers– The most famous: Vernam cipher – Invented by Vernam, ( AT&T, in 1917) – Process the message bit by bit (as a stream) – different from the one-time pad– some call same– Simply add (XOR) bits of message to random key bits – For decryption, generate the key stream and XOR with the
Ciphertext• Examples
– A well-known stream cipher is RC4; – others include: A5/1, A5/2, Chameleon, FISH, Helix. ISAAC,
Panama, Pike, SEAL, SOBER, SOBER-128 and WAKE.• Usage
– Stream ciphers are used in applications where plaintext comes in quantities of unknowable length - for example, a secure wireless connection
Data Security
Pros and Cons
• Drawbacks– Need as many key bits as message, difficult in practice – (ie distribute on a mag-tape or CDROM)
• Strength– If unconditionally secure is provided if the key is truly
random?
Data Security
Key Generation
• Why not to generate keystream from a smaller (base) key?– Use some pseudo-random function to do this – Although this looks very attractive, it proves to be
very very difficult in practice to find a good pseudo-random function that is cryptographically strong
• This is still an area of much research
Data Security
Short History of Ciphers (cont’d)
• Ciphers are really becoming something “cool” (19th century)-> used to convey secret (love) messages in newspapers-> Edgar Alan Poe, Jules Verne, Sir Arthur Conan Doyle-> Beale Papers containing directions to treasure buried in VA
• Charles Wheatstone (1854)-> invents Playfair Cipher-> named after his friend Baron Playfair-> polygram substitution cipher
• Lester S. Hill (1929)-> general polygram cipher, not only two letters but m-> matrix cipher
Data Security
Polygram Substitution Cipher
• not even the large number of keys in a monoalphabetic cipher provides security
• one approach to improving security was to encrypt multiple letters (polygraphic cipher)
• the Playfair Cipher is an example for polygram cipher• invented by Charles Wheatstone in 1854, but named after
his friend Baron Playfair
Data Security
Playfair Key Matrix
• a 5 x 5 matrix of letters based on a keyword
• I and J are considered the same letter
• fill in letters of keyword (without duplicates)
• fill rest of matrix with other letters
• e.g., using the keyword MONARCHYM O N A RC H Y B D
E F G I K
L P Q S T
U V W X Z
s i/j m p l
e a b c d
f g h k n
o q r t u
v w x y z
Data Security
Encryption and Decryption
Encrypt two plaintext letters at a time:1. if a pair is a repeated letter, insert a filler like 'X',
e.g., "balloon" encrypts as "ba lx lo on"
2. if both letters fall in the same row, replace each with letter to right (wrapping back to start from end), e.g., “ar" encrypts as "RM"
3. if both letters fall in the same column, replace each with the letter below it (again wrapping to top from bottom), e.g., “mu" encrypts to "CM"
4. otherwise, each letter is replaced by the one that lies in its row and is located in the column of the other letter, e.g., “hs" --> "BP", “ea" --> "IM" or "JM" (as desired)
Data Security
Security of Playfair Cipher
• security much improved over monoalphabetic
• since we have 26 x 26 = 676 digrams
• would need a 676 entry frequency table to analyze (vs. 26 for monoalphabetic cipher)
• therefore, need correspondingly more ciphertext
• was widely used for many years (eg. US & British military in WWI)
• it can be broken, given a few hundred letters
• since it still has much of the plaintext structure
• Difficult using frequency analysis– But it still reveals the frequency information
Data Security
Relative Frequency of Occurrence of Letters
01 2 3 4 5 6 1 7 8 9 10 10 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
Plaintext
Playfair
Vignere
Random polyalphabetic
Frequency ranked letters (decreasing frequency)
Nor
mal
ized
rela
tive
freq
uenc
y
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
Figure 3.6 Relative Frequency of Occurrence of Letters