the voynich manuscript a mystery

of 88/88
the Voynich Manuscript Kevin Knight Information Sciences Institute University of Southern California Sources for this talk: Mary D’Imperio, The Voynich Manuscript, An Elegant Enigma (1978) Kennedy & Churchill, The Voynich Manuscript (2006) Prescott Currier, Some Important New Statistical Findings (1976) Rene Zandbergen, Currier A and B: Two Different Languages? (1997) Rene Zandbergen, http://www.voynich.nu/ http://www.voynich.ms/forum/ experiments at USC/ISI MIT / September 2009

Post on 03-Jan-2017

229 views

Category:

Documents

1 download

Embed Size (px)

TRANSCRIPT

  • the Voynich Manuscript

    Kevin KnightInformation Sciences Institute

    University of Southern California

    Sources for this talk:

    Mary DImperio, The Voynich Manuscript, An Elegant Enigma (1978)

    Kennedy & Churchill, The Voynich Manuscript (2006)

    Prescott Currier, Some Important New Statistical Findings (1976)

    Rene Zandbergen, Currier A and B: Two Different Languages? (1997)

    Rene Zandbergen, http://www.voynich.nu/

    http://www.voynich.ms/forum/

    experiments at USC/ISI

    MIT / September 2009

    http://www.voynich.nu/

  • Some People Involved with the

    Wilfrid Michael Voynichbook dealer

    Ethel Boole, daughterof George Boole

    Roger Bacon,first scientist

    William Newbold,Polymath, PhD UPenn

    Rudolf IIHoly Roman Emperor

    Athanasius Kircher,German Jesuit super-scholar

    William Friedman,WWII cryptanalyst

    Hans P. Kraus,book dealer

    Voynich Manuscript

  • Outline

    Voynich Manuscript VMS, for short What is it? Where did it come from? What does it mean?

  • What is it?

    Medieval illustrated manuscript Approx. 235 pages on vellum material Color drawings of plants, nymphs, stars,

    etc. Approx. 38,000 words written in an

    unknown script Undeciphered!!! Meaning is unknown Currently owned by Yale University

  • 38,000 words of text

  • Apparent Sections of VMS

    Section Name # of word tokensHerbal 11,938Astrological 2,594Biological 6,915Cosmological 679Pharmacological 5,111Pure Text (Stars) 10,682

  • The Pictures: Herbal

    Many pictureslook like grafting.

    Sunflower? Would dateVMS as post-1492.

  • The Pictures: Astrological

  • The Pictures: Astrological

    What is this?

    Datable clothing?

  • The Pictures: Biological

    Small nudes in baths

    Interconnecting tubes of liquids

  • The Pictures:

    Pharmacological

    medicine jar?

  • 1864 Ethel Boole born in England1865 WV born in Lithuania1885 WV imprisoned, Polish nationalist1890 WV & EB meet, marry in 19021898 WV publishes first book list1912 WV acquires VMS in ancient castle1914 WV moves to USA, opens bookshop1919 WV sends photostatic copies of VMS1919 Copying reveals de Tepencz signature1919 WV writes to Bohemian State Archvs1921 WV presents VMS + Marci letter

    mentioning Bacon, $160k price1921 Newbold & WV announce decipherment1930 WV dies. VMS placed in vault, $100k1931 VMS appraised at $19,4001960 Ethel dies, VMS to secretary Ann Nill

    Castle revealed as Villa Mondragone1961 NY dealer Hans Kraus buys for $24,5001969 Kraus donates VMS to Yale1972 Brumbaugh finds WV letters in BSA200x Zandbergen finds 1639 Baresch letter

    in newly online Kircher archive

    History of Voynich Manuscript

    William Newbold,Polymath, PhD UPenn

    Wilfrid Michael Voynichbook dealer

  • One-Page Letter Tucked Into VMSReverend and Distinguished Sir; Father in Christ:

    This book bequeathed to me by an intimate friend,I destined for you, my very dear Athanasius [Kircher],as soon as it came into my possession, for I wasconvinced that it could be read by no one exceptyourself. The former owner of this book once asked your opinion by letter Accept now this token Dr Raphael, tutor in the Bohemian language toFerdinand III, then King of Bohemia, told me the saidbook had belonged to the Emperor Rudolf and thathe presented the bearer who brought him the book600 ducats. He believed the author was Roger Bacon,the Englishman. On this point I suspend judgment At the command of your reverence,

    Joannes Marcus Marci of CronlandPrague, 19 August, 1665(6?)

    Kircher,super-scholar,recipient ofthis letter

    ???,owned VMSbefore Marci

    EmperorRudolf,paid 600 ducatsfor VMS

    Roger Bacon(1214-94)first scientist

    Im Not Francis Bacon

  • 1864 Ethel Boole born in England1865 WV born in Lithuania1885 WV imprisoned, Polish nationalist1890 WV & EB meet, marry in 19021898 WV publishes first book list1912 WV acquires VMS in ancient castle1914 WV moves to USA, opens bookshop1919 WV sends photostatic copies of VMS1919 Copying reveals de Tepencz signature1919 WV writes to Bohemian State Archvs1921 WV presents VMS + Marci letter

    mentioning Bacon, $160k price1921 Newbold & WV announce decipherment1930 WV dies. VMS placed in vault, $100k1931 VMS appraised at $19,4001960 Ethel dies, VMS to secretary Ann Nill

    Castle revealed as Villa Mondragone1961 NY dealer Hans Kraus buys for $24,5001969 Kraus donates VMS to Yale1972 Brumbaugh finds WV letters in BSA200x Zandbergen finds 1639 Baresch letter

    in newly online Kircher archive

    1576-1612 Rudolf II purchases VMS

    1608-1622 J. de Tepenecz signs VMSin Bohemian court

    1630s George Baresch owns VMSGB sends letter to Kircher

    1639 GB writes Kircher again

    16xx Marci inherits VMS from GB

    1665 Marci sends VMS to Kircherwith letter

    1665-80 Kircher owns VMS

    1680 Kircher dies

    History of Voynich Manuscript

    ??

  • 1864 Ethel Boole born in England1865 WV born in Lithuania1885 WV imprisoned, Polish nationalist1890 WV & EB meet, marry in 19021898 WV publishes first book list1912 WV acquires VMS in ancient castle1914 WV moves to USA, opens bookshop1919 WV sends photostatic copies of VMS1919 Copying reveals Tepenecz signature1919 WV writes to Bohemian State Archvs1921 WV presents VMS + Marci letter

    mentioning Bacon, $160k price1921 Newbold & WV announce decipherment1930 WV dies. VMS placed in vault, $100k1931 VMS appraised at $19,4001960 Ethel dies, VMS to secretary Ann Nill

    Castle revealed as Villa Mondragone1961 NY dealer Hans Kraus buys for $24,5001969 Kraus donates VMS to Yale1972 Brumbaugh finds WV letters in BSA200x Zandbergen finds 1639 Baresch letter

    in newly online Kircher archive

    1576-1612 Rudolf II purchases VMS

    1608-1622 J. de Tepenecz signs VMSin Bohemian court

    1630s George Baresch owns VMSGB sends letter to Kircher

    1639 GB writes Kircher again

    16xx Marci inherits VMS from GB

    1665 Marci sends VMS to Kircherwith letter

    1665-80 Kircher owns VMS

    1680 Kircher dies

    History of Voynich Manuscript

    ??

  • 1864 Ethel Boole born in England1865 WV born in Lithuania1885 WV imprisoned, Polish nationalist1890 WV & EB meet, marry in 19021898 WV publishes first book list1912 WV acquires VMS in ancient castle1914 WV moves to USA, opens bookshop1919 WV sends photostatic copies of VMS1919 Copying reveals de Tepenecz signature1919 WV writes to Bohemian State Archvs1921 WV presents VMS + Marci letter

    mentioning Bacon, $160k price1921 Newbold & WV announce decipherment1930 WV dies. VMS placed in vault, $100k1931 VMS appraised at $19,4001960 Ethel dies, VMS to secretary Ann Nill

    Castle revealed as Villa Mondragone1961 NY dealer Hans Kraus buys for $24,5001969 Kraus donates VMS to Yale1972 Brumbaugh finds WV letters in BSA200x Zandbergen finds 1639 Baresch letter

    in newly online Kircher archive

    History of Voynich Manuscript

    ??

  • 1864 Ethel Boole born in England1865 WV born in Lithuania1885 WV imprisoned, Polish nationalist1890 WV & EB meet, marry in 19021898 WV publishes first book list1912 WV acquires VMS in ancient castle1914 WV moves to USA, opens bookshop1919 WV sends photostatic copies of VMS1919 Copying reveals de Tepenecz signature1919 WV writes to Bohemian State Archvs1921 WV presents VMS + Marci letter

    mentioning Bacon, $160k price1921 Newbold & WV announce decipherment1930 WV dies. VMS placed in vault, $100k1931 VMS appraised at $19,4001960 Ethel dies, VMS to secretary Ann Nill

    Castle revealed as Villa Mondragone1961 NY dealer Hans Kraus buys for $24,5001969 Kraus donates VMS to Yale1972 Brumbaugh finds WV letters in BSA200x Zandbergen finds 1639 Baresch letter

    in newly online Kircher archive

    History of Voynich Manuscript

    ??

  • 1864 Ethel Boole born in England1865 WV born in Lithuania1885 WV imprisoned, Polish nationalist1890 WV & EB meet, marry in 19021898 WV publishes first book list1912 WV acquires VMS in ancient castle1914 WV moves to USA, opens bookshop1919 WV sends photostatic copies of VMS1919 Copying reveals de Tepenecz signature1919 WV writes to Bohemian State Archvs1921 WV presents VMS + Marci letter

    mentioning Bacon, $160k price1921 Newbold & WV announce decipherment1930 WV dies. VMS placed in vault, $100k1931 VMS appraised at $19,4001960 Ethel dies, VMS to secretary Ann Nill

    Castle revealed as Villa Mondragone1961 NY dealer Hans Kraus buys for $24,5001969 Kraus donates VMS to Yale1972 Brumbaugh finds WV letters in BSA200x Zandbergen finds 1639 Baresch letter

    in newly online Kircher archive

    1576-1612 Rudolf II purchases VMS

    1608-1622 J. de Tepenecz signs VMSin Bohemian court

    1630s George Baresch owns VMSGB sends letter to Kircher

    1639 GB writes Kircher again

    16xx Marci inherits VMS from GB

    1665 Marci sends VMS to Kircherwith letter

    1665-80 Kircher owns VMS

    1680 Kircher dies

    History of Voynich Manuscript

    ??

    Barschius owns VMSbetween J. de Tepenecz

    and Marci

  • 1864 Ethel Boole born in England1865 WV born in Lithuania1885 WV imprisoned, Polish nationalist1890 WV & EB meet, marry in 19021898 WV publishes first book list1912 WV acquires VMS in ancient castle1914 WV moves to USA, opens bookshop1919 WV sends photostatic copies of VMS1919 Copying reveals de Tepenecz signature1919 WV writes to Bohemian State Archvs1921 WV presents VMS + Marci letter

    mentioning Bacon, $160k price1921 Newbold & WV announce decipherment1930 WV dies. VMS placed in vault, $100k1931 VMS appraised at $19,4001960 Ethel dies, VMS to secretary Ann Nill

    Castle revealed as Villa Mondragone1961 NY dealer Hans Kraus buys for $24,5001969 Kraus donates VMS to Yale1972 Brumbaugh finds WV letters in BSA200x Zandbergen finds 1639 Baresch letter

    in newly online Kircher archive

    1576-1612 Rudolf II purchases VMS

    1608-1622 J. de Tepenecz signs VMSin Bohemian court

    1630s George Baresch owns VMSsends letter to Kircher

    1639 GB writes Kircher again

    16xx Marci inherits VMS from GB

    1665 Marci sends VMS to Kircherwith letter

    1665-80 Kircher owns VMS

    1680 Kircher dies

    History of Voynich Manuscript

  • Newbold Decipherment

    Marci letter Bacon Cabala letter doubling cipher

    Create 222 = 484 Latin letter pairs AAXX these letter pairs are the cipher alphabet

    Assign each plaintext Latin letter to a set of cipher-alphabet letter pairs (B AQ, RT, )

    This gives the encipherer some freedom, while the recipient can still decipher by using the table

    Cleverly encipher plaintext in such a way as to construct a cover message that looks like Latin, to fool readers

  • Newbold System

    Example:a n n DO MI NU DOMINU

    Too hard to assemble good cover text! So, make cipher letter-pairs overlap:

    a n n AD DB BR ADBR Also difficult, possibly too easy to decipher So, employ anagramming:

    a n n OM DO MI DO OM MI DOMI Now can construct a plausible looking cover text

    in Latin for our secret message (also in Latin) an ingenious system, to be sure!!

  • Newbold Decipherment

    Hmm, by the method, both plaintext and ciphertext should be in Latin letters

    But the VMS doesnt have Latin letters

  • William Newbold,Polymath, PhD UPenn

    4OPCC89 apparentciphertext

    artists rendition

  • William Newbold,Polymath, PhD UPenn

    4OPCC89

    DOMI

    apparentciphertext

    realciphertext:DOMI

    artists rendition

  • Lets Decipher with Newbold !

    PCC89

    DOMI

    DOMI

    DO OM MI

    OM DO MI

    a n n

    non-deterministicanagramming

    lookup in 222 table

    o n n non-deterministicmapping from 11Latin letters to full 22

    real ciphertext

    doublingapparent ciphertext

  • Lets Decipher with Newbold !

    PCC89

    DOMI

    DOMI

    DO OM MI

    OM DO MI

    a n n

    non-deterministicanagramming

    lookup in 222 table

    o n n non-deterministicmapping from 11Latin letters to full 22

    real ciphertext

    doublingapparent ciphertext

    Of course the 222 table isnt given, so we have to build it up through cryptanalysis. Wow, this is a lot of work!

  • Newbold Decipherment

    1300 real ciphertext letters in first 3 lines

    Decipherment of those first lines:I, Roger Bacon, have written this(in Latin)

    Anagramming sets of 55 letters is sometimes required.

    Slow but steady progress Andromeda galaxy, ovaries & ova so Bacon must have had a microscope & telescope, hundreds of years before they were discovered!

  • The Text

    Approx. 38,000 words, unknown script Writing style similar to 15th century

    Florentine humanist hand Between 23 and 40 distinct characters No corrections, likely to have been copied Writing was done after illustrations

  • Transcription

    BSC8AE OPCC9 4OE FCC89 4OFCC9 4OP9 SCBS9 4OBSC9 EFAM OPAE29

    2ZC9 4OFC89 4OFAM Z89 4OFCC9 SC89 4OFCC9 4OFCC9 ESC89 EOP9

    8ZC9 4OPCCC9 8ARSC89 4OFC9 4OP9

    BSC8AE OPCC9 4OE FCC89 4OFCC9 4OP9 SCBS9 4OBSC9 EFAM OPAE292ZC9 4OFC89 4OFAM Z89 4OFCC9 SC89 4OFCC9 4OFCC9 ESC89 EOP98ZC9 4OPCCC9 8ARSC89 4OFC9 4OP9

    last paragraph, f103r

  • Another medieval manuscript, just for calibration

  • Introduction to Astrology and Its Use in Weather Prediction, Medicine, and Agriculture, in English. Manuscript on Paper. 1490.

  • Alphabet: Currier/DImperio

    Transcription

    C S Z

    C S ZP F B V

    P F B VQ X W Y

    Q X W YJ A E R O I D

    J A E R O I D6 7 8 9 4 2

    6 7 8 9 4 2

    G H 1

    G H 1T U 0

    T U 0N M 3

    N M 3K L 5

    K L 5

  • Alphabet: Currier/DImperio

    Transcription

    J A E R O I D

    J A E R O I D

    G H 1

    G H 1Maybe this is really

    IR IIR IIIRThere are several transcriptionschemes to choose from.

    T U 0

    T U 0

    C S Z

    C S ZP F B V

    P F B VQ X W Y

    Q X W Y6 7 8 9 4 2

    6 7 8 9 4 2

  • Alphabet: Currier/DImperio

    Transcription

    C S Z

    C S ZVariations of Z , or separate characters?

    S S S S S S

  • Alphabet: Currier/DImperio

    Transcription

    C S Z

    C S ZP F B V

    P F B VQ X W Y

    Q X W YAre these ligatures?Is Q just a fancy way of writing SP ?

    If you didnt know English, how would you know if was the same as ?

    Suppose never occurred. Would that be evidence?Suppose did occur, with the same contexts as (e.g., *shing)?Suppose did occur, but never in the same context as ?

    Another common motif:

    fi f i

    f if if i

    fifi

    SOORSOE9S9

  • Letter Frequencies

    25468 O20227 C17655 914281 A12973 811008 S10471 E10026 F6716 R5994 P5423 44501 Z4076 M

    2886 21752 N1413 B1046 J950 Q908 X591 T524 *431 V316 I217 W157 D156 3

    148 U96 674 Y52 K31 G17 L14 H2 11 51 0

    O

    C

    9

    A

    8

    S

    E

    F

    R

    P

    4

    Z

    M

    2

    N

    B

    J

    Q

    X

    T

    *

    V

    I

    W

    D

    3

    U

    6

    Y

    K

    G

    L

    H

    1

    5

    0

    Total63k character tokens

    count letter count letter count letter

  • Most Frequent Words

    863 8AM537 OE501 SC89469 AM426 ZC89396 SOE363 OR350 AR344 SC9318 8AR308 4OFCC9305 4OFCC89283 ZC9279 4OFAN272 4OFC89270 89262 4OFAM260 AE253 8AE243 2219 SOR

    212 OFAM211 8AN191 4OFAE186 ZOE177 OFCC9174 SCC9172 SCOE155 S9155 OPC89154 OPAM152 4OFAR151 9151 4OE150 S89147 4OF9144 ZCC9144 OFAN144 2AM143 OPAE141 OPAR140 SX9

    140 OPCC9138 OFAE130 ZO129 OFAR119 ESC89118 OFC89

    8AM

    OE

    SC89

    AM

    ZC89

    SOE

    OR

    AR

    SC9

    8AR

    4OFCC9

    4OFCC89

    ZC9

    4OFAN

    4OFC89

    89

    4OFAM

    AE

    8AE

    2

    SOR

    OFAM

    8AN

    4OFAE

    ZOE

    OFCC9

    SCC9

    SCOE

    S9

    OPC89

    OPAM

    4OFAR

    9

    4OE

    S89

    4OF9

    ZCC9

    OFAN

    2AM

    OPAE

    OPAR

    SX9

    OPCC9

    OFAE

    ZO

    OFAR

    ESC89

    OFC89

    Totals:

    8116 word types38k word tokens

    count word count word count word

    etc

  • Word Length DistributionsVoynichLength Distribution1 0.022 0.103 0.224 0.235 0.216 0.127 0.058 0.019 0.00310 0.00111 0.000112 0.0000713 0.0000235 0.00002

    EnglishLength Distribution1 0.032 0.153 0.164 0.155 0.116 0.097 0.118 0.089 0.0510 0.0311 0.0112 0.00613 0.002

    Counts on word types

  • Features of the Text

    115 (out of 8116) word types appear doubled at least once

    4OFCC89 4OFCC89

    8 words appear tripled 4OFC89 4OFC89 4OFC89 SOE SOE SOE ZCOE ZCOE ZCOE OFAM OFAM OFAM OE OE OE 9PAM 9PAM 9PAM 8AM 8AM 8AM 4OFCC89 4OFCC89 4OFCC89

    However, very few repeatedword bigrams and wordtrigrams!

    No word trigram appears morethan 5 times.

  • Some Theories About the Text

    Cryptogram Phonetic writing system Philosophical language Outsider art Glossolalia Hoax

  • Cryptogram

    Newbold (1921) Manly (1931) critique of Newbold Feely (1945), abbreviated Latin Strong (1945), polyalphabetic cipher, no

    details might fall into hands of enemies of USA!

    Brumbaugh (1972), numerological box Several attempts in the 1990s

  • William Freidman Most famous American cryptographer

    of World War II broke key ciphers, including Japanese

    Purple code, led proto-NSA VMS Study Group (1944-46)

    developed transcription alphabet group disbanded after the war

    2nd VMS Study Group (1962) at RCA

    Included his VMS theory in paper on another topic paper shortened due to space constraints VMS theory included in a footnote, as an

    anagram, to establish invention date

    Theory

    VMS written in a synthetic philosophicallanguage

  • Writing in Tongues

    Glossolalia (Speaking in tongues) Christian New Testament, Pentecost People spoke tongues foreign to themselves

    Writing in Tongues? Medium Helene Smith, investigated by Theodore

    Flournoy (1896) Under a trance, Smith was able to converse with

    Martians She learned their language and could speak and

    write it Looked like a genuine language Grammar closer to French than you might expect

    suggested in Kennedy & Churchill, 2005

    Smiths Martian

  • Hoax

    Previous hoaxes: Hitler diaries Vinland map

    Voynich Manuscript: How? Why? Who?

  • How? Gordon Rugg

    (Scientific American, 2004) Proposed Cardan grille Elizabethan espionage

    tool If applied with

    randomness injected, claimed to generate VMS-like text

  • Why?KPMG Forensics 2006 Survey of Fraud in

    Australia and New ZealandMost Popular Motives for Fraud:

    greed/lifestyle (54%) gambling (22%) personal financial pressure (5%) other (5%) not specified (3.5%) opportunity (0.4%) substance abuse (0.4%)

  • Who?member of Societyof Friends of Russian Freedom

    said to havefaked passports

    Needed $ who doesnt?

    tricky said to havetraded newer,better booksfor monksold dirty ones

    spoke 18 languages

    Marci lettervery convenient

    faked to add a RogerBacon connection?

    BUT: Baresch letter later found in Kircher archive also mention Bacon

    BUT: What if Voynichhad seen that letter?

    de Tepeneczsignature suspiciously foundduring overexposure

    BUT: same signaturein other docs

    BUT: what if Voynichknew that?

    suggested in Kennedy & Churchill, 2005

  • Experiments

    Can computers help us make sense of VMS? Is VMS a kind of letter substitution cipher?

    Originally in Latin? English? Ukrainian? Ukrainian written without vowels?

    Are there patterns of any sort?

  • Substitution Cipher

    ingcmpnqsnwf cv fpn owoktvcv

    hu ihgzsnwfv rqcffnw cw owgcnwf

    kowazoanv ...

  • Substitution Cipher

    e e e e ingcmpnqsnwf cv fpn owoktvcv

    e e ehu ihgzsnwfv rqcffnw cw owgcnwf

    ekowazoanv ...

  • Substitution Cipher

    e e e the ingcmpnqsnwf cv fpn owoktvcv

    e e ehu ihgzsnwfv rqcffnw cw owgcnwf

    ekowazoanv ...

  • Substitution Cipher

    e he e the ingcmpnqsnwf cv fpn owoktvcv

    e e e thu ihgzsnwfv rqcffnw cw owgcnwf

    ekowazoanv ...

  • Substitution Cipher

    e he e of the ingcmpnqsnwf cv fpn owoktvcv

    e e e thu ihgzsnwfv rqcffnw cw owgcnwf

    ekowazoanv ...

  • Substitution Cipher

    e he e of the fofingcmpnqsnwf cv fpn owoktvcv

    e f o e o oe thu ihgzsnwfv rqcffnw cw owgcnwf

    efkowazoanv ...

  • Substitution Cipher

    e he e of theingcmpnqsnwf cv fpn owoktvcv

    e e e thu ihgzsnwfv rqcffnw cw owgcnwf

    ekowazoanv ...

  • Substitution Cipher

    e he e is the sisingcmpnqsnwf cv fpn owoktvcv

    e s i e i ie thu ihgzsnwfv rqcffnw cw owgcnwf

    eskowazoanv ...

  • Substitution Cipher

    e he e is the sisingcmpnqsnwf cv fpn owoktvcv

    e s i e i ie thu ihgzsnwfv rqcffnw cw owgcnwf

    eskowazoanv ...

    Cryptodict

    abacdefb ACADEMICabacdefb DEDICATEabacdefb MEMBRANEabacdefc ELECTRICabacdefc TUTELAGEabacdefd ANARCHICabacdefd EVERYDAYabacdefe ANALYSESabacdefe ANALYSISabacdeff EYEGLASS

  • decipherment is the analysisingcmpnqsnwf cv fpn owoktvcvof documents written in ancienthu ihgzsnwfv rqcffnw cw owgcnwflanguages ...kowazoanv ...

    Substitution CipherCryptodict

    abacdefb ACADEMICabacdefb DEDICATEabacdefb MEMBRANEabacdefc ELECTRICabacdefc TUTELAGEabacdefd ANARCHICabacdefd EVERYDAYabacdefe ANALYSESabacdefe ANALYSISabacdeff EYEGLASS

  • Generative Models

    Spanish letter trigram model

    a {all Voynich letters}b {all Voynich letters}c {all Voynich letters}

    z {all Voynich letters}_ _

    V A S 9 2 _ 9 F A E _ A R _ A P A M _

    Probabilistic model thatsubstitutes VMS letters for Latinletters. Initially uniform.

    q u o _ v a d e _ b r e r t e _

    Train on Spanish web text.Parameters fixed.

    EM method demonstrated on many decipherment tasks in [Knight et al 2006].

    Easy experiments in Carmel finite-state package:% carmel --train-cascade corpus latin.wfsa subst.wfst

    Returns trained devices & Viterbi decipherment.

    EM Algorithm.argmax P(VMS) = argmax P(latin) P(VMS | Latin)

    latin

  • Substitution CipherInput Best decipherment assuming

    plaintext is Spanish

    cevzren cnegr qryvatravbfb uvqnytb qbadhvwbgr qr yn znapun

    primera parte del ingenioso hidalgo don quijote de la mancha

    VAS92 9FAE AR APAM ZOE ZOR9 QOR92 9 FOR ZOE89

    decos acho es imen des dena denal y des denta

    If plaintext is assumed to be Latin:quiss squm is onum pomquss hates s qum hatis

  • Pre-collect language models for 80 languages

    Decipher against each

    See which decoding run yields highest probability

    Hypothesize Other

    Source Languages

  • United Nations

    Declaration of Human Rights

    No one shall be arbitrarily deprived of his property Niemand se eiendom sal arbitrr afgeneem word nie Asnjeri nuk duhet t privohet arbitrarisht nga pasuria e tij Janiw khitisa utaps oraqeps inaki aparkaspati Arrazoirik gabe ez zaio inori bere jabegoa kenduko Den ebet ne vo tennet e berc'hentiezh diganta diouzh c'hoant H

    Ning no ser privat arbitrriament de la seva propietat

    Di a so prupiit n ni p essa privu nimu di modu tirannicu Nitko ne smije samovoljno biti lien svoje imovine Nikdo nesm bt svvoln zbaven svho majetku Ingen m vilkrligt berves sin ejendom Niemand mag willekeurig van zijn eigendom worden beroofd

    Nul ne peut tre arbitrairement priv de sa proprit Nimmen mei samar fan syn eigendom berve wurde Ningun ser privado arbitrariamente da sa propiedade Niemand darf willkrlich seines Eigentums beraubt werden Avavgui ndojepe'a va'eri oimehicha reinte imbe teva Ba wanda za a kwace wa dukiyarsa ba tare da cikakken dalili ba Senkit sem lehet tulajdontl nknyesen megfosztani Engan m eftir getta svipta eign sinni Tak seorang pun boleh dirampas hartanya dengan semena-mena Necuno essera private arbitrarimente de su proprietate N fidir a mhaoin a bhaint go forlmhach de dhuine ar bithAl neniu estu arbitre forprenita lia proprieto Kelleltki ei tohi tema vara meelevaldselt ra vtta Eingin skal hissini vera fyri ongartku Me kua ni dua e kovei vua na nona iyau Keltn lkn mielivaltaisesti riistettk hnen omaisuuttaan

    300+ words in many of worlds languages, UTF-8 encoding

  • Unknown Source LanguageInput Best guess

    of plaintext language

    Best decipherment

    cevzren cnegr qryvatravbfb uvqnytbqba dhvwbgr qr ynznapun

    Spanish primera parte del ingenioso hidalgo don quijote de la mancha

    VAS92 9FAE AR APAM ZOE ZOR9 QOR92 9 FOR ZOE89

    Romanian nonsense

  • Consonantal WritingInput Best guess

    of plaintext language

    Best decipherment

    ceze ceg qy atafuqyt qa dwg q y zapu

    Spanish prmr prt dl ngnshdlg dn qvt d l mnch

    VAS92 9FAE AR APAM ZOE ZOR9 QOR92 9 FOR ZOE89

    more nonsense

  • Generative Models

    Okay, that didnt work

    Lets devise looser generative models, to mine for patterns.

  • Generative ModelsTrigram model over {a, b, _ }

    a {all Voynich letters}

    b {all Voynich letters}

    _ _

    What parameter settingsresult in highest P(corpus) ? EM algorithm.

    a a _ b a b _ a b a a _

    Initially uniform

    V A S 9 2 _ 9 F A E _ A R _ A P A M _

  • Generative ModelsTrigram model over {a, b, _ }

    a {all English letters}

    b {all English letters}

    _ _

    i n _ t h e _ t o w n _ w h e r e _ i _ was

    What parameter settingsresult in highest P(corpus) ? EM algorithm.

    a a _ b a b _ a b a a _

    Initially uniform

  • Generative ModelsTrigram model over {a, b, _ }

    a

    b

    _ _

    i n _ t h e _ t o w n _ w h e r e _ i _ was

    What parameter settingsresult in highest P(corpus) ? EM algorithm.

    a a _ b a b _ a b a a _

    Sample tagging with learned model:

    a b _ b b a _ b a b b _ i n _ t h e _ t o w n _

    b b a b a _ a _ w h e r e _ i _

    Initially uniform

  • ??

    Generative ModelsTrigram model over {a, b, _ }

    a {all Voynich letters}

    b {all Voynich letters}

    _ _

    V A S 9 2 _ 9 F A E _ A R _ A P A M _

    What parameter settingsresult in highest P(corpus) ? EM algorithm.

    a a _ b a b _ a b a a _

    Sample tagging with learned model:

    ? ? ? ? ? _ ? ? ? ? _ ? ? _V A S 9 2 _ 9 F A E _ A R _

    ? ? ? ? _ ? ? ? _ ? ? ? ? _ A P A M _ Z O E _ Z O R 9 _

    Initially uniform

  • Generative ModelsTrigram model over {a, b, _ }

    a

    b

    _ _

    V A S 9 2 _ 9 F A E _ A R _ A P A M _

    What parameter settingsresult in highest P(corpus) ? EM algorithm.

    a a _ b a b _ a b a a _

    Sample tagging with learned model:

    b b b b a _ a b b a _ b a _V A S 9 2 _ 9 F A E _ A R _

    b b b a _ b b a _ b b b a _ A P A M _ Z O E _ Z O R 9 _

    Initially uniform

  • Generative Models

    a

    b

    English

    a

    b

    Voynich

    P(letter | tag) P(tag | letter)

    00.10.20.30.40.50.60.70.80.9

    1

    B D J K M N P Q V W X L R C F G T H S Y U E O A I

    00.10.20.30.40.50.60.70.80.9

    1

    0 1 4 S W Y X Q C A F P B I O 8 V * 2 H E G T R K U 6 J D 9 3 N M 5 L

    P(a)

    P(a)

  • Generative ModelsBigram model over {a, b}

    a {all Voynich words!}

    b {all Voynich words!}

    What parameter settingsresult in highest P(corpus) ? EM algorithm.

    a a b a b a b a a

    VAS92 9FAE AR APAM ZOE ZOR9 QRC2 9 ...

  • Generative ModelsBigram model over {a, b}

    a

    b

    a a b a b a b a a

    VAS92 9FAE AR APAM ZOE ZOR9 QRC2 9 ...

    Do words with similar contextshave similar spellings?!

    That would be very interesting.

  • Generative ModelsBigram model over {a, b}

    a

    b

    a a b a b a b a a

    VAS92 9FAE AR APAM ZOE ZOR9 QRC2 9 ...

    Sample tagging with learned model:

    a a a a a aVAS92 9FAE AR APAM ZOE ZOR9

    a a a a a QRC2 9 FOR ZOE89 2OR9

    WAIT, WHAT?

    Do words with similar contextshave similar spellings?!

    That would be very interesting.

  • Generative Models

    0

    200

    400

    600

    1 6 11 16 21 26 31 36 41 46 51 56 61 66 71 76 81 86 91 96 101

    106

    111

    116

    121

    126

    131

    136

    141

    146

    151

    156

    161

    166

    171

    176

    181

    186

    191

    196

    201

    206

    211

    216

    221

    0

    200

    400

    600

    1 6 11 16 21 26 31 36 41 46 51 56 61 66 71 76 81 86 91 96 101

    106

    111

    116

    121

    126

    131

    136

    141

    146

    151

    156

    161

    166

    171

    176

    181

    186

    191

    196

    201

    206

    211

    216

    221

    Voynich words tagged as a

    Voynich words tagged as b

    pages

  • Generative Models

    0

    200

    400

    600

    1 6 11 16 21 26 31 36 41 46 51 56 61 66 71 76 81 86 91 96 101

    106

    111

    116

    121

    126

    131

    136

    141

    146

    151

    156

    161

    166

    171

    176

    181

    186

    191

    196

    201

    206

    211

    216

    221

    0

    200

    400

    600

    1 6 11 16 21 26 31 36 41 46 51 56 61 66 71 76 81 86 91 96 101

    106

    111

    116

    121

    126

    131

    136

    141

    146

    151

    156

    161

    166

    171

    176

    181

    186

    191

    196

    201

    206

    211

    216

    221

    Voynich words tagged as a

    Voynich words tagged as b

    pages

  • Generative Models

    0

    200

    400

    600

    1 6 11 16 21 26 31 36 41 46 51 56 61 66 71 76 81 86 91 96 101

    106

    111

    116

    121

    126

    131

    136

    141

    146

    151

    156

    161

    166

    171

    176

    181

    186

    191

    196

    201

    206

    211

    216

    221

    0

    200

    400

    600

    1 6 11 16 21 26 31 36 41 46 51 56 61 66 71 76 81 86 91 96 101

    106

    111

    116

    121

    126

    131

    136

    141

    146

    151

    156

    161

    166

    171

    176

    181

    186

    191

    196

    201

    206

    211

    216

    221

    Voynich words tagged as a

    Voynich words tagged as b

    pages

    Herbal Astro Bio Pharma Stars

  • Generative Models

    0

    200

    400

    600

    1 6 11 16 21 26 31 36 41 46 51 56 61 66 71 76 81 86 91 96 101

    106

    111

    116

    121

    126

    131

    136

    141

    146

    151

    156

    161

    166

    171

    176

    181

    186

    191

    196

    201

    206

    211

    216

    221

    0

    200

    400

    600

    1 6 11 16 21 26 31 36 41 46 51 56 61 66 71 76 81 86 91 96 101

    106

    111

    116

    121

    126

    131

    136

    141

    146

    151

    156

    161

    166

    171

    176

    181

    186

    191

    196

    201

    206

    211

    216

    221

    Voynich words tagged as a

    Voynich words tagged as b

    pages

    Herbal Astro Bio Pharma Stars

    Known since Capt. Curriers analysis (1976): Two languages (in the formal sense).Several handwriting styles, supposedly similar breakdown.

  • Captain Curriers

    Two Languages

    Help! Im tired.

    Pages w/Herbaldrawings

  • Zandbergen Dot Plot

    For every pair of pages, how similar are they to each other?

    Rene Zandbergen (1997)

    pages

    sam

    e pages

    Herbal Astro Bio Pharma Stars

  • Focus Further Experiments

    on Voynich-B (Bio & Stars)

    Consistent vocabulary Still plenty of words

    Lets try models that divide words into classes

    10 classes

  • 10 Classes of words: English

    etc etc etc

    etc etc etc

  • a

    c

    d

    e

    f

    g h

    i j

    b

    10-classtagging ofVoynich-B

  • Class-Tag Sequences

    Tagging of first VMS page: f g d h f g i d b j c c b e e a h f g e e a b e e a h f g d b j j c c b e a h f g j j j c c

    c h f g b j j c c b j j c b j c c b e a h f g b j c b j c c b j c b i d i d c b j c c c c cc c c c b e a i d b j c c b j c c b j c c b j c c c c c h f g d b j j j j c c h f g b j j c b e a b i d i d h f g d i d i d i d h f g d b j j j c b j c c c c b j c c c b e a h f h f h f g b j c b e e e a h f g b j e a i d i d b j c b j c b j c h f g b j j c c c c c c b j j c b j c b e a h f g d i d i d b j c b j j j j c b j j c c c b j c b j c b j c c c c b j c b j c c c c c c i d b j c c c c b j c c c b j c c c c c c b j c h f g e a h f g i d i d b j j c b j c b j c b j c b e a b j c c c c c b j c c c c c c c c c i d b j c c c c b j c b j c c b i d i d i d b j j c b j c c c i d i d i d h f g b j c c c c c c c c c c c c c c c c b e a h f g h f g e a i

    14-grams found in 10-class tagging: 25 c c c c c c c c c c c c c c

    9 i d i d i d i d b e a h f g

    7 i d i d i d i d i d i d i d

    7 i d i d h f g e e a h f g e

    7 e a h f g e a h f g e a i d

    6 j c c c c c c c c c c c c c

  • 10 Classes

    of words:

    Voynich-B

    Tags per

    page.

    050

    100

    1 6 11 16 21 26 31 36 41 46

    a

    a 050

    100

    1 6 11 16 21 26 31 36 41 46

    b

    b

    0

    100

    200

    1 6 11 16 21 26 31 36 41 46

    c

    c0

    50

    100

    1 6 11 16 21 26 31 36 41 46

    d

    d

    050

    100

    1 6 11 16 21 26 31 36 41 46

    e

    e 050

    100

    1 6 11 16 21 26 31 36 41 46

    f

    f

    050

    100

    1 6 11 16 21 26 31 36 41 46

    g

    g 050

    100

    1 6 11 16 21 26 31 36 41 46

    h

    h

    0

    50

    100

    1 6 11 16 21 26 31 36 41 46

    i

    i0

    50

    100

    1 6 11 16 21 26 31 36 41 46

    j

    j

  • 10 Classes

    of words:

    Voynich-B

    Tags per

    page.

    Bio words vs.

    Stars words

    050

    100

    1 6 11 16 21 26 31 36 41 46

    a

    a 050

    100

    1 6 11 16 21 26 31 36 41 46

    b

    b

    0

    100

    200

    1 6 11 16 21 26 31 36 41 46

    c

    c0

    50

    100

    1 6 11 16 21 26 31 36 41 46

    d

    d

    050

    100

    1 6 11 16 21 26 31 36 41 46

    e

    e 050

    100

    1 6 11 16 21 26 31 36 41 46

    f

    f

    050

    100

    1 6 11 16 21 26 31 36 41 46

    g

    g 050

    100

    1 6 11 16 21 26 31 36 41 46

    h

    h

    0

    50

    100

    1 6 11 16 21 26 31 36 41 46

    i

    i0

    50

    100

    1 6 11 16 21 26 31 36 41 46

    j

    j

  • Conclusion Voynich Manuscript

    What it is pretty clear Where it came from less clear What it means totally unclear

    Lots of room for empirical, unsupervised computer techniques Character analysis (e.g., ligatures) Determining relations between words and pictures Identification of topics More cipher types

  • thank you

    the Voynich ManuscriptSome People Involved with theOutlineWhat is it?38,000 words of textApparent Sections of VMSThe Pictures: HerbalThe Pictures: AstrologicalThe Pictures: AstrologicalThe Pictures: BiologicalThe Pictures: PharmacologicalSlide Number 12Slide Number 13Slide Number 14Slide Number 15Slide Number 16Slide Number 17Slide Number 18Slide Number 19Newbold DeciphermentNewbold SystemNewbold DeciphermentSlide Number 23Slide Number 24Lets Decipher with Newbold !Lets Decipher with Newbold !Newbold DeciphermentThe TextTranscriptionSlide Number 30Slide Number 31Alphabet: Currier/DImperio TranscriptionAlphabet: Currier/DImperio TranscriptionAlphabet: Currier/DImperio TranscriptionAlphabet: Currier/DImperio TranscriptionLetter FrequenciesMost Frequent WordsWord Length DistributionsFeatures of the TextSome Theories About the TextCryptogramWilliam Freidman Writing in Tongues HoaxHow?Why?Who?ExperimentsSubstitution CipherSubstitution CipherSubstitution CipherSubstitution CipherSubstitution CipherSubstitution CipherSubstitution CipherSubstitution CipherSubstitution CipherSlide Number 58Generative ModelsSubstitution CipherHypothesize OtherSource LanguagesUnited NationsDeclaration of Human RightsUnknown Source LanguageConsonantal WritingGenerative ModelsGenerative ModelsGenerative ModelsGenerative ModelsGenerative ModelsGenerative ModelsGenerative ModelsGenerative ModelsGenerative ModelsGenerative ModelsGenerative ModelsGenerative ModelsGenerative ModelsGenerative ModelsCaptain Curriers Two LanguagesZandbergen Dot PlotFocus Further Experimentson Voynich-B (Bio & Stars)10 Classes of words: EnglishSlide Number 83Class-Tag SequencesSlide Number 85Slide Number 86Conclusionthank you