Transcript
Page 1: 02 Voynich Manuscript

PLEASE SCROLL DOWN FOR ARTICLE

This article was downloaded by:On: 25 October 2010Access details: Access Details: Free AccessPublisher Taylor & FrancisInforma Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK

CryptologiaPublication details, including instructions for authors and subscription information:http://www.informaworld.com/smpp/title~content=t725304178

The Voynich Manuscript: Evidence of the Hoax HypothesisAndreas Schinner

To cite this Article Schinner, Andreas(2007) 'The Voynich Manuscript: Evidence of the Hoax Hypothesis', Cryptologia, 31:2, 95 — 107To link to this Article: DOI: 10.1080/01611190601133539URL: http://dx.doi.org/10.1080/01611190601133539

Full terms and conditions of use: http://www.informaworld.com/terms-and-conditions-of-access.pdf

This article may be used for research, teaching and private study purposes. Any substantial orsystematic reproduction, re-distribution, re-selling, loan or sub-licensing, systematic supply ordistribution in any form to anyone is expressly forbidden.

The publisher does not give any warranty express or implied or make any representation that the contentswill be complete or accurate or up to date. The accuracy of any instructions, formulae and drug dosesshould be independently verified with primary sources. The publisher shall not be liable for any loss,actions, claims, proceedings, demand or costs or damages whatsoever or howsoever caused arising directlyor indirectly in connection with or arising out of the use of this material.

Page 2: 02 Voynich Manuscript

The Voynich Manuscript: Evidence of the HoaxHypothesis

ANDREAS SCHINNER

Abstract In this article, I analyze the Voynich manuscript, using random walkmapping and token=syllable repetition statistics. The results significantly tightenthe boundaries for possible interpretations; they suggest that the text has beengenerated by a stochastic process rather than by encoding or encryption of lan-guage. In particular, the so-called Chinese theory now appears less convincing.

Keywords hoax hypothesis, statistical analysis, stochastic process, Voynichmanuscript

Introduction

The Voynich manuscript (the VMS) is a handwritten codex of about 250 pages, inkon vellum, appearing on stylistic grounds to date from around 1500. It contains illus-trations of mostly unidentifiable plants, astronomical or astrological diagrams, and‘‘naked nymphs,’’ bathing in strange arrangements of pools or tubs connected bycomplex systems of pipes. The most striking feature, however, is the text, writtenin an elegant unique script that has defied commonly accepted translation so far.Information about the VMS, its possible history, as well as attempts of explanationcan be found in various places [4, 7, 15, 14]. Only a brief summary will be given here.

Interpretations of the VMS can roughly be divided into three classes:

. Cipher text hypothesis. The VMS contains natural language text (from the originof the manuscript this should most probably be Latin or German) that has beenencrypted.

. Plain text hypothesis. The VMS text is plain text in natural, not yet identified lan-guage that either did not possess an original alphabet in the beginning 16th cen-tury or the system of writing appeared too complex to a medieval scholar. Theword length statistics makes East Asian languages, in particular Chinese, the mostpromising candidate for this (Chinese theory). Alternatively, the script could alsohave been invented together with an artificial language.

. Hoax hypothesis. The VMS contains no meaningful text at all. In this context, theword ‘‘hoax’’ should be associated with a broad spectrum of possibilities, rangingfrom intentional forgery for monetary gain to the work by an idiot savant, inter-preted by medieval scholars as ‘‘revelation of arcane lore.’’

These three classes are not completely distinct. For example, the VMS couldcontain a message hidden steganographically in a set of otherwise meaningless

Address correspondence to Dr. Andreas Schinner, Institut fur Experimentalphysik,Abteilung fur Atom- and Oberflachenphysik, Johannes Kepler Universitat, AltenbergerStraße 69, 4040 Linz, Austria. E-mail: [email protected]

Cryptologia, 31:95–107, 2007Copyright � Taylor & Francis Group, LLCISSN: 0161-1194 printDOI: 10.1080/01611190601133539

95

Downloaded At: 14:22 25 October 2010

Page 3: 02 Voynich Manuscript

strings. This theory is especially difficult to prove or disprove; the best argumentagainst it known so far is a psychological one: the basic principle of steganographyis to hide the mere existence of a message—and the worst place to hide a genuinesecret is an apparently mysterious book.

It is one of the most striking features of the VMS that even modern computeraided analysis so far could not rule out a single one of these interpretations defi-nitely. Instead, arguments pro and contra all three viewpoints can be given: sincestatistical properties characteristic for natural languages are also present in theVMS text, the encryption method used—if any—should not be too complex;additionally, around 1500, cryptology was still in its early beginnings. Despite thesefacts, all attempts of decipherment by modern cryptanalysts have failed. On theother hand, the text shows several ‘‘exotic’’ linguistic features like the frequent wordrepetitions, or the preferred positions for certain letters within a line; this appearsto be incompatible with the plain text hypothesis, even in the ‘‘artificial language’’version.

Consequently, there are attractions in the hoax hypothesis. However, the VMS textobviously is not composed of simple random strings, and it shows rich linguistic-likestructure. It seemed unlikely that a medieval hoaxer (or even an early 20th century for-ger) could create such a convincing ‘‘facsimile language’’ within reasonable time. Thework by Gordon Rugg [11] has proven that this need not necessarily be true: an algor-ithm feasible even with medieval technology (the ‘‘table-and-grille’’ method) makes itpossible for a single person to generate a text as long and complex as the VMS withinapproximately three months. This, however, is just a possibility and far from a proofof the hoax hypothesis. Furthermore, the ‘‘table-and-grille’’ method as investigated sofar does not explain all of the statistical text properties of the VMS. The three concur-rent explanation classes are thus still of roughly equal relevance.

In this article, statistical investigations of the VMS are presented that provideadditional restrictions to possible solutions. Mapping the text to a random walkuncovers characteristic long-range correlations not present in normal human writ-ings; they better fit to a stochastic process with memory effects than a sequence oftokens chosen according to linguistic rules. Furthermore, the distribution of gapsbetween two similar or selected tokens, respectively, also differs qualitatively fromnormal texts; its mathematical properties indicate the presence of very unusual ‘‘ran-dom effects.’’ Possible implications of these results for the interpretation of the VMSare discussed in the conclusions section.

Throughout this article, the following usual conventions are used: the term tokendenotes any string of characters separated by spaces or line start or end; a word is atype of token regardless of its frequency in the text. For characters or tokens fromthe VMS script the European Voynich Alphabet (EVA) is used [15]; the letters (orsequences of letters) are written italic and are put in angle brackets: (for example,the notorious most frequent VMS token will be transcribed as hdaiini).Finally, the analysis presented in this article is based on the various text sampleslisted in Table 1.

Random Walk Model

Following Kokol, Podgorelec, Zorman, Kokol, and Njivar [8], long-range power lawcorrelations are present in a wide variety of information encoding systems, rangingfrom human writings (natural languages and computer programs) to DNA

96 A. Schinner

Downloaded At: 14:22 25 October 2010

Page 4: 02 Voynich Manuscript

sequences. To some extent, they characterize the information content and complexityof communication.

A useful method to study correlations in character strings is based on mappingthe symbol sequence to a stochastic process that, especially in linguistic literature,frequently is called Brownian walk. This terminology is somehow misleading, sinceBrownian motion can be described as scaling limit of a so-called random walk: inthe theory of stochastic processes [2] it is characterized by independent steps thatall have the same probability distribution, i.e., are uncorrelated. On the other hand,in statistical physics, for example, the expression random walk with memory is some-times used to describe a situation when the stochastic process generating the steps isof Markovian or even non-Markovian type. In the following, random walk should beunderstood with respect to this generalized meaning.

As a first step it is necessary to encode the characters of the texts under investi-gation to bit sequences. It has been shown that the actual definition of this code tablehas negligible influence on the interesting quantities, as long as all (or at least almostall) possible bit patterns are used [12]. Since the VMS contains no punctuation signsthey are removed from the other texts too; upper case characters are converted tolower case. Thus the remaining character set consists of the letters ‘a’–‘z’, theGerman umlauts ‘a’, ‘o’, ‘u’, and the German s–z ligature ‘ß’; empty spaces areignored. These 30 characters can be represented by a 5-bit code.

The bits of the resulting binary string then define the steps � 1 of a randomwalk. Let

Dyðl; l0Þ ¼ yðl þ l0Þ � yðl0Þ ð1Þ

be the walk displacement between step numbers l0 and lþ l0. Then

FðlÞ2 ¼ Dy2� �

� Dyh i2 ð2Þ

describes the variance of the mean displacement. The angle brackets denote aver-aging over all l0. For pure (uncorrelated) random walks of infinite length, wherethe steps are Bernoulli trials with probability p, one easily obtains:

FðlÞ2 ¼ 4pð1� pÞl ð3Þ

In general, F(l ) will behave asymptotically as FðlÞ / la, where an exponent a 6¼ 0.5indicates the presence of long-range correlations.

Table 1. Text sources used in this article

Text Text part Language Number of tokens Number of words

Voynich manuscript1 All 2 Unknown 36,000 7000Vulgate Bible 5% 3 Latin 25,000 6000Luther Bible 5% 3 German 35,000 4000Alice in Wonderland All English 26,000 3000Chinese Bible Genesis Mandarin4 34,000 2000

1 majority vote version of interlinear EVA transcription 1.6e6 [15]; 2or particular sections ofit; see Table 2; 3percentages are counted from top of document; and, 4in pin-yin romanizationwith all tones removed.

The Voynich Manuscript: Evidence of the Hoax Hypothesis 97

Downloaded At: 14:22 25 October 2010

Page 5: 02 Voynich Manuscript

Particular care has to be taken evaluating Eq. (2) for a walk of finite length N toavoid finite size effects: as l ! N � 1 the sample size available for calculating theaverages (i.e. the number of possible l0� values) tends to 1; consequently,FðlÞ ! 0. In the calculations presented here l is limited to a maximal value of N=10.

The resulting F(l) on applying this method to the VMS and other texts is shownin Figure 1. Previous investigations by Kokol et al. [8] of various human writingshave demonstrated that for natural language texts (almost independent of the lan-guage used) the asymptotic exponent a of F(l) does not notably differ from 0.5, whilefor computer program source codes significant deviations are observed. As far as thenormal language samples are considered the present results confirm this.

Most interestingly, the VMS text shows completely different behavior: a cross-over point exists where the ‘‘random process’’ a ¼ 0.5 turns into an asymptoticexponent a � 0:85, indicating the presence of ‘‘memory effects’’ in the underlyingstochastic process. The principal structure of F(l) remains the same also for singlesections of the VMS, as presented in Table 2: the asymptotic exponents for partsof the VMS are somewhat lower (between 0.7 and 0.8) than for the whole text;the difference is mainly due to the relatively high sensitivity of a to reduction ofthe walk length. Two facts are especially noteworthy: (i) the crossover pointlco � 360 ð¼72 characters� 5 bitsÞ of the whole text fits well to the average linelength; (ii) this value approximately also holds for sections that are associated withCurrier’s language A [3], while for sections written in language B lco is significantlyhigher (by approximately a factor of 3).

It appears that in the VMS significant correlations between tokens with spacingof more than an average text line exist, while within a line the text behaves randomly(like ordinary human writings). To inspect this more closely, the step (or bit) auto-correlation function

Figure 1. Root mean square fluctuation of the random walk displacement for the VMS andnormal language texts. Inset: VMS curve (full line) with low and high l asymptotic behavior,respectively (dashed lines).

98 A. Schinner

Downloaded At: 14:22 25 October 2010

Page 6: 02 Voynich Manuscript

CðlÞ ¼ nðl þ l0Þnðl0Þh i � nðl0Þh i2 ð4Þ

and its corresponding cumulative distribution function

CcðlÞ ¼1

l

Xl

k¼1

CðkÞ ð5Þ

are useful quantities. n(k) denotes the value (0 or 1) of the bit at position k in the binarystring generating the random walk. As demonstrated in Figure 2 positive correlationsin the VMS build up within approximately l < 400 that are by an order of magnitudestronger than in ordinary text. These correlations decay after some thousand steps.Such positive correlations are typical for a stochastic process in which the probabilityof a particular random event is increased by previous occurrences of this event.

Table 2. Random walk asymptotic displacement variance

VMS section Folios Walk length a1 a1 lco2 Script3

All 1r–116v 954456 0.131 0.846 356 AþBHerbal 1r–66v 272896 0.243 0.768 196 AAstrological 67r–73v 74721 0.396 0.659 339 ?Biological 75r–84v 172096 0.161 0.762 1065 BPharmaceutical 87r–102v 99176 0.314 0.706 277 ARecipes 103r–116v 282536 0.182 0.738 1285 B

1 FðlÞ ! ala for l >> 1; see Eq. (2) and text; 2Crossover l-value: l0:5co ¼ alaco; and, 3Currier lan-

guage [3] that is dominant in this section.

Figure 2. Cumulative step autocorrelation function Cc(l), cf., Eq. (5), (smoothed by 100 pointsadjacent averaging); full line: VMS, dashed line: Vulgate Bible. Inset: autocorrelation functionC(l), cf., Eq. (4), for l between 1000 and 1030; full line: VMS, gray shaded area: Vulgate Bible.

The Voynich Manuscript: Evidence of the Hoax Hypothesis 99

Downloaded At: 14:22 25 October 2010

Page 7: 02 Voynich Manuscript

A classical model for such a system, often applied to cascade processes like par-ticle induced electron emission [1], is the so-called P�oolya process. It is based on theP�oolya urn scheme, where on drawing a ball of particular color from an urn a specificnumber of balls of the same color are put into the urn, increasing the probability ofdrawing this color again [5] (‘‘spurious contagion’’). In the scaling limit of large stepnumbers l the resulting distribution is the so-called P�oolya distribution, also known asnegative binomial distribution

Pn ¼�1=b

n

� �ð�blÞnð1þ blÞ�1=b�n ð6Þ

In the present context Pn is the probability that in a walk of length l!1 the numberof up-steps is equal to n. Mean and variance of Pn are given by

nh i ¼ l ð7Þ

r2 ¼ lð1þ blÞ ð8Þ

The parameter b describes the ‘‘cascading strength’’ of the process: for b ¼ 0 the ran-dom steps are uncorrelated and Eq. (6) turns into a Poisson distribution, while forb ¼ 1 the so-called Yule-Ferry process (also known as simple birth process) isrecovered [2].

Since l / l, from Eq. (8) follows that an underlying P�oolya process results in theasymptotic behavior FðlÞ /

ffiffiffibp

l1 of the random walk model. In order to reproducethe observed a � 0:85 from Figure 1, l-dependence of b is necessary. Strictly speak-ing, the underlying process then is no longer a pure P�oolya process, since with non-constant b Eq. (6) no longer satisfies the Kolmogorov equations exactly. Due to therather weak variation of b / l�0:3, however, it still remains a useful approximation.The actual representation of the random walk in form of the VMS text can be usedto estimate the true distribution Pn(l). Unfortunately, in particular for large l (whichrepresents the interesting case) the sample size is too small to identify the distributionwith compelling evidence (mainly because b is small). The data, however, do not con-tradict the hypothesis Eq. (6).

The unusual shape of F(l) for the VMS has major impact on possible interpreta-tions. In particular, the Chinese hypothesis appears not to be compatible with it. Theimpression that a non-Markovian stochastic process, where the step probabilitydepends on the long-term history, may play a key role in the interpretation of theVMS will be still deepened in the following sections.

Similar Tokens Repetition Distance Distribution

In a previous work by G. Landini [9] the repetition distance distribution of the mostfrequent tokens in the VMS (hdaiini), ‘‘Alice in Wonderland’’ (the), and the VulgateBible (et), respectively, have been investigated, i.e., the probability distribution of thenumber of other tokens between two occurrences of the particular one (‘‘iso-wordgap’’). The result did not show characteristic difference between the VMS and thenormal texts, apart from the well-known enigmatic VMS feature that commonwords, in particular hdaiini, quite frequently appear in sequences and consequentlyhave non-vanishing probability for zero repetition distance.

As will be demonstrated in this section it is more instructive to investigate therepetition distance of two similar rather than exactly matching tokens. From the

100 A. Schinner

Downloaded At: 14:22 25 October 2010

Page 8: 02 Voynich Manuscript

many well-known string distance metrics the more straight-forward Levenshtein dis-tance [6] will be used here. More sophisticated methods of calculating string distancestend to be optimized for human writings which appears problematic in the VMS con-text of unknown language and meaning (if any). The Levenshtein distance of twocharacter strings is an integer ranging from 0 (exact match) to the maximum ofthe two string lengths (no similarity), denoting the number of elementary edit opera-tions necessary to make both strings equal. Mapping this number to the interval[0,100] yields a ‘‘percentage of dissimilarity’’ for two tokens.

In Figure 3, the similar token repetition distance distribution Pn for the VMScompared with normal texts is presented. Here n denotes the number of other tokensbetween two similar ones, i.e., n ¼ 0 corresponds to the situation of two alike tokensin immediate vicinity. Two words are considered ‘‘similar’’ if their dissimilarity asdefined above is less or equal to 30%; it turns out that the precise value (�10%)of this threshold changes Pn only quantitatively, not qualitatively. The most strikingfeature is the almost ‘‘mathematically perfect’’ smooth shape of the VMS curve forn! 0, while the other text sample data display the expected ‘‘irregular’’ behaviorand tend to zero (or at least small values). As noted previously, this simply expressesthe effect that writers normally try to avoid word repetitions. It is especially note-worthy that even the Chinese text lies closer to the European languages than theVMS, although the higher tendency of common-word repetition sequences in Asianlanguages is a frequent argument in favor of the Chinese theory. The remaining textsamples listed in Table 1 have been omitted in Figure 3 just to avoid confusion bytoo many markers; their behavior is comparable to that of the Vulgate Bible.

Let us consider an infinite random text consisting of N words occurring withprobabilities kk, k ¼ 1,. . ., N. The chance for a particular word k to reappear for

Figure 3. Similar tokens repetition distance distribution (maximal dissimilarity ¼ 30%) of theVMS, compared with Vulgate Bible and the pin-yin text. Inset: VMS result and fit usingEq. (12) (a ¼ 3.5618, b ¼ 0.1534, q ¼ 0.9885).

The Voynich Manuscript: Evidence of the Hoax Hypothesis 101

Downloaded At: 14:22 25 October 2010

Page 9: 02 Voynich Manuscript

the next time exactly after n other tokens follows a geometric distribution kkð1� kkÞn.The total token repetition distance distribution is then given by

Pn ¼XN

k¼1

k2kð1� kkÞn ð9Þ

The geometric distribution has its maximum at n ¼ 0 and decreases monotonically; abehavior also true for the VMS data in Figure 3.

The fact that normal texts as well as the VMS obey Zipf’s first law [10] suggeststhe approximation kk / 1=k. As rough estimate for small n the discrete index k maybe replaced by a continuous variable j, turning the sum Eq. (9) into an integral. Set-ting kj � c=j with an upper cutoff jm to ensure convergence of the kj-norm, andunder the reasonable assumption c << 1, Eq. (9) then yields

Pn � c1� ð1� cÞnþ1

nþ 1ð10Þ

For large n, the sum Eq. (9) may be estimated by the maximal summand, as longas the kk cover the range of values down to zero sufficiently dense. The maximum k0

of the function f ðkÞ ¼ k2ð1� kÞn is given by k0 ¼ 2=ðnþ 2Þ, and Eq. (9) can beapproximated by

Pn �4

e2

1

nþ 2

� �2

ð11Þ

The n dependence of Eqs. (10) and (11) suggests

Pn ffi a1� qnþ1

1þ nþ bn2ð12Þ

with parameters a, b, and q as interpolating fit formula. As can be seen in Figure 3, itexcellently represents the VMS data with ‘‘reasonable’’ parameter c ¼ ð1� qÞ ¼ k1:it equals the order of magnitude of the relative frequency of the VMS token hdaiini.The other parameters a and b reflect the mixture of the two asymptotic limits. On ascale large enough all texts are somehow ‘‘random’’ and produce the observed 1=n2

tail in Pn. The small-n behavior of the VMS is the most remarkable effect: it appearsto indicate the presence of some kind of ‘‘random selection process’’ during the textgeneration, as already noted in the previous section.

It should be emphasized again that the VMS text obviously is not a simple con-volution of independent random strings; at least the underlying stochastic processmust be fairly complex, involving history dependent variation of the step probabil-ities, building up correlations. This is also instructively demonstrated by comparingPn of a text with its token scrambled version (i.e., where the token positions havebeen transposed randomly). As can be seen in Figure 4, token scrambling modifiesthe VMS result only quantitatively (which confirms an already present ‘‘degree ofrandomness’’ in the original text), whereas the Vulgate Bible curve is transformedin shape towards the VMS data; in the contrary, P0 for the VMS is decreased signifi-cantly. This effect appears compatible with the assumption of a ‘‘key stochastic pro-cess’’ with spurious contagion of, e.g., P�oolya type involved in the VMS text generationmethod.

102 A. Schinner

Downloaded At: 14:22 25 October 2010

Page 10: 02 Voynich Manuscript

Selected Tokens Repetition Distance Distribution

In the previous section the probability for n other tokens separating two arbitrarilyselected but similar ones (with respect to Levenshtein string metric distance) has beeninvestigated. Although the unusual behavior of the VMS text contrasting normalhuman writings is clearly visible, the statistical details are somehow concealed dueto the nature of the problem: the geometric distribution characteristic for randomsequences is ‘‘expanded’’ to power-law behavior by the summation Eq. (9). Further-more, the concept of ‘‘similarity’’ as well as finite sample size effects add extra ‘‘ran-dom noise’’.

In this section the problem will be modified slightly: what is the probability fortwo tokens sharing a particular property, being separated by n ones that do not pos-sess this property? Such a property may be the occurrence of a particular letterwithin a token, or a special word structure. This type of question appears especiallypromising since it is a well-known fact that VMS words possess a rich variety ofcharacteristic structural details (crust-mantle-core decomposition [14]).

The symbol hqi in the VMS appears almost always in word-initial position. Ithas been speculated that it might be a prefix with meaning ‘‘and,’’ rather than partof the remaining token (much like the Latin suffix ‘‘que’’). In Figure 5, the repetitiondistance distribution of tokens beginning with hqi is plotted, compared with that ofthe token ‘‘und’’ (the German word for ‘‘and’’) in the Luther Bible. Again, the VMSresult yields a surprisingly simple and smooth curve, qualitatively different from thatassociated with the normal text. A more detailed analysis of the data shows that Pn

can be excellently fitted by a mixture of two geometric distributions:

Pn ¼ ap1ð1� p1Þn þ ð1� aÞp2ð1� p2Þn ð13Þ

Figure 4. Similar tokens repetition distance distribution (maximal dissimilarity ¼ 30%) of theVMS, compared with Vulgate Bible and token scrambled versions of both texts. The lines justconnect the markers to guide the eye.

The Voynich Manuscript: Evidence of the Hoax Hypothesis 103

Downloaded At: 14:22 25 October 2010

Page 11: 02 Voynich Manuscript

A mixture of two probability distributions indicates the presence of two inde-pendent subpopulations in the statistical data. Eq. (13) is, for example, producedby the following random process: use two dice with ‘‘success’’ probabilities p1 andp2, respectively. Throw a die until ‘‘success’’ (‘‘failure’’ means not to add the hqi pre-fix to a token in the sequence); then continue with either die 1 or 2, depending on arandom decision with probability a.

However, this should only be seen as ‘‘example algorithm’’; the mechanismsbehind the text generation process must be somehow more complex, as has beendemonstrated in the previous sections. In this context it is especially noteworthy thatEq. (13) is also compatible with (i.e., is a good approximation to) the situation of astochastic process with varying step probability, being gradually decreased from p1 top2 on ‘‘failure’’ events, and reset to p1 on ‘‘success.’’ This provides another link tospurious contagion processes like the P�oolya scheme discussed previously.

The hqi prefix is just a single aspect of the fairly complex VMS word grammar.However, the behavior expressed by Eq. (13) is found throughout a wide variety oftoken selection conditions; a few examples are listed in Table 3. Most interestingly,the crossover point between the two geometric distributions (i.e., the real value n forwhich both terms of Eq. (13) contribute equally) is in most cases close to the averagenumber of tokens per line. For a token scrambled version of the VMS, however, Eq.(13) is reduced to a single geometric distribution, as is expected in agreement with theprevious analysis.

On the other hand, for normal texts two possible results have been found so far:if the selection criterion is weak and linguistically (almost) irrelevant, then the resultwill be a single geometric distribution (straight random result); an example is the

Figure 5. Repetition distance distribution of VMS tokens beginning with EVA hqi (fullsquares), and the token ‘‘und’’ in the Luther Bible (open circles), respectively. Full line: fitof the geometric distribution mixture Eq. (13) with parameters a ¼ 0.50275, p1 ¼ 0.28531,p2 ¼ 0.10482.

104 A. Schinner

Downloaded At: 14:22 25 October 2010

Page 12: 02 Voynich Manuscript

selection of all tokens in an English text that contain the letter ‘‘e.’’ If, however, thecondition is correlated with semantic (sub-) structures or at least nontrivial tokenparts the result more or less resembles the Luther Bible curve in Figure 5. Like inthe previous sections the behavior of the Chinese text does not differ significantly.

Conclusions

Concerning the VMS enigma, such investigations are of special interest that empha-size the peculiar structural properties of the VMS text in contrast to normal lan-guage. All methods of analysis used in the present article fall into this category.

Interpreting normal texts as bit sequences yields deviations of little significancefrom a true (uncorrelated) random walk. For the VMS, this only holds on a smallscale of approximately the average line length; beyond positive correlations buildup: the presence=absence of a symbol appears to increase=decrease the tendencytowards another occurrence. The P�oolya urn scheme is an example for such a beha-vior; it is, however, not exactly reproducing the VMS data and should be seen asa first approximation only.

Nevertheless, this result has important implications on the possible solutions ofthe VMS riddle. Encryption tends to destroy correlations in a text rather than build-ing them up. The method, however, could be a more complex variant of a wordgame, like the children’s secret language Opish (there you add the syllable ‘‘op’’before each vowel); in this case the effective information content of the VMS wouldat least be rather low. The result appears incompatible with the plain text hypothesis.Even in artificial language correlations tend to be contextual, i.e., on the small scaleof a few sentences.

Thus, the hoax hypothesis may provide the most convincing explanation base forthe data. A variant of the ‘‘table-and-grille’’ method still is a promising candidate, ifthe table is filled with syllables selected under involvement of some ‘‘lottery algor-ithm’’ producing the observed statistical effects. The source for the positive correla-tions might as well be (or partly be) a psychological one: the creator of the tablecould unconsciously have written them into it while trying to ‘‘equally distribute’’the syllables (the human mind is extremely poor at generating random numbers).An additional problem, however, arises upon reusing a table with different grilles:the variance Eq. (2) is very sensitive to correlations created by overlapping slot pat-terns, leading to significant structures in F(l) for large l. To avoid this behavior notobserved for the VMS text, about 4 to 6 only (of the 27 possible) 3� 3 grilles can be

Table 3. Some examples for the parameter fits of Eq. (13)

Selection condition a p1 p2 xC2

Token begins with hqi 0.50275 0.28531 0.10482 4.5Token contains hcChi1 0.58879 0.12189 0.03196 17.4Token contains hchei 0.90501 0.16838 0.03489 25.7Token contains hshei 0.74403 0.12395 0.02811 24.6Token ends with haiini 0.93027 0.12892 0.02001 37.8

1 C stands for a gallows character (hf i, hki, hpi, hti); and, 2crossover point: xC ¼ln ð1�aÞp2

ap1

� �.ln 1�p1

1�p2

� �:

The Voynich Manuscript: Evidence of the Hoax Hypothesis 105

Downloaded At: 14:22 25 October 2010

Page 13: 02 Voynich Manuscript

used with a particular 39� 40 table. It is unlikely that the creator of the VMS hasexcluded the ‘‘forbidden’’ grille layouts by mere luck, but perhaps out of aesthetic(symmetry) considerations? On the other hand, the ‘‘table-and-grille’’ scheme neednot necessarily contain the (whole) truth about the VMS generation process, evenif the hoax hypothesis finally might turn out to be correct.

The token repetition statistics also emphasizes the strangeness of the VMS ‘‘lan-guage.’’ Again the results differ significantly from comparative text samples, indicat-ing that the VMS ‘‘language’’ is more closely related to a stochastic process thanhuman communication. Of particular interest is the mixture of two geometric distribu-tions Eq. (13) that almost perfectly describes the gap distribution of tokens with, forexample, a particular prefix. Such ‘‘exact statistical properties’’ of complex systemsare either trivial (as in the case of purely random aspects) or express an underlyingprinciple. Since Eq. (13) contains a crossover between two terms it most probably isnot trivial (pure randomness would have yielded a single geometric distribution).

Another ‘‘exact property’’ of the VMS is already well known: the word length dis-tribution follows almost exactly a binomial distribution. This fact has been a strongargument in favor of the Chinese theory [13] since East Asian languages, in particularChinese, also show this feature. The present investigations, however, let the Chinesetheory appear much less promising; instead, the mathematically exact shape of theVMS word distribution may be seen as additional evidence of an underlying stochasticprocess (a binomial distribution describes the sum of independent random summands).

It must be emphasized that the present study is not a proof of the hoax hypothesis,nor can it definitely rule out either of the two other main theory classes. It gives, how-ever, some hints on the most promising direction for future investigations. In the textso far I was trying to avoid writing down my personal opinion about the VMS, whereit goes beyond the presentation of facts and the inevitable basic interpretation of stat-istics (I am aware of how easily statistics can be misinterpreted following prejudice).From my viewpoint, the VMS is a cleverly set psychological trap still active after fivecenturies, reflecting the analysts’ expectations and hopes like a mirror without con-taining meaningful information itself. It has been created using ‘‘algorithmic’’ meth-ods, implicitly or explicitly involving some degree of randomness.

A frequent argument against the hoax hypothesis is that even utilizing somethinglike the ‘‘table-and-grille’’ the effort for a hoaxer would have been inadequately high:to defraud Emperor Rudolf II of Bohemia (the possible first buyer of the VMS) amuch simpler concept should have been sufficient. As always with psychologicalarguments there is the intrinsic danger of projecting a value system. Perhaps theVMS is the once-in-a-lifetime masterpiece of a habitual forger—or simply a specialkind of artwork, created with no immoral motivation: around 1980 the Italian archi-tect and industrial designer Luigi Serafini has written and illustrated his famousCodex Seraphinianus (most probably inspired by the VMS) that looks like the visualencyclopedia of an ‘‘extraterrestrial world,’’ and is written in incomprehensible ‘‘lan-guage’’ with strange curvilinear script. Obviously there is some artistic or even philo-sophical attraction in the creation of a phantasmagoric book that has no inherentmeaning—and therefore, can take on any one.

Acknowledgment

The author wishes to thank M. A. Labi for stimulating discussions and proofreadingthe manuscript.

106 A. Schinner

Downloaded At: 14:22 25 October 2010

Page 14: 02 Voynich Manuscript

About the Author

Dr. Andreas Schinner is a theoretical physicist, performing freelance research at theJohannes Kepler University in Linz, Austria. His main area of scientific interest istheoretical solid state physics—particularly particle beam interactions with matter.He is also working as a self-employed software developer.

References

1. Benka, O., A. Schinner, and T. Fink. 1995. ‘‘Distribution of the Number of Emitted Elec-trons for MeVHþ -, and He2þ -ion Impacts on Metals,’’ Phys. Rev. A, 51(3):2281–2284.

2. Cox, D. R. and H. D. Miller. 1965. The Theory of Stochastic Processes. London: Methuen& Co Ltd.

3. Currier, P. H. 1976. Some important new statistical findings. Proceedings of a Seminarheld on 30 November 1976 in Washington DC. In edited by M. E. D’Imperio. Privatelyprinted pamphlet, 30 November 1976. ftp://ftp.funet.fi/pub/doc/religion/occult/necro-nornicon/voynich/currier.paper. Last date accessed by me=web document update: 20Feb 2007.

4. D’Imperio, M. E. 1978. The Voynich Manuscript–An Elegant Enigma. Laguna Hills, CA:Aegean Park Press.

5. Feller, W. 1957. An Introduction to Probability Theory and its Applications. Vol. 1, NewYork: Wiley.

6. Gilleland, M. 2002. Levenshtein Distance in Three Flavors. http://ww.merriampark.-com/Id.htm last accessed 20 Feb 2007.

7. Kennedy, G. and R. Churchill. 2005. The Voynich Manuscript: The Unsolved Riddle of anExtraordinary Book Which has Defied Interpretation for Centuries. London: Orion Pub-lishing Group Ltd.

8. Kokol, P., V. Podgorelec, M. Zorman, T. Kokol, and T. Njivar. 1999. ‘‘Computer andNatural Language Texts–A Comparison Based on Long–Range Correlations’’, Journalof the American Society for Information Science, 50:1295–1301.

9. Landini, G. 2000. Zipf’s laws in the Voynich Manuscript, http-document, currently no 405longer available in the Internet.

10. Landini, G. 2001. ‘‘Evidence of Linguistic Structure in the Voynich Manuscript UsingSpectral Analysis,’’ Cryptologia, 25(4):275–295.

11. Rugg, G. 2004. An Elegant Hoax? A Possible Solution to the Voynich Manuscript,’’Cryptologia, 28(1):31–46.

12. Schenkel, A., J. Zhang, and Y. Zhang. 1993. ‘‘Long Range Correlations in Human Writ-ings,’’ Fractals, 1(1):47–55.

13. Stolfi, J. 2002. Chinese theory Redux: Comparing the VMS and East Asian word lengthdistributions. http://www.ic.unicamp.br/~stolfi/voynich/02-01-18-chinese-redux/ lastaccessed 20 Feb 2007.

14. Stolfi, J. 2003. Voynich manuscript Stuff. http://www.ic.unicamp.br/~stolfi/voynich/lastaccessed 20 Feb 2007.

15. Zandbergen, R. 2003. The Voynich manuscript. http://www.voynich.nu/ last accessed 20Feb 2007.

The Voynich Manuscript: Evidence of the Hoax Hypothesis 107

Downloaded At: 14:22 25 October 2010


Top Related