non-traditional authorship attribution studies: ignis ...2000)/b... · non-traditional authorship...

Download NON-TRADITIONAL AUTHORSHIP ATTRIBUTION STUDIES: IGNIS ...2000)/B... · NON-TRADITIONAL AUTHORSHIP ATTRIBUTION STUDIES: IGNIS FATUUS OR ROSETTA ... and stylistics to determine who

If you can't read please download the document

Upload: phungthien

Post on 22-Feb-2018

224 views

Category:

Documents


1 download

TRANSCRIPT

  • NON-TRADITIONAL AUTHORSHIP ATTRIBUTION STUDIES:

    IGNIS FATUUS OR ROSETTA STONE?

    JOSEPH RUDMAN

    Some words, such as 'Phrenology' or 'Styhmetry' insinuate their own assumptions. In fact, nobody has ever proved that minds can be measured by bumps, or style by numbers.

    - Eric Sams t

    In our view the protagonists of stylistic analysis in forensic applications have not only failed to demonstrate such a link [between style and authorship] but have not even attempted to do so.

    - R.N. Totty'

    INTRODUCTION Non-traditional authorship attribution studies pose an enigma. Do the more than 700 published non-traditional studies constitute a Rosetta Stone allowing us to name virtually evety anonymous author, as Andrew Morton and many others would have us believe?' Or, are these studies an ignis fatuus with just enough legitimate, successful techniques and results to lure unsuspecting practitioners into a quagmire full of half truths and flawed techniques? Do these studies show non-traditional authorship attribution to be simply 'aspiration' and not a science, as Furbank and Owens claim?4

    This paper moves through a short discussion of the 'what', 'why', and 'who' of non-traditional authorship studies to a more detailed look at the hypothetical and theoretical underpinnings of these studies - their assumptions, presumptions, and verifiable constructs. Some other problems and potential solutions are then discussed.

    1. 'Edmond lronside and Stylometry', Notes andQueries, Dec. 1994, pp.469-472 (469). 2. R.N. Totty, et al., 'Forensic Linguistics: The Determination of Authorship from Habits

    of Style', Journal of the Forensic Science Society 27, 1987, pp.13-28 (18). 3. Michael Farringdon, 'The Critics Answered', in Jill M. Farringdon, et al., Analysingfor

    Authorship (Cardiff: University of Wales Press, 1996), pp.239-261. 4. P.N. Furbank & W.R. Owens, 'Dangerous Relations', The 5crib/en'an and the Kit-Cats,

    33(2),1991, pp.242-244 (242).

    BSANZ Bulletin vol.24 no.3, 2000, 163-176

    Copyright of Full Text rests with the original copyright owner and, except as permitted under Copyright Act 1968, copying this copyright matenal is prohibited without the permission of the 0W?er or its exclusive licensee or agent or by way of a hcence from Copyright Agency Limited. For infonnation about such licences contact Copyright Agency Limited on (02) 93947600 (ph) or (02) 93947601 (fax)

  • 164 Bibliographical Society of Australia and New Zealand Bulletin

    WHAT IS NON-TRADITIONAL AUTHORSHIP ATTRIBUTION? Non-traditional authorship attribution studies are those attribution studies that make use of the computer, statistics, and stylistics to determine who wrote a disputed text. But there is much more to be said about the discipline if we are to understand its strengths and weaknesses.

    TYPES OF STUDIES

    Try to balance in your own mind the question whether the latter [text] does not deal in longer words than the former [text]. It has alwqys run in my head that a little expenditure of monry would settle questions of authorship this wqy ... Some of these dqys spurious writings will be detected f:y this test. Mind, I told you sa

    - Sophia De Morgan5

    The beginning of non-traditional authorship studies is usually attributed to Augustus Dc Morgan in 1851 - whether or not he was the first to think along these lines is moot. Almost all of the earlier practitioners do credit him for their inspiration.

    Every authorship attribution problem is different. Experimental designs cannot be used as templates from study to study. There is no formulaic algorithm into which you can plug an authorship problem.

    There are four main categories of non-traditional authorship attribution studies: 1. Anonymous work - no idea of potential author It is an almost impossible task to solve this kind of problem. A study would have to be designed that would compare the anonymous work to every potential author of the period. However, with the rapid advances taking place in all of the disciplines that make up non-traditional authorship studies - especially in the computer hardware area - studies in this category have the potential to become feasible in the not too distant future. 2. Anonymous work - two, three, or some other workable number of potential authors The most successful work in non-traditional authorship studies has been in this second category where the problems and detection techniques are close to manageable - e.g. Mosteller and Wallace's Federalist Papers study; to determine which of the twelve disputed Federalist Papers were written by Hamilton and which were written by Madison, Foster's Primary Colors study' to determine who, from a list of

    5. Memoir of Augustus de Morgan (London: Longrn.ns, Green & Co., 1882), pp.21S -216. 6. Fredrick Mosteller & David L. Wallace, Applied Bcryesian and Classicalltiference: The Case of

    tbe 'Federalist Papers' (New York: Springer-Verlag, 1984). 7. Don.ld W. Foster, 'Primary Culprit: An Analysis of a Novel of Politics', New York, 26

    Feb. 1996, pp. 50-57.

  • Non-Traditional Authorship Attribution Studies 165

    thirty five candidates,' wrote Primary C%rs, and Holmes's Cassandra study' to determine who, from a list of seven candidates, wrote the Cassandra letter that criticized Prime Minister Tony Blair.

    It is much easier to determine if a disputed text was written by either author 'A' or author 'B' than it is to determine if a given text was written by author 'A' - with the latter, potentially there are hundreds of other candidates. In-the former, the practitioner only has to determine which candidate's style most closely resembles the questioned work. This test of homogeneity is only valid if, in fact, the 'real' author is one of the candidates - something that often is not the case. 3. Anonymous work - a collaboration It may be next to impossible to accurately separate out all types of collaborative writings. Stillinger has a good discussion of this problem. lO Dobranski also has a good discussion of what he calls 'co-Iaboring' or working together. He shows how, ':M:ilton depended on amanuenses, acquaintances, printers, distributors, and retailers - often in dramatic ways.'!! There are many studies that attempt to separate out multiple authors of a text that achieve varying degrees of success. 12

    This categoty includes various kinds of collaborations, each with its own concomitant problems - for example there are:

    True collaborations W'hen two or more writers get together and suggest ideas, words; or whole sentences to each other, e.g. a group of writers generating a screen play. This could entail the collaborators alternating scenes or acts under an agreed -upon master plan. An example of true collaborations is The Student Bo

    8. Foster actually quit testing after fifteen candidates because he felt he had the answer in Joe Klein.

    9. Robert Matthews, 'Unmasking Anonymous', Daily Telegraph (London), 3 Dec. 1996, p.7.

    10. Jack Stillinger, 'Multiple Authorship and the Question of Authority', Text: Transactions of the 5 ofiety for Textual Scholarship S, 1991, pp.283-293.

    11. Stephen B. Dobranski, Milton, Authorship, and the Book Trade (Cambridge: Cambridge University Press, 1999), p.9.

    12. Estelle lrizarry, 'The Two Authors of Columbus' Diary', Computers and the Humanities 27(2), 1993, pp.8S-92. Another example is M.W.A. Smith, 'A Procedure to Determine Authorship using Pairs of Consecutive Words: More Evidence for Wilkins's Participation in Peridel, Computers and the Humanities 23(2), 1989, pp.113-127.

    13. Jane Harvard, The Student Body: A Novel(fJew York: Villard, 1998).

  • 166 Bibliographical Society of Australia and New Zealand Bulletin

    Editors as collaborators How do 'Style Books' enter the picture? How many words, phrases, and punctuation marks are the result of editorial intervention? How many entire paragraphs, even pages were deleted? How valid are studies done on Greek 'sentences'? Good has a discussion of the fact that ancient Greek had no punctuation but that it is usually clear to scholars what constitutes a sentence." Daniel Defoe seldom punctuated his works, and almost all of them were pointed by editors or even typesetters. Informal commentators as collaborators An example is friends reading a manuscript and suggesting changes. These are the type of changes that Percy Bysshe Shelley made to Mary Shelley's Frankenstein. 1S

    4. Anonymous work - did author 'N write the questioned work? This is the category where I have spent most of my working life trying to determine which of the approximately 970 anonymous works attributed to Daniel Defoe he did write. In addition to showing that Defoe's style is consistent, the practitioner must show not only that the anonymous work is consistent with Defoe but also that no other writer of the period has a similar style. 16

    Prior to any non-traditional authorship study, a rigorous traditional authorship study must be carried out - using all of the bibliographical internal and external evidence - the kind of study that many of us in literature learned to do (or at least about) in graduate school. The traditional scholars' 'considered opinions' in disputed authorship studies should be used to determine the 'prior odds' input to any Bayesian statistical studies.

    WHY DO AUTHORSHIP STUDIES?

    One of the first tasks of scholarship is the assemb!J of its materials, the careful undoing of the effects of time, the examination as to authorship, authenticity, and date

    _ Rene Wellek & Austin Warren 17 An authorship question can have a much different import depending on the circumstances of the query. \XThether or not a new poem was written by Shakespeare does not have the same immediacy or importance as whether or not a person on trial for kidnapping and murder wrote a lengthy ransom note. The 'scholarship' mentioned by Wellek and Warren applies to all of the following

    14. IJ. Good, 'Discussion of the Paper by Mr. Morton', in 'The Authorship of Greek Prose', by A.Q. Morton, Journal of the Rt;yal Statistical Society. B, 1965, pp.169-233 (p.225).

    15. Charles E. Robinson, Mary Wollstonecroft She/ley: The Frankenstein Notebooks (New York: Garland, 1996).

    16. The working plan for the Defoe study is much more complicated than this. An explication of the complete experimental plan of the Defoe work is forthcoming.

    17. Theory of Literature, 3rd ed. (New York: Harcourt, Brace & World, 1956), p.57.

  • Non-Traditional Authorship Attribution Studies 167

    disciplines - but I hardly need to convince the readers of this journal of the importance of proving authorship and fixing each writer's canon. The following overview of various disciplines gives an idea of the why: Literature Most authorship studies fall into this catchall category, from works on the classics to contemporary works. There is Brinegar's work on Mark Twain 18 and O'Donnell's work on Stephen Crane,19 to name but two. A correct canon is a sine qJla non for all literature studies that depend on authorship authenticity.

    The most famous and most worked-on problem of authorship in literature is the Shakespeare canon. One of the earliest non -traditional Shakespeare studies dates to 1887 with Mendenhall's 'The Characteristic Curves of Composition'. 20 The Shakespeare attribution studies, with the concomitant quarrels over methods and results, continue unabated. One of the latest series of articles has Elliott and Valenza versus Foster. 21

    History There is a range of reasons why historians would want to determine authorship -from just 'neatening' things up to more crucial questions of fact and fiction. It is not as important to know if the twelve disputed Federalist Papers were written by Hamilton or Madison22 as it is to know if the History of the Pirates was a factual historical account by Captain J ohnson or a sham written by Daniel Defoe. 23 Philosophy An example of this is Reynolds's authorship work on disputed tracts attributed to Hobbes.24 Another example is Kenny's work on Aristotle. 25

    18. C.S. Brinegar. 'Mark Twain and the Quintus Curtius Snodgrass Letters: A Statistical Study', Amencan Statistical Association Joumal58, 1963, pp.85-96.

    19. Bernard O'Donnell, 'Stephen Crane's The O'Ruddy: A Problem in Authorship Discrimination', in Th, Computer and Literary Style, ed. Jacob Leed (Kent, Ohio: Kent State University Press, 1966), pp.1 07 -115.

    20. T.e. Menclenhall, 'The Characteristic Curves of Composition', Science 9 (214), 1887, pp.237-249. .

    21. Ward E.Y. ElIiott & Robert J. Valenza, 'The Professor Doth Protest Too Much, Methinks: Problems with the Foster 'Response", Computers and the Humanities 32,1999, pp.425-490.

    22. Mosteller & Wallace. 23. Capt. Charles Johnson, A General History of the Robberies and Murders of the Most Notorious

    Pyrates (London: Ch. Rivington, 1724). See also John Robert Moore, 'The Authorship of A General History of the Piratel. in John Robert Moore. De.foe in the Pillory and Other Studies (Bloomington, Ind.: Indiana University Press, 1939), pp.129 -188.

    24. Noel B. Reynolds. 'Statistical Wordprinting', in Thomas Hobbes: Three Di.J(ourses, ed. Noel B. Reynolds & Adene W. Saxonhouse (Chicago: University of Chicago Press, 1995), pp.157-162.

  • 168 Bibliographical Sociery of Australia and New Zealand Bulletin

    Economics26 There is O'Brian and Darnell's work, Authorship Pui!l!es in the History of Economics." Politics The above-mentioned Pnmary Colors and Cassandra papers are two good examples of present-day political intrigue where non-traditional authorship studies played a key roll. Another, non-contemporary example is in my work on Daniel Defoe - he was a spy and counterspy. Defoe would write a tract for the Tories, then counter the arguments in another tract for the Whigs, and then write a third tract on the same subject giving his own views. And all three would be under different pseudonyms. Religion Now we begin to see 'real' consequences of authorship determination - men have been and still are willing to kill and be killed over religion.

    The Book of Genesis: The study by Raddy and Shore on the authorship of Genesis is seated in the Jewish religion - is it a collaboration, a rewrite, or was 'Moses' the author?28 The Book of Mormon: There is Holmes's work as an 'outsider,' titled, Authorship Attribution and the Book of Mormon. There also are numerous 'insider' studies - is it a 'translation' of the tablets delivered by Moram or is it Joseph Smith who wrote the Book of Mormon. 29 The Pauline Epistles: One of the earliest non-traditional studies of the Pauline question is Mascol's 'Curves of Pauline and Pseudo-Pauline Style', published in 1888.30 One of the later works is Neumann's 1990 dissertation, The Authenticity of the Pauline Epistle

  • Non-Traditional Authorship Attribution Studies 169

    Law This is another area where the consequences of authorship determination can literally mean freedom or incarceration - life or death. The use of non-traditional authorship studies in the law, and especially in court proceedings gave birth to the term 'Forensic Stylistics', which if nothing else does give a scientific aura to the field. Gerald McMenamin's book Forensic 5 tylistics gives a good introduction to this discipline.32 Great Britain's judicial system and some others modeled on that system allow non-traditional authorship studies. This acceptance may well be premature -Morton, the most frequent and well-known expert witness, was seemingly debunked on live television.33 The United States courts, as a general rule, do not allow non-traditional authorship studies into the courtroom except in very limited instances. The judge in the Patty Hearst kidnap trial did not allow Dr Singer's testimony on stylistic comparison of the bank holdup note and Miss Hearst's writing samples. The gist of the reasoning is that non -traditional authorship studies do not constitute a legitimate science. 34

    THE 'WHO' OF NON-TRADITIONAL ATTRIBUTION When you look at the education and professional background of the practitioners of non-traditional authorship studies, you see a range unprecedented in any other 'scientific' discipline. Holmes, Mosteller, Wallace, Forsyth, and Tweedy are among the many statisticians. Morton leads a contingent from religion - ministers, priests, and theologians. There are historians and economists. There are professors of literature - led by Burrows and Foster. There are computer scientists, operations researchers - and this list goes on.

    Each of these practitioners brings a different background and range of expertise. But who should the practitioners be? What kind of background do they need? My contention is that every practitioner must have at least a strong working knowledge of:

    The field of the disputed text The history of the age that produced the disputed text Bibliographic techniques Statistics Stylistics Corpus Linguistics Computer Science.

    32. Gerald R. McMenamin, Forensic Stylistics (Amsterdam: Elsevier, 1993), reprinted from Forensic Science Inferna/iona/58, 1993.

    33. Robert Matthews, 'Harsh Words for Verbal Fingerprints', Sunday Telegraph (London), 4 July 1993.

    34. United Statesv. Hearst, FederalSapplement (412) (St. Paul: West Publishing, 1976).

  • 170 Bibliographical Society of Australia and New Zealand Bulletin

    HYPOTHESES The primary hypothesis behind non-traditional authorship studies is that every author has a verifiably unique style. This hypothesis has never been tested, let alone proven. The lack of a proven theory after more than thirty years and 700 studies is one of the chief reasons that non-traditional authorship attribution studies are not accepted - in the main - by either the literary or the scientific community. I will discuss a little later in this paper some of the other reasons why non-traditional authorship attribution studies are considered pseudoscientific claptrap, or at best looked at with scepticism. The scope of any experiment to test the hypothesis that every author has a verifiably unique style is so vast and complex that there are good reasons why a comprehensive study has not been undertaken. Some of these reasons are: Degree of difficulty The number of 'writers' over the years that would have to be included in the test is staggering. Do you include the literature of the oral tradition? Do you include the work by amanuenses? What do you do about the changing concept of a uthor and imitation? Computers The ability to handle the billions of words of text is only now in the realm of possibility. Machine-readable text With the advances in optical character readers and the ability to collect machine-readable text over the net, this element too is only now entering the realm of possibility. The panoply of peripheral disciplines Other disciplines are slowly gaining the maturity needed to aid non-traditional authorship studies - for example, computational linguistics, stylistics, corpus linguistics, statistics, and computer science.

    SUB-HYPOTHESES There are many sub-hypotheses and sub-theories that are behind the mam one. Some of these are: Style is quantifiable No one contests that there are elmements of style that are quantifiable. There are hundreds of studies that quantify various style-markers, such as word length distributions, type/token ratios, and function word frequencies.

    That style is quantifiable is now a given - a fact already established. This quantifiability is what sets the working definition of style not only for this paper but for most non-traditional attribution studies. These practitioners are not concerned,

  • Non-Traditional Authorship Attribution Studies 171

    in the main, with those elements of style that some consider subjective and interpretative - elements such as mood or tone. 3S

    Style changes over time There have been studies that attempt to establish the chronology of an author's body of work. For example, we have Brainerd's work on Shakespeare, 36 Brandwood's work on Plato," and Burwick's work on Carlyle.38

    There have been no studies that show the statement 'style changes over time' to be universally true. There have been no studies to show which, if any, of the style markers change and at what chronological rate they change. Style is different for different genres There have been no large-scale experiments to test this hypothesis. It is intuitive that at least a significant number of an author's style-markers are different for prose and poetry. However, is there a difference between a novel and a biography? An essay and a short story? Style is as differentiating as fingerprints, iris scans, and DNA The claims that there are experimental techniques in non-traditional authorship attribution studies that are comparable to the medical forensic techniques of fingerprint matching, iris matching, or even DNA matching is an 'assumption' that is at best tenuous and, more likely, untenable. These assumptions differ as to the attainable degree of certainty in any findings. And, obviously, there have been no large-scale studies to show that any of these assumptions are true.

    It also is interesting to look back at some of the earlier non-traditional authorship studies that invoked the 'paternity blood test' analogy - i.e. you cannot prove a suspect is the father, only that he cannot possibly be the father. There are many practitioners today who still claim that this 'proving the negative' is the only valid quest. These practitioners have not yet been proven wrong. An author's style is consistent To move this sub-hypothesis through theory to proof is paramount - a sine qua non. Even given the constraints of time and genre, this has never been empirically

    35. The statement that there are elements of style that cannot be quantified is moot. I believe that all elements of style can be isolated in their essence and therefore be quantified. The study would be on how an author creates such stylistic effects.

    36. Barron Brainerd, 'The Chronology of Shakespeare's Plays: A Statistical Study', Computers and the Humanities 14, 1980, pp.221-230.

    37. L. Brandwo'od, The Chronolo!!) of Plato's Dialogues (Cambridge: Cambridge University Press, 1990).

    38. Frederick L. Burwick, 'Stylistic Continuity and Change in the Prose Work of Thomas Carlyle', in Statistics and Style, ed. Lubomir Dolozel & Richard W. Bailey (New York: American Elsevier, 1969), pp.178-196.

  • 172 Bibliographical Society of Australia and New Zealand Bulletin

    shown. I cannot sttess enough the importance that this proof has to non -traditional authorship studies.

    PROBLEMS Many of the general and specific problems of non -traditional authorship attribution studies are mentioned and explicated in the literature. Examples of a more detailed treatment of some of these problems can be found in two articles I published in 1998.39.

    In addition to the problem of a lack of a proven hypothesis that was treated earlier, some of the other major problems are: No agreement on experimental setup and techniques One of the major indications of real problems in a field is when there is no consensus on results, no consensus as to accepted or correct techniques. There is no consensus on even the most basic of concepts, such as, What is a word?' and 'What is a sentence?'40 For almost every published paper on non-traditional attribution there is a counter paper pointing out fatal flaws. No agreement on style-markers Studies have run the gamut - almost every marker that can be quantified has taken its turn as the marker du jour. Is the number of style-markers infinite? Is style an open ended system? The answer is no - the number of style markers does not approach infinity. Style is a closed system, at least when working with a specific author and the controls.

    The question of 'unconscious' versus 'conscious' style-markers must be addressed. If Georges Perec can write an entire novel without the letter 'e'41 and then turn around and write his next work using no other vowel except 'e',42 can we

    39. 'The State of Authorship Attribution Studies: Some Problems and Solutions', Computers and Ib, Humanilies 31, 1998, pp.351-365; 'Non-traditional Authorship Attribution Studies in the Historia Augusta: Some Caveats', Literary and Linguistic Computing 13(3), 1998, pp.151-157.

    40. Gregory Grefenstetts & Pas! Tapanainen, 'What is a Word, W'hat is a Sentence? Problems of Tokenization', in Procuding's of the 3rd. IntemationaJ.Conjerenre of Computational Lexicography (COMPLEX'94) (Budapest: Research Institute for Linguistics, Hungarian Academy of Science, 1994), pp.79-87.

    41. Georges Perec, LA Dispanrion (paris: Denoe!, 1969). 42. Georges Perec, Les &venenles (paris: Julliard, 1972).

  • Non-Traditional Authorship Attribution Studies 173

    say that any quantifiable style-marker is invariably unconscious! 43 Can style be imitated to such an extent as to fool the non -traditional practitioner?44

    The practitioner should use as many style-markers as possible. The quantifiable elements of style should be identified with as much rigor as is used in the Human Genome Project. These style-markers should be used as the elements in a DNA autoradiogram. Studies governed by expediency The pressures to present papers and publish articles within tight time-frames is a convenient excuse to take shortcuts that prove lethal to many authorship studies. A lack of competent and complete research These two items show up in the majority of published non-traditional attribution studies. W'hen the authors are aware of the deficiencies, they claim time or money constraints. Many practitioners assume too many unproven facts. And, unfortunately, many are not even aware of what they have done wrong or left undone.

    Poor statistical techniques There have been close to one hundred different statistical tests used in non-traditional authorship studies. Each of these statistical tests needs its own theoretical underpinnings. l'vfichael Farringdon discusses the criticism that 'QSUM has no theoretical basis.' He claims that an underlying theory is not necessary because QSUM 'works.' This argument would be worth looking into if, in fact, QSUM did work and was accepted by those working in authorship studies.4S Sloppy primary data Textual data must meet the standards of the unbroken chain of custody - for example, if we do not have a copy of a text of the Shakespearean drama until sixteen or so years after it was first written, how can we know what was changed by the many actors, directors, pirates, and publishers? A valid study cannot be carried out with a corrupted text. Lack of expertise in allied fields As stated earlier, the practitioner must be firmly grounded in the allied fields. This is seldom the case.

    43. Peree is not the only author to write lipograms. Another famous lipogram novel is Ernest Vincent Wright, Gadsby (Los Angeles: Wetzel, 1939). It is a story of over 50,000 words without the letter 'e'.

    44. Estelle Irizarry, 'Exploring Conscious Imitation of Style with Ready-made Software', Computers and tb, Humanities 23(3), 1989, pp.227 -233.

    45. Farringdon, p.241.

  • 174 Bibliographical Society of Australia and New Zealand Bulletin

    Treatment of errors \Xlhen you conduct an experiment, you must contend with all of the errors -systematic, random, and illegitimate. I have never seen a study that adequately treats all of these.

    A REASONED ANSWER TO THE ENIGMA There are two strategies to making progress towards finding the correct under!Jing theory, 1. the so-called 'top-down' approach where one postulates a complete theory of everything ... 2. the empirical!J based 'bottom-up' approach where one uses experimental data to make smaller, incremental steps.

    - Ira Z. Rothstein46

    When I discussed this paper via e-mail with John Burrows, he sent me a little message - 'And dare one hope that the Rosetta Stone is in the ascendant?'47 Nthough it might not seem like it from the emphasis in this paper, I too hope and believe in the ascendancy - and I also believe that there is a light in the distance and it is not swamp gas.

    I try to point out and study the many problems in the non-traditional authorship studies so that they can be discussed and corrected. An experimental attempt to move the main hypothesis through theory to proof is an example of the top-down approach. There are many 'bottom-up' studies that are a part of the movement of the hypothesis through theory to proof. We can look at each 'bottom-up' study as a piece of the picture puzzle. The Federalist Papers study is a good example of an important piece. Keep in mind that almost all non-traditional authorship studies have many valid and important aspects.

    There are reasoned responses to the list of problems I ran through. Solutions to many of these are found in the literature. Simply stating the problem goes a long way towards solving the problem. I would like to point out one solution that doesn't appear in the literature - an experimental setup of a hybrid of the top-down and bottom-up approaches to the hypothesis that every author has a verifiably unique style.

    1. Within a time period (e.g. + / - 5 years), language (native), and genre, randomly sdect a sample (nl) of all possible writers. For example: 1705 + / - 5 years (1700 through 1710) British English Political pamphlets Randomly select a sample of 30% of all possible writers.

    46. 'The Search for a Theory of Everything', Interactions (Dept. of Physics, Carnegie Metion), 1998, p.4.

    47. John Burrows, 'Re: US Promotion Assessments'. e-mail to the author, 28 May 1999.

  • Non-Traditional Authorship Attribution Studies 175

    These constraints minimize the need to show that a writer's style changes over time, over genre, or with a different language.

    The size (percent) of the sample (nl) correlates to the amount of confidence that can be placed in the final answer to the question 'Do all authors have a verifiably unique style.' The higher the percentage of authors, the higher the probability of a correct answer. Also, the greater the mean distance between the selected authors' styles, the higher the confidence level in the final answer.

    This is not an easy task. Every stage of the experimental setup is fraught with the probability of systematic errors. These must be identified and folded into the final error. For example, how can you be sure that you have identified all of the potential authors? What about the anonymous works? What about the dubitanda?

    2. Randomly select (n2) passages of (n3) running words from each selected author. For example, select 10 random passages of 1,000 running words for an author sample of 10,000 tokens. How can we be sure that that (n2) is truly representative? How do we know (n3) is large enough? Again, the larger the (n2) and (n3) samples, the greater the confidence level in the final answer. 3. Subject each author's text to stylistic analysis. This should be done using as many style-markers as possible. By using every available style marker, the practitioner eliminates the charge of 'cherry-picking'. Also, some of the stylistic traits will be a part of the zeitgeist and therefore very difficult to differentiate - the same as the problems that arise with DNA identification of 'close-knit' tribal groups. Bear in mind that the statistics behind the adjudication of each style-marker are most likely different. As a prudent preliminary test, perform the analysis on a small number of authors (e.g. 10). Only continue with the larger study if all of the preliminary samples of styles are uruque.

    4. Controls: (a) (n4) of other writers from the same pool that (nl) used (h) (nS) other selections of (n6) running words from these writers (c) Analyze as above.

    Obviously, the determination of each (n) is not a simple task. As we have seen above, even the determination of what is a word or token must be done carefully and consistently. This type of study should be done for every non-traditional authorship attribution study as part of the control. Now, it is important to realize that if this type of control is carried out for every authorship study and if it is consistently shown that every author has a unique style, QED, the hypothesis is proven!

  • 176 Bibliographical Society of Australia and New Zealand Bulletin

    CONCLUSION Where are non-traditional authorship attribution studies headed? Some of the most important things that should be done and undone are: Every practitioner should be firmly grounded in all aspects of non-traditional

    authorship attribution A consensus on experimental techniques should be achieved A consensus on which style-markers are relevant should be achieved A group of gatekeepers should be formed. What is being done? Many serious practitioners in all parts of the world, working in many languages, are confronting the problems of non -traditional authorship studies and working toward solutions. For example, a group of sixteen non -traditional authorship attribution practitioners met in July 1999 at the ACH/ ALLC 99 conference at the University of Virginia. The discussion centred on the problems of non-traditional authorship studies and what steps should be taken to advance the field. 'Assignments' were given to individuals to complete before the Glasgow ACH/ ALLC 2000 conference - assignments such as 'Works in progress; essential bibliographies, and chapters to be included in a 'how to' manual. This series of meetings is scheduled to continue at the annual meeting of the ACH and ALLC.

    Non-traditional authorship studies can see a glimmer of light at the end of the tunnel, and this light is not the headlamp of an on-coming locomotive.