word occurrence statistics and exegesis

31
Word Occurrence Statistics and Biblical Exegesis A Paper Presented to Dr. Shawn Madden in partial fulfillment of the requirements for GEN7530: M.A. Research Project Donald Seth Brown Southeastern Baptist Theological Seminary November 18, 2013

Upload: seth-brown

Post on 24-Oct-2015

53 views

Category:

Documents


0 download

DESCRIPTION

Word occurrence statistics are often used to make or support exegetical conclusions about biblical texts with little or no articulation of the corpus linguistic methodology that undergirds such data. This paper explores the use of computer generated statistics in biblical exegesis to distill a few helpful guidelines for the use of such data in the future of biblical studies.

TRANSCRIPT

Page 1: Word Occurrence Statistics and Exegesis

!!!!Word Occurrence Statistics and Biblical Exegesis !!!!!!!!!!!!!

A Paper !Presented to !

Dr. Shawn Madden !in partial fulfillment of the requirements for !

GEN7530: M.A. Research Project !!!!!!!!!Donald Seth Brown !

Southeastern Baptist Theological Seminary !November 18, 2013 !

Page 2: Word Occurrence Statistics and Exegesis

CONTENTS

!Introduction 1 !The State of Computer Aided Research in Biblical Studies 2 !Word Occurrence Statistics: Described and Evaluated 4 !Traditional Exegesis: Described and Evaluated 11 !Joy/Rejoice in Philippians: A Case Study 15 !Conclusion 23 !!

Page 3: Word Occurrence Statistics and Exegesis

ILLUSTRATIONS !!Tables !

Divisions of Exegesis and the Impact of Digital Technology 4

Quantitative Data for Joy/Rejoice in Philippians 17

!!Figures !

Word Counts for Each Discourse of Philippians 20

WOS for Each Discourse of Philippians 20

Word Counts for the Body Head 21

WOS for the Body Head 21

!

Page 4: Word Occurrence Statistics and Exegesis

Introduction

A. T. Robertson wrote in his small exposition of Paul’s epistle to the church in Philippi,

“Joy is the key note of Philippians.”  The great New Testament (NT) scholar did not offer any 1

justification for his claim. He wrote it as a matter-of-fact. Many years later another respected NT

scholar named Moisés Silva mentions “joy” in Philippians. Silva, however, presents the term

with attendant quantitative data. Silva remarks about how noticeable is the “frequency of ‘joy’

terminology.” “Fourteen times,” he says, do the terms carav and caivrw occur in the book of

Philippians (3.5 times per chapter). Yet, Silva goes on to write, “Most significant of all are the

ten occurrences of fronevw.” There is an interesting method to Silva’s reasoning. It seems that

word occurrence statistics (WOS) impact the way he estimates the importance of certain words

and phrases in the text, namely, caravV, caivrw, and fronevw.  Unfortunately though, the precise 2

method for evaluating and incorporating WOS is never discussed.

It is likely that many sermon-hearers have come across similar undisclosed methods

regarding WOS. The author of the present work has personally heard preachers opine about the

importance of a word or concept because it occurs x amount of times in the text. And, on other

occasions, a given word or phrase is important because it only occurs once or twice in the text.

Well, which is it? Again, the method for evaluating and incorporating WOS is never discussed.

How does an exegete properly use statistical information to determine (or help to determine)

important features of a biblical text?

! ! A. T. Robertson, Paul’s Joy in Christ, rev. and ed. W. C. Strickland (Nashville, TN: Broadman Press, 1

1917), 30.

! Moisés Silva, Philippians, in Baker Exegetical Commentary on the New Testament Series, 2nd ed., eds. 2

Robert Yarborough and Robert H. Stein (Grand Rapids, MI: Baker, 2005), 10.

Page 5: Word Occurrence Statistics and Exegesis

The purpose of this paper is to explore the relationship of WOS to exegesis. Are there any

principles that could help exegetes understand how quantitative data enlightens the text? Matt

O’Donnell has similar concerns when he writes, “The use of statistics has a long history in the

study of Hellenistic Greek, though often it has involved little more than simple frequency counts

of particular features.” He goes on to argue that such practices are inadequate and in need of a

radical overhaul. The purview of O’Donnell’s work includes arguments for and clarifications of

the large field of corpus linguistics in the study of the New Testament.  Though our concerns are 3

the same, the scope of this paper is much narrower than O’Donnell’s. The present work considers

the nature and role of particular tools within the field of corpus linguistics in light of their use in

exegesis. The tools in question are WOS. To that end, the nature of quantitative data, the

traditional methods of biblical exegesis, and the relationship between the two are discussed. In

other words, helpful principles for using digital tools are pursued by answering two major

questions: What information does computer generated WOS convey? And, where do these WOS

fit within the exegetical process?

After a brief introduction to the state of digital tools in biblical exegesis, the nature of

both WOS and traditional exegesis are considered and evaluated. Then, these tools are put to use

in a case study on Paul’s epistle to the Philippians. Finally, conclusions are drawn regarding the

use of WOS in biblical exegesis.

!!

!2

! The term, corpus linguistics, is defined in the next section. Matthew Brook O’Donnell, Corpus Linguistics 3

and the Greek of the New Testament, New Testament Monographs, no. 6, ed. Stanley E. Porter (Sheffield: Sheffield Phoenix Press, 2005), 1.

Page 6: Word Occurrence Statistics and Exegesis

The State of Computer Aided Research in Biblical Studies

It is a patent observation that computers have made their way into the study of almost

everything. Moreover, the speed and accuracy of computers has irrevocably changed qualitative

standards in many fields. So, it seems natural for computers to have a similar effect on biblical

studies. There is no reason to wonder exactly how often a given word occurs or where else a

word might appear in biblical literature. The question must still be asked broadly, though, “How

are digital tools impacting biblical studies?” Before beginning, though, I want to offer a broad

overview of the tools being used in biblical studies so that the reader will have a framework

through which to view the specific tools that are discussed below, namely, WOS.

Consider the big-picture of biblical exegesis. What has to take place leading up to and in

order for a person to study the Bible. There seem to be four major divisions of study that make

up exegesis.  They are: archaeology, papyrology, philology, and history.  4 5

The present work considers WOS in their setting of the third division of exegesis, the

philology division. However, each of the three remaining divisions of exegesis have their own

attendant digital tools that are becoming more and more prevalent. A similar course of study

needs to be undertaken for each of these digital tools so that biblical scholars can incorporate

them properly into the exegetical process. The danger for all exegetes is the unexamined use of

these tools with little or no understanding as to what types of information they produce and,

!3

! Some might argue that one or both of the first divisions are technically pre-exegetical. The point is taken; 4

however, for the sake of grasping the big-picture of the exegetical process, they are included in the divisions that make up exegesis.

! The divisions listed are virtually commensurate with the standard divisions of exegesis offered by Ralph 5

Martin. Ralph Martin, “Approaches to New Testament Exegesis,” in New Testament Interpretation: Essays on Principles and Methods, ed. I. Howard Marshall (Grand Rapids, MI: William B. Eerdmans, 1977), 222–23.

Page 7: Word Occurrence Statistics and Exegesis

more importantly, what the resultant information means. See table 1 for some examples of digital

tools and their place in exegesis.

Surely, there are more digital tools available than what have been listed here. Who knows what

new tools will come about in the future that biblical scholars will need to approach with a

discerning eye? Theoretically, there are innumerable amounts of data to be considered but much

of it is useless if the exegete does not know how to use it.

Word Occurrence Statistics: Described and Evaluated

The data under consideration in this paper is broadly associated with the field of corpus

linguistics (a sub-discipline of linguistics and sub-sub-discipline of philology). So, it is beneficial

for the reader to have a general idea about the nature of corpus linguistics and be familiar with

some of the concepts and tools. Linguistics is “explorative of how individuals use different types

of language to mean different things in a variety of social relationships and situations.”  In other 6

Table 1 Divisions of Exegesis and the Impact of Digital Technology

Archaeology Archaeologists are analyzing and engaging dig sites with high quality digital cameras and 3D scanners. Furthermore, once the manuscripts are uncovered, they are being digitized, transcribed, and annotated. This work advances both the curation and preservation of the source documents. Many are now viewable online for free.

Papyrology The digitization of manuscripts is allowing text critics to subsequently edit, collate, and make text critical decisions with the help of computers.

Philology Machine readable texts are being marked with grammatical and syntactic data so that filtering and analysis can be automated. Text mining has produced vast amounts of statistical data, like WOS, that can supplement traditional exegetical tools. Online tools like the Perseus Project have propelled corpus linguistics and irrevocably changed lexicography.

History The propagation of online resources is growing rapidly. More and more universities are offering courses and information online that can increase one’s understanding of the ancient world. Furthermore, many scholars are building new, digitally native formats for their work (e.g., interactive ebooks, interactive timelines, etc.). Mediums like blogs are being recognized more and more as valid avenues for scholarly work. Also, geographic information systems (GIS) are taking advantage of the rapidly expanding power and resources of computers.

!4

! O’Donnell, Corpus Linguistics, 17.6

Page 8: Word Occurrence Statistics and Exegesis

words, linguistics is the study of language in use. For modern linguistics, this involves the use of

both qualitative and quantitative methods. Qualitative methods employ tools like comparative

language studies and traditional exegesis. Quantitative methods, however, employ (primarily)

digital tools to generate statistical data that is used to identify latent characteristics of the

language in question. O’Donnell comments,

quantitative analysis of language involves a great deal more than simply counting linguistic features (e.g., phonemes, morphemes, parts of speech, clause types, etc.). It also involves measuring features, i.e., ordering them in relation to their features in paradigmatic and syntagmatic systems, noting the degree to which a feature is present or absent in a variety of contexts, and using statistical measures in an evaluative and predictive manner.  7

!Suffice it to say, it is a highly involved and technical enterprise that requires a level of familiarity

and understanding before one can engage linguistics rightly. This fact alone should make

exegetes careful about using the tools of linguistics without at least a basic understanding of the

field.

Generally, corpus linguistics makes use of quantitative methods and can be defined as:

the computer assisted study of naturally occurring language.  So, as the definition implies, 8

corpus linguistics is concerned primarily with the quantitative data produced by computer

programs, although qualitative analyses precede and follow the quantitative analysis.

Furthermore, the data is generated from a corpus (body) of language that is not invented (i.e.,

naturally occurring).  After all, the concern for linguists is the language as is, not as they 9

!5

! Ibid., 23.7

! O’Donnell identifies six key characteristics of corpus linguistics that attend its definition. They are: the 8

use of a representative corpus; the use of linguistic annotation; the interpretation of frequency; the discovery of linguistic variables; the identification of patterns; and quantitative analysis and statistical methods. Ibid., 26–30.

! This distinction is necessary because in the history of linguistic study some scholars attempted to analyze 9

occurring language that was artificially constructed. This manner of descriptivist study is patently dubious. Ibid., 20.

Page 9: Word Occurrence Statistics and Exegesis

manufacture it to be. The representativeness of the corpus is important too. Just as sociologists

concern themselves with the representativeness of their groups, so also corpus linguists care that

the body of literature under consideration represents the language they are studying.

Furthermore, detailed characteristics of the form and function of the corpus impact the study as

well.

Register, for instance, includes many of those detailed characteristics that must be

considered when defining the corpus as well as interpreting computer generated statistics. The

register of a text is its particular situation. Those who are familiar with biblical studies might

think of register as a specific combination of genre and purpose (or form and function; the Sitz

im Leben).  10

Another concern is the format of the generated data. The bare facts must be packaged so

as to answer the questions being asked; WOS need context. Raw occurrence statistics (i.e.,

simple word counts) make up the foundation of the data. Yet, raw occurrences do not show the

relationship of words to one another. The numbers do not convey much exegetical information

until they are formatted into ratios or percentages wherein they are compared to the whole

corpus. Formatting allows linguists to observe patterns and trends among word usages and thus

draw conclusions about the nature of the language in use.

!6

! As it happens, the concept of register is highly useful when employing predictive methods of interpreting 10

statistics. For the modern American, imagine how accurately one could predict the vocabulary and number of occurrences of certain words that someone might use in a conversational exchange at a fast-food restaurant. The exchange has a register and that register is defined by the formulaic statements and responses that are common to that situation. A similar type of analysis is often employed by biblical scholars when discussing the epistolary formula in the New Testament epistles. One of the most popular instances of this discussion is Paul’s idiosyncratic use of cavriV (charis) instead of caivrein (charein). The historical readers of a first century epistle would expect the characteristic greeting, caivrein; yet Paul catches their attention with a similar but distinctly Christian word, cavriV. Analyzing this use in the context of predictive analysis sheds hermeneutical light on the discourse.

Page 10: Word Occurrence Statistics and Exegesis

One may already have picked up on the fact that word statistics with no relation to the

whole do not convey much exegetical information. Raw occurrences describe the text but they

do not reveal the meaning of the text. Similarly, when a selection of words has no relation to one

another, very little information is communicated. So, it is important for corpus linguistics to keep

the relationships between words in view as studies are carried out. O’Donnell makes this point

clear when he writes, “qualitative and quantitative approaches should be seen as

complementary.” In fact, “quantitative analysis often requires a prior qualitative analysis to

provide the framework and categories that can be counted, measured, ordered and statistically

analyzed.”  This statement is manifests as the process of evaluating the scope and register of a 11

corpus (qualitative methods) before quantitative analysis can take place. Furthermore, the queries

of a given corpus assume certain qualitative questions that are waiting to be answered by the

qualitative data.

Now that the general outline of corpus linguistics is drawn, one can begin to evaluate the

method and its tools. Biber notes,

The [great] contribution of the corpus-based approach is that it often produces surprising findings that run directly counter to our prior intuitions. . . . one of the main research goals of this approach is to empirically identify the linguistic patterns that are extremely frequent or rare in discourse from a particular variety.”  12

!Judging by Biber’s definition alone, one can see that a unique strength of WOS (the tools of

corpus linguistics that undergird quantitative analysis) is their ability to show patterns that might

otherwise be overlooked. For instance, a word or phrase might be used so much that an exegete

!7

! Ibid., 24.11

! Douglas Biber, “Corpus-Based and Corpus-Driven Analyses of Language Variation and Use,” in The 12

Oxford Handbook of Linguistic Analysis, eds. Bernd Heine and Heiko Narrog (Oxford: Oxford University Press, 2010), 163–64.

Page 11: Word Occurrence Statistics and Exegesis

begins to ignore, assume, or overlook its importance in a body of literature. Likewise, rare

occurrences of a word or phrase might allow the reader to ignore, assume, or overlook its

importance in a body of literature. Further still, a word or phrase might occur an unremarkable

number of times in the text of an author compared to other words in the same text but differ

radically in number from another text by the same author. WOS shine a light on the prominent

features of a text (or language) so as to call attention to their importance or non-importance. The

process is similar to viewing the Grand Canyon from an observation deck then viewing a cross-

section diagram of the canyon. Certain features become clear that might be overlooked from a

standard viewpoint.

Likely, the most prominent feature that an interpreter of WOS might notice is high

recurrence. Words that occur many times in a given text seem to have some importance. After

all, there is something intuitive that leads most exegetes to assume that if an author is frequently

mentioning a certain word or phrase then the author is attributing some level of importance to it.

Another such prominent feature that an interpreter of WOS might encounter are lexical

bundles. Lexical bundles are groups of words that, by definition, occur frequently. Moreover,

they are not idiomatic; that is, they are not words that combine into one semantic unit. However,

the strings of words are syntactically related (or else they would be nonsensical). Lexical bundles

also have strong grammatical correlates (i.e., they are associated with certain verb forms and

clauses). Biber summarizes, “Although they are neither idiomatic nor structurally complete,

lexical bundles are important building blocks in discourse. . . . [they] provide interpretive frames

for the developing discourse.”  The benefit of the lexical bundles is that they offer more context 13

!8

! Ibid., 170–72.13

Page 12: Word Occurrence Statistics and Exegesis

for WOS. One might wonder why a given word occurs frequently. The given word may

correspond with a bundle of words. So, then, it should come as no surprise when one finds high

recurrence of that word in WOS patterns.  14

Hapax Legomena are other features that are often noticed in WOS and frequently referred

to in biblical studies. The term is a transliteration of the Greek words a{pax legovmenon that mean

“something said once.” These are words or phrases that have no recurrence since they only occur

once. As with frequently occurring words, intuition might lead the exegete to ascribe a level of

importance (or non-importance) to hapax legomena simply because of their frequency or rarity.

However, it is important to contextualize hapax legomena by clearly defining the corpus from

which they derive (i.e., Gospel literature, Pauline literature, the NT, Hellenistic literature, etc.).

Another practical benefit of WOS is the fact that they are computer generated.

Historically, scholars have gone through texts and manually counted word occurrences.  Yet, the 15

risk of human error and limitations of time made this work exceedingly tedious and rarely

undertaken. The advent of computers, however, allowed scholars to perform such actions in

vastly shorter amounts of time with fewer errors.

So, to summarize the benefits of WOS, they (1) show prominent patterns in texts by

highlighting either common or rare occurrences of words and phrases, (2) allow for more

advanced pattern recognition in the form of lexical bundles, and (3) by their computer-generated-

ness, allow for greater efficiency and accuracy in quantitative analysis. Now, in light of the great

benefits of WOS, it is helpful to see the limitations as well.

!9

! For this reason, lexical bundles are valuable in predictive analysis.14

! A. T. Robertson offers an example of this in his larger Grammar.15

Page 13: Word Occurrence Statistics and Exegesis

Despite their unique benefits, WOS are surprisingly limited in their ability to convey

exegetical information. Many language scholars suppose that a high recurrence or single

occurrence of a word or grammatical construction denotes emphasis or importance by the author.

That supposition, however, is simply not the case. What, precisely, does recurrence convey? If

word recurrence was the sole proprietor of exegetical information then definite articles and

conjunctions would top the list every time.  Likewise, the importance of specially chosen and 16

placed hapax legomenon would be neglected. So, one can see that meta-data in the form of

recurrence cannot convey nearly as much information as the data itself. Recurrence may support

a hypothesis derived from discourse analysis but the frequency statistic itself does not make the

argument. Biber notes, “frequency is not a decisive factor in identifying ‘patterns.’”  By 17

“patterns,” Biber refers to linguistic categories that give meaning to certain recurrent

grammatical characteristics (e.g., high recurrence and hapax legomena). In other words, the bare

statistics of word occurrences are not probative regarding the meaning of a text. One might still

ask, “Why not?”

WOS extract words from their context. The statistics are necessarily limited to single

words or small groups of words. It is a patent fact that words with no significant relation to other

words do not convey much exegetical information. For example, imagine taking a list of one

hundred alphabetically ordered words from a dictionary and then reading them out loud. What

meaning is communicated? Very little, if anything, is communicated by such a list of isolated

words. Now, consider an author crafting a short story of one hundred words that carefully and

!10

! Warren Trenchard lists the top two highest occurring words in the Greek New Testament as the definite 16

article and kaiv. Warren C. Trenchard, Complete Vocabulary Guide to the Greek New Testament, rev. ed. (Grand Rapids, MI: Zondervan, 1998), 128.

! Biber, “Corpus-Based and Corpus-Driven Analyses of Language Variation and Use,” 178.17

Page 14: Word Occurrence Statistics and Exegesis

meaningfully relates certain words to one another via syntax and discourse structure. It seems

obvious that the probative evidence for understanding the author’s meaning resides in the

relationship of words to one another (discourse), not on the frequency or rarity of any given word

or words.

Biber goes on to nuance the discussion by arguing that WOS have variegated levels of

importance depending on the nature of the given statistic. The difference in occurrences from

zero to one carries the most importance. Furthermore, the difference between one occurrence and

two is similarly important because it establishes recurrence. After that, recurrence is quite limited

in the amount of exegetical information it conveys.  18

Now that WOS are described and evaluated, we can move on to do the same for

traditional methods of exegesis, i.e., methods that do not make use of WOS in their fullest

expression.

Traditional Exegesis: Described and Evaluated

A brief survey of the traditional methods of biblical exegesis will establish a framework

through which to view WOS (and any new tools that may be invented in the future). To start,

though, it is helpful to revisit the four divisions of exegesis. Each division is listed and discussed

in turn as it relates to traditional exegesis.

The beginning division, which is unfortunately taken for granted by a great many

exegetes, is the uncovering of the biblical manuscripts. Archaeology is to thank for the several

!11

! Ibid.18

Page 15: Word Occurrence Statistics and Exegesis

thousand New Testament manuscripts that undergird the edited texts on which biblical exegesis

relies.  19

The second division of exegesis is papyrology. Once the manuscripts are found and

curated, the texts are then transcribed and edited so as to produce the most original reading of the

text. This work requires a high level of technical and linguistic skill and has historically been

undertaken by a very small number of scholars.

The third division of exegesis is the attempt to understand the language of the text ––

philology. Philologists use all manner of commentaries, lexicons, concordances, and grammars

to produce via translation a readable volume for their native language. To fully understand the

meaning of the text, though, one must move up from lexical, grammatical, and syntactical study

into discourse analysis so as to comprehend the text in its genre and form. In other words, it is

not enough to understand sentences and paragraphs without an apprehension of the form and

function of those sentences and paragraphs within larger units.

The fourth division of exegesis is historical study. Generally, this requires the

employment of history books covering the Greco-Roman social, political, rhetorical and

religious world. However, most exegetes will also agree that a knowledge of the geography in

which the biblical events took place can help in understanding the text. Geographical study can

(but does not often) require the act of physically visiting the sites for one to fully understand the

geographical background of a text.

!12

! I. Howard Marshall and Ralph Martin both pay homage to the work of textual critics but neglect the 19

finders of the manuscripts (i.e., archaeologists) as key elements in the interpretive process. It should also be noted that curators of historical libraries (e.g., the Vatican Library, etc.), those which are known and unknown, are also to thank for the preservation of ancient biblical manuscripts. I. Howard Marshall, “Introduction,” in New Testament Interpretation: Essays on Principles and Methods, ed. I. Howard Marshall (Grand Rapids: Eerdmans, 1977), 11. Ralph P. Martin, “Approaches to New Testament Exegesis,” in New Testament Interpretation: Essays on Principles and Methods, ed. I. Howard Marshall (Grand Rapids: Eerdmans, 1977), 222.

Page 16: Word Occurrence Statistics and Exegesis

As with the above section on WOS, we will focus on philology because it is the

discipline in which the bulk of traditional exegesis takes place. Philology is a term that refers to

the study of literature. Broadly, it encompasses all manner of comparative literature studies,

linguistics, and exegesis. In many circumstances it is discrete from other disciplines like

archaeology and history. It has and will always be a foundational pillar in liberal arts, humanities,

and biblical exegesis. Philology’s tools are commentaries, lexicons, concordances, and

grammars. Likely, a lengthy exposition of the uses of each of these tools is unnecessary for most

readers. So, it seems helpful to move right into an evaluation of the discipline.

There seem to be at least two primary benefits of traditional philology. First, the method

necessitates the direct engagement of the text at hand. Close, careful reading of a text so as to

produce deep understanding is the central tenet of philology and its strongest benefit. That is to

say, the method is not dependent on derivative information (meta-data) as with tools like WOS.

Traditional philology engages texts as texts, not as information about texts. In a manner of

speaking, traditional philology observes the Grand Canyon, not a diagram of the canyon.

The second benefit of traditional philology is its focused approach. The philological tools

are created for in depth qualitative analysis of literature (e.g., lexicons). Even in sub-disciplines

like comparative literature where multiples texts are brought into comparison, the focus of such

studies is still narrowed down to certain aspects of the literature like plot devices, archetypes, etc.

The strength of focus makes traditional philology weak in the area of broad, comprehensive

studies of a large corpus. This characteristic serves as a fitting segue into the limitations of

traditional philology.

!13

Page 17: Word Occurrence Statistics and Exegesis

The primary limitation to traditional philology today is the lack of quantitative data that

can cover a large corpus of material. Now, to be fair, this type of data was largely unknown and

unaccessible before the advent of computer technology. Still, the limitation remains and it needs

to be pointed out now that computer technology is available and widely accessible. That is not to

say that quantitative data regarding language and literature never existed before computers. In

fact, A. T. Robertson, in his larger Grammar, offers plenty of quantitative data. However, even

such a pioneering work was quite limited in its scope and accuracy. Robertson frequently

comments upon the uses of words with “common,” “limited use,” “vanishing quantity,” etc.

rather than citing accurate numerical data.  20

So, given the benefits and limitations of both WOS and traditional philology, it seems one

can now begin to hold them together and see how they relate. WOS show their limitations in

communicating the relationships of words while traditional philology excels in in-depth analysis

of discourse. Furthermore, traditional philology shows its weakness in dealing with quantitative

data while WOS are specially generated to do just that. It seems, on the face of it, that these two

methods for studying literature complement each other well. Now, the remaining work is to put

both methods to the test and see if the evidence bears out the complementary nature of these

methods. Furthermore, such a case study will allow for the establishment of principles regarding

the use of WOS in exegesis.

!!!

!14

! A. T. Robertson, A Grammar of the Greek New Testament in the Light of Historical Research, 7th repr. 20

(1934, repr., Nashville, TN: Broadman Press, 2010), 61.

Page 18: Word Occurrence Statistics and Exegesis

Joy/Rejoice in Philippians

So, I would like to give at least one illustration of these tools in action. That way, the

reader can see the benefits and limitations of both WOS and traditional exegesis on display.

The book of Philippians is an appropriate document for testing the tools under consideration. It is

especially useful for contrasting the digital and traditional tools previously discussed because the

results of both methods can be found in current research upon Philippians. The manual tools of

traditional philology contribute their results in works like David Alan Black’s discourse analysis

of Philippians.  21

Also, a short browse through the titles of books and commentaries written about

Philippians will show a common theme –– joy. This anecdotal evidence is corroborated by

respected authors who make substantial claims like “[Joy] is the keynote. The word for ‘joy’ in

its verbal and noun forms appears sixteen times in the letter, proportionately more often than in

any of Paul’s other letters.”  The evidence Loh and Nida use to establish the theme of the book 22

are WOS. So, one can see the results of digital tools (the word occurrence statistic, “sixteen

times”) attributing a level of importance to joy and thus affecting the estimation of the epistle’s

theme. Both manual and digital tools are on display in the material on Philippians.  23

Let us begin by looking at a few examples of traditional philology at work on the use of

joy and rejoice in Philippians. Marvin Vincent says the letter “flows on to the end in a steady

!15

! David Alan Black, “The Discourse Structure of Philippians: A Study in Textlinguistics,” Novum 21

Testamentum, 37, no. 1 (January 1995): 16–49.

! ! I-Jin Loh and Eugene A. Nida, A Handbook on Paul’s Letter to the Philippians (New York, NY: United 22

Bible Societies, 1977), 1.

! It is possible to achieve statistical information without the use of digital tools. In fact, it has been done 23

for centuries. However, today it is overwhelmingly produced with digital tools. So, for the sake of the present work, statistical information will be considered the result of digital research methods. It would not affect the conclusions of this work if statistical information about Philippians was gathered manually.

Page 19: Word Occurrence Statistics and Exegesis

stream of thankful joy.”  He goes on to write, “Joy is a frequent theme in this letter” and he 24

quotes John Bengel as saying, “The sum of the epistle is, ‘I rejoice, do ye rejoice.’”  Again, 25

Vincent comments upon caivrete in 4:4 and 4:10 that “rejoice” is the “keynote of the epistle.”  26

Gordon Fee notes the “general paucity of Paul’s more specialized theological vocabulary” then

goes on to highlight that “the singular most frequent word group in the letter is “joy,” thus,

claiming joy as the highlight of the book by contrast.  Interestingly, F. W. Beare, who does not 27

even believe the epistle is one unit (rather, three), claims joy is a “glad note” that “sounds its

music all through the epistle.”  O’Brien on the other hand, argues strongly for the integrity of 28

the epistle and notes the “recurrent exhortation to rejoice” but does not posit it as a purpose in the

epistle in his introduction.  Instead, his conclusions follow other commentators in calling joy “a 29

keynote of the epistle” and a “motif.”  30

Now, notice how some other commentators introduce WOS to support their claims. When

O’Brien comments upon 2:18, he remarks that the “verb ‘to rejoice’ and its cognates turn up

sixteen times in Philippians,” thus, including some quantitative data in his premises. John Paul

Heil has the most nuanced discussion of the various occurrences of joy language. He notes the

!16

! Marvin R. Vincent, The Epistles to the Philippians and to Philemon, International Critical Commentary 24

Series, eds. Samuel Driver, Alfred Plummer, and Charles Briggs (1897, repr., Edinburgh, Scotland: T&T Clark, 1976), xxxiv.

! Ibid., 23.25

! Ibid., 133, 141.26

! Gordon D. Fee, Paul’s Letter to the Philippians, The New International Commentary on the New 27

Testament, eds. Ned Stonehouse, F. F. Bruce, and Gordon Fee (Grand Rapids, MI: Wm. B. Eerdmans, 1995), 20.

! F. W. Beare, The Epistle to the Philippians, Black’s New Testament Commentaries Series, ed. Henry 28

Chadwick (London, England: A. & C. Black, 1959), 72.

! Peter T. O’Brien, The Epistle to the Philippians, The New International Greek Testament Commentary, 29

eds. I. Howard Marshall and W. Ward Gasque (Grand Rapids, MI: William B. Eerdmans, 1991), 38, 485.

! Ibid., 485, 349.30

Page 20: Word Occurrence Statistics and Exegesis

sixteen times that word forms of carav and caivrw appear but also adds that other words from the

root car- (e.g., cavriV and carivzomai) should be considered along with synonyms (e.g., eujyucw:)

and antonyms (e.g., luvphn) when determining the theme.  Although, Heil does not articulate the 31

relation of the WOS to his process for determining theme; the reader is left to assume there is

some relation. Heil’s inconsistency serves an apt transition to a direct consideration of the

quantitative data about the use of joy/rejoice in Philippians.

The data listed in table 2 is the type that is commonly given (supposedly) in support of

making joy the theme of Philippians. As is obvious from the above survey of the traditional

philological literature on the subject, many authors cite quantitative data yet no one explains

their method for obtaining the information nor do they explain how it relates to their conclusions

about the theme. It seems to always be assumed that the reader perceives the connection (if in

fact there is any). Furthermore, the reader may have already noticed that Silva claims that joy

appears fourteen times; yet most everyone else, including this author, count sixteen. Did

someone miscount? How, exactly, does one go back to check the numbers and be sure the same

methods are being used? Herein lies the first obstacle.

Table 2 Quantitative Data for Joy/Rejoice in Philipians

caivrw carav

Phil. 1:18 (2x); 2:17 (2x), 18 (2x), 28; 3:1; 4:4, 10 Phil. 1:4, 25; 2:2, 29; 4:1

11 occurrences; 2.75x per chapter 5 occurrences; 1.25x per chapter

16 total occurrences; 4x per chapter

!17

! John Paul Heil, Philippians: Let Us Rejoice in Being Conformed to Christ, Early Christianity and Its 31

Literature, no. 3 (Atlanta, GA: Society of Biblical Literature, 2010), 1–3.

Page 21: Word Occurrence Statistics and Exegesis

If an exegete were to use a digital tool to retrieve the same information provided by the

commentators, how would they go about it? A quick search or two using a site like La Parola

immediately reveals the difficulty. What does the exegete search for: word form, lemma, root, or

semantic domain? Each of these disparate queries will return wildly varying results. Searching

for the word form (i.e., the exact case, number, gender, mood, tense, etc.) generates two to three

hits at the most. A search for the lemma (headword) generates roughly the information provided

in table 2; still, one must know ahead of time to search for caravV, caivrw:, and sugcaivrw: then

add the results together. A search for the root form generates all the word forms, including all the

occurrences of carivV, et al. Finally, searching for the semantic domain returns all the occurrences

of the words listed above plus their synonyms and any semantically related words. Not one

commentator explains this to their readers nor do they explain their own method for obtaining

the quantitative data.

Next, after one retrieves this information, he or she must decide what exegetical

information it provides. What is the supposed connection between the quantitative data and the

meaning of the text? Once again many scholars uncritically associate recurrence and importance

without ever giving the reason why.  Moreover, the uncritical evidence is then used to posit the 32

theme of the epistle.

Before any principles are drawn regarding the use of WOS in exegesis, it behooves the

author to offer a better way to present WOS. Combining the manual and digital methods gleans

!18

! There are even difficult nuances when determining the theme of a text. Jeffrey Reed discusses this issue 32

at length. He proposes three levels of prominence: background, theme, and focus. After all, one cannot discuss prominence and thematic elements if there is no conception of non-prominence or what exactly constitutes something that is not a theme. Jeffrey T. Reed, “Identifying Theme in the New Testament: Insights from Discourse Analysis,” in Discourse Analysis and Other Topics in Biblical Greek, eds. Stanley Porter and D. A. Carson, Journal for the Study of the New Testament Supplement Series, no. 113 (Sheffield, England: Sheffield Academic Press, 1995), 75–101.

Page 22: Word Occurrence Statistics and Exegesis

the best of both the qualitative and quantitative information. David Alan Black helps to guide the

use of statistics as he approaches Philippians with the tools of discourse analysis. He writes,

. . . just because Paul uses joy language to a greater degree in Philippians than he does elsewhere in his writings, one is not necessarily justified in making joy the theme of the letter. It is rarely legitimate simply to make a word count and draw conclusions from it, since concepts involve far more elaborate structures than individual words. It is at this crucial point that textlinguistics performs a valuable service. By inquiring after the "whole" meanings of the text rather than just the meanings of its parts, textlinguistics offers a major interpretive key for our understanding of the letter.  33

!At present, Black’s study of discourse analysis in Philippians will serve as the basis on which

this inquiry is based.  Tables 3 through 6 below offer the some of the quantitative data listed in 34

table 2, but in this case the WOS are put within the context of discourse units. This way, one can

not only see the quantitative data, but one can evaluate where and how the occurrences are taking

place. The supporting quantitative data is displayed and its format reveals something of the

reason why the occurrences of joy/rejoice are important, i.e., they occur at critical junctures in

the text.

!19

! Generally, the terms text linguistics and discourse analysis are synonymous. David Alan Black, “The 33

Discourse Structure of Philippians: A Study in Textlinguistics,” Novum Testamentum, 37, no. 1 (January 1995): 16.

! The following tables and figures were created using a combination of quantitative data generated with the 34

tools offered by www.laparola.net and qualitative analysis offered by David Alan Black. Richard Wilson, La Parola, http://www.laparola.net/greco (accessed November 8, 2013); David Alan Black, “The Discourse Structure of Philippians: A Study in Textlinguistics,” 16–49; David Alan Black, Linguistics for Students of New Testament Greek: A Survey of Basic Concepts and Applications, 2nd ed. (Grand Rapids, MI: Baker Books, 1988), 170–98.

Page 23: Word Occurrence Statistics and Exegesis

!

!

Figure 1 Word Counts for Each Discourse of Philippians

Epistle Opening (1:1–2)

Body Intro (1:3–11)

Body Head (1:12–2:30)

Body Subpart (3:1–4:9)

Body Close (4:10–20)

Epistle Closing (4:21–23)

0 200 400 600 800

Number of Words in Section

Figure 2 WOS for Each Discourse of Philippians

Epistle Opening (1:1–2)

Body Intro (1:3–11)

Body Head (1:12–2:30)

Body Subpart (3:1–4:9)

Body Close (4:10–20)

Epistle Closing (4:21–23)

0 2.5 5 7.5 10

Occurrences of Joy/Rejoice

!20

Page 24: Word Occurrence Statistics and Exegesis

!

!

Figure 3 Word Counts for the Body Head (1:12–2:30)

Subsection One 1:12–26

Subsection Two:A 1:27–30

Subsection Two:B 2:1–18

Subsection Three:A 2:19–24

Subsection Three:B 2:25–30

0 65 130 195 260

Number of Words in Section

Figure 4 WOS for the Body Head (1:12–2:30)

Subsection One 1:12–26

Subsection Two:A 1:27–30

Subsection Two:B 2:1–18

Subsection Three:A 2:19–24

Subsection Three:B 2:25–30

0 1.25 2.5 3.75 5

Occurrences of Joy/Rejoice

!21

Page 25: Word Occurrence Statistics and Exegesis

Dividing epistle into its functional discourse units allows the exegete to see the WOS in

their context.  WOS can now be considered not only by their numbers but also by their 35

placement in the text. For instance, a word that occurs in either the epistle’s opening or closing

may not be as functionally important as a word that appears in the thesis of the body. Tables 3

through 6 show that the highest recurrence of joy/rejoice is in the body head of the epistle. This

placement suggests that joy/rejoice is in a plausible position to be very important in the epistle.

One factor that mitigates against this conclusion is that the highest recurrence of the words

appear in a discourse unit with the highest overall word count; therefore, joy/rejoice is

statistically more likely to appear in that section.

Ultimately, though, it seems the commentators were right to point out the importance of

joy/rejoice even though they failed to use the statistics correctly.  They glean the best of 36

traditional philology, namely, direct engagement with the text and focused lexical and

grammatical analysis. However, their use of WOS fails to glean the benefits of the tool (and they

do not serve their readers with their methodological mystery). There is no critical mention of the

corpus beside the fact that occurrences in their work, by definition, are limited to Philippians. No

one discusses the reasons for limiting or not limiting the WOS to the epistle. There is also no

!22

! The discourse units used in figures 1–4 are based on David Black’s work in: David Alan Black, “The 35

Discourse Structure of Philippians,” 16–49; David Alan Black, Linguistics, 170–98.

! It is this type of unsatisfactory use of the tools that led to a neglect of quantitative analysis in the history 36

discipline. Notice the similarity to biblical studies as Wendy Plotkin writes, Content analysis never achieved significant status as a method within the historical community. It may be that traditional historians believed that content analysts sacrificed too much of the complexity of a text in classifying its contents. They also may have attributed too much significance to small or skewed samples of texts. It is also likely that the effort of classifying texts and entering the data into the earliest computers was not justified by the results they found. Although their methods may have fallen short, there is great validity in the aims of content analysts. Their work complements that of traditional historians in an attempt to confirm the hypotheses included in more traditional studies. Wendy Plotkin, “Electronic Texts in the Historical Profession,” In Computing and the Social Sciences and Humanities, ed. Orville Burton (Chicago, IL: University of Illinois Press, 2002), 97.

Page 26: Word Occurrence Statistics and Exegesis

mention of the WOS’s relation to register (i.e., genre, form, etc.). They present the WOS as bare

word counts, with the exception of Silva who notes the occurrences per chapter.  Nevertheless, 37

he does not mention corpus nor register. Lastly, they fail to put the WOS into the context of their

use. In other words, do the words occur in the greeting, body, closing, etc.? Are their any

syntactical or grammatical patterns? These are just a few of the issues that must be considered

when using WOS. Let us move on now to establish some principles for using WOS in exegesis.

Conclusion

It seems obvious that WOS can be misused and misleading; yet, it is also the case that

they have many latent benefits. In conclusion, I would like to suggest several principles that will

aid exegetes in both understanding WOS and using them properly in the exegetical process.

Firstly, understand the importance of the queried corpus. Is the corpus representative of

the language in question? The representativeness of a corpus is defined by its inclusion of the

“full range of variability in a population.”  In the case of studying very small corpora (like NT 38

epistles), is the corpus even large enough to generate significant data? After all, any selection

from the NT is minuscule compared to other “small corpora” (i.e., “anything under five million

words”).  Before a query can be run on a particular word, a good corpus must be established. 39

There are external criteria that must be decided: the mode of the text (spoken or written, etc.), the

type of text (genre), the domain (formal or common), the language, the location, and the date.  40

!23

! Silva, BECNT, 10.37

! Almut Koester, “Building Small Specialised Corpora,” In The Routledge Handbook of Corpus 38

Linguistics, eds. Anne O’Keefe and Michael McCarthy (New York, NY: Routledge, 2010), 68.

! Ibid., 67.39

! John Sinclair, “Corpus and Text: Basic Principles,” Developing Linguistic Corpora: A Guide to Good 40

Practice, ed. Martin Wynne, AHDS Literature, Language and Linguistics, 2004, (accessed October 14, 2013) http://www.users.ox.ac.uk/~martinw/dlc/index.htm

Page 27: Word Occurrence Statistics and Exegesis

All this and more help the exegete to build a corpus that is representative of language to be

studied. While not impossible, this is difficult to accomplish with single books of the Bible;

though, it is more likely with larger corpora like the Gospels, Pauline, or entire NT literature. A

sub-consideration of the corpus is the register. Armed with a good understanding of register, the

exegete can often learn to expect the language that will occur in the text. This can help pre-empt

and predict certain patterns that arise in WOS so the exegete does not mistakenly take an

occurrence or set of occurrences as unique when they actually fit the form of the given text.  41

Secondly, understand the importance of the query itself. As illustrated earlier, searching

for word form, lemma, root, or semantic domain will significantly affect the results that are

generated. This is particularly the case with highly inflected languages like ancient Greek.

Thirdly, understand the importance of the format. WOS can be formatted so as to only

show isolated data about particular words and thus convey little to no exegetical information. As

stated previously, WOS that consist of bare word counts are almost useless. The word

occurrences must be placed in relation to the words around them via ratios, percentages, lexical

bundles, or either compared to other similar corpora (for example: comparing word counts in

Philippians to word counts in Ephesians).

Fourthly, understand the meaning of WOS. Word occurrence statistics function in two

primary ways: (1) as descriptors of defined texts and (2) as indicators of specific usages.  In 42

other words, WOS can be used to describe the nature of a defined text (corpus) by highlighting

!24

! As stated previously, discussions of predictive analysis arise most often over Paul’s epistolary greetings. 41

The words, word forms, and syntax occur in his greetings as they do because they fit a certain epistolary form that can be expected. In fact, when Paul deviates from the given form, it draws attention.

! These divisions correspond to the standard way of classifying statistics in: Ray L. Carpenter and Ellen 42

Storey Vasu, Statistical Methods for Librarians (Chicago, IL: American Library Association, 1978), 1.

Page 28: Word Occurrence Statistics and Exegesis

particular characteristics, namely, the number of times a word occurs within the given corpus.

Alternately, WOS can be used to indicate (as a table of contents) specific usages of a particular

word.  Either descriptively or indicatively, WOS are not probative. They cannot convey 43

meaning on their own; they can only corroborate methods that include sufficient contextual

evidence to obtain meaning. WOS are best used in conjunction with lexical, grammatical, and

discourse analytical methods. Dealing with a related issue (reading vis-á-vis hypermedia),

Christian Vandendorpe writes, “A reader is in essence someone who devotes a certain amount of

time to perceiving, comprehending, and interpreting signs organized in the form of a message.”  44

Now, statistics certainly represent signs in a organized format; however, it is an altogether

different organization than originally intended by the author in which meaning is found.

Vandendorpe goes on to say, “The richer the cognitive context, the stronger the possibilities for

the production of meaning; but if context is lacking, these possibilities tend toward zero.”  An 45

apt analogy might be that of a detective and an analyst. A detective follows clues in order to

catch a car thief. An analyst detects patterns in car theft statistics.

Lastly, communicate the essential methodological details to the reader. It is a standard

expectation in any scientific analysis that the employed methods are articulated so that readers

and future analysts can re-create the data and test the conclusions. It seems reasonable to expect

the same from biblical scholars who employ quantitative methods to study a given text. To return

to the inciting introduction, A. T. Robertson had a firm grasp of the theme of Philippians without

!25

! This method of using statistics is most common in lexicography. In fact, this tool is a major contributor to 43

the value of The Perseus Project.

! Christian Vandendorpe, From Papyrus to Hypertext: Toward the Universal Digital Library, trans. Phyllis 44

Aronoff and Howard Scott (Chicago, IL: University of Illinois Press, 2009), 109.

! Ibid., 111.45

Page 29: Word Occurrence Statistics and Exegesis

the use of digital tools and Silva’s use of WOS supported the hypothesis. However, it seems that

Robertson, Silva, and all biblical scholars could help their readers (and future exegetes) by

articulating their methods and tools.

!26

Page 30: Word Occurrence Statistics and Exegesis

Bibliography !Beare, F. W. The Epistle to the Philippians. In Black’s New Testament Commentaries Series, edited by Henry Chadwick. London, England: A. & C. Black, 1959. !Biber, Douglas. “Corpus-Based and Corpus-Driven Analyses of Language Variation and Use.” In The Oxford Handbook of Linguistic Analysis, edited by Bernd Heine and Heiko Narrog. Oxford, England: Oxford University Press, 2010. 159–91. !Black, David Alan. “The Discourse Structure of Philippians: A Study in Textlinguistics.” Novum Testamentum 37, no. 1 (January 1995): 16–49. !–––. Linguistics for Students of New Testament Greek. 2nd ed. Grand Rapids, MI: Baker Books, 1988. 170–196. !–––, Katharine Barnwell and Stephen Levinsohn, eds. Linguistics and New Testament Interpretation: Essays on Discourse Analsysis. Nashville, TN: Broadman Press, 1992. !Carpenter, Ray L. and Ellen Storey Vasu. Statistical Methods for Librarians. Chicago, IL: American Library Association, 1978. !Danker, Frederick W. A Century of Greco-Roman Philology. Atlanta, GA: Scholars Press, 1988. !Fee, Gordon D. Paul’s Letter to the Philippians. In The New International Commentary on the New Testament Series, edited by Ned Stonehouse, F. F. Bruce, and Gordon Fee. Grand Rapids, MI: Wm. B. Eerdmans, 1995. !Heil, John Paul. Philippians: Let Us Rejoice in Being Conformed to Christ, Early Christianity and Its Literature, no. 3. Atlanta, GA: Society of Biblical Literature, 2010. !Koester, Almut. “Building Small Specialized Corpora.” In The Routledge Handbook of Corpus Linguistics, edited by Anne O’Keefe and Michael McCarthy. New York, NY: Routledge, 2010. 66–79. !Loh, I-Jin and Eugene A. Nida. A Handbook on Paul’s Letter to the Philippians. New York, NY: United Bible Societies, 1977. !O’Donnell, Matthew Brook. Corpus Linguistics and the Greek of the New Testament. New Testament Monographs. Number 6, edited by Stanley E. Porter. Sheffield, England: Sheffield Phoenix Press, 2005. !

Page 31: Word Occurrence Statistics and Exegesis

Plotkin, Wendy. “Electronic Texts in the Historical Profession.” In Computing in the Social Sciences and Humanities, edited by Orville Burton, 87–123. Chicago, IL: University of Illinois Press, 2002. !Reed, Jeffrey T. “Identifying Theme in the New Testament: Insights from Discourse Analysis.” In Discourse Analysis and Other Topics in Biblical Greek, edited by Stanley Porter and D. A. Carson, 75–101. Sheffield, England: Sheffield Academic Press, 1995. !Robertson, A. T. A Grammar of the Greek New Testament in the Light of Historical Research. 1934. Reprint, Nashville, TN: Broadman Press, 2010. !–––. Paul’s Joy in Christ. Revised and edited by W. C. Strickland. Nashville, TN: Broadman Press, 1917. !Sinclair, John. “Corpus and Text: Basic Principles.” Developing Linguistic Corpora: A Guide to Good Practice. Edited by Martin Wynne. AHDS Literature, Language and Linguistics. Accessed October 14, 2013. http://www.users.ox.ac.uk/~martinw/dlc/ index.htm !Trenchard, Warren C. Complete Vocabulary Guide to the Greek New Testament. Revised and edited. Grand Rapids, MI: Zondervan, 1998. !Vandendorpe, Christian. From Papyrus to Hypertext: Toward the Universal Digital Library, translated by Phyllis Aronoff and Howard Scott. Chicago, IL: University of Illinois Press, 2009. !Vincent, Marvin R. The Epistles to the Philippians and to Philemon. In The International Critical Commentary Series, edited by Samuel Driver, Alfred Plummer, and Charles Briggs. 1897. Reprint. Edinburgh, Scotland: T&T Clark, 1976. !Wilson, Richard. La Parola. http://www.laparola.net/greco (accessed November 8, 2013). !