enabling the production of high-quality english glosses of every word in the hebrew bible

19
Enabling the Production of High-Quality English Glosses of Every Word in the Hebrew Bible Drayton Benner President, Miklal Software Solutions PhD Candidate, Northwest Semitic Philology, University of Chicago [email protected]

Upload: jrcovington

Post on 17-Aug-2015

267 views

Category:

Spiritual


0 download

TRANSCRIPT

Enabling the Production of High-Quality English Glosses of Every

Word in the Hebrew Bible

Drayton BennerPresident, Miklal Software SolutionsPhD Candidate, Northwest Semitic

Philology, University of [email protected]

Hebrew-English Print Interlinear

Structure of Talk

• Requirements for the Enabler• Tour of the Enabler• Gloss sample pericope• Producing algorithmic glosses• Results and Conclusions

Requirements

Essentials• Quality Glosses– Literal, yet contextual– ESV-friendly

• Efficient User– Accurate– Consistent– Quick

Non-essentials• Aesthetics• Customizability• Portability

Satisfying the Requirements

Strategies• Show the user lots of relevant data compactly• Allow the user to dig deeper quickly• Help the user check for consistency• Provide quality algorithmic glosses

Interlinear Text Editor

Enabler Tour

Interlinear Text Editor (2)

Enabler Tour

Lexeme Information Table

Enabler Tour

Detailed Lexeme Information Table

Enabler Tour

Algorithmic Glossing: Data Sources

• WordNet• CMU Pronouncing Dictionary• Miscellaneous lists– E.g. irregular plural nouns in English

Algorithmic Glossing: Proper Nouns

• Case 1: consistent past user glosses– Easy: follow established user convention

• Case 2: inconsistent past user glosses– List possible glosses (ESV, Lexham, past user gloss)– Score possible glosses• Score ESV and Lexham glosses• Score past user glosses

– Pick the possibility with the highest score

Algorithmic Glossing: Common Nouns• Possible challenges– Shorten glosses with natural language processing– Modify lexical form:• Make plural (esp. irregular plurals)• Indicate Hebrew construct relationship

– User convention: “of.”

• Indicate presence of pronominal suffix– User convention: (add “+ [object pronoun]”).

Algorithmic Glossing: Verbs• Additional challenges– Divide by both root and stem, i.e. less usable data– Represent subject and pronominal suffixes– Pick an English tense• User was consistent with:

– Infinitive constructs– Infinitive absolutes– Participles (“-ing” forms)

• But finite verbal forms are much more challenging

Algorithmic Glossing: Verbs (cont.)• English tense for finite verbal forms:

• Identify tense of main verb in English verb phrases– E.g. “we will have gone”

• Recognize non-verbal elements translating verbs– E.g. in the gloss “we will do quickly”

• List possible tenses (ESV, Lexham, past user glosses)• Score and pick the best• Reconstruct English verb phrase with tense

– E.g. “you will have jumped”

Enabler Results

• Quality:– Better glosses– Greater consistency• 500+ previous glosses changed since using the Enabler

• Speed:– 58% faster than without the Enabler• Despite increased difficulty of material

– Poetry/prophecy instead of prose

Adjective Particle Pronoun Verb Common noun Proper/gentilic noun

Total0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

User glosses matching data sources in Job 25-Ezekiel 48(using Enabler)

LexhamESV

Adjective Particle Pronoun Verb Common noun Proper/gentilic noun

Total0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

User glosses matching data sources in Job 25-Ezekiel 48(using Enabler)

LexhamESVAlgorithmic

Things to Do Differently Next Time

• Faster load time for a chapter• Stanford Natural Language Processing Tools• More use of Hebrew context in algorithms

Conclusions

• An enjoyable project because:– Required Hebrew and Aramaic– Required developing complex algorithms to solve

difficult tasks– Helped provide greater access to Hebrew/Aramaic

sources of the Old Testament