Data Analysis in the Hebrew Bible

Download Data Analysis in the Hebrew Bible

Post on 16-Nov-2014




0 download

Embed Size (px)


Joint work with Martijn Naaijer (VU University). With the Hebrew Bible encoded in Linguistic Annotation Framework (LAF-ISO), and with a new LAF processing tool, we demonstrate how you can do practical data analysis. The tool, LAF-Fabric, integrates with the ipython notebook approach. Our example here is lexeme cooccurrence analysis of bible books. For now, the road from data to visualization is more important than the exact visualization.


  • 1. DATA ANALYSIS IN THE HEBREW BIBLE CLIN 2014-01-17 Dirk Roorda (DANS/TLA), Martijn Naaijer and Gino Kalkman (VU ETCBC)

2. RESEARCH @just started 3. EXEGESISpreaching the word of God the devil is in the details meanings of specific words 4. DISTANT READING scan large quantities of text find patterns signals in the noise study other aspects than meaning text transmission linguistic variation literary form 5. VARIATION IN BIBLICAL HEBREW Timespan of Hebrew Bible writing: ~1000 years Assumption: we can divide the books in 2 groups EBH (early biblical Hebrew) LBH (late biblical Hebrew) 6. "PROOF" Select some features that differ for EBH and LBHRisk of circularity We need data analysis that is comprehensive (not eclectic) critical (not everything is a signal) 7. SYNTACTIC VARIATION syntactic featuresdrivers of changephrase, clause, textdiachrony variationlarge unitschapters booksgeography demography 8. THE HEBREW BIBLE AS DATA 9. THE HEBREW BIBLE IN LAF LAF ISO 24612:2012 SHEBANQ (github) 2.27 GB 1.5 M nodes 1.5 M edges40 M features 400 K words 13 M XML ids 10. PROCESSING LAFit is XML but not document-like (not asTEI) and not database like (not nice for XQUERY) it is graph-like 11. PROCESSING LAF eXist (>30min loading time, simple queries >60min)indexes needed: but which ones tried POIO (>60min loading time, needs >20GB RAM) straightforward object oriented in Python scripting language overhead 12. LAF-FABRIC LAF-Fabricalso Pythonloads in a few secondsuses C-like arrays executes in a few secondson a laptop can run in a Terminal as an IPython notebook 13. gender notebook 14. COOCCURRENCES1 Common Nouns 2 Proper Nouns Nodes are books Edges are cooccurrences of lexemes (1 or 2) 15. WEIGHTED EDGESS(lex): number of books containing lex C(b1, b2): intersection of lexemes of b1 and b2 L(b1, b2): union of lexemes of b1 and b2 16. cooccurrences notebook 17. cooccurrences notebook 18. cooccurrences notebook 19. cooccurrences notebook 20. cooccurrences notebook 21. cooccurrences notebook 22. cooccurrences notebook 23. Common Nouns no weight 24. Common Nouns with weight 25. Proper Nouns no weight 26. Proper Nouns with weight 27. DATA-DRIVEN THEOLOGY dirk.roorda@dans.knaw.nlThank You