etraces-subproject text re-use in literature
DESCRIPTION
eTRACES-subproject Text re-use in Literature. Christian Kötteritzsch / Gerhard Lauer / Annette Geßner. Team eTRACES Göttingen. Christian Kötteritzsch Gerhard Lauer Annette Geßner GCDH & Uni Leipzig GCDH & Uni Göttingen GCDH - PowerPoint PPT PresentationTRANSCRIPT
eTRACES-subproject
Text re-use in Literature
Christian Kötteritzsch / Gerhard Lauer / Annette Geßner
Team eTRACES Göttingen
Christian Kötteritzsch Gerhard Lauer Annette Geßner
GCDH & Uni Leipzig GCDH & Uni Göttingen GCDH
ASV German studies Classics
Central questionAnalysis of text re-use in German literature
- to understand better how literature make use of other texts
- to understand better specific re-use of given texts in a large corpus of literature
- to understand better specific types of intertextuality
- to facilitate the identification of (indirect) quotations for editorial purposes
Corpuszeno.org-corpus (http://www.textgrid.de/en/digitale-bibliothek.html)
includes
fictional texts from Luther to Kafka
Preprocessing of the xml files
through a toolchain to extract and format xml-based corporathanks to Frederik Baumgardt (3 Mann-Monate)
A first idea:
A long term analysis of the emerging autonomous aesthetic in German literature,
especially novels
Text Re-use
- text mining depends on genre and text re-use styles
- to look for text re-use only within a German corpus would miss the many foreign quotations
- looking for a simpler starting point:
one book in thousandsbut
Text Re-use
Objectives
Test case: re-use of the Bible in German literature
- find biblical quotations and allusions
- offer a web-based text re-use tool
- online working environment to create a digital edition
Re-use Style
Identify types of biblical re-use by hand
Design a table of quotation styles
Categorize types of "Re-use Style"
Schiller's "Die Räuber" (77 entries)
Fontane's "Effi Briest" (11 entries)
...
Natural language processing
+ Analysis: Tracer-software (ASV, Leipzig)
+ Server: Virtual machine (Gesellschaft für Wissenschaftliche Datenverarbeitung Göttingen [GWDG])
+ Frontend: Google Web Toolkit Framework
Front-End (Mock-Ups)
Possible extensions
+ more texts (and Bibles)
+ more own texts
+ more features
? more crowd editing or more personal edition? more distant reading: text statistics
--> Any suggestions?
Next milestones
May and June 2012:
- collect different bible versions
(Zürcher, Allioli, Keppler etc., revisions)
- integrate into the text re-use tool
- clarify server issues with GWDG
- determine re-use style by analysing more genres, historical and other specifice re-use styles
Next milestones
Summer 2012:
- first run on folder 'Romane' of zeno.org-corpus
- evaluation and rerun
Until end of 2012:
- develop a statistic sub-tool
- version 1.0 of front end online
Next milestones
Beginning of 2013:
- intern evaluation and optimization
April to July 2013:
- teach a seminar on text re-use (and let students evaluate tool)
- invite editors for test cases
Next milestones
Till end of 2013:
- optimizing and bugfixing, finish tool
- enlarge the corpus of literature and of Bibles
- do statistical research cases
- community workshop with editors and text analysts
Till end of project 2014:
- write final report