datech2014 - session 5 - wittgenstein’s nachlass: wittfind and wittgenstein advanced search tools...

20
Wittgenstein’s Nachlass: WiTTFind and Wittgenstein Advanced Search Tools (WAST) Presentation DaTECH2014 Florian Fink Centrum f¨ ur Informations- und Sprachverarbeitung (CIS) LMU May 19, 2014 Florian Fink Wittgenstein’s Nachlass: WiTTFind and Wittgenstein Advanced Search Tools (WAST 1/20

Upload: impact-centre-of-competence

Post on 22-Nov-2014

479 views

Category:

Documents


0 download

DESCRIPTION

Presentation of the paper Wittgenstein’s Nachlass: WiTTFind and Wittgenstein Advanced Search Tools (WAST) by Maximilian Hadersbeck, Alois Pichler, Florian Fink and Øyvind Liland Gjesdal in DATeCH 2014. #digidays

TRANSCRIPT

Page 1: Datech2014 - Session 5 - Wittgenstein’s Nachlass: WiTTFind and Wittgenstein Advanced Search Tools (WAST)

Wittgenstein’s Nachlass: WiTTFind and WittgensteinAdvanced Search Tools (WAST)

Presentation DaTECH2014

Florian Fink

Centrum fur Informations- und Sprachverarbeitung (CIS) LMU

May 19, 2014

Florian Fink Wittgenstein’s Nachlass: WiTTFind and Wittgenstein Advanced Search Tools (WAST)1/20

Page 2: Datech2014 - Session 5 - Wittgenstein’s Nachlass: WiTTFind and Wittgenstein Advanced Search Tools (WAST)

Ludwig Wittgenstein

The Austrian philosopher Ludwig Wittgenstein(1889 - 1951) left behind 20,000 pages of hisphilosophical manuscripts and typescripts – the so-called

Wittgensteins Nachlass.

• In 2000 the Wittgenstein Archives at the University Bergen (WAB)published this Nachlass as an electronic edition called BergenElectronic Edition (BEE).

• In 2009, WAB made additionally 5000 pages freely available on theweb.

• In 2010 a cooperation between Dr. Alois Pichler and the Center forInformation and Language Processing started on the Nachlass.

Florian Fink Wittgenstein’s Nachlass: WiTTFind and Wittgenstein Advanced Search Tools (WAST)2/20

Page 3: Datech2014 - Session 5 - Wittgenstein’s Nachlass: WiTTFind and Wittgenstein Advanced Search Tools (WAST)

Wittgenstein in Co-Text

The The Widgenstein in Co-Text projectis a cooperation of the Wittgenstein Archive Bergenand the Center of Information- and Language Processing,Ludwig-Maximilians University of Munich. Its goal is to

discover new tools that help researchers from different fields to explore andresearch the works of Ludwig Widgenstein.

• Dr. Maximilian Hadersbeck (LMU Munchen)

• Dr. Alois Pichler (Wittgenstein Archive Bergen)

• Øyvind Liland Gjesdal (Wittgenstein Archive Bergen)

• Florian Fink (LMU Munchen)

Florian Fink Wittgenstein’s Nachlass: WiTTFind and Wittgenstein Advanced Search Tools (WAST)3/20

Page 4: Datech2014 - Session 5 - Wittgenstein’s Nachlass: WiTTFind and Wittgenstein Advanced Search Tools (WAST)

Wittgenstein Advanced Search Tools (WAST)

In order to provide new possibilities for researchers, the WittgensteinAdvanced Search Tools (WAST) offer local grammar based searchcapabilities on the Nachlass.

• Simple interface for even complex queries

• Lemmatized vs simple full text search capabilities

• Integration of Semantic and syntactic informations

• Integration of part-of-speech information

• Presentation of the search results within the original documents

Florian Fink Wittgenstein’s Nachlass: WiTTFind and Wittgenstein Advanced Search Tools (WAST)4/20

Page 5: Datech2014 - Session 5 - Wittgenstein’s Nachlass: WiTTFind and Wittgenstein Advanced Search Tools (WAST)

The front end of WAST I

The web-based front-end of WAST provides access to the text anddisplays the results of queries. The results are shown in their normalizedform, showing the sentences that contain the query and a small snipped ofthe original document.

Florian Fink Wittgenstein’s Nachlass: WiTTFind and Wittgenstein Advanced Search Tools (WAST)5/20

Page 6: Datech2014 - Session 5 - Wittgenstein’s Nachlass: WiTTFind and Wittgenstein Advanced Search Tools (WAST)

The front end of WAST II

Each sentence of the result set contains various links to externalWittgenstein resources.

• The according sentence in the Wittgenstein Source of the Universityof Bergen

• Pundit.

• A highlighted sentence within the original facsimile.

Florian Fink Wittgenstein’s Nachlass: WiTTFind and Wittgenstein Advanced Search Tools (WAST)6/20

Page 7: Datech2014 - Session 5 - Wittgenstein’s Nachlass: WiTTFind and Wittgenstein Advanced Search Tools (WAST)

Highlighting of the facsimile

The highlighter is used to directly show the result of queries in scans ofthe original page of the according facsimile. This enables researchers tosee the context of his search in a greater context.

Florian Fink Wittgenstein’s Nachlass: WiTTFind and Wittgenstein Advanced Search Tools (WAST)7/20

Page 8: Datech2014 - Session 5 - Wittgenstein’s Nachlass: WiTTFind and Wittgenstein Advanced Search Tools (WAST)

Expanding of the search results

The normalized view can also be expanded to further study the context ofthe results. He is able to examine the text and is able to see changes andalternatives interpolated by Ludwig Wittgenstein.

Florian Fink Wittgenstein’s Nachlass: WiTTFind and Wittgenstein Advanced Search Tools (WAST)8/20

Page 9: Datech2014 - Session 5 - Wittgenstein’s Nachlass: WiTTFind and Wittgenstein Advanced Search Tools (WAST)

Searching the Nachlass

The tool that searches over the Nachlass of Ludwig Wittgenstein is calledWittfind.It is used to search the text for words and phrases within sentences, sinceLudwig Wittgenstein sees them as central to the meaning of words(Tractatus logico philosophicus [22, 3.3]):

Nur der Satz hat Sinn; nur im Zusammenhang des Satzes hat einName Bedeutung1

1Only propositions have sense; only in the nexus of a proposition does a name havemeaningFlorian Fink Wittgenstein’s Nachlass: WiTTFind and Wittgenstein Advanced Search Tools (WAST)9/20

Page 10: Datech2014 - Session 5 - Wittgenstein’s Nachlass: WiTTFind and Wittgenstein Advanced Search Tools (WAST)

Wittfind

To find the phrases Wittfind internally executes search graphs on the text.These search graphs facilitate the use of concatenations, alternatives andsequences of token in the queries.You can specify queries that

• search for a token A or B

• search for a token A followed by a token B

• search for a token A followed by zero or more sequences of a token B

Florian Fink Wittgenstein’s Nachlass: WiTTFind and Wittgenstein Advanced Search Tools (WAST)10/20

Page 11: Datech2014 - Session 5 - Wittgenstein’s Nachlass: WiTTFind and Wittgenstein Advanced Search Tools (WAST)

Search graphs

The phrase search of Wittfind is comparable to algorithms that search forregular expressions in text. They are transformed to nondeterministic finiteautomata (NFA) and used for the searching phase.Whereas the automata of regular expressions match on a single characterbasis, the automata of the search graphs match on the token of the text.

Florian Fink Wittgenstein’s Nachlass: WiTTFind and Wittgenstein Advanced Search Tools (WAST)11/20

Page 12: Datech2014 - Session 5 - Wittgenstein’s Nachlass: WiTTFind and Wittgenstein Advanced Search Tools (WAST)

Pattern matching I

Wittfind is able to resort on different policies and resources in its searchphase. The syntax of the token matching builds upon the syntax ofUnitex, but expands it where appropriate.

• simple string matching

• matching on part-of-speech tags

• matching on regular expressions

• matching on special token classes (words, numbers, etc.)

Florian Fink Wittgenstein’s Nachlass: WiTTFind and Wittgenstein Advanced Search Tools (WAST)12/20

Page 13: Datech2014 - Session 5 - Wittgenstein’s Nachlass: WiTTFind and Wittgenstein Advanced Search Tools (WAST)

Pattern matching II

Apart from simple string-based matching of the token, Wittfind uses abackground dictionary for additional morphological, syntactical andsemantic Information.

• lemmatized matching (“dachte” matches queries for its lemma“denken”)

• inverse lemmatized matching (“gedacht” matches “dachte” becausethey share a common lemma “denken”)

• matching on morphological and syntactical forms (verbs, adjectives,etc.)

• matching based on the semantic of the token as it is provided in thedictionary.

Florian Fink Wittgenstein’s Nachlass: WiTTFind and Wittgenstein Advanced Search Tools (WAST)13/20

Page 14: Datech2014 - Session 5 - Wittgenstein’s Nachlass: WiTTFind and Wittgenstein Advanced Search Tools (WAST)

Queries

While it is possible to directly draw search graphs and execute them2,Wittfind also accepts flat queries.These queries are transformed to search graphs automatically. Thealgorithm used to accomplish this is similar to the algorithms, that createNFA’s from regular expressions.

2There is an experimental online graph editorthat allows you to do exactly this.Florian Fink Wittgenstein’s Nachlass: WiTTFind and Wittgenstein Advanced Search Tools (WAST)14/20

Page 15: Datech2014 - Session 5 - Wittgenstein’s Nachlass: WiTTFind and Wittgenstein Advanced Search Tools (WAST)

Wittfind client-server

Wittfind uses additional lexical resources in form of a backgrounddictionary. In order to avoid redundant loading of these resources for eachquery Wittfind is split in one server and client part.

• The server loads all resources and texts once and optimizes theirusage.

• The client applies a preprocessing step to the query and sends it tothe server to execute it.

• The client applies the matches from the server to the text anddisplays the results of the query.

Florian Fink Wittgenstein’s Nachlass: WiTTFind and Wittgenstein Advanced Search Tools (WAST)15/20

Page 16: Datech2014 - Session 5 - Wittgenstein’s Nachlass: WiTTFind and Wittgenstein Advanced Search Tools (WAST)

Expanding WAST to other texts

WAST as presented here evolved around the work of Ludwig Wittgensteinand many of its features come directly from the ideas and needs ofWittgenstein researchers.But most of the tools are modular and able to work on any text. The coretool of the local grammar search Wittfind can be easily used to search anycompatible collection of texts. Since it returns the sentences in the originaldocument without modifying it, any tool that works on the original textshould work on the results as well.To use all features on needs:

• A TEI-5 compatible text (optionally with part-of-speech tags)

• An extensive full form dictionary

Florian Fink Wittgenstein’s Nachlass: WiTTFind and Wittgenstein Advanced Search Tools (WAST)16/20

Page 17: Datech2014 - Session 5 - Wittgenstein’s Nachlass: WiTTFind and Wittgenstein Advanced Search Tools (WAST)

Further work on WAST

The development on WAST has always been supported by various studentsof the CIS. Some centered their bachelor or master thesis around differentaspects of the tool set. There are other projects finished and about tofinish.

• Wittfind webservice

• Facsimile Highlighter (M. Lindinger)

• Facsimile Reader based on the Highlighter

• Large Helppage with a collection of example search queries (A. Krey)

• Semantic Search offering simple usage of the different semanticclasses in WAST including specialized classes to exploit the theory ofcolor of Ludwig Wittgenstein (A. Krey)

• Online graph editor (Y. Kalasouskaya)

Florian Fink Wittgenstein’s Nachlass: WiTTFind and Wittgenstein Advanced Search Tools (WAST)17/20

Page 18: Datech2014 - Session 5 - Wittgenstein’s Nachlass: WiTTFind and Wittgenstein Advanced Search Tools (WAST)

Online graph editor

The online graph editor is a tool that allows people to draw search graphsand execute them directly on the Nachlass.

Florian Fink Wittgenstein’s Nachlass: WiTTFind and Wittgenstein Advanced Search Tools (WAST)18/20

Page 19: Datech2014 - Session 5 - Wittgenstein’s Nachlass: WiTTFind and Wittgenstein Advanced Search Tools (WAST)

Urls

• Wittgenstein Archive Bergen http://wab.uib.no/

• Wittgenstein Source http://www.wittgensteinsource.org

• Pundit http://feed.thepund.it/

• WAST http://wittfind.cis.lmu.de

• CIS http://www.cis.lmu.de

• Wittgenstein in Co-Text http://www.cis.uni-muenchen.de/forschung/ehumanities/research-group-co/index.html

• Graph search presentation: http://www.cis.uni-muenchen.de/

kurse/max/scholarship/finkwf.pdf

• Wittgenstein scholarship:http://wastwiki.cis.uni-muenchen.de/wiki/Scholarship

Florian Fink Wittgenstein’s Nachlass: WiTTFind and Wittgenstein Advanced Search Tools (WAST)19/20

Page 20: Datech2014 - Session 5 - Wittgenstein’s Nachlass: WiTTFind and Wittgenstein Advanced Search Tools (WAST)

Thanks for your attention!

Florian Fink Wittgenstein’s Nachlass: WiTTFind and Wittgenstein Advanced Search Tools (WAST)20/20