from keyword searching to discourse mining
TRANSCRIPT
![Page 1: From keyword searching to discourse mining](https://reader031.vdocuments.mx/reader031/viewer/2022021922/58edbc3f1a28abab488b45ed/html5/thumbnails/1.jpg)
From
keyword searching to
discourse mining
Pim Huijnen, Juliette Lonij
DH2016, Kraków 15 July 2016
![Page 2: From keyword searching to discourse mining](https://reader031.vdocuments.mx/reader031/viewer/2022021922/58edbc3f1a28abab488b45ed/html5/thumbnails/2.jpg)
From: The oasis, 13 April 1912, p.9. Chronicling America: Historic American Newspapers. Lib. of Congress.
![Page 3: From keyword searching to discourse mining](https://reader031.vdocuments.mx/reader031/viewer/2022021922/58edbc3f1a28abab488b45ed/html5/thumbnails/3.jpg)
From: The oasis, 13 April 1912, p.9. Chronicling America: Historic American Newspapers. Lib. of Congress.
![Page 4: From keyword searching to discourse mining](https://reader031.vdocuments.mx/reader031/viewer/2022021922/58edbc3f1a28abab488b45ed/html5/thumbnails/4.jpg)
Tangherlini, T. R. and Leonard, P. (2013). Trawling in the Sea of the Great Unread: Sub-corpus
topic modeling and Humanities research, Poetics, 41: 725-749.
Van den Hoven, M., Van den Bosch, A. and Zervanou, K. (2010). Beyond Reported History:
Strikes That Never Happened. Proceedings of the First International AMICUS Workshop on
Automated Motif Discovery in Cultural Heritage and Scientific Communication Texts,
Vienna: 20-28.
Wiedemann, G. and Niekler, A. (2014). Document Retrieval for Large Scale Content Analysis
using Contextualized Dictionaries. Terminology and Knowledge Engineering, Berlin, June
2014: https://hal.archives-ouvertes.fr/hal-01005879.
![Page 5: From keyword searching to discourse mining](https://reader031.vdocuments.mx/reader031/viewer/2022021922/58edbc3f1a28abab488b45ed/html5/thumbnails/5.jpg)
Using extensive and context-specific word lists (‘dictionaries’) to replace the contingency of single keywords
Developing a script to extract dictionaries from literature based on topic modeling
Experimenting with tools to visualise results of dictionary searching in kranten.delpher.nl
Goals researcher-in-residence project
![Page 6: From keyword searching to discourse mining](https://reader031.vdocuments.mx/reader031/viewer/2022021922/58edbc3f1a28abab488b45ed/html5/thumbnails/6.jpg)
Flexibility (evaluation based on human expertise)
Transparency (avoiding black-boxing)
Practicality (available for the wider public)
KB researcher-in-residence project
![Page 7: From keyword searching to discourse mining](https://reader031.vdocuments.mx/reader031/viewer/2022021922/58edbc3f1a28abab488b45ed/html5/thumbnails/7.jpg)
Script to extract dictionaries
B
Topic modeling
TF-IDF
A
![Page 8: From keyword searching to discourse mining](https://reader031.vdocuments.mx/reader031/viewer/2022021922/58edbc3f1a28abab488b45ed/html5/thumbnails/8.jpg)
BC
Script to extract dictionaries
![Page 9: From keyword searching to discourse mining](https://reader031.vdocuments.mx/reader031/viewer/2022021922/58edbc3f1a28abab488b45ed/html5/thumbnails/9.jpg)
Visualising results of dictionary searches in Delpher
Use OR-query to search KB’s newspaper corpus Visualise results on the basis of Solr’s relevancy-score (min. no. of words)
(arbeid* OR bedrij* OR beheer OR controle* OR factor* OR functie* OR kost* OR leiding* OR loon* OR maatregel* OR management OR methode* OR model* OR norm* OR organisatie* OR plannen OR prijs OR productie OR rationeel OR rendement OR reorganisatie OR statistiek OR taylor OR tijd OR werkbesparing OR werkverdeeling)
![Page 10: From keyword searching to discourse mining](https://reader031.vdocuments.mx/reader031/viewer/2022021922/58edbc3f1a28abab488b45ed/html5/thumbnails/10.jpg)
kbresearch.nl/dictionary
![Page 11: From keyword searching to discourse mining](https://reader031.vdocuments.mx/reader031/viewer/2022021922/58edbc3f1a28abab488b45ed/html5/thumbnails/11.jpg)
Challenges
Running an OR-query of 25+ (or, preferably, more) words on a 90.000.000+ document dataset
Accounting for particularities of the corpus: * number of newspaper titles per year * changes in newspaper titles over the years * changes in article length over the years
Getting an idea of the exact combination of words in the visualised results
![Page 12: From keyword searching to discourse mining](https://reader031.vdocuments.mx/reader031/viewer/2022021922/58edbc3f1a28abab488b45ed/html5/thumbnails/12.jpg)
Thank you!
https://github.com/jlonij/keyword_generator
http://blog.kbresearch.nl/
http://www.pimhuijnen.com