![Page 1: When Humanities Scale - DIGHUMLAB · When Humanities Scale On the Emergence of Analytics in Culture Research Kristo er L Nielbo ... (cultural heritage data)-reliance on qualitative](https://reader035.vdocuments.mx/reader035/viewer/2022070710/5ec699383f83e745073e8850/html5/thumbnails/1.jpg)
1/36
When Humanities ScaleOn the Emergence of Analytics in Culture Research
Kristoffer L [email protected]
knielbo.github.io/
March 16, 2018
![Page 2: When Humanities Scale - DIGHUMLAB · When Humanities Scale On the Emergence of Analytics in Culture Research Kristo er L Nielbo ... (cultural heritage data)-reliance on qualitative](https://reader035.vdocuments.mx/reader035/viewer/2022070710/5ec699383f83e745073e8850/html5/thumbnails/2.jpg)
2/36
SENSE OF URGENCY
“we are seeing a new wave of DH-related (and data science) investments acrossScandinavia”
ANALYTICS FALLACY
“showing that an algorithm achieved state of the art results” instead of “rig-orous investigation of why we think a given method gives relevant results overa data set”
SHORT MEMORY
“historyless triumphalism that originates in the newness of the field and ismaintained by the digitization, data and eScience hype”
![Page 3: When Humanities Scale - DIGHUMLAB · When Humanities Scale On the Emergence of Analytics in Culture Research Kristo er L Nielbo ... (cultural heritage data)-reliance on qualitative](https://reader035.vdocuments.mx/reader035/viewer/2022070710/5ec699383f83e745073e8850/html5/thumbnails/3.jpg)
4/36
PROGRAM
0.00 HUMANITIES & DATA a few data points are not enough
0.20 CULTURE ANALYTICS an emerging field
0.30 APPLICATIONS* humanities data and computing
0.55 SUMMARY ...
* interrupted by short digressions
![Page 4: When Humanities Scale - DIGHUMLAB · When Humanities Scale On the Emergence of Analytics in Culture Research Kristo er L Nielbo ... (cultural heritage data)-reliance on qualitative](https://reader035.vdocuments.mx/reader035/viewer/2022070710/5ec699383f83e745073e8850/html5/thumbnails/4.jpg)
5/36
HUMANITIES & DATA
![Page 5: When Humanities Scale - DIGHUMLAB · When Humanities Scale On the Emergence of Analytics in Culture Research Kristo er L Nielbo ... (cultural heritage data)-reliance on qualitative](https://reader035.vdocuments.mx/reader035/viewer/2022070710/5ec699383f83e745073e8850/html5/thumbnails/5.jpg)
6/36
– domain knowledge in history, language, literature &c combined with microscopic and(predominantly) qualitative analysis of human cultural manifestations
anti-thesis to data-intensive research– research that solely relies on very few data points, a “myopic” perspective andhuman computation
![Page 6: When Humanities Scale - DIGHUMLAB · When Humanities Scale On the Emergence of Analytics in Culture Research Kristo er L Nielbo ... (cultural heritage data)-reliance on qualitative](https://reader035.vdocuments.mx/reader035/viewer/2022070710/5ec699383f83e745073e8850/html5/thumbnails/6.jpg)
8/36
– the data deluge is transforming knowledge discovery and understanding in everydomain of human inquiry
– knowledge discovery depends critically on advanced computing capabilities
a large part of these data are unstructured and fundamentally cultural
– to get additional value from these data, faculties of humanities must becomecomputationally and data literate
![Page 7: When Humanities Scale - DIGHUMLAB · When Humanities Scale On the Emergence of Analytics in Culture Research Kristo er L Nielbo ... (cultural heritage data)-reliance on qualitative](https://reader035.vdocuments.mx/reader035/viewer/2022070710/5ec699383f83e745073e8850/html5/thumbnails/7.jpg)
9/36
– number of research publications alone makes computational literacy a necessity forthe humanities scholar
– publications related to Gospel of Marc (KJV) > 50K, ∼ 16,500 words in 16 chp. on 11 p.
– plus a massive increase in digitized cultural heritage databases (libraries, archieves,museums)
![Page 8: When Humanities Scale - DIGHUMLAB · When Humanities Scale On the Emergence of Analytics in Culture Research Kristo er L Nielbo ... (cultural heritage data)-reliance on qualitative](https://reader035.vdocuments.mx/reader035/viewer/2022070710/5ec699383f83e745073e8850/html5/thumbnails/8.jpg)
10/36
![Page 9: When Humanities Scale - DIGHUMLAB · When Humanities Scale On the Emergence of Analytics in Culture Research Kristo er L Nielbo ... (cultural heritage data)-reliance on qualitative](https://reader035.vdocuments.mx/reader035/viewer/2022070710/5ec699383f83e745073e8850/html5/thumbnails/9.jpg)
11/36
![Page 10: When Humanities Scale - DIGHUMLAB · When Humanities Scale On the Emergence of Analytics in Culture Research Kristo er L Nielbo ... (cultural heritage data)-reliance on qualitative](https://reader035.vdocuments.mx/reader035/viewer/2022070710/5ec699383f83e745073e8850/html5/thumbnails/10.jpg)
12/36
Big Data or (just) data
– Depending on definition, most humanities data are not Big Data– they are however “big enough for us”
“Instead of focusing on a ‘big data revolution,’ perhaps it is time we were focusedon an ‘all data revolution,’ where we recognize that the critical change in theworld has been innovative analytics, using data from all traditional and newsources, and providing a deeper, clearer understanding of our world.”
(Lazer,Kennedy, King & Vespignani 2014)
![Page 11: When Humanities Scale - DIGHUMLAB · When Humanities Scale On the Emergence of Analytics in Culture Research Kristo er L Nielbo ... (cultural heritage data)-reliance on qualitative](https://reader035.vdocuments.mx/reader035/viewer/2022070710/5ec699383f83e745073e8850/html5/thumbnails/11.jpg)
13/36
Archaeology|3D modeling
– humanistic domain experts (archaeologist) that use research technique (excavation)– digital technologies have increased the scale and changed the research area
![Page 12: When Humanities Scale - DIGHUMLAB · When Humanities Scale On the Emergence of Analytics in Culture Research Kristo er L Nielbo ... (cultural heritage data)-reliance on qualitative](https://reader035.vdocuments.mx/reader035/viewer/2022070710/5ec699383f83e745073e8850/html5/thumbnails/12.jpg)
15/36
- archaeology and interaction studies currently use Big Data/HPC when thecomputational needs are present
– scale alone does not necessarily change methods or perspective– reduce ++data points to a few by relying on our myopic perspective for analysis
– we essentially lack a culture of analytics in the humanities
![Page 13: When Humanities Scale - DIGHUMLAB · When Humanities Scale On the Emergence of Analytics in Culture Research Kristo er L Nielbo ... (cultural heritage data)-reliance on qualitative](https://reader035.vdocuments.mx/reader035/viewer/2022070710/5ec699383f83e745073e8850/html5/thumbnails/13.jpg)
16/36
CULTURE ANALYTICS
![Page 14: When Humanities Scale - DIGHUMLAB · When Humanities Scale On the Emergence of Analytics in Culture Research Kristo er L Nielbo ... (cultural heritage data)-reliance on qualitative](https://reader035.vdocuments.mx/reader035/viewer/2022070710/5ec699383f83e745073e8850/html5/thumbnails/14.jpg)
17/36
In the humanities a culture of analytics is an analytics of culture
We have to study “the dynamics of culturally informed interactions between people,and the cultural expressive forms that result from these interaction ... at scaleshitherto unimaginable”
So we need to develop an “intellectually and ethically sound approach to the study ofcultures across time and across space, leveraging the enormous gains made in the pastdecade in computation and machine readable cultural archives, from libraries andmuseum collections to the born digital cultural expressions of billions of people on theinternet”1
1From the Culture Analytics White Papers’ Introduction
![Page 15: When Humanities Scale - DIGHUMLAB · When Humanities Scale On the Emergence of Analytics in Culture Research Kristo er L Nielbo ... (cultural heritage data)-reliance on qualitative](https://reader035.vdocuments.mx/reader035/viewer/2022070710/5ec699383f83e745073e8850/html5/thumbnails/15.jpg)
18/36
– Culture Analytics seeks to understand cultural phenomena as inherently multi-scaleand multi-resolution– preference for micro to macro-movement (“scale from one object”)
![Page 16: When Humanities Scale - DIGHUMLAB · When Humanities Scale On the Emergence of Analytics in Culture Research Kristo er L Nielbo ... (cultural heritage data)-reliance on qualitative](https://reader035.vdocuments.mx/reader035/viewer/2022070710/5ec699383f83e745073e8850/html5/thumbnails/16.jpg)
19/36
CULTURE ANALYTICS
In comparison to analytics proper- descriptive not predictive- neither side of the interdisciplinary divide is conceptualized as service- preference for micro-scale analysis- predominantly unstructured data- low-resource varieties/historical perspective (cultural heritage data)- reliance on qualitative assesment (e.g., hyper-parameters and validation
procedures)
... to similar trends (e.g., culturomics, cliodynamics)- multi-scale/multi-resolution- data-intensive ethos (scalability matters)
![Page 17: When Humanities Scale - DIGHUMLAB · When Humanities Scale On the Emergence of Analytics in Culture Research Kristo er L Nielbo ... (cultural heritage data)-reliance on qualitative](https://reader035.vdocuments.mx/reader035/viewer/2022070710/5ec699383f83e745073e8850/html5/thumbnails/17.jpg)
20/36
APPLICATIONS
![Page 18: When Humanities Scale - DIGHUMLAB · When Humanities Scale On the Emergence of Analytics in Culture Research Kristo er L Nielbo ... (cultural heritage data)-reliance on qualitative](https://reader035.vdocuments.mx/reader035/viewer/2022070710/5ec699383f83e745073e8850/html5/thumbnails/18.jpg)
21/36
Culture Analytics|Fractal Properties of Lexical Complexity
![Page 19: When Humanities Scale - DIGHUMLAB · When Humanities Scale On the Emergence of Analytics in Culture Research Kristo er L Nielbo ... (cultural heritage data)-reliance on qualitative](https://reader035.vdocuments.mx/reader035/viewer/2022070710/5ec699383f83e745073e8850/html5/thumbnails/19.jpg)
22/36
Philosophy|Latent Semantic Variables
– philosphers and sinologists have been debating the existence of mind-body dualismin classical Chinese philosophy
– with domain experts, unsupervised learning was used to identify a multi-leveldualistic semantic space
– one model (LDA) was further utilized to predict class of origin for controversial textsslices
![Page 20: When Humanities Scale - DIGHUMLAB · When Humanities Scale On the Emergence of Analytics in Culture Research Kristo er L Nielbo ... (cultural heritage data)-reliance on qualitative](https://reader035.vdocuments.mx/reader035/viewer/2022070710/5ec699383f83e745073e8850/html5/thumbnails/20.jpg)
23/36
History|Predictive Causality & Slow Decay
– historians and media researchers theorize about the causal dependencies betweenpublic discourse and advertisement
– time series analysis of keyword frequencies (from seedlists) indicated that for somecategories ‘ads shape society’, while other categories merely ‘reflect’
– advertisements show a faster decay (on-off intermittant behavior) than publicdiscourse (long-range dependencies)
![Page 21: When Humanities Scale - DIGHUMLAB · When Humanities Scale On the Emergence of Analytics in Culture Research Kristo er L Nielbo ... (cultural heritage data)-reliance on qualitative](https://reader035.vdocuments.mx/reader035/viewer/2022070710/5ec699383f83e745073e8850/html5/thumbnails/21.jpg)
24/36
digression #1.1
Computational Literacy|Programming & Stats
– every knowledge intensive organization has to break the learning curve, but certainsectors are more challenged
– we out-sourced the task to an international non-profit organization w. years ofexperience in scientific computing
– promote a common language and import best practice from software development
– unix shell, python and version control
![Page 22: When Humanities Scale - DIGHUMLAB · When Humanities Scale On the Emergence of Analytics in Culture Research Kristo er L Nielbo ... (cultural heritage data)-reliance on qualitative](https://reader035.vdocuments.mx/reader035/viewer/2022070710/5ec699383f83e745073e8850/html5/thumbnails/22.jpg)
25/36
digression #1.2
Computational Literacy|Programming & Stats
GUI → CLI
- novice-friendly visual approach to computer interaction w. a fast learning curve ERROR
- expert-friendly text-based approach to computer interaction w. ++freedom VALID
- CONFLICT break the learning curve through training intensive, non-intuitive, andspecialized tools
- locally, we try to solve this conflict with a mix of science and guerrilla warfare byestablishing small, semi-autonomous eScience units that intervene in humanitiesresearch
![Page 23: When Humanities Scale - DIGHUMLAB · When Humanities Scale On the Emergence of Analytics in Culture Research Kristo er L Nielbo ... (cultural heritage data)-reliance on qualitative](https://reader035.vdocuments.mx/reader035/viewer/2022070710/5ec699383f83e745073e8850/html5/thumbnails/23.jpg)
26/36
Medieval History|Novelty Detection
– historians debate historical transitions
– Saxo’s Gesta Danorum c. 1200 AD.history of the Danish royal dynasty
– transition between book 8 or 9?
– transition point or gradual?
– traditional word-level representationis ambivalent
– latent semantic model was trainedover sentence windows
– change detection and recurrence plotused to identify phase transition focusdin book 9
![Page 24: When Humanities Scale - DIGHUMLAB · When Humanities Scale On the Emergence of Analytics in Culture Research Kristo er L Nielbo ... (cultural heritage data)-reliance on qualitative](https://reader035.vdocuments.mx/reader035/viewer/2022070710/5ec699383f83e745073e8850/html5/thumbnails/24.jpg)
27/36
Media Studies|Novelty Detection
– change point detection in topicality space applies to “a change in the media tone”
– train model on 200 years of newspapers in a comparative study between DK and NL
– collaboration between historians, media studies and information science with apredictive scope
![Page 25: When Humanities Scale - DIGHUMLAB · When Humanities Scale On the Emergence of Analytics in Culture Research Kristo er L Nielbo ... (cultural heritage data)-reliance on qualitative](https://reader035.vdocuments.mx/reader035/viewer/2022070710/5ec699383f83e745073e8850/html5/thumbnails/25.jpg)
28/36
digression #2
Copyright & Privacy|Data Access and Mobility
Challenges to computationally empowering humanities:
- technical competencies
- interdisciplinary respect and understanding
- epistemology differences
- data access and mobility
Data silos (the true punishment for the fall of man) often originate in“culturaldifferences”, not technical or legislative issues
copyright is a bigger challenge than data protection laws
![Page 26: When Humanities Scale - DIGHUMLAB · When Humanities Scale On the Emergence of Analytics in Culture Research Kristo er L Nielbo ... (cultural heritage data)-reliance on qualitative](https://reader035.vdocuments.mx/reader035/viewer/2022070710/5ec699383f83e745073e8850/html5/thumbnails/26.jpg)
29/36
Literary History|Lexical Density
Literary scholars and creativity researchers argue for the “tortured artist”– “writers’ creative state is inversely related to their emotional state”
– “writers’ creative state depends on their emotional state”
– look for dependencies in lexical density and sentiment scores for highly profilicwriters to identify state incongruences
![Page 27: When Humanities Scale - DIGHUMLAB · When Humanities Scale On the Emergence of Analytics in Culture Research Kristo er L Nielbo ... (cultural heritage data)-reliance on qualitative](https://reader035.vdocuments.mx/reader035/viewer/2022070710/5ec699383f83e745073e8850/html5/thumbnails/27.jpg)
30/36
digression #3
Historical Languages|Low-resource Varieties
– text analytics depends critically on existing tools and data (ex. sentimentdictionaries)
– orthographic variation in historical data represents a challenge, because NLP andTM resources “suffer from presentism”
– projects often try to adapt the tool (ex. modify dictionary to historical data set)
– this solution scales badly due to lack of standardization
For Scandinavian languages we use spelling correction (rule-based andprobabilistic) to normalize (or modernize) historical data increasing recallconsiderably
![Page 28: When Humanities Scale - DIGHUMLAB · When Humanities Scale On the Emergence of Analytics in Culture Research Kristo er L Nielbo ... (cultural heritage data)-reliance on qualitative](https://reader035.vdocuments.mx/reader035/viewer/2022070710/5ec699383f83e745073e8850/html5/thumbnails/28.jpg)
31/36
Literary Studies|Sentiment Analysis
0 1000 2000 3000 4000 5000 6000−5
0
5
10
Time
Sentim
ent
Madame Bovary
(a) Original t = L/400 t = L/10
0 1000 2000 3000 4000 5000 6000−1
−0.5
0
0.5
1
Time
Sen
tim
ent
(b)
filtered (t = L/10)
filtered (t = 3L/8)
0 2 4 6 8 10 12−2
0
2
4
6
Hs=0.57
Hl=0.74
log2w
log
2F
(w)
– dictionary-based sentiment analysis can reconstruct narrative/plot vectors thatreflect human reading
– basic insights from structural linguistics and narratology can be captured by thisapproach
– a particular scaling-range, 0.6 < H ≤ 0.8, seems to indicate literary optimality
![Page 29: When Humanities Scale - DIGHUMLAB · When Humanities Scale On the Emergence of Analytics in Culture Research Kristo er L Nielbo ... (cultural heritage data)-reliance on qualitative](https://reader035.vdocuments.mx/reader035/viewer/2022070710/5ec699383f83e745073e8850/html5/thumbnails/29.jpg)
32/36
Literary History|Sequence Alignment
– there is a new biographical trend is In literary history
– using lexical density and sequence alignment, we can compare creative trajectories ofauthors
![Page 30: When Humanities Scale - DIGHUMLAB · When Humanities Scale On the Emergence of Analytics in Culture Research Kristo er L Nielbo ... (cultural heritage data)-reliance on qualitative](https://reader035.vdocuments.mx/reader035/viewer/2022070710/5ec699383f83e745073e8850/html5/thumbnails/30.jpg)
33/36
Anthropology|Language Modeling
– anthropologists discuss why rituals appear rigid, while they seem to maintainbehavioral variability
– manual annotation of ritual dance applied to ethnographic video archieves frommultiple generations
– very few behavioral units are transmitted between generations (compulsory),allowing for both flexibility and rigidity
![Page 31: When Humanities Scale - DIGHUMLAB · When Humanities Scale On the Emergence of Analytics in Culture Research Kristo er L Nielbo ... (cultural heritage data)-reliance on qualitative](https://reader035.vdocuments.mx/reader035/viewer/2022070710/5ec699383f83e745073e8850/html5/thumbnails/31.jpg)
34/36
SUMMARY
![Page 32: When Humanities Scale - DIGHUMLAB · When Humanities Scale On the Emergence of Analytics in Culture Research Kristo er L Nielbo ... (cultural heritage data)-reliance on qualitative](https://reader035.vdocuments.mx/reader035/viewer/2022070710/5ec699383f83e745073e8850/html5/thumbnails/32.jpg)
35/36
Summary
All knowledge-intensive organization are experiencing the data deluge
- demands new forms of expertise and (strange) bed-fellows
- unique situation where compute and data can empower humanities domainexperts and change our scale and perspective
- humanities are part of the solution
BUT,– scaling (Big Data) alone is not enough (e.g., archaeology, interaction studies)– we need a culture of analytics
CULTURE ANALYTICS– cultural behavior and products at scale– descriptive, transdisciplinary, historical, qualitative– challenged by lack of training, data access, low-resource varieties
![Page 33: When Humanities Scale - DIGHUMLAB · When Humanities Scale On the Emergence of Analytics in Culture Research Kristo er L Nielbo ... (cultural heritage data)-reliance on qualitative](https://reader035.vdocuments.mx/reader035/viewer/2022070710/5ec699383f83e745073e8850/html5/thumbnails/33.jpg)
36/36
THANK YOU
knielbo.github.io
& credits toMax R. Echardt and Katrine F. Baunvig, datakube, University of Southern Denmark, DK
Jianbo Gao and Bin Liu, Institute of Complexity Science and Big Data, Guangxi University, CHNCulture Analytics @ Institute of Pure and Applied Mathematics, UCLA, US