on the books: jim crow and algorithms of resistance
TRANSCRIPT
![Page 1: On the Books: Jim Crow and Algorithms of Resistance](https://reader034.vdocuments.mx/reader034/viewer/2022042421/625fd91b574f6b675807fd5d/html5/thumbnails/1.jpg)
Amanda Henley, Head of Digital Research Serviceson behalf of the On the Books Project Team
On the Books: Jim Crow and Algorithms of Resistance
![Page 2: On the Books: Jim Crow and Algorithms of Resistance](https://reader034.vdocuments.mx/reader034/viewer/2022042421/625fd91b574f6b675807fd5d/html5/thumbnails/2.jpg)
Student Workers: Montana Eck, Julia Long, Ashley Mullikin, Siri Nallaparaju, Tim Oyeleke, and Jenna Patton
Project TeamNeil Byers, Graduate Assistant- Documentation and Content Developer
Lorin Bruckner, Data Visualization Services Librarian - Text Analysis and Visualization Expert
Sarah Carrier, North Carolina Research and Instructional Librarian - Special Collections Expert
Rucha Dalwadi, Research Assistant - Documentation and Content Developer
James Dick, Graduate Assistant (& Attorney)- Law review and QA/QC
María R. Estorino, AUL for Special Collections & Director of the Wilson Library -Executive Sponsor and Liaison to the Library Leadership Team
Grant Glass, Graduate Assistant- Text Analysis workflow
Amanda Henley, Head of Digital Research Services - PI and Project Lead
Hannah Jacobs, Graduate Assistant – Content Developer
Matt Jansen, Data Analyst - Text Analysis Expert and Statistician
Steve Segedy, Applications Analyst – Web developer
William Sturkey, Faculty Member of History - Disciplinary Scholar
Kimber Thomas, African American Studies Scholar
Nathan Kelber, Ithaka – Collaborator (former PI and Project Lead)
Funding
![Page 3: On the Books: Jim Crow and Algorithms of Resistance](https://reader034.vdocuments.mx/reader034/viewer/2022042421/625fd91b574f6b675807fd5d/html5/thumbnails/3.jpg)
About On the Books
Project to make North Carolina legal history accessible as a text corpus.
100+ years of North Carolina public, private, and local session laws
Project Goals:
-Create corpus of NC Session Laws from 1865/66 - 1967-Identify discoverable NC segregation statutes during the Jim Crow era using text analysis
![Page 4: On the Books: Jim Crow and Algorithms of Resistance](https://reader034.vdocuments.mx/reader034/viewer/2022042421/625fd91b574f6b675807fd5d/html5/thumbnails/4.jpg)
Motivated by a reference question:
Where do I find a list of NC Jim Crow laws?
![Page 5: On the Books: Jim Crow and Algorithms of Resistance](https://reader034.vdocuments.mx/reader034/viewer/2022042421/625fd91b574f6b675807fd5d/html5/thumbnails/5.jpg)
![Page 6: On the Books: Jim Crow and Algorithms of Resistance](https://reader034.vdocuments.mx/reader034/viewer/2022042421/625fd91b574f6b675807fd5d/html5/thumbnails/6.jpg)
Workflow & ProcessesFor creating Collection as Data
• Compile Volume List
• Download Images from Internet Archive
• Preprocess Images• Identify location of marginalia and paratextual
information• Rotate as needed• Crop image to main text body• Add color-matched borders• Adjust images to optimize OCR
• OCR over 80,000 Images
Marginalia and paratextual information were removed.
![Page 7: On the Books: Jim Crow and Algorithms of Resistance](https://reader034.vdocuments.mx/reader034/viewer/2022042421/625fd91b574f6b675807fd5d/html5/thumbnails/7.jpg)
Unit of analysis is individual lawsUsed pattern matching to split lawsExtensive post-split cleanup
Results: • 53,218 chapters• 297,000 sections
Parse and Annotate Laws
![Page 8: On the Books: Jim Crow and Algorithms of Resistance](https://reader034.vdocuments.mx/reader034/viewer/2022042421/625fd91b574f6b675807fd5d/html5/thumbnails/8.jpg)
Text Analysis
Can we determine which laws are Jim Crow?
![Page 9: On the Books: Jim Crow and Algorithms of Resistance](https://reader034.vdocuments.mx/reader034/viewer/2022042421/625fd91b574f6b675807fd5d/html5/thumbnails/9.jpg)
Requires a training set to teach the algorithm what is/is not a Jim Crow law.
Laws in the training set identified by experts:
• Pauli Murray• Richard Paschal• William Sturkey• Kimber Thomas
Supervised Classification
![Page 10: On the Books: Jim Crow and Algorithms of Resistance](https://reader034.vdocuments.mx/reader034/viewer/2022042421/625fd91b574f6b675807fd5d/html5/thumbnails/10.jpg)
• To identify the best model, 80% of the training set was used to train models, while 20% was used to assess precision.
• XGBoost model selected for highest precision.
• Incorporated the type of law (public, private) and the year.
• Output was probability of law being Jim Crow.
• 90% probable Jim Crow cutoff selected (conservative).
Analysis
![Page 11: On the Books: Jim Crow and Algorithms of Resistance](https://reader034.vdocuments.mx/reader034/viewer/2022042421/625fd91b574f6b675807fd5d/html5/thumbnails/11.jpg)
Identified 905 Jim Crow Laws
141 identified by experts
411 identified by the model only
353 identified by the model and confirmed by an expert
![Page 12: On the Books: Jim Crow and Algorithms of Resistance](https://reader034.vdocuments.mx/reader034/viewer/2022042421/625fd91b574f6b675807fd5d/html5/thumbnails/12.jpg)
Version 2 is Forthcoming • Improved corpus - more accurately split chapters and sections
• Improved text analysis – more advanced workflow• Identified additional Jim Crow laws
• Training set
![Page 13: On the Books: Jim Crow and Algorithms of Resistance](https://reader034.vdocuments.mx/reader034/viewer/2022042421/625fd91b574f6b675807fd5d/html5/thumbnails/13.jpg)
onthebooks.lib.unc.edu