Transcript
Page 1: Mapping Implicit Processes: Extracting Social Networks from Digital Corpora

M. H. BealsSheffield Hallam University

@mhbeals

Mapping Implicit Processes:Extracting Social Networks from Digital Corpora

View These Slides

About Me

Page 2: Mapping Implicit Processes: Extracting Social Networks from Digital Corpora

Overview

Understanding Scissors-and-Paste Journalism in Georgian Britain

Computer-Aided Identification of Reprints and Memes

Understanding Dissemination Pathways

Manual Construction of Social Networks

Computer-Aided Ordering of Dissemination Pathways

Future Plans

Page 3: Mapping Implicit Processes: Extracting Social Networks from Digital Corpora

Scissors-and-Paste Journalism in Georgian Britain

Proliferation of Colonial and Provincial PressesSpread of Journeyman Printers

Reduction of Stamp Duty

New Profit ModelsEntertaining and Literary Content

Adverts to Attract Readers to Sell to Advertisers

Manual Dissemination of NewsLimited Number of “Specials”

Postal Exchange, Subscriptions, Correspondence

No Telegraph until 1840s and Not Used for Miscellany

Page 4: Mapping Implicit Processes: Extracting Social Networks from Digital Corpora

Computer-Aided Identification of Reprints & Memes

Promise Large-Scale Digitisation Efforts

Keyword Searching

nGram Matching (WCopyFind)

Edition Tracking (Juxta)

Viral Texts Project (Cordell, Dillon, and Smith) Large-Scale Corpus of Nineteenth Century Newspapers

Extensive, Automatic Repair of OCR Errors

Identification of Highly Reprinted Materials (Memes)

Discussion and Exploration of Meme Traits and and Patterns

PerilsDiscrete Digital Corpera

(Paywalls)

Offline Penumbra (Curation)

Lost Nodes (Incomplete Data)

OCR Variability (50-80%)

Page 5: Mapping Implicit Processes: Extracting Social Networks from Digital Corpora

Computer-Aided Identification of Reprints & Memes

# concordanceset.pyimport redef replace_words(text, word_dic): rc = re.compile('|'.join(map(re.escape, word_dic))) def translate(match): return word_dic[match.group(0)] return rc.sub(translate, text)

def getNGrams(wordlist, n): return [wordlist[i:i+n] for i in range(len(wordlist)-(n-1))]

basenumber = raw_input('What is the first id number? ’)number = str(basenumber)numberint = int(basenumber)basenumberend = raw_input('What is the last id number? ’)endnumber = int(basenumberend)

ngram = raw_input('How many words should be in a phrase? ’)ngrams = int(ngram)combifile = 'combine.txt’listopen = open(combifile, "r”)wordlist = listopen.read()splitlist = wordlist.split()listopen.close()ngramslist = getNGrams(splitlist, ngrams)

if ngramslist: ngramslist.sort() last = ngramslist[-1] for i in range(len(ngramslist)-2, -1, -1): if last == ngramslist[i]: del ngramslist[i] else: last = ngramslist[i]

tidystring = '’

for item in ngramslist: number = str(basenumber) numberint = int(basenumber) lineitem = " ".join(item) print lineitem tidystring += str('\n' + lineitem + ',')

while (numberint<=endnumber): file = str(number + ".txt”) fin = open(file, 'r’) text = fin.read() fin.close() if lineitem in text: tidystring += str(number + ',’) numberint = int(number) numberint += 1 number = str(numberint)

# create an excelfile for this exampleexcel_file = "ngramcompiled.csv”fout = open(excel_file, "w”)fout.write(tidystring)fout.close()

Page 6: Mapping Implicit Processes: Extracting Social Networks from Digital Corpora

Computer-Aided Identification of Reprints & Memes

Page 7: Mapping Implicit Processes: Extracting Social Networks from Digital Corpora

Understanding Dissemination Pathways

Meme Identification

Courtesy of Viral Texts Project, http://www.viraltexts.org/

Page 8: Mapping Implicit Processes: Extracting Social Networks from Digital Corpora

Understanding Dissemination Pathways

Chronological Spread

Courtesy of Viral Texts Project, https://www.youtube.com/watch?v=YwDlyt7jhMs

Page 9: Mapping Implicit Processes: Extracting Social Networks from Digital Corpora

Understanding Dissemination Pathways

Genealogical Model

Page 10: Mapping Implicit Processes: Extracting Social Networks from Digital Corpora

Manual Construction of Social Networks

The Glasgow Advertiser, 7 October 1793, p. 5

Knoxville, May 11.IT is shocking to describe the bloody scenes thathave lately taken place in this district. TheIndians have killed and scalped a great number ofpersons, among whom is Colonel Isaac Bledose,who was massacred within 150 yards of his ownhouse.On the 27th instant a body of Indians attackedGreenfield station: they killed John Jervis, anda negro fellow, belonging to Mrs. Tarker. Bythe bravery of three young men, viz. William Nee-ly, William Wilson, and William Hall, the stationwas preserved; they killed two Indians, woundedseveral others, and put them to flight. It is to beremembered, that Neely and Hall had each lost afather and two brothers, and Wilson a brother, bythe savages. Men are now in pursuit of the Indi-ans.

Full Discussion of Dissemination Pathway Available at: http://prezi.com/in4_bqvgmanr/

Page 11: Mapping Implicit Processes: Extracting Social Networks from Digital Corpora

Manual Construction of Social Networks

Derived from Glasgow News Archive, British Library 19th Century Newspapers,

NewspaperArchive.com, Readex Early American Newspapers, Newspapers.com, and the University of Kentucky

Page 12: Mapping Implicit Processes: Extracting Social Networks from Digital Corpora

Computer-Aided Ordering of Dissemination Pathways

Binary Computer Model

Arbitrary Tolerance Levels

Reference to Additional Tables

Bypassing Missing Nodes

Flexibility

Difficult to Recreate Human Instinct…

…But is That a Bad Thing?

Page 13: Mapping Implicit Processes: Extracting Social Networks from Digital Corpora

Computer-Aided Ordering of Dissemination Pathways

Phylogenetic Model

Image Courtesy of Fred Hsu (Wikipedia:User:Fredhsu on en.wikipedia) CC-BY-SA-3.0 via Wikimedia Commons

Page 14: Mapping Implicit Processes: Extracting Social Networks from Digital Corpora

Future PlansComputer Program

OCR Clean-up ProcessesDivision into Likely Meme GroupingsVariety of Relatedness Scores

Textual IntegrityPrefixes and SuffixesChronological SeparationChronological-Geographical FeasibilityWell-Worn Path ModifierModeling of Relatedness Factors

Directional Social Network DatabaseRaw Data to Inform Additional Research

Manual CorrectionsDirect Attributions

Parsing Compilations

Initial Discovery of Well-Worn Paths

Inclusion of Offline Materials

www.mhbeals.com/cnd

Page 15: Mapping Implicit Processes: Extracting Social Networks from Digital Corpora

M. H. BealsSheffield Hallam University

@mhbeals

Mapping Implicit Processes:Extracting Social Networks from Digital Corpora

View These Slides on Slideshare

About Mewww.

mhbeals.com


Top Related