modeling activist discourse on state repression through...

Modeling Activist Discourse on State Repression Through

Text Analytics

Evan Brown and Hailey Reeves

Overview of Project● Analyzed trends in activist discourse on state repression in recent decades, using a data set of

94 documents related to state repression. ●● We define state repression as the actual or threatened use of penalization against a person or

an organization committing nonviolent dissent within state jurisdiction.●● Perception has a direct connection to why and how people engage with causes they deem

important. Social movements heavily influence societal ethics and law. ●● Our analysis will determine underlying themes that influence activist engagement as well as

common perceptions held by activists within this sample.

MethodProcessed documents using OCR software

Created frequency tables of all terms; removed stop words

Used frequency tables to develop networks using NetworkX

Input text files into R and performed LSA using R packages “tm" and “lsa”

Used high degree terms from network, found most similar LSA terms

Created high degree LSA network

Created additional networks with input from Reynolds-Stenson

Text Mining and Latent Semantic Analysis● Text Mining - general term for extracting useful information from text.● Latent Semantic Analysis (LSA) - Manipulating a term-document matrix using SVD and a

low-rank approximation of the resultant matrices in order to extract semantic information and to better see similarities between terms and documents.

● Singular Value Decomposition (SVD) - The process of splitting a matrix into the product of three matrices, according to the equation where the middle matrix is diagonal, containing only the matrix M’s singular values as entries.

● Query Matching (Cosine method) - For a certain document vector aj , and a query q, the angle θ between them can be found with:

Generating the LSA Space● Create term-document matrix from original/OCR texts● Perform SVD to split the matrix into U, Σ , V*, where Σ is diagonal● Determine the optimal number of dimensions, k, to truncate

○ A “least-squares” problem○ Determined using “dimcalc_share”○ This method calculates all singular values and finds the first position where their sum is

greater than or equal to a certain percentage of the sum of all singular values (for this case, 50% is chosen as the default measure).

○ For our LSA space, k=12

● Truncate each matrix, and multiply them together to return to term-document format but with semantic information better highlighted

LSA Conclusions/Results● Resultant term-document matrix is assumed to better incorporate

semantic similarities between terms and documents. Our results show this is the case

● Can be seen in the example below, as well as in the document similarity matrices.

Document Similarity Matrix (Pre-LSA)

Document Similarity Matrix (Post-LSA)

Network Analysis, Informal Overview● Assortativity: The measure of how correlated nodes are by degree.●● Centrality: A way to indicate hubs. ●● Density: the ratio of the edges in the network over all possible edges

between any pair of vertices. ●● Clustering Coefficient: A percentage that measures how well a

group of nodes are cliqued. ●● Weight: Strength of links/edge.

Closeness (top)Betweenness (bottom)

Assortativity

Frequency Network

● Used degree centrality (nodes with highest number of incident edges) to determine most significant terms

● Terms included: action, court, federal, government, information, law, people, police, prison, security, and support

High Degree LSA NetworkNumber of nodes: 235Number of edges: 252Average degree: 2.1447Graph Density:

0.009165302782Average clustering: 0.03177123113293326Transitivity

0.01265822785

Assortativity: -0.916834

Centrality (>0.10)Betweenness Centrality Closeness Centrality

< 0.15 “Provide”, “security”, “success”, “system”, “war”

< 0.13 “Arrested”, “deadly”, “force”

< 0.23 “court”, “federal”, “law”, “national”, “serve”

< 0.15 “FBI”, “asserted”, “conduct”, “convictions”, “crime”, “criminal”, “custody”, “detained”, “detentions”, “disobedience”, “espionage”, “imminent”, “intent”, “law”, “preventive”, “rights”

< 0.30 “government” < 0.17 “Court”

Potential theme:Prosecution, security, civil disobedience

High Degree Network - Line GraphInterpretation:Nodes with high degree centrality (>0.07) revolve around security and police:

('data', 'security'): 0.07171314741035856,('information', 'police'): 0.13147410358565736,('measures', 'security'): 0.07171314741035856,('messages', 'security'): 0.07171314741035856,('personal', 'police'): 0.07171314741035856,('police', 'public'): 0.07171314741035856,

“People-Police” LSA Network● Created because nodes “people” and

“police” had highest degree rank in original network

●● Network is the union of the two terms

and their respective latent semantic info

●● Shared vertices are few: only “versus”

and “street”, depicting an explicit conflict

Conclusion●

● The most connected and central terms illustrate a narrative of conflict that parallel Reynold-Stenson’s findings in her interviews.

● This is supported by how consequential terms such as “arrested”, “convictions”, “court”, “force”, “law”, and “rights” had high centrality measures relative to the rest of the network (>0.1).

● Trends across these texts could indicate trends in activist perceptions in the original sample.

● Further research with a larger data set and a more generalized demographic may allow for a more generalized conclusion and actual hypothesis testing, but this work provides potential hypotheses to test.

AcknowledgementsSincere thanks is given to Heidi Reynolds-Stenson for providing our data and offering her conclusions, as well as to William Lippitt and Dr. Ildar Gabitov for facilitating this project.

References1. Padraig Mac Carron and Ralph Kenna, Universal properties of mythological networks, EPL, 99 (2012) 28002, doi: 10.1209/0295-5075/99/28002

2. M.E.J. Newman, Assortative mixing in networks. Phys. Rev. Lett. 89, 208701 (2002)

3. David Easley and Jon Kleinberg, "Positive and negative relationships," in Networks, Crowds, and Markets: Reasoning about a Highly Connected World, Cambridge University Press, 2010. Print.

4. Anand, Ashish. "Complex Network Theory: An Introductory Tutorial." Department of Computer Science and Engineering. Indian Institute of Technology, Guwahati. 12 Sept. 2013. Lecture.

References5. Fariss, C. J. & Schnakenberg, K. F. Measuring Mutual Dependence between State Repressive Actions. Journal of Conflict Resolution 58, (2014).

6. Noldus, R. & Mieghem, P. V. Assortativity in Complex Networks. Assortativity Survey (2015).

7. Turner, S. Success in Social Movements: Looking at Constitutional-Based Demands to Determine the Potential Success of Social Movements. (2013).

8. Saito, N. MAT 167: Applied Linear Algebra, Lecture 22: Text Mining. (2012).

9. Kim, K. (2018, March 12). Mathematical approach for Text Mining 1. Lecture presented in Ulsan National Institute of Science and Technology, Ulsan.

modeling activist discourse on state repression through...

Documents

network analysis with networkx : fundamentals of network...

networkx - python graph analysis and visualization @ pyhug

networkx network: 用networkx找出台灣公司網絡核心...

graph and network analysis - instituto de...

optical cloaking of a cylindrical...

networkx: network analysis with python - the computer...

networkx t python łŸ Üt - pusan national...

networkx reference - uok.ac.ir

networkx: network analysis subgraph isomorphism

search of the scent source in turbulent...

new bubble dynamics in a vibrating liquid - university of...

networkx reference...networkx reference, release 1.3 •be a...

pylecture2 -networkx-

vicki springmann edward...

search of the scent source in turbulent...

dynamics of the elastic pendulum -...

solving graph problems using networkx

networkx 0.99

networkx reference

stability of lagrange points: james webb space...