language variation suite - interactive toolkit for quantitative analysis
TRANSCRIPT
Introduction
DataPreparation
LanguageVariationSuite
Working withData
VisualAnalytics
InferentialAnalysis
DataModification
Mixed Effects
RBRUL
Appendix
Cross Tabulation
DataModification
References
Optimizing Language Variation Analysis:Language Variation Suite
Olga Scrivner, Manuel Dıaz-Campos and Rafael Orozco
[email protected] [email protected] [email protected]
Indiana University and Louisiana State University
NWAV45, 20161 / 93
Introduction
DataPreparation
LanguageVariationSuite
Working withData
VisualAnalytics
InferentialAnalysis
DataModification
Mixed Effects
RBRUL
Appendix
Cross Tabulation
DataModification
References
Goal
Provide researchers with a variety of quantitative methods toadvance language variation studies.
PositionSentencep < 0.001
1
ind, pre post
Heavinessp = 0.003
2
≤ 1 > 1
Periodp < 0.001
3
≤ 1 > 1
Node 4 (n = 81)
VO
OV
00.20.40.60.81
Node 5 (n = 119)
VO
OV
00.20.40.60.81
Node 6 (n = 181)
VO
OV
00.20.40.60.81
Periodp < 0.001
7
≤ 2 > 2
Node 8 (n = 221)
VO
OV
00.20.40.60.81
Focusp < 0.001
9
cf nf
Node 10 (n = 66)
VO
OV
00.20.40.60.81
Main_Verb_Structurep < 0.001
11
ACIOther, Restructuring
Node 12 (n = 43)
VO
OV
00.20.40.60.81
Node 13 (n = 265)
VO
OV
00.20.40.60.81
2 / 93
Introduction
DataPreparation
LanguageVariationSuite
Working withData
VisualAnalytics
InferentialAnalysis
DataModification
Mixed Effects
RBRUL
Appendix
Cross Tabulation
DataModification
References
Objectives
1 Introduce a novel sociolinguistic toolkit
2 Develop practical quantitative analytical skills
3 Understand and interpret advanced statistical models
3 / 93
Introduction
DataPreparation
LanguageVariationSuite
Working withData
VisualAnalytics
InferentialAnalysis
DataModification
Mixed Effects
RBRUL
Appendix
Cross Tabulation
DataModification
References
What is LVS?
Language Variation Suite
It is a Shiny web application designed for data analysis insociolinguistic research.
It can be used for:
Processing spreadsheet data
Reporting in tables and graphs
Analyzing means, regression, conditional trees and muchmore
4 / 93
Introduction
DataPreparation
LanguageVariationSuite
Working withData
VisualAnalytics
InferentialAnalysis
DataModification
Mixed Effects
RBRUL
Appendix
Cross Tabulation
DataModification
References
Background
LVS is built in R using Shiny package:
1 R - a free programming language for statistical computingand graphics
2 Shiny App - a web application framework for R
Computational power of R + Web interactivity
5 / 93
Introduction
DataPreparation
LanguageVariationSuite
Working withData
VisualAnalytics
InferentialAnalysis
DataModification
Mixed Effects
RBRUL
Appendix
Cross Tabulation
DataModification
References
Background
http://littleactuary.github.io/blog/Web-application-framework-with-Shiny/
6 / 93
Introduction
DataPreparation
LanguageVariationSuite
Working withData
VisualAnalytics
InferentialAnalysis
DataModification
Mixed Effects
RBRUL
Appendix
Cross Tabulation
DataModification
References
Data Preparation
Important things to consider before data entry:
File format:
Comma separated value (CSV) facilitates faster processing
Excel format will slow processing
Column names should not contain spaces
Permitted: non-accented characters, numbers, underscore,hyphen, and period
One column must contain your dependent variable
The rest of the columns contain independent variables
7 / 93
Introduction
DataPreparation
LanguageVariationSuite
Working withData
VisualAnalytics
InferentialAnalysis
DataModification
Mixed Effects
RBRUL
Appendix
Cross Tabulation
DataModification
References
Workspace
Browser
Chrome, Firefox, Safari - recommendable
Explorer may cause instability issues
Accessibility
PC, Mac, Linux
Data files can be uploaded from any location on yourcomputer
Smart Phone
Data files must be on a cloud platform connected to yourphone account (e.g. dropbox)
8 / 93
Introduction
DataPreparation
LanguageVariationSuite
Working withData
VisualAnalytics
InferentialAnalysis
DataModification
Mixed Effects
RBRUL
Appendix
Cross Tabulation
DataModification
References
Terminology Review
a. Categorical - non-numerical data with two values
yes - no; deletion - retention; perfective - imperfective
b. Continuous - numerical data
duration, age, chronological period
c. Multinomial - non-numerical data with three or morevalues
deletion - aspiration - retention
d. Ordinal - scale: currently not supported
9 / 93
Introduction
DataPreparation
LanguageVariationSuite
Working withData
VisualAnalytics
InferentialAnalysis
DataModification
Mixed Effects
RBRUL
Appendix
Cross Tabulation
DataModification
References
Workshop Files
https://languagevariationsuite.wordpress.com/
1 categoricaldata.csv: categorical dependent - Labov NewYork 1966 study
2 continuousdata.csv: continuous dependent - Intervocalic/d/ in Caracas corpus (Dıaz-Campos et al.)
3 LVS web site:
https://languagevariationsuite.shinyapps.io/Pages/
10 / 93
Introduction
DataPreparation
LanguageVariationSuite
Working withData
VisualAnalytics
InferentialAnalysis
DataModification
Mixed Effects
RBRUL
Appendix
Cross Tabulation
DataModification
References
Workshop Files
https://languagevariationsuite.wordpress.com/
1 categoricaldata.csv: categorical dependent - Labov NewYork 1966 study
2 continuousdata.csv: continuous dependent - Intervocalic/d/ in Caracas corpus (Dıaz-Campos et al.)
3 LVS web site:
https://languagevariationsuite.shinyapps.io/Pages/
10 / 93
Introduction
DataPreparation
LanguageVariationSuite
Working withData
VisualAnalytics
InferentialAnalysis
DataModification
Mixed Effects
RBRUL
Appendix
Cross Tabulation
DataModification
References
Language Variation Suite - Structure
1 Demo
Brief introduction
2 Data
Upload file, data summary, adjust data, cross tabulation
3 Visual Analysis
Plotting - histograms, frequencies, cluster plots
4 RBRUL
New version by Daniel Johnson!
5 Inferential statistics
Modeling, regression, conditional trees, random forest,model comparison
11 / 93
Introduction
DataPreparation
LanguageVariationSuite
Working withData
VisualAnalytics
InferentialAnalysis
DataModification
Mixed Effects
RBRUL
Appendix
Cross Tabulation
DataModification
References
Language Variation Suite - Structure
1 Demo
Brief introduction
2 Data
Upload file, data summary, adjust data, cross tabulation
3 Visual Analysis
Plotting - histograms, frequencies, cluster plots
4 RBRUL
New version by Daniel Johnson!
5 Inferential statistics
Modeling, regression, conditional trees, random forest,model comparison
11 / 93
Introduction
DataPreparation
LanguageVariationSuite
Working withData
VisualAnalytics
InferentialAnalysis
DataModification
Mixed Effects
RBRUL
Appendix
Cross Tabulation
DataModification
References
Language Variation Suite - Data
1 Upload CSV file
2 Upload Excel file
12 / 93
Introduction
DataPreparation
LanguageVariationSuite
Working withData
VisualAnalytics
InferentialAnalysis
DataModification
Mixed Effects
RBRUL
Appendix
Cross Tabulation
DataModification
References
Excel Format
1 Slow processing
2 Requires the name of your excel sheet
13 / 93
Introduction
DataPreparation
LanguageVariationSuite
Working withData
VisualAnalytics
InferentialAnalysis
DataModification
Mixed Effects
RBRUL
Appendix
Cross Tabulation
DataModification
References
Save Excel as CSV Format
To optimize speed - Save as CSV prior upload
14 / 93
Introduction
DataPreparation
LanguageVariationSuite
Working withData
VisualAnalytics
InferentialAnalysis
DataModification
Mixed Effects
RBRUL
Appendix
Cross Tabulation
DataModification
References
Server
Since LVS is hosted on a server, Shiny idle time-out settingsmay stop the application when it is left inactive (it will greyout).
Solution: Click reload and re-upload your csv data
15 / 93
Introduction
DataPreparation
LanguageVariationSuite
Working withData
VisualAnalytics
InferentialAnalysis
DataModification
Mixed Effects
RBRUL
Appendix
Cross Tabulation
DataModification
References
Upload File
Upload categoricaldata.csv
16 / 93
Introduction
DataPreparation
LanguageVariationSuite
Working withData
VisualAnalytics
InferentialAnalysis
DataModification
Mixed Effects
RBRUL
Appendix
Cross Tabulation
DataModification
References
Table
Table displays our dataset and allows for sorting columns indescending/ascending order.
17 / 93
Introduction
DataPreparation
LanguageVariationSuite
Working withData
VisualAnalytics
InferentialAnalysis
DataModification
Mixed Effects
RBRUL
Appendix
Cross Tabulation
DataModification
References
Summary
Summary provides a quantitative summary for each variable,e.g. frequency count, mean, median.
18 / 93
Introduction
DataPreparation
LanguageVariationSuite
Working withData
VisualAnalytics
InferentialAnalysis
DataModification
Mixed Effects
RBRUL
Appendix
Cross Tabulation
DataModification
References
Data Structure
19 / 93
Introduction
DataPreparation
LanguageVariationSuite
Working withData
VisualAnalytics
InferentialAnalysis
DataModification
Mixed Effects
RBRUL
Appendix
Cross Tabulation
DataModification
References
Data Structure
1 Total number of observations
2 Number of variables
3 Variable types
Factor - categorical values
Num - numeric values (0.95, 1.05)
Int - integer values (1, 2, 3)20 / 93
Introduction
DataPreparation
LanguageVariationSuite
Working withData
VisualAnalytics
InferentialAnalysis
DataModification
Mixed Effects
RBRUL
Appendix
Cross Tabulation
DataModification
References
Questions?
21 / 93
Introduction
DataPreparation
LanguageVariationSuite
Working withData
VisualAnalytics
InferentialAnalysis
DataModification
Mixed Effects
RBRUL
Appendix
Cross Tabulation
DataModification
References
Language Variation Suite - Structure
1 Demo
Brief introduction
2 Data
Upload file, data summary, adjust data, cross tabulation
3 Visual Analysis
Plotting, cluster classification
4 RBRUL
New version by Daniel Johnson!
5 Inferential statistics
Modeling, regression, conditional trees, random forest,model comparison
22 / 93
Introduction
DataPreparation
LanguageVariationSuite
Working withData
VisualAnalytics
InferentialAnalysis
DataModification
Mixed Effects
RBRUL
Appendix
Cross Tabulation
DataModification
References
Visual Analytics
Visual Analytics: “The science of analytical reasoningfacilitated by visual interactive interfaces”
(Thomas et al. 2005)
23 / 93
Introduction
DataPreparation
LanguageVariationSuite
Working withData
VisualAnalytics
InferentialAnalysis
DataModification
Mixed Effects
RBRUL
Appendix
Cross Tabulation
DataModification
References
One Variable Plot
24 / 93
Introduction
DataPreparation
LanguageVariationSuite
Working withData
VisualAnalytics
InferentialAnalysis
DataModification
Mixed Effects
RBRUL
Appendix
Cross Tabulation
DataModification
References
One Variable Plot
25 / 93
Introduction
DataPreparation
LanguageVariationSuite
Working withData
VisualAnalytics
InferentialAnalysis
DataModification
Mixed Effects
RBRUL
Appendix
Cross Tabulation
DataModification
References
Two Variables Plot
26 / 93
Introduction
DataPreparation
LanguageVariationSuite
Working withData
VisualAnalytics
InferentialAnalysis
DataModification
Mixed Effects
RBRUL
Appendix
Cross Tabulation
DataModification
References
Two Variables Plot
27 / 93
Introduction
DataPreparation
LanguageVariationSuite
Working withData
VisualAnalytics
InferentialAnalysis
DataModification
Mixed Effects
RBRUL
Appendix
Cross Tabulation
DataModification
References
Three Variables Plot
28 / 93
Introduction
DataPreparation
LanguageVariationSuite
Working withData
VisualAnalytics
InferentialAnalysis
DataModification
Mixed Effects
RBRUL
Appendix
Cross Tabulation
DataModification
References
Three Variables Plot
29 / 93
Introduction
DataPreparation
LanguageVariationSuite
Working withData
VisualAnalytics
InferentialAnalysis
DataModification
Mixed Effects
RBRUL
Appendix
Cross Tabulation
DataModification
References
Cluster Plot
Classification of data into sub-groups is based onpairwise similarities
Groups are clustered in the form of a tree-likedendrogram
Independent variable must have at least THREE values(e.g. store - Saks, Kleins, Macy’s)
30 / 93
Introduction
DataPreparation
LanguageVariationSuite
Working withData
VisualAnalytics
InferentialAnalysis
DataModification
Mixed Effects
RBRUL
Appendix
Cross Tabulation
DataModification
References
Cluster Plot
31 / 93
Introduction
DataPreparation
LanguageVariationSuite
Working withData
VisualAnalytics
InferentialAnalysis
DataModification
Mixed Effects
RBRUL
Appendix
Cross Tabulation
DataModification
References
Cluster Plot
Saks (upper middle-class store), Macy’s (middle-class store), Kleins
(working-class)
32 / 93
Introduction
DataPreparation
LanguageVariationSuite
Working withData
VisualAnalytics
InferentialAnalysis
DataModification
Mixed Effects
RBRUL
Appendix
Cross Tabulation
DataModification
References
Inferential Statistics
33 / 93
Introduction
DataPreparation
LanguageVariationSuite
Working withData
VisualAnalytics
InferentialAnalysis
DataModification
Mixed Effects
RBRUL
Appendix
Cross Tabulation
DataModification
References
Language Variation Suite - Structure
1 Demo
Brief introduction
2 Data
Upload file, data summary, adjust data, cross tabulation
3 Visual Analysis
Plotting, cluster classification
4 RBRUL
New version by Daniel Johnson!
5 Inferential statistics
Modeling, regression, conditional trees, random forest,model comparison
34 / 93
Introduction
DataPreparation
LanguageVariationSuite
Working withData
VisualAnalytics
InferentialAnalysis
DataModification
Mixed Effects
RBRUL
Appendix
Cross Tabulation
DataModification
References
Modeling
35 / 93
Introduction
DataPreparation
LanguageVariationSuite
Working withData
VisualAnalytics
InferentialAnalysis
DataModification
Mixed Effects
RBRUL
Appendix
Cross Tabulation
DataModification
References
Modeling
35 / 93
We are interested in RETENTION= Application
Introduction
DataPreparation
LanguageVariationSuite
Working withData
VisualAnalytics
InferentialAnalysis
DataModification
Mixed Effects
RBRUL
Appendix
Cross Tabulation
DataModification
References
Regression Types
Model
a.) Fixed effects
b.) Mixed effects - individual speaker/token variation (withingroup)
Type of Dependent Variable
a.) Binary/categorical (only two values)
b.) Continuous (numeric)
c.) Multinomial - categorical with more than two values
36 / 93
Introduction
DataPreparation
LanguageVariationSuite
Working withData
VisualAnalytics
InferentialAnalysis
DataModification
Mixed Effects
RBRUL
Appendix
Cross Tabulation
DataModification
References
Regression
37 / 93
Introduction
DataPreparation
LanguageVariationSuite
Working withData
VisualAnalytics
InferentialAnalysis
DataModification
Mixed Effects
RBRUL
Appendix
Cross Tabulation
DataModification
References
Model Output
38 / 93
Introduction
DataPreparation
LanguageVariationSuite
Working withData
VisualAnalytics
InferentialAnalysis
DataModification
Mixed Effects
RBRUL
Appendix
Cross Tabulation
DataModification
References
Interpretation
1 Estimate: reported in log-odds: negative or positive effectcloser to zero - lesser effect
2 P - significance (p < 0.05)
39 / 93
Introduction
DataPreparation
LanguageVariationSuite
Working withData
VisualAnalytics
InferentialAnalysis
DataModification
Mixed Effects
RBRUL
Appendix
Cross Tabulation
DataModification
References
Interpretation
Lexical item Fourth has a negative effect on retention and issignificant
Normal style has a slightly negative effect on retention but itscoefficient is not significant
Macy’s and Saks have a positive and significant effect onretention. Saks (upper middle class store) is more significantthan Macy’s (middle class store)
http://www.free-online-calculator-use.com/scientific-notation-converter.html40 / 93
Introduction
DataPreparation
LanguageVariationSuite
Working withData
VisualAnalytics
InferentialAnalysis
DataModification
Mixed Effects
RBRUL
Appendix
Cross Tabulation
DataModification
References
Interpretation
Lexical item Fourth has a negative effect on retention and issignificant
Normal style has a slightly negative effect on retention but itscoefficient is not significant
Macy’s and Saks have a positive and significant effect onretention. Saks (upper middle class store) is more significantthan Macy’s (middle class store)
http://www.free-online-calculator-use.com/scientific-notation-converter.html40 / 93
exponential notation:
1.48e-8
0.0000000148
Introduction
DataPreparation
LanguageVariationSuite
Working withData
VisualAnalytics
InferentialAnalysis
DataModification
Mixed Effects
RBRUL
Appendix
Cross Tabulation
DataModification
References
Questions?
41 / 93
Introduction
DataPreparation
LanguageVariationSuite
Working withData
VisualAnalytics
InferentialAnalysis
DataModification
Mixed Effects
RBRUL
Appendix
Cross Tabulation
DataModification
References
New Tools of Linguistic Analysis (Baayen 2008,Tagliamonte 2014, Gries 2015)
Conditional inference trees and Random Forests
“Proves to be more stable than stepwise variable selectionapproaches available for logistic regression” (Strobl2009:325)
Can handle skewed data that often violate theassumptions of regression approaches
42 / 93
Introduction
DataPreparation
LanguageVariationSuite
Working withData
VisualAnalytics
InferentialAnalysis
DataModification
Mixed Effects
RBRUL
Appendix
Cross Tabulation
DataModification
References
Conditional Tree
Conditional tree: a simple non-parametric regression analysis,commonly used in social and psychological studies
Linear regression: all information is combined linearly
Conditional tree regression: visual splitting to captureinteraction between variables
Recursive splitting (tree branches)43 / 93
Introduction
DataPreparation
LanguageVariationSuite
Working withData
VisualAnalytics
InferentialAnalysis
DataModification
Mixed Effects
RBRUL
Appendix
Cross Tabulation
DataModification
References
Conditional Tree - Tagliamonte and Baayen 2012
1 The distribution of was/were is split into two groups byindividuals.
2 The variant were occurs significantly more frequently with thefirst group.
44 / 93
Introduction
DataPreparation
LanguageVariationSuite
Working withData
VisualAnalytics
InferentialAnalysis
DataModification
Mixed Effects
RBRUL
Appendix
Cross Tabulation
DataModification
References
Conditional Tree - Tagliamonte and Baayen (2012)
1 Polarity is relevant to the second group of individuals.
2 The variant were occurs significantly more often with negativepolarity
45 / 93
Introduction
DataPreparation
LanguageVariationSuite
Working withData
VisualAnalytics
InferentialAnalysis
DataModification
Mixed Effects
RBRUL
Appendix
Cross Tabulation
DataModification
References
Conditional Tree - Tagliamonte and Baayen (2012)
1 Affirmative Polarity is conditioned by Age.
2 The variant was is produced significantly more often byIndividuals of 46 and younger.
46 / 93
Introduction
DataPreparation
LanguageVariationSuite
Working withData
VisualAnalytics
InferentialAnalysis
DataModification
Mixed Effects
RBRUL
Appendix
Cross Tabulation
DataModification
References
Conditional Tree
47 / 93
Introduction
DataPreparation
LanguageVariationSuite
Working withData
VisualAnalytics
InferentialAnalysis
DataModification
Mixed Effects
RBRUL
Appendix
Cross Tabulation
DataModification
References
Conditional Tree
1 Store is the most significant factor for R-use
Kleins (working class store) - more R-deletion
2 R-use in Macy’s and Saks is conditioned by lexical item:
Floor shows more R-retention than Fourth
3 Style is not significant48 / 93
Introduction
DataPreparation
LanguageVariationSuite
Working withData
VisualAnalytics
InferentialAnalysis
DataModification
Mixed Effects
RBRUL
Appendix
Cross Tabulation
DataModification
References
Random Forest
1 Variable importance for predictors
2 Robust technique with small n large p data
3 All predictors considered jointly (allows for inclusion ofcorrelated factors)
49 / 93
Introduction
DataPreparation
LanguageVariationSuite
Working withData
VisualAnalytics
InferentialAnalysis
DataModification
Mixed Effects
RBRUL
Appendix
Cross Tabulation
DataModification
References
Random Forest
50 / 93
Introduction
DataPreparation
LanguageVariationSuite
Working withData
VisualAnalytics
InferentialAnalysis
DataModification
Mixed Effects
RBRUL
Appendix
Cross Tabulation
DataModification
References
Random Forest
1 Store is the most important predictor
2 Lexical Item is the second predictor
3 Style is irrelevant: close to zero and red dotted line (cut-offvalue).
51 / 93
Introduction
DataPreparation
LanguageVariationSuite
Working withData
VisualAnalytics
InferentialAnalysis
DataModification
Mixed Effects
RBRUL
Appendix
Cross Tabulation
DataModification
References
Let’s Have a Short Break
52 / 93
Introduction
DataPreparation
LanguageVariationSuite
Working withData
VisualAnalytics
InferentialAnalysis
DataModification
Mixed Effects
RBRUL
Appendix
Cross Tabulation
DataModification
References
Preparing Data
1 Download continuousdata.csv
2 Upload this file on LVS
53 / 93
Introduction
DataPreparation
LanguageVariationSuite
Working withData
VisualAnalytics
InferentialAnalysis
DataModification
Mixed Effects
RBRUL
Appendix
Cross Tabulation
DataModification
References
Table
54 / 93
Introduction
DataPreparation
LanguageVariationSuite
Working withData
VisualAnalytics
InferentialAnalysis
DataModification
Mixed Effects
RBRUL
Appendix
Cross Tabulation
DataModification
References
Summary
55 / 93
Introduction
DataPreparation
LanguageVariationSuite
Working withData
VisualAnalytics
InferentialAnalysis
DataModification
Mixed Effects
RBRUL
Appendix
Cross Tabulation
DataModification
References
Summary
56 / 93
Introduction
DataPreparation
LanguageVariationSuite
Working withData
VisualAnalytics
InferentialAnalysis
DataModification
Mixed Effects
RBRUL
Appendix
Cross Tabulation
DataModification
References
Changing Class from Integer to Factor
57 / 93
Introduction
DataPreparation
LanguageVariationSuite
Working withData
VisualAnalytics
InferentialAnalysis
DataModification
Mixed Effects
RBRUL
Appendix
Cross Tabulation
DataModification
References
Change Class
58 / 93
Introduction
DataPreparation
LanguageVariationSuite
Working withData
VisualAnalytics
InferentialAnalysis
DataModification
Mixed Effects
RBRUL
Appendix
Cross Tabulation
DataModification
References
Adjusted Dataset
59 / 93
Introduction
DataPreparation
LanguageVariationSuite
Working withData
VisualAnalytics
InferentialAnalysis
DataModification
Mixed Effects
RBRUL
Appendix
Cross Tabulation
DataModification
References
Summary - New Dataset
60 / 93
Introduction
DataPreparation
LanguageVariationSuite
Working withData
VisualAnalytics
InferentialAnalysis
DataModification
Mixed Effects
RBRUL
Appendix
Cross Tabulation
DataModification
References
Continuous Variable - Histogram
61 / 93
Introduction
DataPreparation
LanguageVariationSuite
Working withData
VisualAnalytics
InferentialAnalysis
DataModification
Mixed Effects
RBRUL
Appendix
Cross Tabulation
DataModification
References
Density - Histogram
Density: a non-parametric model of the distribution of points basedon a smooth density estimate
http://scikit-learn.org/stable/modules/density.html
62 / 93
Introduction
DataPreparation
LanguageVariationSuite
Working withData
VisualAnalytics
InferentialAnalysis
DataModification
Mixed Effects
RBRUL
Appendix
Cross Tabulation
DataModification
References
Frequency Plot
63 / 93
Introduction
DataPreparation
LanguageVariationSuite
Working withData
VisualAnalytics
InferentialAnalysis
DataModification
Mixed Effects
RBRUL
Appendix
Cross Tabulation
DataModification
References
Frequency Plot - Word Cloud
64 / 93
Introduction
DataPreparation
LanguageVariationSuite
Working withData
VisualAnalytics
InferentialAnalysis
DataModification
Mixed Effects
RBRUL
Appendix
Cross Tabulation
DataModification
References
Frequency Plot
65 / 93
Introduction
DataPreparation
LanguageVariationSuite
Working withData
VisualAnalytics
InferentialAnalysis
DataModification
Mixed Effects
RBRUL
Appendix
Cross Tabulation
DataModification
References
Questions?
66 / 93
Introduction
DataPreparation
LanguageVariationSuite
Working withData
VisualAnalytics
InferentialAnalysis
DataModification
Mixed Effects
RBRUL
Appendix
Cross Tabulation
DataModification
References
Fixed and Mixed Effects Models
67 / 93
I’m ready for Mixed Models!
Introduction
DataPreparation
LanguageVariationSuite
Working withData
VisualAnalytics
InferentialAnalysis
DataModification
Mixed Effects
RBRUL
Appendix
Cross Tabulation
DataModification
References
Fixed and Mixed Models
Fixed Effects Model : All predictors are treated independently.
Underlying assumption - no group-internalvariation between speakers or tokens
Mixed Effects Model : Allows for evaluation of individual- andgroup-level variation
68 / 93
Introduction
DataPreparation
LanguageVariationSuite
Working withData
VisualAnalytics
InferentialAnalysis
DataModification
Mixed Effects
RBRUL
Appendix
Cross Tabulation
DataModification
References
Fixed and Mixed Models: Errors
Fixed Regression Model - ignoring individual variations(speakers or words) may lead to Type I Error:“a chance effect is mistaken for a real differencebetween the populations”
Mixed Regression Model - prone to Type II Error:“if speaker variation is at a high level, we cannotdiscern small population effects without a largenumber of speakers” (Johnson 2009, 2015)
69 / 93
Introduction
DataPreparation
LanguageVariationSuite
Working withData
VisualAnalytics
InferentialAnalysis
DataModification
Mixed Effects
RBRUL
Appendix
Cross Tabulation
DataModification
References
Mixed Effect Regression
Mixed Model = fixed effects + random effects
Fixed-effects factor - “repeatable and a small number of levels”
Random-effects factor - “a non-repeatable random sample
from a larger population” (Wieling 2012)
walk, sleep, study, finish, eat, etc
aspectual verb, stative verb
speaker1, speaker3, speaker3, etc
male, female
70 / 93
Introduction
DataPreparation
LanguageVariationSuite
Working withData
VisualAnalytics
InferentialAnalysis
DataModification
Mixed Effects
RBRUL
Appendix
Cross Tabulation
DataModification
References
Mixed Effect Regression
Mixed Model = fixed effects + random effects
Fixed-effects factor - “repeatable and a small number of levels”
Random-effects factor - “a non-repeatable random sample
from a larger population” (Wieling 2012)
walk, sleep, study, finish, eat, etc
aspectual verb, stative verb
speaker1, speaker3, speaker3, etc
male, female
70 / 93
Introduction
DataPreparation
LanguageVariationSuite
Working withData
VisualAnalytics
InferentialAnalysis
DataModification
Mixed Effects
RBRUL
Appendix
Cross Tabulation
DataModification
References
Mixed Effect Regression
Mixed Model = fixed effects + random effects
Fixed-effects factor - “repeatable and a small number of levels”
Random-effects factor - “a non-repeatable random sample
from a larger population” (Wieling 2012)
walk, sleep, study, finish, eat, etc
aspectual verb, stative verb
speaker1, speaker3, speaker3, etc
male, female
70 / 93
Introduction
DataPreparation
LanguageVariationSuite
Working withData
VisualAnalytics
InferentialAnalysis
DataModification
Mixed Effects
RBRUL
Appendix
Cross Tabulation
DataModification
References
Mixed Effect Modeling
71 / 93
NULL when the dependent variable is continuous
Fixed Effects - independent variables
Introduction
DataPreparation
LanguageVariationSuite
Working withData
VisualAnalytics
InferentialAnalysis
DataModification
Mixed Effects
RBRUL
Appendix
Cross Tabulation
DataModification
References
Mixed Effect Modeling
72 / 93
Mixed Effects - group-internal variation
Introduction
DataPreparation
LanguageVariationSuite
Working withData
VisualAnalytics
InferentialAnalysis
DataModification
Mixed Effects
RBRUL
Appendix
Cross Tabulation
DataModification
References
Regression Results
73 / 93
Introduction
DataPreparation
LanguageVariationSuite
Working withData
VisualAnalytics
InferentialAnalysis
DataModification
Mixed Effects
RBRUL
Appendix
Cross Tabulation
DataModification
References
Interpretation - Random Effects
1 Standard Deviation: a measure of the variability for eachrandom effect (speakers and tokens)
2 Residual: random variation that is not due to speakers ortokens (residual error)
74 / 93
Introduction
DataPreparation
LanguageVariationSuite
Working withData
VisualAnalytics
InferentialAnalysis
DataModification
Mixed Effects
RBRUL
Appendix
Cross Tabulation
DataModification
References
Interpretation - Fixed Effects
1 Estimate/coefficient: reported in log-odds (negative orpositive)
2 P-value: tells you if the level is significant
75 / 93
Introduction
DataPreparation
LanguageVariationSuite
Working withData
VisualAnalytics
InferentialAnalysis
DataModification
Mixed Effects
RBRUL
Appendix
Cross Tabulation
DataModification
References
Language Variation Suite - Structure
1 Demo
Brief introduction
2 Data
Upload file, data summary, adjust data, cross tabulation
3 Visual Analysis
Plotting, cluster classification
4 RBRUL
New version by Daniel Johnson!
5 Inferential statistics
Modeling, regression, varbrul analysis, conditional trees,random forest
76 / 93
Introduction
DataPreparation
LanguageVariationSuite
Working withData
VisualAnalytics
InferentialAnalysis
DataModification
Mixed Effects
RBRUL
Appendix
Cross Tabulation
DataModification
References
RBRUL 3.0 Beta
77 / 93
Introduction
DataPreparation
LanguageVariationSuite
Working withData
VisualAnalytics
InferentialAnalysis
DataModification
Mixed Effects
RBRUL
Appendix
Cross Tabulation
DataModification
References
Upload File
78 / 93
Introduction
DataPreparation
LanguageVariationSuite
Working withData
VisualAnalytics
InferentialAnalysis
DataModification
Mixed Effects
RBRUL
Appendix
Cross Tabulation
DataModification
References
Model Selection
79 / 93
Introduction
DataPreparation
LanguageVariationSuite
Working withData
VisualAnalytics
InferentialAnalysis
DataModification
Mixed Effects
RBRUL
Appendix
Cross Tabulation
DataModification
References
Model Selection
80 / 93
Introduction
DataPreparation
LanguageVariationSuite
Working withData
VisualAnalytics
InferentialAnalysis
DataModification
Mixed Effects
RBRUL
Appendix
Cross Tabulation
DataModification
References
Model Selection
81 / 93
Introduction
DataPreparation
LanguageVariationSuite
Working withData
VisualAnalytics
InferentialAnalysis
DataModification
Mixed Effects
RBRUL
Appendix
Cross Tabulation
DataModification
References
Output
82 / 93
Introduction
DataPreparation
LanguageVariationSuite
Working withData
VisualAnalytics
InferentialAnalysis
DataModification
Mixed Effects
RBRUL
Appendix
Cross Tabulation
DataModification
References
Application Values
83 / 93
Introduction
DataPreparation
LanguageVariationSuite
Working withData
VisualAnalytics
InferentialAnalysis
DataModification
Mixed Effects
RBRUL
Appendix
Cross Tabulation
DataModification
References
Questions?
84 / 93
Introduction
DataPreparation
LanguageVariationSuite
Working withData
VisualAnalytics
InferentialAnalysis
DataModification
Mixed Effects
RBRUL
Appendix
Cross Tabulation
DataModification
References
Appendix 1: Cross-Tabulation
Cross-tabulation examines the relationship between twovariables (their interaction).
85 / 93
Introduction
DataPreparation
LanguageVariationSuite
Working withData
VisualAnalytics
InferentialAnalysis
DataModification
Mixed Effects
RBRUL
Appendix
Cross Tabulation
DataModification
References
Cross-Tabulation: One Dependent and OneIndependent Variables
86 / 93
Introduction
DataPreparation
LanguageVariationSuite
Working withData
VisualAnalytics
InferentialAnalysis
DataModification
Mixed Effects
RBRUL
Appendix
Cross Tabulation
DataModification
References
Cross-Tabulation Output
Raw frequency / Proportion by column / Proportion across row
87 / 93
Introduction
DataPreparation
LanguageVariationSuite
Working withData
VisualAnalytics
InferentialAnalysis
DataModification
Mixed Effects
RBRUL
Appendix
Cross Tabulation
DataModification
References
Appendix 2: Data Modification
88 / 93
Introduction
DataPreparation
LanguageVariationSuite
Working withData
VisualAnalytics
InferentialAnalysis
DataModification
Mixed Effects
RBRUL
Appendix
Cross Tabulation
DataModification
References
Adjust Data
Retain: Select data subset
Exclude: Exclude variables from a factor group
Recode: Combine and rename variables
Change class: Numeric → factor; factor → numeric
Transform: Apply log transformation to a specific column
ADJUSTED DATASET:
Run - to apply all above changes
Reset - to reset to the original dataset
89 / 93
Introduction
DataPreparation
LanguageVariationSuite
Working withData
VisualAnalytics
InferentialAnalysis
DataModification
Mixed Effects
RBRUL
Appendix
Cross Tabulation
DataModification
References
Exclude: Emphatic Style
90 / 93
Introduction
DataPreparation
LanguageVariationSuite
Working withData
VisualAnalytics
InferentialAnalysis
DataModification
Mixed Effects
RBRUL
Appendix
Cross Tabulation
DataModification
References
Adjusted Dataset
91 / 93
Introduction
DataPreparation
LanguageVariationSuite
Working withData
VisualAnalytics
InferentialAnalysis
DataModification
Mixed Effects
RBRUL
Appendix
Cross Tabulation
DataModification
References
Adjusting Dataset
To revert to the original data, select RESET:
92 / 93
Introduction
DataPreparation
LanguageVariationSuite
Working withData
VisualAnalytics
InferentialAnalysis
DataModification
Mixed Effects
RBRUL
Appendix
Cross Tabulation
DataModification
References
References I
[1] Baayen, Harald. 2008. Analyzing linguistic data: A practical introduction to statistics. Cambridge:Cambridge University Press
[2] Bentivoglio, Paola and Mercedes Sedano. 1993. Investigacion sociolinguıstica: sus metodos aplicados auna experiencia venezolana. Boletın de Linguıstica 8. 3-35
[3] Gries, Stefan Th. 2015. Quantitative designs and statistical techniques. In Douglas Biber RandiReppen (eds.), The Cambridge Handbook of English Corpus Linguistics. Cambridge: CambridgeUniversity Press
[4] Labov, W. 1966. The Social Stratification of English in New York City. Washington: Center for AppliedLinguistics
[5] http://gifsanimados.espaciolatino.com/x bob esponja 8.gif
[6] https://daniellestolt.files.wordpress.com/2013/01/are-you-ready1.gif
[7] http://www.martijnwieling.nl/R/sheets.pdf
93 / 93