language variation suite - interactive toolkit for quantitative analysis

99
Introduction Data Preparation Language Variation Suite Working with Data Visual Analytics Inferential Analysis Data Modification Mixed Effects RBRUL Appendix Cross Tabulation Data Modification References Optimizing Language Variation Analysis: Language Variation Suite Olga Scrivner, Manuel D´ ıaz-Campos and Rafael Orozco [email protected] [email protected] [email protected] Indiana University and Louisiana State University NWAV45, 2016 1 / 93

Upload: olga-scrivner

Post on 10-Apr-2017

151 views

Category:

Data & Analytics


2 download

TRANSCRIPT

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

DataModification

Mixed Effects

RBRUL

Appendix

Cross Tabulation

DataModification

References

Optimizing Language Variation Analysis:Language Variation Suite

Olga Scrivner, Manuel Dıaz-Campos and Rafael Orozco

[email protected] [email protected] [email protected]

Indiana University and Louisiana State University

NWAV45, 20161 / 93

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

DataModification

Mixed Effects

RBRUL

Appendix

Cross Tabulation

DataModification

References

Goal

Provide researchers with a variety of quantitative methods toadvance language variation studies.

PositionSentencep < 0.001

1

ind, pre post

Heavinessp = 0.003

2

≤ 1 > 1

Periodp < 0.001

3

≤ 1 > 1

Node 4 (n = 81)

VO

OV

00.20.40.60.81

Node 5 (n = 119)

VO

OV

00.20.40.60.81

Node 6 (n = 181)

VO

OV

00.20.40.60.81

Periodp < 0.001

7

≤ 2 > 2

Node 8 (n = 221)

VO

OV

00.20.40.60.81

Focusp < 0.001

9

cf nf

Node 10 (n = 66)

VO

OV

00.20.40.60.81

Main_Verb_Structurep < 0.001

11

ACIOther, Restructuring

Node 12 (n = 43)

VO

OV

00.20.40.60.81

Node 13 (n = 265)

VO

OV

00.20.40.60.81

2 / 93

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

DataModification

Mixed Effects

RBRUL

Appendix

Cross Tabulation

DataModification

References

Objectives

1 Introduce a novel sociolinguistic toolkit

2 Develop practical quantitative analytical skills

3 Understand and interpret advanced statistical models

3 / 93

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

DataModification

Mixed Effects

RBRUL

Appendix

Cross Tabulation

DataModification

References

What is LVS?

Language Variation Suite

It is a Shiny web application designed for data analysis insociolinguistic research.

It can be used for:

Processing spreadsheet data

Reporting in tables and graphs

Analyzing means, regression, conditional trees and muchmore

4 / 93

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

DataModification

Mixed Effects

RBRUL

Appendix

Cross Tabulation

DataModification

References

Background

LVS is built in R using Shiny package:

1 R - a free programming language for statistical computingand graphics

2 Shiny App - a web application framework for R

Computational power of R + Web interactivity

5 / 93

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

DataModification

Mixed Effects

RBRUL

Appendix

Cross Tabulation

DataModification

References

Background

http://littleactuary.github.io/blog/Web-application-framework-with-Shiny/

6 / 93

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

DataModification

Mixed Effects

RBRUL

Appendix

Cross Tabulation

DataModification

References

Data Preparation

Important things to consider before data entry:

File format:

Comma separated value (CSV) facilitates faster processing

Excel format will slow processing

Column names should not contain spaces

Permitted: non-accented characters, numbers, underscore,hyphen, and period

One column must contain your dependent variable

The rest of the columns contain independent variables

7 / 93

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

DataModification

Mixed Effects

RBRUL

Appendix

Cross Tabulation

DataModification

References

Workspace

Browser

Chrome, Firefox, Safari - recommendable

Explorer may cause instability issues

Accessibility

PC, Mac, Linux

Data files can be uploaded from any location on yourcomputer

Smart Phone

Data files must be on a cloud platform connected to yourphone account (e.g. dropbox)

8 / 93

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

DataModification

Mixed Effects

RBRUL

Appendix

Cross Tabulation

DataModification

References

Terminology Review

a. Categorical - non-numerical data with two values

yes - no; deletion - retention; perfective - imperfective

b. Continuous - numerical data

duration, age, chronological period

c. Multinomial - non-numerical data with three or morevalues

deletion - aspiration - retention

d. Ordinal - scale: currently not supported

9 / 93

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

DataModification

Mixed Effects

RBRUL

Appendix

Cross Tabulation

DataModification

References

Workshop Files

https://languagevariationsuite.wordpress.com/

1 categoricaldata.csv: categorical dependent - Labov NewYork 1966 study

2 continuousdata.csv: continuous dependent - Intervocalic/d/ in Caracas corpus (Dıaz-Campos et al.)

3 LVS web site:

https://languagevariationsuite.shinyapps.io/Pages/

10 / 93

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

DataModification

Mixed Effects

RBRUL

Appendix

Cross Tabulation

DataModification

References

Workshop Files

https://languagevariationsuite.wordpress.com/

1 categoricaldata.csv: categorical dependent - Labov NewYork 1966 study

2 continuousdata.csv: continuous dependent - Intervocalic/d/ in Caracas corpus (Dıaz-Campos et al.)

3 LVS web site:

https://languagevariationsuite.shinyapps.io/Pages/

10 / 93

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

DataModification

Mixed Effects

RBRUL

Appendix

Cross Tabulation

DataModification

References

Language Variation Suite - Structure

1 Demo

Brief introduction

2 Data

Upload file, data summary, adjust data, cross tabulation

3 Visual Analysis

Plotting - histograms, frequencies, cluster plots

4 RBRUL

New version by Daniel Johnson!

5 Inferential statistics

Modeling, regression, conditional trees, random forest,model comparison

11 / 93

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

DataModification

Mixed Effects

RBRUL

Appendix

Cross Tabulation

DataModification

References

Language Variation Suite - Structure

1 Demo

Brief introduction

2 Data

Upload file, data summary, adjust data, cross tabulation

3 Visual Analysis

Plotting - histograms, frequencies, cluster plots

4 RBRUL

New version by Daniel Johnson!

5 Inferential statistics

Modeling, regression, conditional trees, random forest,model comparison

11 / 93

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

DataModification

Mixed Effects

RBRUL

Appendix

Cross Tabulation

DataModification

References

Language Variation Suite - Data

1 Upload CSV file

2 Upload Excel file

12 / 93

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

DataModification

Mixed Effects

RBRUL

Appendix

Cross Tabulation

DataModification

References

Excel Format

1 Slow processing

2 Requires the name of your excel sheet

13 / 93

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

DataModification

Mixed Effects

RBRUL

Appendix

Cross Tabulation

DataModification

References

Save Excel as CSV Format

To optimize speed - Save as CSV prior upload

14 / 93

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

DataModification

Mixed Effects

RBRUL

Appendix

Cross Tabulation

DataModification

References

Server

Since LVS is hosted on a server, Shiny idle time-out settingsmay stop the application when it is left inactive (it will greyout).

Solution: Click reload and re-upload your csv data

15 / 93

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

DataModification

Mixed Effects

RBRUL

Appendix

Cross Tabulation

DataModification

References

Upload File

Upload categoricaldata.csv

16 / 93

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

DataModification

Mixed Effects

RBRUL

Appendix

Cross Tabulation

DataModification

References

Table

Table displays our dataset and allows for sorting columns indescending/ascending order.

17 / 93

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

DataModification

Mixed Effects

RBRUL

Appendix

Cross Tabulation

DataModification

References

Summary

Summary provides a quantitative summary for each variable,e.g. frequency count, mean, median.

18 / 93

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

DataModification

Mixed Effects

RBRUL

Appendix

Cross Tabulation

DataModification

References

Data Structure

19 / 93

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

DataModification

Mixed Effects

RBRUL

Appendix

Cross Tabulation

DataModification

References

Data Structure

1 Total number of observations

2 Number of variables

3 Variable types

Factor - categorical values

Num - numeric values (0.95, 1.05)

Int - integer values (1, 2, 3)20 / 93

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

DataModification

Mixed Effects

RBRUL

Appendix

Cross Tabulation

DataModification

References

Questions?

21 / 93

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

DataModification

Mixed Effects

RBRUL

Appendix

Cross Tabulation

DataModification

References

Language Variation Suite - Structure

1 Demo

Brief introduction

2 Data

Upload file, data summary, adjust data, cross tabulation

3 Visual Analysis

Plotting, cluster classification

4 RBRUL

New version by Daniel Johnson!

5 Inferential statistics

Modeling, regression, conditional trees, random forest,model comparison

22 / 93

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

DataModification

Mixed Effects

RBRUL

Appendix

Cross Tabulation

DataModification

References

Visual Analytics

Visual Analytics: “The science of analytical reasoningfacilitated by visual interactive interfaces”

(Thomas et al. 2005)

23 / 93

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

DataModification

Mixed Effects

RBRUL

Appendix

Cross Tabulation

DataModification

References

One Variable Plot

24 / 93

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

DataModification

Mixed Effects

RBRUL

Appendix

Cross Tabulation

DataModification

References

One Variable Plot

25 / 93

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

DataModification

Mixed Effects

RBRUL

Appendix

Cross Tabulation

DataModification

References

Two Variables Plot

26 / 93

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

DataModification

Mixed Effects

RBRUL

Appendix

Cross Tabulation

DataModification

References

Two Variables Plot

27 / 93

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

DataModification

Mixed Effects

RBRUL

Appendix

Cross Tabulation

DataModification

References

Three Variables Plot

28 / 93

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

DataModification

Mixed Effects

RBRUL

Appendix

Cross Tabulation

DataModification

References

Three Variables Plot

29 / 93

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

DataModification

Mixed Effects

RBRUL

Appendix

Cross Tabulation

DataModification

References

Cluster Plot

Classification of data into sub-groups is based onpairwise similarities

Groups are clustered in the form of a tree-likedendrogram

Independent variable must have at least THREE values(e.g. store - Saks, Kleins, Macy’s)

30 / 93

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

DataModification

Mixed Effects

RBRUL

Appendix

Cross Tabulation

DataModification

References

Cluster Plot

31 / 93

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

DataModification

Mixed Effects

RBRUL

Appendix

Cross Tabulation

DataModification

References

Cluster Plot

Saks (upper middle-class store), Macy’s (middle-class store), Kleins

(working-class)

32 / 93

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

DataModification

Mixed Effects

RBRUL

Appendix

Cross Tabulation

DataModification

References

Inferential Statistics

33 / 93

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

DataModification

Mixed Effects

RBRUL

Appendix

Cross Tabulation

DataModification

References

Language Variation Suite - Structure

1 Demo

Brief introduction

2 Data

Upload file, data summary, adjust data, cross tabulation

3 Visual Analysis

Plotting, cluster classification

4 RBRUL

New version by Daniel Johnson!

5 Inferential statistics

Modeling, regression, conditional trees, random forest,model comparison

34 / 93

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

DataModification

Mixed Effects

RBRUL

Appendix

Cross Tabulation

DataModification

References

Modeling

35 / 93

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

DataModification

Mixed Effects

RBRUL

Appendix

Cross Tabulation

DataModification

References

Modeling

35 / 93

We are interested in RETENTION= Application

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

DataModification

Mixed Effects

RBRUL

Appendix

Cross Tabulation

DataModification

References

Regression Types

Model

a.) Fixed effects

b.) Mixed effects - individual speaker/token variation (withingroup)

Type of Dependent Variable

a.) Binary/categorical (only two values)

b.) Continuous (numeric)

c.) Multinomial - categorical with more than two values

36 / 93

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

DataModification

Mixed Effects

RBRUL

Appendix

Cross Tabulation

DataModification

References

Regression

37 / 93

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

DataModification

Mixed Effects

RBRUL

Appendix

Cross Tabulation

DataModification

References

Model Output

38 / 93

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

DataModification

Mixed Effects

RBRUL

Appendix

Cross Tabulation

DataModification

References

Interpretation

1 Estimate: reported in log-odds: negative or positive effectcloser to zero - lesser effect

2 P - significance (p < 0.05)

39 / 93

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

DataModification

Mixed Effects

RBRUL

Appendix

Cross Tabulation

DataModification

References

Interpretation

Lexical item Fourth has a negative effect on retention and issignificant

Normal style has a slightly negative effect on retention but itscoefficient is not significant

Macy’s and Saks have a positive and significant effect onretention. Saks (upper middle class store) is more significantthan Macy’s (middle class store)

http://www.free-online-calculator-use.com/scientific-notation-converter.html40 / 93

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

DataModification

Mixed Effects

RBRUL

Appendix

Cross Tabulation

DataModification

References

Interpretation

Lexical item Fourth has a negative effect on retention and issignificant

Normal style has a slightly negative effect on retention but itscoefficient is not significant

Macy’s and Saks have a positive and significant effect onretention. Saks (upper middle class store) is more significantthan Macy’s (middle class store)

http://www.free-online-calculator-use.com/scientific-notation-converter.html40 / 93

exponential notation:

1.48e-8

0.0000000148

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

DataModification

Mixed Effects

RBRUL

Appendix

Cross Tabulation

DataModification

References

Questions?

41 / 93

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

DataModification

Mixed Effects

RBRUL

Appendix

Cross Tabulation

DataModification

References

New Tools of Linguistic Analysis (Baayen 2008,Tagliamonte 2014, Gries 2015)

Conditional inference trees and Random Forests

“Proves to be more stable than stepwise variable selectionapproaches available for logistic regression” (Strobl2009:325)

Can handle skewed data that often violate theassumptions of regression approaches

42 / 93

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

DataModification

Mixed Effects

RBRUL

Appendix

Cross Tabulation

DataModification

References

Conditional Tree

Conditional tree: a simple non-parametric regression analysis,commonly used in social and psychological studies

Linear regression: all information is combined linearly

Conditional tree regression: visual splitting to captureinteraction between variables

Recursive splitting (tree branches)43 / 93

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

DataModification

Mixed Effects

RBRUL

Appendix

Cross Tabulation

DataModification

References

Conditional Tree - Tagliamonte and Baayen 2012

1 The distribution of was/were is split into two groups byindividuals.

2 The variant were occurs significantly more frequently with thefirst group.

44 / 93

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

DataModification

Mixed Effects

RBRUL

Appendix

Cross Tabulation

DataModification

References

Conditional Tree - Tagliamonte and Baayen (2012)

1 Polarity is relevant to the second group of individuals.

2 The variant were occurs significantly more often with negativepolarity

45 / 93

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

DataModification

Mixed Effects

RBRUL

Appendix

Cross Tabulation

DataModification

References

Conditional Tree - Tagliamonte and Baayen (2012)

1 Affirmative Polarity is conditioned by Age.

2 The variant was is produced significantly more often byIndividuals of 46 and younger.

46 / 93

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

DataModification

Mixed Effects

RBRUL

Appendix

Cross Tabulation

DataModification

References

Conditional Tree

47 / 93

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

DataModification

Mixed Effects

RBRUL

Appendix

Cross Tabulation

DataModification

References

Conditional Tree

1 Store is the most significant factor for R-use

Kleins (working class store) - more R-deletion

2 R-use in Macy’s and Saks is conditioned by lexical item:

Floor shows more R-retention than Fourth

3 Style is not significant48 / 93

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

DataModification

Mixed Effects

RBRUL

Appendix

Cross Tabulation

DataModification

References

Random Forest

1 Variable importance for predictors

2 Robust technique with small n large p data

3 All predictors considered jointly (allows for inclusion ofcorrelated factors)

49 / 93

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

DataModification

Mixed Effects

RBRUL

Appendix

Cross Tabulation

DataModification

References

Random Forest

50 / 93

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

DataModification

Mixed Effects

RBRUL

Appendix

Cross Tabulation

DataModification

References

Random Forest

1 Store is the most important predictor

2 Lexical Item is the second predictor

3 Style is irrelevant: close to zero and red dotted line (cut-offvalue).

51 / 93

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

DataModification

Mixed Effects

RBRUL

Appendix

Cross Tabulation

DataModification

References

Let’s Have a Short Break

52 / 93

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

DataModification

Mixed Effects

RBRUL

Appendix

Cross Tabulation

DataModification

References

Preparing Data

1 Download continuousdata.csv

2 Upload this file on LVS

53 / 93

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

DataModification

Mixed Effects

RBRUL

Appendix

Cross Tabulation

DataModification

References

Table

54 / 93

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

DataModification

Mixed Effects

RBRUL

Appendix

Cross Tabulation

DataModification

References

Summary

55 / 93

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

DataModification

Mixed Effects

RBRUL

Appendix

Cross Tabulation

DataModification

References

Summary

56 / 93

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

DataModification

Mixed Effects

RBRUL

Appendix

Cross Tabulation

DataModification

References

Changing Class from Integer to Factor

57 / 93

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

DataModification

Mixed Effects

RBRUL

Appendix

Cross Tabulation

DataModification

References

Change Class

58 / 93

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

DataModification

Mixed Effects

RBRUL

Appendix

Cross Tabulation

DataModification

References

Adjusted Dataset

59 / 93

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

DataModification

Mixed Effects

RBRUL

Appendix

Cross Tabulation

DataModification

References

Summary - New Dataset

60 / 93

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

DataModification

Mixed Effects

RBRUL

Appendix

Cross Tabulation

DataModification

References

Continuous Variable - Histogram

61 / 93

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

DataModification

Mixed Effects

RBRUL

Appendix

Cross Tabulation

DataModification

References

Density - Histogram

Density: a non-parametric model of the distribution of points basedon a smooth density estimate

http://scikit-learn.org/stable/modules/density.html

62 / 93

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

DataModification

Mixed Effects

RBRUL

Appendix

Cross Tabulation

DataModification

References

Frequency Plot

63 / 93

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

DataModification

Mixed Effects

RBRUL

Appendix

Cross Tabulation

DataModification

References

Frequency Plot - Word Cloud

64 / 93

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

DataModification

Mixed Effects

RBRUL

Appendix

Cross Tabulation

DataModification

References

Frequency Plot

65 / 93

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

DataModification

Mixed Effects

RBRUL

Appendix

Cross Tabulation

DataModification

References

Questions?

66 / 93

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

DataModification

Mixed Effects

RBRUL

Appendix

Cross Tabulation

DataModification

References

Fixed and Mixed Effects Models

67 / 93

I’m ready for Mixed Models!

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

DataModification

Mixed Effects

RBRUL

Appendix

Cross Tabulation

DataModification

References

Fixed and Mixed Models

Fixed Effects Model : All predictors are treated independently.

Underlying assumption - no group-internalvariation between speakers or tokens

Mixed Effects Model : Allows for evaluation of individual- andgroup-level variation

68 / 93

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

DataModification

Mixed Effects

RBRUL

Appendix

Cross Tabulation

DataModification

References

Fixed and Mixed Models: Errors

Fixed Regression Model - ignoring individual variations(speakers or words) may lead to Type I Error:“a chance effect is mistaken for a real differencebetween the populations”

Mixed Regression Model - prone to Type II Error:“if speaker variation is at a high level, we cannotdiscern small population effects without a largenumber of speakers” (Johnson 2009, 2015)

69 / 93

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

DataModification

Mixed Effects

RBRUL

Appendix

Cross Tabulation

DataModification

References

Mixed Effect Regression

Mixed Model = fixed effects + random effects

Fixed-effects factor - “repeatable and a small number of levels”

Random-effects factor - “a non-repeatable random sample

from a larger population” (Wieling 2012)

walk, sleep, study, finish, eat, etc

aspectual verb, stative verb

speaker1, speaker3, speaker3, etc

male, female

70 / 93

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

DataModification

Mixed Effects

RBRUL

Appendix

Cross Tabulation

DataModification

References

Mixed Effect Regression

Mixed Model = fixed effects + random effects

Fixed-effects factor - “repeatable and a small number of levels”

Random-effects factor - “a non-repeatable random sample

from a larger population” (Wieling 2012)

walk, sleep, study, finish, eat, etc

aspectual verb, stative verb

speaker1, speaker3, speaker3, etc

male, female

70 / 93

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

DataModification

Mixed Effects

RBRUL

Appendix

Cross Tabulation

DataModification

References

Mixed Effect Regression

Mixed Model = fixed effects + random effects

Fixed-effects factor - “repeatable and a small number of levels”

Random-effects factor - “a non-repeatable random sample

from a larger population” (Wieling 2012)

walk, sleep, study, finish, eat, etc

aspectual verb, stative verb

speaker1, speaker3, speaker3, etc

male, female

70 / 93

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

DataModification

Mixed Effects

RBRUL

Appendix

Cross Tabulation

DataModification

References

Mixed Effect Modeling

71 / 93

NULL when the dependent variable is continuous

Fixed Effects - independent variables

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

DataModification

Mixed Effects

RBRUL

Appendix

Cross Tabulation

DataModification

References

Mixed Effect Modeling

72 / 93

Mixed Effects - group-internal variation

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

DataModification

Mixed Effects

RBRUL

Appendix

Cross Tabulation

DataModification

References

Regression Results

73 / 93

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

DataModification

Mixed Effects

RBRUL

Appendix

Cross Tabulation

DataModification

References

Interpretation - Random Effects

1 Standard Deviation: a measure of the variability for eachrandom effect (speakers and tokens)

2 Residual: random variation that is not due to speakers ortokens (residual error)

74 / 93

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

DataModification

Mixed Effects

RBRUL

Appendix

Cross Tabulation

DataModification

References

Interpretation - Fixed Effects

1 Estimate/coefficient: reported in log-odds (negative orpositive)

2 P-value: tells you if the level is significant

75 / 93

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

DataModification

Mixed Effects

RBRUL

Appendix

Cross Tabulation

DataModification

References

Language Variation Suite - Structure

1 Demo

Brief introduction

2 Data

Upload file, data summary, adjust data, cross tabulation

3 Visual Analysis

Plotting, cluster classification

4 RBRUL

New version by Daniel Johnson!

5 Inferential statistics

Modeling, regression, varbrul analysis, conditional trees,random forest

76 / 93

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

DataModification

Mixed Effects

RBRUL

Appendix

Cross Tabulation

DataModification

References

RBRUL 3.0 Beta

77 / 93

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

DataModification

Mixed Effects

RBRUL

Appendix

Cross Tabulation

DataModification

References

Upload File

78 / 93

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

DataModification

Mixed Effects

RBRUL

Appendix

Cross Tabulation

DataModification

References

Model Selection

79 / 93

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

DataModification

Mixed Effects

RBRUL

Appendix

Cross Tabulation

DataModification

References

Model Selection

80 / 93

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

DataModification

Mixed Effects

RBRUL

Appendix

Cross Tabulation

DataModification

References

Model Selection

81 / 93

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

DataModification

Mixed Effects

RBRUL

Appendix

Cross Tabulation

DataModification

References

Output

82 / 93

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

DataModification

Mixed Effects

RBRUL

Appendix

Cross Tabulation

DataModification

References

Application Values

83 / 93

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

DataModification

Mixed Effects

RBRUL

Appendix

Cross Tabulation

DataModification

References

Questions?

84 / 93

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

DataModification

Mixed Effects

RBRUL

Appendix

Cross Tabulation

DataModification

References

Appendix 1: Cross-Tabulation

Cross-tabulation examines the relationship between twovariables (their interaction).

85 / 93

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

DataModification

Mixed Effects

RBRUL

Appendix

Cross Tabulation

DataModification

References

Cross-Tabulation: One Dependent and OneIndependent Variables

86 / 93

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

DataModification

Mixed Effects

RBRUL

Appendix

Cross Tabulation

DataModification

References

Cross-Tabulation Output

Raw frequency / Proportion by column / Proportion across row

87 / 93

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

DataModification

Mixed Effects

RBRUL

Appendix

Cross Tabulation

DataModification

References

Appendix 2: Data Modification

88 / 93

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

DataModification

Mixed Effects

RBRUL

Appendix

Cross Tabulation

DataModification

References

Adjust Data

Retain: Select data subset

Exclude: Exclude variables from a factor group

Recode: Combine and rename variables

Change class: Numeric → factor; factor → numeric

Transform: Apply log transformation to a specific column

ADJUSTED DATASET:

Run - to apply all above changes

Reset - to reset to the original dataset

89 / 93

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

DataModification

Mixed Effects

RBRUL

Appendix

Cross Tabulation

DataModification

References

Exclude: Emphatic Style

90 / 93

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

DataModification

Mixed Effects

RBRUL

Appendix

Cross Tabulation

DataModification

References

Adjusted Dataset

91 / 93

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

DataModification

Mixed Effects

RBRUL

Appendix

Cross Tabulation

DataModification

References

Adjusting Dataset

To revert to the original data, select RESET:

92 / 93

Introduction

DataPreparation

LanguageVariationSuite

Working withData

VisualAnalytics

InferentialAnalysis

DataModification

Mixed Effects

RBRUL

Appendix

Cross Tabulation

DataModification

References

References I

[1] Baayen, Harald. 2008. Analyzing linguistic data: A practical introduction to statistics. Cambridge:Cambridge University Press

[2] Bentivoglio, Paola and Mercedes Sedano. 1993. Investigacion sociolinguıstica: sus metodos aplicados auna experiencia venezolana. Boletın de Linguıstica 8. 3-35

[3] Gries, Stefan Th. 2015. Quantitative designs and statistical techniques. In Douglas Biber RandiReppen (eds.), The Cambridge Handbook of English Corpus Linguistics. Cambridge: CambridgeUniversity Press

[4] Labov, W. 1966. The Social Stratification of English in New York City. Washington: Center for AppliedLinguistics

[5] http://gifsanimados.espaciolatino.com/x bob esponja 8.gif

[6] https://daniellestolt.files.wordpress.com/2013/01/are-you-ready1.gif

[7] http://www.martijnwieling.nl/R/sheets.pdf

93 / 93