incorporating dialectal variability for socially equitable...
TRANSCRIPT
![Page 1: Incorporating Dialectal Variability for Socially Equitable ...jurgens/docs/jurgens-tsvetkov-jurafsky... · Incorporating Dialectal Variability for Socially Equitable ... Dialect Less](https://reader031.vdocuments.mx/reader031/viewer/2022022006/5aca176d7f8b9a51678da9bd/html5/thumbnails/1.jpg)
Incorporating Dialectal Variability for Socially Equitable
Language IdentificationDavid Jurgens, Yulia Tsvetkov, and Dan Jurafsky
![Page 2: Incorporating Dialectal Variability for Socially Equitable ...jurgens/docs/jurgens-tsvetkov-jurafsky... · Incorporating Dialectal Variability for Socially Equitable ... Dialect Less](https://reader031.vdocuments.mx/reader031/viewer/2022022006/5aca176d7f8b9a51678da9bd/html5/thumbnails/2.jpg)
McNamee, P., “Language identification: a solved problem suitable for undergraduate instruction” Journal
of Computing Sciences in Colleges 20(3) 2005.
![Page 3: Incorporating Dialectal Variability for Socially Equitable ...jurgens/docs/jurgens-tsvetkov-jurafsky... · Incorporating Dialectal Variability for Socially Equitable ... Dialect Less](https://reader031.vdocuments.mx/reader031/viewer/2022022006/5aca176d7f8b9a51678da9bd/html5/thumbnails/3.jpg)
“This paper describes […] how even the most simple of these methods
using data obtained from the World Wide Web achieve accuracy
approaching 100% on a test suite comprised of ten European languages”
McNamee, P., “Language identification: a solved problem suitable for undergraduate instruction” Journal
of Computing Sciences in Colleges 20(3) 2005.
![Page 4: Incorporating Dialectal Variability for Socially Equitable ...jurgens/docs/jurgens-tsvetkov-jurafsky... · Incorporating Dialectal Variability for Socially Equitable ... Dialect Less](https://reader031.vdocuments.mx/reader031/viewer/2022022006/5aca176d7f8b9a51678da9bd/html5/thumbnails/4.jpg)
Whose language are we identifying?
![Page 5: Incorporating Dialectal Variability for Socially Equitable ...jurgens/docs/jurgens-tsvetkov-jurafsky... · Incorporating Dialectal Variability for Socially Equitable ... Dialect Less](https://reader031.vdocuments.mx/reader031/viewer/2022022006/5aca176d7f8b9a51678da9bd/html5/thumbnails/5.jpg)
Whose language are we identifying?
![Page 6: Incorporating Dialectal Variability for Socially Equitable ...jurgens/docs/jurgens-tsvetkov-jurafsky... · Incorporating Dialectal Variability for Socially Equitable ... Dialect Less](https://reader031.vdocuments.mx/reader031/viewer/2022022006/5aca176d7f8b9a51678da9bd/html5/thumbnails/6.jpg)
Whose language are we identifying?
![Page 7: Incorporating Dialectal Variability for Socially Equitable ...jurgens/docs/jurgens-tsvetkov-jurafsky... · Incorporating Dialectal Variability for Socially Equitable ... Dialect Less](https://reader031.vdocuments.mx/reader031/viewer/2022022006/5aca176d7f8b9a51678da9bd/html5/thumbnails/7.jpg)
Whose language are we identifying?
![Page 8: Incorporating Dialectal Variability for Socially Equitable ...jurgens/docs/jurgens-tsvetkov-jurafsky... · Incorporating Dialectal Variability for Socially Equitable ... Dialect Less](https://reader031.vdocuments.mx/reader031/viewer/2022022006/5aca176d7f8b9a51678da9bd/html5/thumbnails/8.jpg)
Global platforms attract global diversity in a language
English
![Page 9: Incorporating Dialectal Variability for Socially Equitable ...jurgens/docs/jurgens-tsvetkov-jurafsky... · Incorporating Dialectal Variability for Socially Equitable ... Dialect Less](https://reader031.vdocuments.mx/reader031/viewer/2022022006/5aca176d7f8b9a51678da9bd/html5/thumbnails/9.jpg)
Global platforms attract global diversity in a language
English 125M Speakers
90M Speakers
79M Speakers
60M Speakers
251M Speakers
![Page 10: Incorporating Dialectal Variability for Socially Equitable ...jurgens/docs/jurgens-tsvetkov-jurafsky... · Incorporating Dialectal Variability for Socially Equitable ... Dialect Less](https://reader031.vdocuments.mx/reader031/viewer/2022022006/5aca176d7f8b9a51678da9bd/html5/thumbnails/10.jpg)
Global platforms attract global diversity in a language
English
French Spanish Arabic
125M Speakers
90M Speakers
79M Speakers
60M Speakers
251M Speakers
![Page 11: Incorporating Dialectal Variability for Socially Equitable ...jurgens/docs/jurgens-tsvetkov-jurafsky... · Incorporating Dialectal Variability for Socially Equitable ... Dialect Less](https://reader031.vdocuments.mx/reader031/viewer/2022022006/5aca176d7f8b9a51678da9bd/html5/thumbnails/11.jpg)
5
Human Development Index of text’s origin country
Estimated LID accuracy for
English tweets
{EducationLife expectancy Income
(Labov, 1964; Ash, 2002)
![Page 12: Incorporating Dialectal Variability for Socially Equitable ...jurgens/docs/jurgens-tsvetkov-jurafsky... · Incorporating Dialectal Variability for Socially Equitable ... Dialect Less](https://reader031.vdocuments.mx/reader031/viewer/2022022006/5aca176d7f8b9a51678da9bd/html5/thumbnails/12.jpg)
5
Human Development Index of text’s origin country
Estimated LID accuracy for
English tweets
{EducationLife expectancy Income
MoreDialect
LessDialect
(Labov, 1964; Ash, 2002)
![Page 13: Incorporating Dialectal Variability for Socially Equitable ...jurgens/docs/jurgens-tsvetkov-jurafsky... · Incorporating Dialectal Variability for Socially Equitable ... Dialect Less](https://reader031.vdocuments.mx/reader031/viewer/2022022006/5aca176d7f8b9a51678da9bd/html5/thumbnails/13.jpg)
5
Human Development Index of text’s origin country
Estimated LID accuracy for
English tweets
{EducationLife expectancy Income
MoreDialect
LessDialect
(Labov, 1964; Ash, 2002)
![Page 14: Incorporating Dialectal Variability for Socially Equitable ...jurgens/docs/jurgens-tsvetkov-jurafsky... · Incorporating Dialectal Variability for Socially Equitable ... Dialect Less](https://reader031.vdocuments.mx/reader031/viewer/2022022006/5aca176d7f8b9a51678da9bd/html5/thumbnails/14.jpg)
Current language detection methods perform significantly worse in less-developed countries
5
Human Development Index of text’s origin country
Estimated LID accuracy for
English tweets
{EducationLife expectancy Income
MoreDialect
LessDialect
(Labov, 1964; Ash, 2002)
![Page 15: Incorporating Dialectal Variability for Socially Equitable ...jurgens/docs/jurgens-tsvetkov-jurafsky... · Incorporating Dialectal Variability for Socially Equitable ... Dialect Less](https://reader031.vdocuments.mx/reader031/viewer/2022022006/5aca176d7f8b9a51678da9bd/html5/thumbnails/15.jpg)
Current language detection methods perform significantly worse in less-developed countries
5
Human Development Index of text’s origin country
Estimated LID accuracy for
English tweets }23%
{EducationLife expectancy Income
MoreDialect
LessDialect
(Labov, 1964; Ash, 2002)
![Page 16: Incorporating Dialectal Variability for Socially Equitable ...jurgens/docs/jurgens-tsvetkov-jurafsky... · Incorporating Dialectal Variability for Socially Equitable ... Dialect Less](https://reader031.vdocuments.mx/reader031/viewer/2022022006/5aca176d7f8b9a51678da9bd/html5/thumbnails/16.jpg)
6
Keyword Filter“flu”, “sick”
Practical Motivation: Epidemic Detection
NLP Which symptoms?
6
![Page 17: Incorporating Dialectal Variability for Socially Equitable ...jurgens/docs/jurgens-tsvetkov-jurafsky... · Incorporating Dialectal Variability for Socially Equitable ... Dialect Less](https://reader031.vdocuments.mx/reader031/viewer/2022022006/5aca176d7f8b9a51678da9bd/html5/thumbnails/17.jpg)
6
Keyword Filter“flu”, “sick”
Practical Motivation: Epidemic Detection
NLP Which symptoms?
LanguageDetection
6
![Page 18: Incorporating Dialectal Variability for Socially Equitable ...jurgens/docs/jurgens-tsvetkov-jurafsky... · Incorporating Dialectal Variability for Socially Equitable ... Dialect Less](https://reader031.vdocuments.mx/reader031/viewer/2022022006/5aca176d7f8b9a51678da9bd/html5/thumbnails/18.jpg)
6
Keyword Filter“flu”, “sick”
Practical Motivation: Epidemic Detection
non-English
NLP Which symptoms?
LanguageDetection
6
![Page 19: Incorporating Dialectal Variability for Socially Equitable ...jurgens/docs/jurgens-tsvetkov-jurafsky... · Incorporating Dialectal Variability for Socially Equitable ... Dialect Less](https://reader031.vdocuments.mx/reader031/viewer/2022022006/5aca176d7f8b9a51678da9bd/html5/thumbnails/19.jpg)
6
Keyword Filter“flu”, “sick”
Practical Motivation: Epidemic Detection
non-English
NLP Which symptoms?
LanguageDetection
6
![Page 20: Incorporating Dialectal Variability for Socially Equitable ...jurgens/docs/jurgens-tsvetkov-jurafsky... · Incorporating Dialectal Variability for Socially Equitable ... Dialect Less](https://reader031.vdocuments.mx/reader031/viewer/2022022006/5aca176d7f8b9a51678da9bd/html5/thumbnails/20.jpg)
6
Keyword Filter“flu”, “sick”
Practical Motivation: Epidemic Detection
non-English
NLP Which symptoms?
LanguageDetection
6
![Page 21: Incorporating Dialectal Variability for Socially Equitable ...jurgens/docs/jurgens-tsvetkov-jurafsky... · Incorporating Dialectal Variability for Socially Equitable ... Dialect Less](https://reader031.vdocuments.mx/reader031/viewer/2022022006/5aca176d7f8b9a51678da9bd/html5/thumbnails/21.jpg)
6
Keyword Filter“flu”, “sick”
Practical Motivation: Epidemic Detection
NLP Which symptoms?
LanguageDetection
non-English?
6
![Page 22: Incorporating Dialectal Variability for Socially Equitable ...jurgens/docs/jurgens-tsvetkov-jurafsky... · Incorporating Dialectal Variability for Socially Equitable ... Dialect Less](https://reader031.vdocuments.mx/reader031/viewer/2022022006/5aca176d7f8b9a51678da9bd/html5/thumbnails/22.jpg)
Failing to recognize a language silences its
speakers’ voices
![Page 23: Incorporating Dialectal Variability for Socially Equitable ...jurgens/docs/jurgens-tsvetkov-jurafsky... · Incorporating Dialectal Variability for Socially Equitable ... Dialect Less](https://reader031.vdocuments.mx/reader031/viewer/2022022006/5aca176d7f8b9a51678da9bd/html5/thumbnails/23.jpg)
Current language detection methods perform significantly worse in less-developed countries
8
Human Development Index of text’s origin country
Estimated accuracy for
English tweets
MoreDialect
LessDialect
(Labov, 1964; Ash, 2002)
![Page 24: Incorporating Dialectal Variability for Socially Equitable ...jurgens/docs/jurgens-tsvetkov-jurafsky... · Incorporating Dialectal Variability for Socially Equitable ... Dialect Less](https://reader031.vdocuments.mx/reader031/viewer/2022022006/5aca176d7f8b9a51678da9bd/html5/thumbnails/24.jpg)
Current language detection methods perform significantly worse in less-developed countries
8
Human Development Index of text’s origin country
Estimated accuracy for
English tweets
MoreDialect
LessDialect
(Labov, 1964; Ash, 2002)
Our goal is make language ID performance equal for all
languages across all dialects
![Page 25: Incorporating Dialectal Variability for Socially Equitable ...jurgens/docs/jurgens-tsvetkov-jurafsky... · Incorporating Dialectal Variability for Socially Equitable ... Dialect Less](https://reader031.vdocuments.mx/reader031/viewer/2022022006/5aca176d7f8b9a51678da9bd/html5/thumbnails/25.jpg)
Current language detection methods perform significantly worse in less-developed countries
8
Human Development Index of text’s origin country
Estimated accuracy for
English tweets
MoreDialect
LessDialect
(Labov, 1964; Ash, 2002)
Our goal is make language ID performance equal for all
languages across all dialects
This is a
universal
NLP issue!
![Page 26: Incorporating Dialectal Variability for Socially Equitable ...jurgens/docs/jurgens-tsvetkov-jurafsky... · Incorporating Dialectal Variability for Socially Equitable ... Dialect Less](https://reader031.vdocuments.mx/reader031/viewer/2022022006/5aca176d7f8b9a51678da9bd/html5/thumbnails/26.jpg)
Key Problems: Current methods struggle in the global setting because
9
![Page 27: Incorporating Dialectal Variability for Socially Equitable ...jurgens/docs/jurgens-tsvetkov-jurafsky... · Incorporating Dialectal Variability for Socially Equitable ... Dialect Less](https://reader031.vdocuments.mx/reader031/viewer/2022022006/5aca176d7f8b9a51678da9bd/html5/thumbnails/27.jpg)
Key Problems: Current methods struggle in the global setting because
9
Data: No corpora that captures global variation in lexicon and dialect
![Page 28: Incorporating Dialectal Variability for Socially Equitable ...jurgens/docs/jurgens-tsvetkov-jurafsky... · Incorporating Dialectal Variability for Socially Equitable ... Dialect Less](https://reader031.vdocuments.mx/reader031/viewer/2022022006/5aca176d7f8b9a51678da9bd/html5/thumbnails/28.jpg)
Key Problems: Current methods struggle in the global setting because
9
Data: No corpora that captures global variation in lexicon and dialect
Model: makes simplistic assumptions about how multilinguals communicate
![Page 29: Incorporating Dialectal Variability for Socially Equitable ...jurgens/docs/jurgens-tsvetkov-jurafsky... · Incorporating Dialectal Variability for Socially Equitable ... Dialect Less](https://reader031.vdocuments.mx/reader031/viewer/2022022006/5aca176d7f8b9a51678da9bd/html5/thumbnails/29.jpg)
Our approach
10
NLP methodologies capable of handling linguistic variation
Better social representation through network-based
sampling
![Page 30: Incorporating Dialectal Variability for Socially Equitable ...jurgens/docs/jurgens-tsvetkov-jurafsky... · Incorporating Dialectal Variability for Socially Equitable ... Dialect Less](https://reader031.vdocuments.mx/reader031/viewer/2022022006/5aca176d7f8b9a51678da9bd/html5/thumbnails/30.jpg)
Our Data Solution: Improve linguistic representation through network-based sampling
11
![Page 31: Incorporating Dialectal Variability for Socially Equitable ...jurgens/docs/jurgens-tsvetkov-jurafsky... · Incorporating Dialectal Variability for Socially Equitable ... Dialect Less](https://reader031.vdocuments.mx/reader031/viewer/2022022006/5aca176d7f8b9a51678da9bd/html5/thumbnails/31.jpg)
Our Data Solution: Improve linguistic representation through network-based sampling
11
Bootstrap dialectic corpora using existing classifiers to find monolingual individuals
![Page 32: Incorporating Dialectal Variability for Socially Equitable ...jurgens/docs/jurgens-tsvetkov-jurafsky... · Incorporating Dialectal Variability for Socially Equitable ... Dialect Less](https://reader031.vdocuments.mx/reader031/viewer/2022022006/5aca176d7f8b9a51678da9bd/html5/thumbnails/32.jpg)
Our Data Solution: Improve linguistic representation through network-based sampling
11
Bootstrap dialectic corpora using existing classifiers to find monolingual individuals
![Page 33: Incorporating Dialectal Variability for Socially Equitable ...jurgens/docs/jurgens-tsvetkov-jurafsky... · Incorporating Dialectal Variability for Socially Equitable ... Dialect Less](https://reader031.vdocuments.mx/reader031/viewer/2022022006/5aca176d7f8b9a51678da9bd/html5/thumbnails/33.jpg)
Our Data Solution: Improve linguistic representation through network-based sampling
11
Bootstrap dialectic corpora using existing classifiers to find monolingual individuals
eng
![Page 34: Incorporating Dialectal Variability for Socially Equitable ...jurgens/docs/jurgens-tsvetkov-jurafsky... · Incorporating Dialectal Variability for Socially Equitable ... Dialect Less](https://reader031.vdocuments.mx/reader031/viewer/2022022006/5aca176d7f8b9a51678da9bd/html5/thumbnails/34.jpg)
Our Data Solution: Improve linguistic representation through network-based sampling
11
Bootstrap dialectic corpora using existing classifiers to find monolingual individuals
engeng
![Page 35: Incorporating Dialectal Variability for Socially Equitable ...jurgens/docs/jurgens-tsvetkov-jurafsky... · Incorporating Dialectal Variability for Socially Equitable ... Dialect Less](https://reader031.vdocuments.mx/reader031/viewer/2022022006/5aca176d7f8b9a51678da9bd/html5/thumbnails/35.jpg)
Our Data Solution: Improve linguistic representation through network-based sampling
11
Bootstrap dialectic corpora using existing classifiers to find monolingual individuals
engeng
eng
engeng
fra
![Page 36: Incorporating Dialectal Variability for Socially Equitable ...jurgens/docs/jurgens-tsvetkov-jurafsky... · Incorporating Dialectal Variability for Socially Equitable ... Dialect Less](https://reader031.vdocuments.mx/reader031/viewer/2022022006/5aca176d7f8b9a51678da9bd/html5/thumbnails/36.jpg)
Our Data Solution: Improve linguistic representation through network-based sampling
11
Bootstrap dialectic corpora using existing classifiers to find monolingual individuals
engeng
eng
engeng
engfra
![Page 37: Incorporating Dialectal Variability for Socially Equitable ...jurgens/docs/jurgens-tsvetkov-jurafsky... · Incorporating Dialectal Variability for Socially Equitable ... Dialect Less](https://reader031.vdocuments.mx/reader031/viewer/2022022006/5aca176d7f8b9a51678da9bd/html5/thumbnails/37.jpg)
Our Data Solution: Improve linguistic representation through network-based sampling
11
Bootstrap dialectic corpora using existing classifiers to find monolingual individuals
Sample from the geolocated Twitter social network to include text from people at all locations
engeng
eng
engeng
engfra
![Page 38: Incorporating Dialectal Variability for Socially Equitable ...jurgens/docs/jurgens-tsvetkov-jurafsky... · Incorporating Dialectal Variability for Socially Equitable ... Dialect Less](https://reader031.vdocuments.mx/reader031/viewer/2022022006/5aca176d7f8b9a51678da9bd/html5/thumbnails/38.jpg)
Build a strategically-diverse corpora and synthesize code-switched examples
12
![Page 39: Incorporating Dialectal Variability for Socially Equitable ...jurgens/docs/jurgens-tsvetkov-jurafsky... · Incorporating Dialectal Variability for Socially Equitable ... Dialect Less](https://reader031.vdocuments.mx/reader031/viewer/2022022006/5aca176d7f8b9a51678da9bd/html5/thumbnails/39.jpg)
Build a strategically-diverse corpora and synthesize code-switched examples
12
Topical
![Page 40: Incorporating Dialectal Variability for Socially Equitable ...jurgens/docs/jurgens-tsvetkov-jurafsky... · Incorporating Dialectal Variability for Socially Equitable ... Dialect Less](https://reader031.vdocuments.mx/reader031/viewer/2022022006/5aca176d7f8b9a51678da9bd/html5/thumbnails/40.jpg)
Build a strategically-diverse corpora and synthesize code-switched examples
12
Topical Geographic
![Page 41: Incorporating Dialectal Variability for Socially Equitable ...jurgens/docs/jurgens-tsvetkov-jurafsky... · Incorporating Dialectal Variability for Socially Equitable ... Dialect Less](https://reader031.vdocuments.mx/reader031/viewer/2022022006/5aca176d7f8b9a51678da9bd/html5/thumbnails/41.jpg)
Build a strategically-diverse corpora and synthesize code-switched examples
12
Topical
Social
Geographic
![Page 42: Incorporating Dialectal Variability for Socially Equitable ...jurgens/docs/jurgens-tsvetkov-jurafsky... · Incorporating Dialectal Variability for Socially Equitable ... Dialect Less](https://reader031.vdocuments.mx/reader031/viewer/2022022006/5aca176d7f8b9a51678da9bd/html5/thumbnails/42.jpg)
Build a strategically-diverse corpora and synthesize code-switched examples
12
Topical
Social
Geographic
Multilingual
![Page 43: Incorporating Dialectal Variability for Socially Equitable ...jurgens/docs/jurgens-tsvetkov-jurafsky... · Incorporating Dialectal Variability for Socially Equitable ... Dialect Less](https://reader031.vdocuments.mx/reader031/viewer/2022022006/5aca176d7f8b9a51678da9bd/html5/thumbnails/43.jpg)
Our model solution: treat language identification as a character-based sequence to sequence task.
13
Encoder
Decoder
Je vais commander à emporter. I’m too lazy to cook.Jaech et al. 2016; Samih et al. 2016
![Page 44: Incorporating Dialectal Variability for Socially Equitable ...jurgens/docs/jurgens-tsvetkov-jurafsky... · Incorporating Dialectal Variability for Socially Equitable ... Dialect Less](https://reader031.vdocuments.mx/reader031/viewer/2022022006/5aca176d7f8b9a51678da9bd/html5/thumbnails/44.jpg)
Our model solution: treat language identification as a character-based sequence to sequence task.
13
Encoder
Decoder
Je vais commander à emporter. I’m too lazy to cook.Jaech et al. 2016; Samih et al. 2016
Represents a multi-layer recurrent neural network
![Page 45: Incorporating Dialectal Variability for Socially Equitable ...jurgens/docs/jurgens-tsvetkov-jurafsky... · Incorporating Dialectal Variability for Socially Equitable ... Dialect Less](https://reader031.vdocuments.mx/reader031/viewer/2022022006/5aca176d7f8b9a51678da9bd/html5/thumbnails/45.jpg)
Our model solution: treat language identification as a character-based sequence to sequence task.
13
Encoder
Decoder
Je vais commander à emporter. I’m too lazy to cook.J e _ o k .…
Jaech et al. 2016; Samih et al. 2016
Represents a multi-layer recurrent neural network
![Page 46: Incorporating Dialectal Variability for Socially Equitable ...jurgens/docs/jurgens-tsvetkov-jurafsky... · Incorporating Dialectal Variability for Socially Equitable ... Dialect Less](https://reader031.vdocuments.mx/reader031/viewer/2022022006/5aca176d7f8b9a51678da9bd/html5/thumbnails/46.jpg)
Our model solution: treat language identification as a character-based sequence to sequence task.
13
Encodes the whole sentence using its charactersEncoder
Decoder
Je vais commander à emporter. I’m too lazy to cook.J e _ o k .…
Jaech et al. 2016; Samih et al. 2016
Represents a multi-layer recurrent neural network
![Page 47: Incorporating Dialectal Variability for Socially Equitable ...jurgens/docs/jurgens-tsvetkov-jurafsky... · Incorporating Dialectal Variability for Socially Equitable ... Dialect Less](https://reader031.vdocuments.mx/reader031/viewer/2022022006/5aca176d7f8b9a51678da9bd/html5/thumbnails/47.jpg)
Our model solution: treat language identification as a character-based sequence to sequence task.
13
Encodes the whole sentence using its charactersEncoder
Decoder
Je vais commander à emporter. I’m too lazy to cook.J e _ o k .…
Decode each word’s language from the sentence encoding
Jaech et al. 2016; Samih et al. 2016
Represents a multi-layer recurrent neural network
![Page 48: Incorporating Dialectal Variability for Socially Equitable ...jurgens/docs/jurgens-tsvetkov-jurafsky... · Incorporating Dialectal Variability for Socially Equitable ... Dialect Less](https://reader031.vdocuments.mx/reader031/viewer/2022022006/5aca176d7f8b9a51678da9bd/html5/thumbnails/48.jpg)
Our model solution: treat language identification as a character-based sequence to sequence task.
13
Encodes the whole sentence using its characters
Fra Fra Fra Fra Fra . Eng Eng Eng Eng Eng .
Encoder
Decoder
Je vais commander à emporter. I’m too lazy to cook.J e _ o k .…
Decode each word’s language from the sentence encoding
Jaech et al. 2016; Samih et al. 2016
Represents a multi-layer recurrent neural network
![Page 49: Incorporating Dialectal Variability for Socially Equitable ...jurgens/docs/jurgens-tsvetkov-jurafsky... · Incorporating Dialectal Variability for Socially Equitable ... Dialect Less](https://reader031.vdocuments.mx/reader031/viewer/2022022006/5aca176d7f8b9a51678da9bd/html5/thumbnails/49.jpg)
14
Equilid vs off-the-shelf
Lui et al. 2013, 2014
Our M
etho
d
![Page 50: Incorporating Dialectal Variability for Socially Equitable ...jurgens/docs/jurgens-tsvetkov-jurafsky... · Incorporating Dialectal Variability for Socially Equitable ... Dialect Less](https://reader031.vdocuments.mx/reader031/viewer/2022022006/5aca176d7f8b9a51678da9bd/html5/thumbnails/50.jpg)
14
0
25
50
75
100
70 Languages on Twitter
Mac
ro F
1
langi
d.py CL
D2Ou
r Met
hod
Equilid vs off-the-shelf
Lui et al. 2013, 2014
Our M
etho
d
![Page 51: Incorporating Dialectal Variability for Socially Equitable ...jurgens/docs/jurgens-tsvetkov-jurafsky... · Incorporating Dialectal Variability for Socially Equitable ... Dialect Less](https://reader031.vdocuments.mx/reader031/viewer/2022022006/5aca176d7f8b9a51678da9bd/html5/thumbnails/51.jpg)
14
0
25
50
75
100
70 Languages on Twitter
Mac
ro F
1
langi
d.py CL
D2Ou
r Met
hod
0
25
50
75
100
Geo-diverse Tweets
Mac
ro F
1
langi
d.py
CLD2 Ou
r Met
hod
Equilid vs off-the-shelf
Lui et al. 2013, 2014
Our M
etho
d
![Page 52: Incorporating Dialectal Variability for Socially Equitable ...jurgens/docs/jurgens-tsvetkov-jurafsky... · Incorporating Dialectal Variability for Socially Equitable ... Dialect Less](https://reader031.vdocuments.mx/reader031/viewer/2022022006/5aca176d7f8b9a51678da9bd/html5/thumbnails/52.jpg)
14
0
25
50
75
100
70 Languages on Twitter
Mac
ro F
1
langi
d.py CL
D2Ou
r Met
hod
0
25
50
75
100
Geo-diverse Tweets
Mac
ro F
1
langi
d.py
CLD2 Ou
r Met
hod
0
25
50
75
100
Multilingual Tweets
Mac
ro F
1
Polyg
lot CLD2
Equilid vs off-the-shelf
Lui et al. 2013, 2014
Our M
etho
d
![Page 53: Incorporating Dialectal Variability for Socially Equitable ...jurgens/docs/jurgens-tsvetkov-jurafsky... · Incorporating Dialectal Variability for Socially Equitable ... Dialect Less](https://reader031.vdocuments.mx/reader031/viewer/2022022006/5aca176d7f8b9a51678da9bd/html5/thumbnails/53.jpg)
15
Equilid even outperforms system specifically tuned for each dataset
0
50
100
70 Languages on Twitter
9291.2M
acro
F1
Our M
etho
d
Mac
ro F
1
langi
d.py CL
D2Ou
r Met
hod
0
50
100
TweetLID
79.678.7
Jaec
h et
al. (
2016
)
Our M
etho
d
Jaec
h et
al. (
2016
)
![Page 54: Incorporating Dialectal Variability for Socially Equitable ...jurgens/docs/jurgens-tsvetkov-jurafsky... · Incorporating Dialectal Variability for Socially Equitable ... Dialect Less](https://reader031.vdocuments.mx/reader031/viewer/2022022006/5aca176d7f8b9a51678da9bd/html5/thumbnails/54.jpg)
Case Study: Do our solutions provide socially-equitable language identification for
health-related queries?
16
1M Tweets with any of 385 English terms from
established lexicons for influenza, psychological well-
being, and social health
![Page 55: Incorporating Dialectal Variability for Socially Equitable ...jurgens/docs/jurgens-tsvetkov-jurafsky... · Incorporating Dialectal Variability for Socially Equitable ... Dialect Less](https://reader031.vdocuments.mx/reader031/viewer/2022022006/5aca176d7f8b9a51678da9bd/html5/thumbnails/55.jpg)
Case Study: Do our solutions provide socially-equitable language identification for
health-related queries?
Lamb et al., (2013); Smith et al., (2016); Preotiuc-Pietro et al., (2015); Park et al., (2016)16
1M Tweets with any of 385 English terms from
established lexicons for influenza, psychological well-
being, and social health
![Page 56: Incorporating Dialectal Variability for Socially Equitable ...jurgens/docs/jurgens-tsvetkov-jurafsky... · Incorporating Dialectal Variability for Socially Equitable ... Dialect Less](https://reader031.vdocuments.mx/reader031/viewer/2022022006/5aca176d7f8b9a51678da9bd/html5/thumbnails/56.jpg)
Case Study: Do our solutions provide socially-equitable language identification for
health-related queries?
Lamb et al., (2013); Smith et al., (2016); Preotiuc-Pietro et al., (2015); Park et al., (2016)16
Task: does the language identification system recognize every tweet as English?
1M Tweets with any of 385 English terms from
established lexicons for influenza, psychological well-
being, and social health
![Page 57: Incorporating Dialectal Variability for Socially Equitable ...jurgens/docs/jurgens-tsvetkov-jurafsky... · Incorporating Dialectal Variability for Socially Equitable ... Dialect Less](https://reader031.vdocuments.mx/reader031/viewer/2022022006/5aca176d7f8b9a51678da9bd/html5/thumbnails/57.jpg)
Equilid raises the bar for socially-equitable language identification
17
Human Development Index of text’s origin country
Estim
ated
acc
urac
y fo
r En
glish
twee
ts
![Page 58: Incorporating Dialectal Variability for Socially Equitable ...jurgens/docs/jurgens-tsvetkov-jurafsky... · Incorporating Dialectal Variability for Socially Equitable ... Dialect Less](https://reader031.vdocuments.mx/reader031/viewer/2022022006/5aca176d7f8b9a51678da9bd/html5/thumbnails/58.jpg)
Social Equality doesn’t stop at Language Identification
18
Methodologies capable of handling
language as it is used
Better social representation in
our data
![Page 59: Incorporating Dialectal Variability for Socially Equitable ...jurgens/docs/jurgens-tsvetkov-jurafsky... · Incorporating Dialectal Variability for Socially Equitable ... Dialect Less](https://reader031.vdocuments.mx/reader031/viewer/2022022006/5aca176d7f8b9a51678da9bd/html5/thumbnails/59.jpg)
Social Equality doesn’t stop at Language Identification
18
Methodologies capable of handling
language as it is used
Better social representation in
our data
![Page 60: Incorporating Dialectal Variability for Socially Equitable ...jurgens/docs/jurgens-tsvetkov-jurafsky... · Incorporating Dialectal Variability for Socially Equitable ... Dialect Less](https://reader031.vdocuments.mx/reader031/viewer/2022022006/5aca176d7f8b9a51678da9bd/html5/thumbnails/60.jpg)
19
David Jurgens, Yulia Tsvetkov, and Dan Jurafsky
Be equitable! https://github.com/davidjurgens/equilid