mental health surveillance over social media with digital cohorts · 2019-06-01 · mental health...

Proceedings of the Sixth Workshop on Computational Linguistics and Clinical Psychology, pages 114–120Minneapolis, Minnesota, June 6, 2019. c©2019 Association for Computational Linguistics

114

Mental Health Surveillance over Social Mediawith Digital Cohorts

Silvio Amir, Mark Dredze and John W. Ayers †

Center for Language & Speech Processing, Johns Hopkins University, Baltimore, MD†Division of Infectious Diseases & Global Public Health, University of California, La Jolla, CA

[email protected], [email protected], [email protected]

Abstract

The ability to track mental health conditionsvia social media opened the doors for large-scale, automated, mental health surveillance.However, inferring accurate population-leveltrends requires representative samples of theunderlying population, which can be challeng-ing given the biases inherent in social mediadata. While previous work has adjusted sam-ples based on demographic estimates, the pop-ulations were selected based on specific out-comes, e.g. specific mental health conditions.We depart from these methods, by conductinganalyses over demographically representativedigital cohorts of social media users. To vali-dated this approach, we constructed a cohort ofUS based Twitter users to measure the preva-lence of depression and PTSD, and investi-gate how these illnesses manifest across demo-graphic subpopulations. The analysis demon-strates that cohort-based studies can help con-trol for sampling biases, contextualize out-comes, and provide deeper insights into thedata.

1 Introduction

The ability of social media analysis to supportcomputational epidemiology and improve publichealth practices is well established (Culotta, 2010;Paul and Dredze, 2011; Salathe et al., 2012; Pauland Dredze, 2017). The field has seen partic-ular success around the diagnosis, quantificationand tracking of mental illnesses (Hao et al., 2013;Schwartz et al., 2014; Coppersmith et al., 2014a,2015a,c; Amir et al., 2017). These methods haveutilized social media (Coppersmith et al., 2014b;Kumar et al., 2015; De Choudhury et al., 2016),as well as other online data sources (Ayers et al.,2017, 2013, 2012; Arora et al., 2016), to obtainpopulation level estimates and trends around men-tal health topics.

Accurately estimating population-level trendsrequires obtaining representative samples of thegeneral population. However, social media hasmany well know biases, e.g. young adults tendbe over-represented (demographic bias). Yet, mostsocial media analyses tend ignore these issues, ei-ther by assuming that all the data is equally rele-vant, or by selecting data for specific outcome. Forexample, studying depression from users who talkabout depression instead of first selecting a pop-ulation and then measuring outcomes. Outcomebased data selection can also introduce biases,such as over-representing individuals vocal aboutthe topic of interest (self-selection bias). Con-sequently, trends or insights gleaned from theseanalyses might not be generalizable to the broaderpopulation.

Fortunately, these problems are well understoodin traditional health studies, and well-establishedtechniques from polling and survey-based re-search are routinely used to correct for these bi-ases. For example, medical studies frequently uti-lize a cohort based approach in which a group ispre-selected to study disease causes or to identifyconnections between risk factors and health out-comes (Prentice, 1986). We can replicate theseuniversally accepted approaches by conductinganalyses over digital cohorts of social media users,characterized with respect to key demographic at-tributes. In this work, we propose to use sucha social media based cohort for the purposes ofmental health surveillance. We developed a dig-ital cohort by sampling a large number of Twit-ter users at random (not based on outcomes), andthen using demographic inference techniques toinfer key demographics for the users namely, theage, gender, location and race/ethnicity. Then, weused the cohort to measure relative rates of bothdepression and PTSD, using supervised classifiersfor each mental health condition. The inferred de-

115

mographic information allowed us to observe cleardifferences in how these illnesses manifest in thepopulation. Moreover, the analysis demonstrateshow social media based cohort studies can help tocontrol for sampling biases and contextualize theoutcomes.

2 Methodology

We now briefly describe our approach for cohort-based studies over social media. A more detaileddescription of the proposed methodology will ap-pear in a forthcoming publication. Most workson social media analysis estimate trends by aggre-gating document-level signals inferred from arbi-trary (and biased) data samples selected to matcha predefined outcome. While some recent workhas begun incorporating demographic informationto contextualize analyses (Mandel et al., 2012;Mitchell et al., 2013; Huang et al., 2017, 2019) andto improve representativeness of the data (Copper-smith et al., 2015b; Dos Reis and Culotta, 2015),these studies still select on specific outcomes.

We depart from these works by constructing ademographically representative digital cohort ofsocial media users prior to the analyses, and thenconducting cohort-based studies over this pre-selected population. While a significant undertak-ing in most medical studies, the vast quantities ofavailable social media data make assembling so-cial media cohorts feasible. Such cohorts can beused to support longitudinal and cross-sectionalstudies, allowing experts to contextualize the out-comes, produce externally valid trends from inher-ently biased samples and extrapolate those trendsto a broader population. Similar strategies havebeen utilized in online surveys, which can havecomparable validity to other survey modalitiessimply by controlling for basic demographic fea-tures such as the location, age, ethnicity and gen-der (Duffy et al., 2005).

2.1 Building Digital Cohorts

Our cohort construction process entails two keysteps: first, randomly selecting a large sampleof Twitter users; and second, annotating thoseusers with key demographic attributes. Whilesuch attributes are not provided by the API,automated methods can be used to infer suchtraits from data (Cesare et al., 2017). Followingthis approach, we develop a demographic infer-ence pipeline to automatically infer age, gender,

race/ethnicity and location for each cohort candi-date.

Age Identifying age based on the content of auser can be challenging, and exact age often can-not be determined based on language use alone.Therefore, we use discrete categories that providea more accurate estimate of age: Teenager (below19), 20s, 30s, 40s, 50s (50 years or older).

Gender The gender was inferred using Demog-rapher, a supervised model that predicts the (bi-nary) gender of Twitter users with features basedon the name field on the user profile (Knowleset al., 2016).

Race/Ethnicity The standard formulation ofrace and ethnicity is not well understood by thegeneral public, so categorizing social media usersalong these two axes may not be reasonable.Therefore, we use a single measure of multi-cultural expression that includes five categories:White (W), Asian (A), Black (B), Hispanic (H),and Other.

Location The location was inferred using Car-men, an open-source library for geolocating tweetsthat uses a series of rules to lookup location stringsin a location knowledge-base (Dredze et al., 2013).We use the inferred location to select users thatlive in the United States.

The age and race/ethnicity attributes were in-ferred with custom supervised classifiers based onAmir et al. (2017)’s user-level model. The classi-fiers were trained and evaluated on a dataset of 5Kannotated users, attaining performances of 0.28and 0.41 Average F1, respectively. See the sup-plemental notes for additional details on these ex-periments1.

2.2 Mental Health Classifiers

We build on prior work on supervised models formental health inference over social media data.We focus on two mental health conditions — de-pression and PTSD — and develop classifiers withthe self-reported datasets created for CLPysch2015 (Mitchell et al., 2015; Coppersmith et al.,2015b). These labeled datasets derive from usersthat have publicly disclosed on Twitter a diag-nosis of depression (327 users) or PTSD (246users), with an equal number of randomly selecteddemographically-matched (with respect to age andgender) users as controls. For each user, the asso-

1https://samiroid.github.io/assets/demos.pdf

116

Figure 1: Demographics of the digital cohort.

ciated metadata and posting history was also col-lected — up to the 3000 most recent tweets, perlimitations of the Twitter API.

The participants of the task proposed a host ofmethods ranging from rule-based systems to vari-ous supervised models (Pedersen, 2015; Preotiuc-Pietro et al., 2015; Coppersmith et al., 2015b).More recently, the neural user-level classifier pro-posed by Amir et al. (2017) showed not onlygood performance on this task, but also the abilityto capture implicit similarities between users af-fected by the same diseases, thus opening the doorto more interpretable analyses2. Hence, we adopttheir model for this analysis.

3 Analysis

We constructed a cohort for our analysis by ran-domly selecting a sample of Twitter users and pro-cessing it with the aforementioned demographicinference pipeline. After discarding accounts fromusers located outside the United States, we ob-tained a cohort of 48K Twitter users with the de-mographic composition shown in Figure 1. Somedemographic groups are over-represented (e.g.young adults) while others are grossly under-represented (e.g. teenagers) which illustrates theneed for methodologies that can take these dispar-ities into account.

We then processed the cohort through themental-health classifiers to estimate the prevalenceof depression and PTSD, and examine how theseillnesses manifest across the population. The anal-ysis revealed that 30.2% of the cohort membersare likely to suffer from depression, 30.8% from

2a similar finding to Benton et al. (2017)

PTSD, and 20% from both. We observe a sig-nificant overlap between people affected by de-pression and PTSD, which is not surprising giventhat the comorbidity of these disorders is well-known, with approximately half of people withPTSD also having a diagnosis of major depressivedisorder (Flory and Yehuda, 2015).

How do these conditions affect different partsof the population? To answer this question, welooked at the affected users and measured how thedemographics of individual sub-populations dif-fer from those of the cohort as a whole. Fig-ures 2 and 3 show the estimates for depression,PTSD and both, controlled for the cohort demo-graphics. We observe large generational differ-ences — PTSD seems to be more prevalent amongolder people whereas depression affects predomi-nantly younger people. We also observe that inall cases Women are more susceptible than Men,and Blacks and Hispanics are more likely to be af-fected than Whites. This may represent a bias inthe underlying data used to construct the classi-fiers, or a difference in how social media is usedby different demographic groups. For example,models that were trained with a majority of datafrom White users maybe oversensitive to specificdialects used by other communities.

3.1 Discussion

Comparing our estimates with the current statis-tics provided by the NIH — a prevalence of 6.7%for depression3 and 3.6% for PTSD4 —, we cansee that ours are much higher. It should be notedhowever, that the NIH reports refers to Major De-pression episodes whereas our classifiers maybealso be sensitive to mild depressions which maynever be diagnosed as such. Moreover, these es-timates are not directly comparable since the NIHstatistics are outdated (the estimates are from 2003and 2015 for PTSD and depression, respectively)and our cohort was not adjusted to match thedemographics of the US population. Neverthe-less, it is worth noting that the relative prevalencerates, per demographic group, we obtained corre-late with the NIH reports. For example, we ob-serve similar distributions in terms of age and gen-

3https://www.nimh.nih.gov/health/statistics/major-depression.shtml

4https://www.nimh.nih.gov/health/statistics/post-traumatic-stress-disorder-ptsd.shtml

117

(a) Depression(b) PTSD

Figure 2: The prevalence of two mental health conditions in the cohort.

der. However, we found that Blacks and Hispan-ics are more likely to be affected by mental ill-nesses, whereas the NIH reports a higher preva-lence among Whites.

One possible reason for these disparities is thatracial minorities are more likely to come fromcommunities with lower education rates and so-cioeconomic status (SES), and to be in a posi-tion where they lack proper health coverage andmental-health care. Reports from the NIH andother US governmental agencies show that 46.3%of Whites suffering from a mental-illness weresubjected to some form treatment, but this wascase for only 29.8% of Blacks and 27.3% of His-panics5. There may also be a bias in reportingwithin different racial and ethnic groups, as preva-lence estimates can be biased by access to mentalhealth care and social stigma. Recent studies showthat factors such as discrimination and perceivedinequality have a stronger influence on mental-health than it was previously supposed, even whencontrolling for the SES (Budhwani et al., 2015).Others have found that acute and chronic discrimi-nation causes racial disparities in health to be evenmore pronounced at the upper ends of the socioe-conomic spectrum. One of the reasons being thatfor Whites, improvements in SES result in im-proved health and significantly less exposure todiscrimination, whereas for Blacks and Hispan-ics upwards mobility significantly increases thelikelihood of discrimination and unfair treatment,

5https://www.integration.samhsa.gov/MHServicesUseAmongAdults.pdf

Figure 3: Depression and PTSD

as they move into predominantly White neighbor-hoods and work environments (Colen et al., 2017).

While an in-depth analysis of this issue is be-yond the scope of this work, these results suggestthat it deserves further investigation. A follow-up study to investigate the role of discriminationin mental-health could be conducted by addinga model to identify users who reported instancesof discrimination and compare the prevalence ofmental-illness with a control group.

4 Conclusions

We have presented the first cohort based study ofmental health trends on Twitter. Instead of con-ducting the analysis over arbitrary data samplesselected to match a given outcome, we first devel-oped a digital cohort of social media users char-

118

acterized with respect to key demographic traits.We used this cohort to measure relative rates ofdepression and PTSD, and examine how these ill-nesses affect different demographic strata. Theability to disaggregate the estimates per demo-graphic group allowed us to observe clear differ-ences in how these illnesses manifest across dif-ferent parts of the population — something thatwould not be possible with typical social mediaanalysis methodologies. This brings social mediaanalysis methodologies closer to universally ac-cepted practices in surveillance based research.

Information about how different sub-populations perceive or are affected by certainhealth issues, could also improve public healthpolicies and inform intervention campaignstargeted for different demographics. Moreover,the fact that some of our estimates correlate withstatistics obtained through traditional method-ologies suggests that this might be a promisingapproach to complement current epidemiologypractices. Indeed, this opens the door to moreresponsive and deliberate public health interven-tions, and allow experts to track the progressor the effects of targeted interventions, in nearreal-time.

4.1 Privacy and Ethical Considerations

The majority of social media analysis approachestry to extract signals from individual posts andthus do not need to record any personal informa-tion. However, as we start moving towards user-level analyses, we are collecting and storing com-plete records of social media users communica-tions. Even though this information is publiclyavailable, people might not be consciously awareof the implications of sharing all their data andcertainly have not given explicit consent for theirdata to be analyzed in aggregate. This is evenmore pertinent for analyses involving sensitive in-formation (e.g. health related issues). As it hasbeen demonstrated by the recent incidents involv-ing companies inadvertently sharing or failing toprotect users personal data, there is a serious dan-ger of abuse and exploitation for systems that col-lect and store large amounts of personal data.

Even though this is in large part an ethicalquestion, there are technical solutions that can beused to partially address this issue. One is to useanonymization techniques to obfuscate any detailsthat allow third parties (even analysts) to identify

the individuals that are involved in the study. An-other is to store only abstract representations —which can still be updated and consumed by pre-dictive models — , and discard the actual con-tent. In regards to consent, there are initiatives tosupport voluntary data donation for research pur-poses, e.g. the Our Data Helps program6.

ReferencesSilvio Amir, Glen Coppersmith, Paula Carvalho,

Mario J. Silva, and Bryon C. Wallace. 2017. Quan-tifying mental health from social media with neu-ral user embeddings. In Proceedings of the 2ndMachine Learning for Healthcare Conference, vol-ume 68 of Proceedings of Machine Learning Re-search, pages 306–321, Boston, Massachusetts.PMLR.

Vischal S Arora, David Stuckler, and Martin Mckee.2016. Tracking search engine queries for suicidein the united kingdom, 2004–2013. Public health,137:147–153.

John W Ayers, Benjamin M Althouse, Jon-PatrickAllem, Matthew A Childers, Waleed Zafar, CarlLatkin, Kurt M Ribisl, and John S Brownstein. 2012.Novel surveillance of psychological distress duringthe great recession. Journal of affective disorders,142(1-3):323–330.

John W Ayers, Benjamin M Althouse, Jon-PatrickAllem, J Niels Rosenquist, and Daniel E Ford. 2013.Seasonality in seeking mental health information ongoogle. American journal of preventive medicine,44(5):520–525.

John W Ayers, Benjamin M Althouse, Eric C Leas,Mark Dredze, and Jon-Patrick Allem. 2017. Internetsearches for suicide following the release of 13 rea-sons why. JAMA internal medicine, 177(10):1527–1529.

Adrian Benton, Margaret Mitchell, and Dirk Hovy.2017. Multitask learning for mental health condi-tions with limited social media data. In Proceedingsof the 15th Conference of the European Chapter ofthe Association for Computational Linguistics: Vol-ume 1, Long Papers, volume 1, pages 152–162.

Henna Budhwani, Kristine Ria Hearld, and DanielChavez-Yenter. 2015. Depression in racial and eth-nic minorities: the impact of nativity and discrimina-tion. Journal of racial and ethnic health disparities,2(1):34–42.

Nina Cesare, Christan Grant, and Elaine O Nsoesie.2017. Detection of user demographics on social me-dia: A review of methods and recommendations forbest practices. arXiv preprint arXiv:1702.01807, 0.

6https://ourdatahelps.org/

119

Cynthia G Colen, David M Ramey, Elizabeth C Cook-sey, and David R Williams. 2017. Racial disparitiesin health among nonpoor african americans and his-panics: the role of acute and chronic discrimination.Social Science & Medicine, 0.

Glen Coppersmith, Mark Dredze, and Craig Harman.2014a. Quantifying mental health signals in twitter.In Proceedings of the Workshop on ComputationalLinguistics and Clinical Psychology: From Linguis-tic Signal to Clinical Reality, pages 51–60. Associa-tion for Computational Linguistics.

Glen Coppersmith, Mark Dredze, Craig Harman, andKristy Hollingshead. 2015a. From ADHD to SAD:Analyzing the language of mental health on twit-ter through self-reported diagnoses. In Proceedingsof the 2nd Workshop on Computational Linguisticsand Clinical Psychology: From Linguistic Signal toClinical Reality, pages 1–10. Association for Com-putational Linguistics.

Glen Coppersmith, Mark Dredze, Craig Harman,Kristy Hollingshead, and Margaret Mitchell. 2015b.Clpsych 2015 shared task: Depression and ptsd ontwitter. In Proceedings of the 2nd Workshop onComputational Linguistics and Clinical Psychology:From Linguistic Signal to Clinical Reality, pages31–39. Association for Computational Linguistics.

Glen A Coppersmith, Mark Dredze, Craig Harman,and Kristy Hollingshead. 2015c. From adhd tosad: analyzing the language of mental health ontwitter through self-reported diagnoses. In NAACLWorkshop on Computational Linguistics and Clini-cal Psychology, pages 1–10.

Glen A Coppersmith, Craig Harman, and Mark Dredze.2014b. Measuring post traumatic stress disorder intwitter. In International Conference on Weblogs andSocial Media (ICWSM), pages 579–582.

Aron Culotta. 2010. Towards detecting influenza epi-demics by analyzing Twitter messages. In Proceed-ings of the First Workshop on Social Media Ana-lytics, SOMA ’10, pages 115–122, New York, NY,USA. ACM.

Munmun De Choudhury, Emre Kiciman, Mark Dredze,Glen Coppersmith, and Mrinal Kumar. 2016. Dis-covering shifts to suicidal ideation from mentalhealth content in social media. In Proceedings of the2016 CHI Conference on Human Factors in Com-puting Systems, CHI ’16, pages 2098–2110, NewYork, NY, USA. ACM.

Virgile Landeiro Dos Reis and Aron Culotta. 2015. Us-ing matched samples to estimate the effects of exer-cise on mental health from twitter. In Proceedings ofthe Twenty-Ninth AAAI Conference on Artificial In-telligence, AAAI’15, pages 182–188. AAAI Press.

Mark Dredze, Michael J Paul, Shane Bergsma, andHieu Tran. 2013. Carmen: A Twitter geolocationsystem with applications to public health. In AAAI

Workshop on Expanding the Boundaries of HealthInformatics Using AI (HIAI).

Bobby Duffy, Kate Smith, George Terhanian, and JohnBremer. 2005. Comparing data from online andface-to-face surveys. International Journal of Mar-ket Research, 47(6):615.

Janine D Flory and Rachel Yehuda. 2015. Comorbid-ity between post-traumatic stress disorder and ma-jor depressive disorder: alternative explanations andtreatment considerations. Dialogues in clinical neu-roscience, 17(2):141.

Bibo Hao, Lin Li, Ang Li, and Tingshao Zhu. 2013.Predicting mental health status on social media. InCross-Cultural Design. Cultural Differences in Ev-eryday Life.CCD 2013, Lecture Notes in ComputerScience, pages 101–110. Springer, Berlin, Heidel-berg.

Xiaolei Huang, Michael Smith, Michael Paul, DmytroRyzhkov, Sandra Quinn, David Broniatowski, andMark Dredze. 2017. Examining patterns of in-fluenza vaccination in social media. In AAAI Work-shops.

Xiaolei Huang, Michael C Smith, Amelia M Jami-son, David A Broniatowski, Mark Dredze, San-dra Crouse Quinn, Justin Cai, and Michael JPaul. 2019. Can online self-reports assist inreal-time identification of influenza vaccination up-take? a cross-sectional study of influenza vaccine-related tweets in the usa, 2013–2017. BMJ open,9(1):e024018.

Rebecca Knowles, Josh Carroll, and Mark Dredze.2016. Demographer: Extremely simple name de-mographics. In Proceedings of the First Workshopon NLP and Computational Social Science, pages108–113.

Mrinal Kumar, Mark Dredze, Glen Coppersmith, andMunmun De Choudhury. 2015. Detecting changesin suicide content manifested in social media fol-lowing celebrity suicides. In Proceedings of the26th ACM conference on Hypertext and hypermedia.ACM.

Benjamin Mandel, Aron Culotta, John Boulahanis,Danielle Stark, Bonnie Lewis, and Jeremy Rodrigue.2012. A demographic analysis of online sentimentduring hurricane irene. In Proceedings of the 2ndWorkshop on Language in Social Media, pages 27–36. Association for Computational Linguistics.

Lewis Mitchell, Kameron Decker Harris, Morgan RFrank, Peter Sheridan Dodds, and Christopher MDanforth. 2013. The geography of happiness:connecting Twitter sentiment and expression, de-mographics, and objective characteristics of place.PLOS ONE, 8(5).

Margaret Mitchell, Glen Coppersmith, and KristyHollingshead, editors. 2015. Proceedings of the

120

Workshop on Computational Linguistics and Clin-ical Psychology: From Linguistic Signal to ClinicalReality. North American Association for Computa-tional Linguistics, Denver, Colorado, USA.

Michael J Paul and Mark Dredze. 2011. You are whatyou tweet: Analyzing Twitter for public health. InProceedings of the 5th International AAAI Confer-ence on Weblogs and Social Media, ICWSM11. As-sociation for the Advancement of Artificial Intelli-gence.

Michael J. Paul and Mark Dredze. 2017. Social Moni-toring for Public Health. Morgan & Claypool Pub-lishers.

Ted Pedersen. 2015. Screening Twitter users for de-pression and PTSD with lexical decision lists. InProceedings of the Workshop on Computational Lin-guistics and Clinical Psychology: From LinguisticSignal to Clinical Reality, pages 46–53.

Ross L Prentice. 1986. A case-cohort design for epi-demiologic cohort studies and disease prevention tri-als. Biometrika, 73(1):1–11.

Daniel Preotiuc-Pietro, Maarten Sap, H AndrewSchwartz, and LH Ungar. 2015. Mental illnessdetection at the world well-being project for theclpsych 2015 shared task. NAACL HLT 2015,page 40.

Marcel Salathe, Linus Bengtsson, Todd J Bodnar, De-von D Brewer, John S Brownstein, Caroline Buc-kee, Ellsworth M Campbell, Ciro Cattuto, ShashankKhandelwal, Patricia L Mabry, et al. 2012. Dig-ital epidemiology. PLoS computational biology,8(7):e1002616.

H. Andrew Schwartz, Johannes Eichstaedt, Margaret L.Kern, Gregory Park, Maarten Sap, David Stillwell,Michal Kosinski, and Lyle Ungar. 2014. Towardsassessing changes in degree of depression throughFacebook. In Proceedings of the Workshop onComputational Linguistics and Clinical Psychology:From Linguistic Signal to Clinical Reality, pages118–125. Association for Computational Linguis-tics.

mental health surveillance over social media with digital cohorts · 2019-06-01 · mental health...

Documents