subconscious crowdsourcing: a feasible data collection mechanism for mental disorder detection on...
Post on 13-Apr-2017
53 Views
Preview:
TRANSCRIPT
Subconscious Crowdsourcing: A Feasible Data Collection Mechanism for Mental
Disorder Detection on Social Media
Chun-Hao Chang, Elvis Saravia, and Yi-Shin Chen
Institute of Information Systems and Applications
National Tsing Hua University
Hsinchu, Taiwan 30013, R.O.C.
Email: { ccha97u, ellfae, yishin}@gmail.com
1
Introduction
➔ One in three persons report sufficient criteria for at least one form of mental disorder at some point in their life.
➔ 16% in US suffer from some form of mental disorder. The leading cause of disability worldwide.
➔ Problem: Majority of cases remain largely undetected. Diagnosis is difficult.
➔ Solution: Social networks provide a venue for mental disorder research.
Source: Wikipedia 2
Background
Bipolar Disorder:
- Unstable and impulsive emotions- Cycling between mania and depression
Borderline Personality Disorder:
- Unstable and impulsive emotions- Impaired social interactions
3
Motivation
➔ Open access to patients data from social websites.
➔ Build a real-time mental health assessment tool to assist in diagnosis.
4
Related Work
➔ Predicting Depression via Social Media - Microsoft (M De Choudhury, M Gamon, S Counts, E Horvitz - ICWSM, 2013)
1. Collected data using crowdsourcing platform, Amazon Mechanical Turk.2. Purchased Twitter data.3. Prediction of depression before diagnosis.
➔ Quantifying Mental Health Signals in Twitter - John Hopkins University (Coppersmith, G., Dredze, M., & Harman, C. (2014))
1. Automatically collected patients by keyword matching (e.g., “I was diagnosed with X”).2. Predicts 4 different kinds of mental disorders.
Limitation: Data not easily accessible or reproduced.
5
Challenges
➔ How to identify online patients?
➔ How to efficiently collect patients data?
➔ Avoid selection bias - Is the predictive model detecting patients with mental illnesses or just people talk about it?
6
Objectives
➔ To build predictive models for the purpose of mental disorder detection.
➔ To extract features which alleviate the selection bias problem.
➔ Standardize features for mental disorder detection.
7
Data Collection
➔ Subconscious crowdsourcing - a reliable and efficient mechanism to gather patients data. Community is the key element.
Therapist
Patients
9
Preprocessing
➔ Twitter accounts with more than 100 posts
➔ Accounts with more than 50% hyperlinks were also removed
Purpose: Getting rid of spam accounts.
10
Feature Extraction
➔ Overall, we are interested in linguistic and behavioural features.
➔ Information that reveals a user’s personality and behavior: emotion transition, social interactions, age, gender, etc.
➔ TF-IDF, LIWC, and Pattern of Life Features
11
Features
➔ TF-IDF Model:◆ Unigrams and bigrams
➔ LIWC (Linguistic Inquiry and Word Count):◆ Thoughts, feeling, personality and motivation
➔ Pattern of Life:◆ Emotional scores, age, and gender◆ Polarity features (negative ratio, positive ratio, positive combo,
negative combo, and flips ratio)◆ Social features (tweeting frequency, mention ratio, frequent
mentions, and unique mentions)
➔
12
Experiments: Data
Group Users Tweets Averaged Tweets
Random Samples 548 796957 1454.3
Bipolar Patients 278 347774 1250.99
BPD Patients 203 225774 1112.19
Bipolar Experts 11 14056 1611.67
BPD Experts 9 19696 1790.55
13
Experiments: Evaluation
➔ Three predictive models (Random Forest) for each mental disorder◆ Pattern of Life Model◆ TF-IDF Model◆ LIWC Model
➔ Three experiments◆ 10-Fold Cross Validation Test◆ Selection Bias Test◆ Limited Data Test
14
10-Fold Cross Validation
Pattern of Life 0.90
LIWC 0.91
TF-IDF 0.96
Pattern of Life 0.91
LIWC 0.90
TF-IDF 0.9615
Selection Bias TestIs model detecting user suffering from mental disorder or just talking about it?
Bipolar BPD
mentalhealth dbt
meds feeling
blog borderline
therapy helps
anxiety self harm
thoughts psychiatrist
feel better cpn
electroboyusa disorder
health bpdchat
bipolarblogger depression
Top TF-IDF terms16
Conclusion
➔ We proposed an efficient and accessible mechanism for collection patients data.
➔ We improved the Pattern of Life Model to produce better predictions.
➔ Address selection bias problem, previously not addressed.
Future work: Support more mental illnesses
18
top related