on two existing approaches to statistical analysis of social media … · 27th october 2018 the two...

11
On two existing approaches to statistical analysis of social media data Martina Patone Li-Chun Zhang 27th October 2018

Upload: others

Post on 23-Aug-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: On two existing approaches to statistical analysis of social media … · 27th October 2018 The two existing approaches Approach 1. Approach 2. 1. The case of the SMI (Daas et al

On two existing approaches to statisticalanalysis of social media data

Martina PatoneLi-Chun Zhang

27th October 2018

Page 2: On two existing approaches to statistical analysis of social media … · 27th October 2018 The two existing approaches Approach 1. Approach 2. 1. The case of the SMI (Daas et al

The two existing approaches

Approach 1. Approach 2.

1. The case of the SMI (Daas et al. 2015):Can the SMI be used to replace the CCI?

2. Inferring users’ residency via tweets (Swier et al. 2015):Can geolocalised tweets be used to extract the residential addressof an user?

Page 3: On two existing approaches to statistical analysis of social media … · 27th October 2018 The two existing approaches Approach 1. Approach 2. 1. The case of the SMI (Daas et al

The two existing approaches

Approach 1. Approach 2.

1. The case of the SMI (Daas et al. 2015):Can the SMI be used to replace the CCI?

2. Inferring users’ residency via tweets (Swier et al. 2015):Can geolocalised tweets be used to extract the residential addressof an user?

Page 4: On two existing approaches to statistical analysis of social media … · 27th October 2018 The two existing approaches Approach 1. Approach 2. 1. The case of the SMI (Daas et al

The two existing approaches

Approach 1. Approach 2.

1. The case of the SMI (Daas et al. 2015):Can the SMI be used to replace the CCI?

2. Inferring users’ residency via tweets (Swier et al. 2015):Can geolocalised tweets be used to extract the residential addressof an user?

Page 5: On two existing approaches to statistical analysis of social media … · 27th October 2018 The two existing approaches Approach 1. Approach 2. 1. The case of the SMI (Daas et al

Representation

Connection only between sP and s∗ = b(a(sP )) ∩ U

1. One-phase: sP ⊆ P the observed set, U Dutch households;

2. Two-phase: s∗ ⊆ U the observed set, U UK residents.

Page 6: On two existing approaches to statistical analysis of social media … · 27th October 2018 The two existing approaches Approach 1. Approach 2. 1. The case of the SMI (Daas et al

Measurement

1. One-phase: zj , j ∈ sP the observed sentiment for post j;

2. Two-phase: y∗i = τ(zj , j ∈ Pi) the observed address of theanchor point for person i ∈ s∗.

Page 7: On two existing approaches to statistical analysis of social media … · 27th October 2018 The two existing approaches Approach 1. Approach 2. 1. The case of the SMI (Daas et al

One-phase: formal interpretation of Daas et al.(2015)

One-phase: use the observed data (zj , sP,t) to aim at thesame parameter θ = θ(yU )

θ = ξ(zsP,t) = θ(yU)

SMIt = ξ(zsP,t)

SMIt = ξt + dt

E(dt) = 0, V (dt) = η2t (≈ 0)

CCIt = θ(yst)

CCIt = θt + et

et ∼ N (0, σ2t )

Can the SMI replace the CCI?: θt = ξt

Page 8: On two existing approaches to statistical analysis of social media … · 27th October 2018 The two existing approaches Approach 1. Approach 2. 1. The case of the SMI (Daas et al

One-phase: statistical validation

Test: H0 : θt − ξt = µ vs. H1 : θt − ξt 6= µ;

Under H0: Xt = CCIt − SMIt = µ+ et with et ∼ N (0, σ2t );

p-value exceeds 0.05 for cv > 0.367

Page 9: On two existing approaches to statistical analysis of social media … · 27th October 2018 The two existing approaches Approach 1. Approach 2. 1. The case of the SMI (Daas et al

Two-phase: formal interpretation of Swier et al.(2015)

Two-phase: transform the social media dataset (zj , sP ) in apseudo-survey dataset (y∗i , s

∗)

sPa−→ sA

b−→ s∗ ⊂ U representation

zj −→ y∗i ( 6= yi), j ∈ Pi measurement

Statistical analysis is performed on (y∗i , s∗)

Page 10: On two existing approaches to statistical analysis of social media … · 27th October 2018 The two existing approaches Approach 1. Approach 2. 1. The case of the SMI (Daas et al

Two-phase: data quality1st phase: social media dataset

Representation Measurement

2nd phase: pseudo-survey datasetRepresentation Measurement

Page 11: On two existing approaches to statistical analysis of social media … · 27th October 2018 The two existing approaches Approach 1. Approach 2. 1. The case of the SMI (Daas et al

One-phase and Two-phase: discussion

One-phase (zj , sP ) Two-phase (y∗i , s∗)

(zj , P ) non-probability sample not of interestsp ⊂ P unknown selection

(y∗i , s∗) need weighting feasible

(yi, s∗) need weighting feasible

measurement consideration measurement consideration

(y∗i , U) test parameters non-probability sampleneed survey data s∗ ⊂ U unknown selection

(yi, U) test parameters non-probability sampleneed survey data s∗ ⊂ U unknown selection

measurement consideration