on two existing approaches to statistical analysis of social media … · 27th october 2018 the two...
TRANSCRIPT
![Page 1: On two existing approaches to statistical analysis of social media … · 27th October 2018 The two existing approaches Approach 1. Approach 2. 1. The case of the SMI (Daas et al](https://reader035.vdocuments.mx/reader035/viewer/2022063015/5fd1c54344ac71252f029c0b/html5/thumbnails/1.jpg)
On two existing approaches to statisticalanalysis of social media data
Martina PatoneLi-Chun Zhang
27th October 2018
![Page 2: On two existing approaches to statistical analysis of social media … · 27th October 2018 The two existing approaches Approach 1. Approach 2. 1. The case of the SMI (Daas et al](https://reader035.vdocuments.mx/reader035/viewer/2022063015/5fd1c54344ac71252f029c0b/html5/thumbnails/2.jpg)
The two existing approaches
Approach 1. Approach 2.
1. The case of the SMI (Daas et al. 2015):Can the SMI be used to replace the CCI?
2. Inferring users’ residency via tweets (Swier et al. 2015):Can geolocalised tweets be used to extract the residential addressof an user?
![Page 3: On two existing approaches to statistical analysis of social media … · 27th October 2018 The two existing approaches Approach 1. Approach 2. 1. The case of the SMI (Daas et al](https://reader035.vdocuments.mx/reader035/viewer/2022063015/5fd1c54344ac71252f029c0b/html5/thumbnails/3.jpg)
The two existing approaches
Approach 1. Approach 2.
1. The case of the SMI (Daas et al. 2015):Can the SMI be used to replace the CCI?
2. Inferring users’ residency via tweets (Swier et al. 2015):Can geolocalised tweets be used to extract the residential addressof an user?
![Page 4: On two existing approaches to statistical analysis of social media … · 27th October 2018 The two existing approaches Approach 1. Approach 2. 1. The case of the SMI (Daas et al](https://reader035.vdocuments.mx/reader035/viewer/2022063015/5fd1c54344ac71252f029c0b/html5/thumbnails/4.jpg)
The two existing approaches
Approach 1. Approach 2.
1. The case of the SMI (Daas et al. 2015):Can the SMI be used to replace the CCI?
2. Inferring users’ residency via tweets (Swier et al. 2015):Can geolocalised tweets be used to extract the residential addressof an user?
![Page 5: On two existing approaches to statistical analysis of social media … · 27th October 2018 The two existing approaches Approach 1. Approach 2. 1. The case of the SMI (Daas et al](https://reader035.vdocuments.mx/reader035/viewer/2022063015/5fd1c54344ac71252f029c0b/html5/thumbnails/5.jpg)
Representation
Connection only between sP and s∗ = b(a(sP )) ∩ U
1. One-phase: sP ⊆ P the observed set, U Dutch households;
2. Two-phase: s∗ ⊆ U the observed set, U UK residents.
![Page 6: On two existing approaches to statistical analysis of social media … · 27th October 2018 The two existing approaches Approach 1. Approach 2. 1. The case of the SMI (Daas et al](https://reader035.vdocuments.mx/reader035/viewer/2022063015/5fd1c54344ac71252f029c0b/html5/thumbnails/6.jpg)
Measurement
1. One-phase: zj , j ∈ sP the observed sentiment for post j;
2. Two-phase: y∗i = τ(zj , j ∈ Pi) the observed address of theanchor point for person i ∈ s∗.
![Page 7: On two existing approaches to statistical analysis of social media … · 27th October 2018 The two existing approaches Approach 1. Approach 2. 1. The case of the SMI (Daas et al](https://reader035.vdocuments.mx/reader035/viewer/2022063015/5fd1c54344ac71252f029c0b/html5/thumbnails/7.jpg)
One-phase: formal interpretation of Daas et al.(2015)
One-phase: use the observed data (zj , sP,t) to aim at thesame parameter θ = θ(yU )
θ = ξ(zsP,t) = θ(yU)
SMIt = ξ(zsP,t)
SMIt = ξt + dt
E(dt) = 0, V (dt) = η2t (≈ 0)
CCIt = θ(yst)
CCIt = θt + et
et ∼ N (0, σ2t )
Can the SMI replace the CCI?: θt = ξt
![Page 8: On two existing approaches to statistical analysis of social media … · 27th October 2018 The two existing approaches Approach 1. Approach 2. 1. The case of the SMI (Daas et al](https://reader035.vdocuments.mx/reader035/viewer/2022063015/5fd1c54344ac71252f029c0b/html5/thumbnails/8.jpg)
One-phase: statistical validation
Test: H0 : θt − ξt = µ vs. H1 : θt − ξt 6= µ;
Under H0: Xt = CCIt − SMIt = µ+ et with et ∼ N (0, σ2t );
p-value exceeds 0.05 for cv > 0.367
![Page 9: On two existing approaches to statistical analysis of social media … · 27th October 2018 The two existing approaches Approach 1. Approach 2. 1. The case of the SMI (Daas et al](https://reader035.vdocuments.mx/reader035/viewer/2022063015/5fd1c54344ac71252f029c0b/html5/thumbnails/9.jpg)
Two-phase: formal interpretation of Swier et al.(2015)
Two-phase: transform the social media dataset (zj , sP ) in apseudo-survey dataset (y∗i , s
∗)
sPa−→ sA
b−→ s∗ ⊂ U representation
zj −→ y∗i ( 6= yi), j ∈ Pi measurement
Statistical analysis is performed on (y∗i , s∗)
![Page 10: On two existing approaches to statistical analysis of social media … · 27th October 2018 The two existing approaches Approach 1. Approach 2. 1. The case of the SMI (Daas et al](https://reader035.vdocuments.mx/reader035/viewer/2022063015/5fd1c54344ac71252f029c0b/html5/thumbnails/10.jpg)
Two-phase: data quality1st phase: social media dataset
Representation Measurement
2nd phase: pseudo-survey datasetRepresentation Measurement
![Page 11: On two existing approaches to statistical analysis of social media … · 27th October 2018 The two existing approaches Approach 1. Approach 2. 1. The case of the SMI (Daas et al](https://reader035.vdocuments.mx/reader035/viewer/2022063015/5fd1c54344ac71252f029c0b/html5/thumbnails/11.jpg)
One-phase and Two-phase: discussion
One-phase (zj , sP ) Two-phase (y∗i , s∗)
(zj , P ) non-probability sample not of interestsp ⊂ P unknown selection
(y∗i , s∗) need weighting feasible
(yi, s∗) need weighting feasible
measurement consideration measurement consideration
(y∗i , U) test parameters non-probability sampleneed survey data s∗ ⊂ U unknown selection
(yi, U) test parameters non-probability sampleneed survey data s∗ ⊂ U unknown selection
measurement consideration