an interactive method for inferring demographic attributes
TRANSCRIPT
![Page 1: An interactive method for inferring demographic attributes](https://reader031.vdocuments.mx/reader031/viewer/2022030123/58a472c51a28aba34c8b5737/html5/thumbnails/1.jpg)
Valentina Beretta, Daniele Maccagnola, Timothy Cribbin and Enza Messina
University of Milano- Bicocca Brunel University
![Page 2: An interactive method for inferring demographic attributes](https://reader031.vdocuments.mx/reader031/viewer/2022030123/58a472c51a28aba34c8b5737/html5/thumbnails/2.jpg)
Gathering User Data in Social Media
• Social Media offer to social scientists an unprecedented opportunity for collecting data about people and their characteristics
2 TweetClass - Hypertext 2015 - Cyprus
Traditional survey methods
More reliable More expensive Much slower
Social Media Analytics
Less Reliable Huge amount of data Users share their views for free
VS
![Page 3: An interactive method for inferring demographic attributes](https://reader031.vdocuments.mx/reader031/viewer/2022030123/58a472c51a28aba34c8b5737/html5/thumbnails/3.jpg)
Many characteristics make SM based research more attractive:
• Collection of large datasets is fast and relatively cheap • SM users tend to comment in a responsive, ad hoc manner (their opinion is therefore more timely than “designed” research) • The perceived anonymity of SM lead to more “honest” responses
Gathering User Data in Social Media
• Social Media offer to social scientists an unprecedented opportunity for collecting data about people and their characteristics
2 TweetClass - Hypertext 2015 - Cyprus
Social Media Analytics
![Page 4: An interactive method for inferring demographic attributes](https://reader031.vdocuments.mx/reader031/viewer/2022030123/58a472c51a28aba34c8b5737/html5/thumbnails/4.jpg)
Demographic Data in Social Media
• However, a key barrier in SM data collection is the absence of explicit or reliable demographic attribute data (specifically age and gender) • Without ready demographic data, researchers make subjective judgements by explaining qualitative characteristics of the users (their post content or visual profile)
• This is very time consuming!
• Automatic methods can be used, but they are not always reliable
3 TweetClass - Hypertext 2015 - Cyprus
![Page 5: An interactive method for inferring demographic attributes](https://reader031.vdocuments.mx/reader031/viewer/2022030123/58a472c51a28aba34c8b5737/html5/thumbnails/5.jpg)
TweetClass
A semi-automatic framework that combines automatic classification with an user interface for manually refining ambiguous cases
4 TweetClass - Hypertext 2015 - Cyprus
![Page 6: An interactive method for inferring demographic attributes](https://reader031.vdocuments.mx/reader031/viewer/2022030123/58a472c51a28aba34c8b5737/html5/thumbnails/6.jpg)
Outline
• Dataset Collection • Insights on TweetClass
• Automatic classification • Manual refinement
• Cognitive Walkthrough and Summative Evaluation
• First trial (performed by experts) • Results and improvement • Second trial (performed by general users) • Quantitative and qualitative results
5 TweetClass - Hypertext 2015 - Cyprus
![Page 7: An interactive method for inferring demographic attributes](https://reader031.vdocuments.mx/reader031/viewer/2022030123/58a472c51a28aba34c8b5737/html5/thumbnails/7.jpg)
Dataset Collection
• Considered classes: • Gender: Male / Female • Age: less than 30 years old / more than 30 years old
• Dataset collected from Twitter • Users labeling:
• Gender: manually determined • Age: automatical search + manual refinement
6 TweetClass - Hypertext 2015 - Cyprus
![Page 8: An interactive method for inferring demographic attributes](https://reader031.vdocuments.mx/reader031/viewer/2022030123/58a472c51a28aba34c8b5737/html5/thumbnails/8.jpg)
Automatic Classification
• Machine learning methods are not the main focus of this work
7 TweetClass - Hypertext 2015 - Cyprus
![Page 9: An interactive method for inferring demographic attributes](https://reader031.vdocuments.mx/reader031/viewer/2022030123/58a472c51a28aba34c8b5737/html5/thumbnails/9.jpg)
Automatic Classification
• Machine learning methods are not the main focus of this work
7 TweetClass - Hypertext 2015 - Cyprus
• Gender classification was based on the 40N database [Michael 2007]
• Three possible classes: F (female), M (male) and U (unknown) • Unknown users will be refined afterwards
![Page 10: An interactive method for inferring demographic attributes](https://reader031.vdocuments.mx/reader031/viewer/2022030123/58a472c51a28aba34c8b5737/html5/thumbnails/10.jpg)
Automatic Classification
• Machine learning methods are not the main focus of this work
7 TweetClass - Hypertext 2015 - Cyprus
• For age classification we tried several classifiers and chosen the best one based on their accuracy • Two models trained on separate datasets (only male and only female) perform better than a single model
• Gender classification was based on the 40N database [Michael 2007]
• Three possible classes: F (female), M (male) and U (unknown) • Unknown users will be refined afterwards
![Page 11: An interactive method for inferring demographic attributes](https://reader031.vdocuments.mx/reader031/viewer/2022030123/58a472c51a28aba34c8b5737/html5/thumbnails/11.jpg)
Refinement Phase
8 TweetClass - Hypertext 2015 - Cyprus
• Gender refinement is performed manually on users classified as “unknown”
![Page 12: An interactive method for inferring demographic attributes](https://reader031.vdocuments.mx/reader031/viewer/2022030123/58a472c51a28aba34c8b5737/html5/thumbnails/12.jpg)
Refinement Phase
8 TweetClass - Hypertext 2015 - Cyprus
• Gender refinement is performed manually on users classified as “unknown”
• The end-user is shown several data regarding the user to classify: • The user name, screen name and description • The user photo and background image • A subset of the tweets posted by the user
• The end-user then can select the most appropriate class
![Page 13: An interactive method for inferring demographic attributes](https://reader031.vdocuments.mx/reader031/viewer/2022030123/58a472c51a28aba34c8b5737/html5/thumbnails/13.jpg)
Refinement Phase
8 TweetClass - Hypertext 2015 - Cyprus
• Gender refinement is performed manually on users classified as “unknown”
![Page 14: An interactive method for inferring demographic attributes](https://reader031.vdocuments.mx/reader031/viewer/2022030123/58a472c51a28aba34c8b5737/html5/thumbnails/14.jpg)
Refinement Phase (cont.)
• For age refinement, we introduce a confidence level , that indicates how “confident” is the automatic classifier of its prediction
9 TweetClass - Hypertext 2015 - Cyprus
• The value of confidence level varies between 0.5 (no confidence at all) and 1 (complete confidence) • The end-users will refine only the age class of users whose confidence level is below a certain threshold
![Page 15: An interactive method for inferring demographic attributes](https://reader031.vdocuments.mx/reader031/viewer/2022030123/58a472c51a28aba34c8b5737/html5/thumbnails/15.jpg)
Refinement Phase (cont.)
• For age refinement, we introduce a confidence level , that indicates how “confident” is the automatic classifier of its prediction
9 TweetClass - Hypertext 2015 - Cyprus
![Page 16: An interactive method for inferring demographic attributes](https://reader031.vdocuments.mx/reader031/viewer/2022030123/58a472c51a28aba34c8b5737/html5/thumbnails/16.jpg)
Cognitive Walkthrough
• To understand if the interface we designed is intuitive and easy to use, we performed a formal evaluation using a method called cognitive walkthrough • The aim is to determine the ease with which naïve users are able to employ the UI to achieve their objectives at each step of the task • Special attention is payed to how well the interface supports “exploratory learning”, i.e. first-time use without formal training
10 TweetClass - Hypertext 2015 - Cyprus
![Page 17: An interactive method for inferring demographic attributes](https://reader031.vdocuments.mx/reader031/viewer/2022030123/58a472c51a28aba34c8b5737/html5/thumbnails/17.jpg)
Cognitive Walkthrough
• The usability analyst studied how the end-user progressed through the steps of TweetClass, and asked the following questions:
• Will the user try to achieve the right effect? • Will the user notice that the correct action is available? • Will the user associate the correct action with the effect to be achieved? • If the correct action is performed, will the user see that progress is being made?
11 TweetClass - Hypertext 2015 - Cyprus
![Page 18: An interactive method for inferring demographic attributes](https://reader031.vdocuments.mx/reader031/viewer/2022030123/58a472c51a28aba34c8b5737/html5/thumbnails/18.jpg)
Cognitive Walkthrough (trial)
• We recruited two domain experts • Participants were given a 10 minute presentation of the tool – only to introduce the aim and the basic conceptual steps • We used a “thinking aloud” method to induce participants to express their comments and doubts • Participants were then asked to asnwer questions regarding effectiveness, efficiency, information undestanding and easiness of use of the tool
12 TweetClass - Hypertext 2015 - Cyprus
![Page 19: An interactive method for inferring demographic attributes](https://reader031.vdocuments.mx/reader031/viewer/2022030123/58a472c51a28aba34c8b5737/html5/thumbnails/19.jpg)
Cognitive Walkthrough (results)
The cognitive walkthrough highlighted several problems:
1. Both participants suggested to include a continuous update about the age and gender composition of the current set of Twitter users, to better decide how many instances to refine;
2. During the two refinement phases, the attention of the experts was mainly captured by the images – and less by the textual information
3. They suggested various improvements to the shown messages and buttons
13 TweetClass - Hypertext 2015 - Cyprus
![Page 20: An interactive method for inferring demographic attributes](https://reader031.vdocuments.mx/reader031/viewer/2022030123/58a472c51a28aba34c8b5737/html5/thumbnails/20.jpg)
Interface Prototype
14 TweetClass - Hypertext 2015 - Cyprus
![Page 21: An interactive method for inferring demographic attributes](https://reader031.vdocuments.mx/reader031/viewer/2022030123/58a472c51a28aba34c8b5737/html5/thumbnails/21.jpg)
Summative Evaluation
• Finally, we conduced a summative evaluation of the second interface prototype • We recruited 22 participants (15 males and 7 females), of which 12 PhD students, 7 researchers and 3 master students • We collected data regarding completion time, inter-rate agreement and success rate
15 TweetClass - Hypertext 2015 - Cyprus
![Page 22: An interactive method for inferring demographic attributes](https://reader031.vdocuments.mx/reader031/viewer/2022030123/58a472c51a28aba34c8b5737/html5/thumbnails/22.jpg)
Summative Evaluation (results)
• Assignment of age takes twice the time for assignment of the gender • The inter-agreement rate (measured ising the Fleiss’ kappa index [Fleiss, 1981]) shown a higher agreement for gender than for age (77,34% vs 70,45%) • The accuracy of the refined instances was generally high: 92% for gender classification, and 91% for age classification • The participants also answered questions about the easiness of use, learning and information understanding regarding the tool – the overall satisfaction was very high
16 TweetClass - Hypertext 2015 - Cyprus
![Page 23: An interactive method for inferring demographic attributes](https://reader031.vdocuments.mx/reader031/viewer/2022030123/58a472c51a28aba34c8b5737/html5/thumbnails/23.jpg)
Conclusions and Future Work
• We introduced TweetClass, a proof-of-concept tool to support social scientist in the identification of demographic attributes of Twitter users • As the collection of this data is generally a difficult problem, TweetClass can help to increase the quality and/or dimension of Twitter user samples • Future work will include:
• incorporation of other automatic techniques in the tool • identification of other demographic attributes • expansion to larger datasets
17 TweetClass - Hypertext 2015 - Cyprus
![Page 24: An interactive method for inferring demographic attributes](https://reader031.vdocuments.mx/reader031/viewer/2022030123/58a472c51a28aba34c8b5737/html5/thumbnails/24.jpg)
E-mail: [email protected] Website: mind.disco.unimib.it