somechallengingproblemsin...
TRANSCRIPT
![Page 1: SomeChallengingProblemsin Mining%Social%Media%huanliu/dmml_presentation/2014/...Arizona%State%University% Some%Challenging%Problems%in%Mining%Social%Media% %Data%Mining%and%Machine%Learning%Lab!](https://reader036.vdocuments.mx/reader036/viewer/2022081615/5fd93efad3396458d31e2c7e/html5/thumbnails/1.jpg)
Some Challenging Problems in Mining Social Media Arizona State University Data Mining and Machine Learning Lab April 22, 2014 1
Some Challenging Problems in Mining Social Media
Huan Liu Joint work with
Ali Abbasi Shamanth Kumar Fred Morsta?er Reza Zafarani Jiliang Tang
![Page 2: SomeChallengingProblemsin Mining%Social%Media%huanliu/dmml_presentation/2014/...Arizona%State%University% Some%Challenging%Problems%in%Mining%Social%Media% %Data%Mining%and%Machine%Learning%Lab!](https://reader036.vdocuments.mx/reader036/viewer/2022081615/5fd93efad3396458d31e2c7e/html5/thumbnails/2.jpg)
Some Challenging Problems in Mining Social Media Arizona State University Data Mining and Machine Learning Lab April 22, 2014 2 2
Social Media Mining by Cambridge University Press
h?p://dmml.asu.edu/smm/
![Page 3: SomeChallengingProblemsin Mining%Social%Media%huanliu/dmml_presentation/2014/...Arizona%State%University% Some%Challenging%Problems%in%Mining%Social%Media% %Data%Mining%and%Machine%Learning%Lab!](https://reader036.vdocuments.mx/reader036/viewer/2022081615/5fd93efad3396458d31e2c7e/html5/thumbnails/3.jpg)
Some Challenging Problems in Mining Social Media Arizona State University Data Mining and Machine Learning Lab April 22, 2014 3
TradiIonal Media and Data
Broadcast Media One-‐to-‐Many
CommunicaHon Media One-‐to-‐One TradiIonal Data
![Page 4: SomeChallengingProblemsin Mining%Social%Media%huanliu/dmml_presentation/2014/...Arizona%State%University% Some%Challenging%Problems%in%Mining%Social%Media% %Data%Mining%and%Machine%Learning%Lab!](https://reader036.vdocuments.mx/reader036/viewer/2022081615/5fd93efad3396458d31e2c7e/html5/thumbnails/4.jpg)
Some Challenging Problems in Mining Social Media Arizona State University Data Mining and Machine Learning Lab April 22, 2014 4
Social Media: Many-‐to-‐Many
• Everyone can be a media outlet or producer • Disappearing communicaHon barrier • DisHnct characterisHcs – User generated content: Massive, dynamic, extensive, instant, and noisy
– Rich user interacHons: Linked data – CollaboraHve environment, and wisdom of the crowd – Many small groups (the long tail phenomenon) – AQenHon is expensive
4
![Page 5: SomeChallengingProblemsin Mining%Social%Media%huanliu/dmml_presentation/2014/...Arizona%State%University% Some%Challenging%Problems%in%Mining%Social%Media% %Data%Mining%and%Machine%Learning%Lab!](https://reader036.vdocuments.mx/reader036/viewer/2022081615/5fd93efad3396458d31e2c7e/html5/thumbnails/5.jpg)
Some Challenging Problems in Mining Social Media Arizona State University Data Mining and Machine Learning Lab April 22, 2014 5
Unique Features of Social Media
• Novel phenomena observed from people’s interac(ons in social media
• Unprecedented opportuniHes for interdisciplinary and collabora(ve research – How to use social media to study human behavior? • It’s rich, noisy, free-‐form, and definitely BIG
– With so much data, how can we make sense of it? • PuZng “bricks” into a useful (meaningful) “edifice” • Developing new methods/tools for social media mining
![Page 6: SomeChallengingProblemsin Mining%Social%Media%huanliu/dmml_presentation/2014/...Arizona%State%University% Some%Challenging%Problems%in%Mining%Social%Media% %Data%Mining%and%Machine%Learning%Lab!](https://reader036.vdocuments.mx/reader036/viewer/2022081615/5fd93efad3396458d31e2c7e/html5/thumbnails/6.jpg)
Some Challenging Problems in Mining Social Media Arizona State University Data Mining and Machine Learning Lab April 22, 2014 6
Some Challenges in Mining Social Media
• EvaluaHon Dilemma – How to evaluate without convenHonal test data?
• Sampling Bias – O`en we get a small sample of (sHll big) data. How can we ensure if the data can lead to credible findings?
• Noise-‐Removal Fallacy – How do we remove noise without losing too much?
• Studying Distrust in Social Media – Is distrust simply the negaHon of trust? Where to find distrust informaHon with “one-‐way” relaHons?
![Page 7: SomeChallengingProblemsin Mining%Social%Media%huanliu/dmml_presentation/2014/...Arizona%State%University% Some%Challenging%Problems%in%Mining%Social%Media% %Data%Mining%and%Machine%Learning%Lab!](https://reader036.vdocuments.mx/reader036/viewer/2022081615/5fd93efad3396458d31e2c7e/html5/thumbnails/7.jpg)
Some Challenging Problems in Mining Social Media Arizona State University Data Mining and Machine Learning Lab April 22, 2014 7 7
• EvaluaHon is important in data mining – TradiHonal test data is o`en not available in social media mining
• Can we evaluate our findings without ground truth?
• A case study of Migra(on in Social Media – Users are a primary source of revenue – New social media sites need to aQract users – ExisHng sites need to retain their users – CompeHHon for precious aQenHon
EvaluaIon Dilemma
![Page 8: SomeChallengingProblemsin Mining%Social%Media%huanliu/dmml_presentation/2014/...Arizona%State%University% Some%Challenging%Problems%in%Mining%Social%Media% %Data%Mining%and%Machine%Learning%Lab!](https://reader036.vdocuments.mx/reader036/viewer/2022081615/5fd93efad3396458d31e2c7e/html5/thumbnails/8.jpg)
Some Challenging Problems in Mining Social Media Arizona State University Data Mining and Machine Learning Lab April 22, 2014 8
MigraIon in Social Media
• What is migraHon? – MigraHon can be described as the movement of users away from one locaHon toward another, either due to necessity, or aQracHon to the new environment.
• Two types of migraHon – Site migraHon
– AQenHon migraHon
Site 2
Site 3 Site 1 Site 2 Site 3 A`er
Hme t
Site 2
Site 3 Site 1
A`er Hme t
Site 2
Site 3 Site 1
![Page 9: SomeChallengingProblemsin Mining%Social%Media%huanliu/dmml_presentation/2014/...Arizona%State%University% Some%Challenging%Problems%in%Mining%Social%Media% %Data%Mining%and%Machine%Learning%Lab!](https://reader036.vdocuments.mx/reader036/viewer/2022081615/5fd93efad3396458d31e2c7e/html5/thumbnails/9.jpg)
Some Challenging Problems in Mining Social Media Arizona State University Data Mining and Machine Learning Lab April 22, 2014 9
Obtaining User MigraIon Pa?erns
• Goal: IdenHfying trends of aQenHon migraHon of users across the two phases of the collected data.
• Process
![Page 10: SomeChallengingProblemsin Mining%Social%Media%huanliu/dmml_presentation/2014/...Arizona%State%University% Some%Challenging%Problems%in%Mining%Social%Media% %Data%Mining%and%Machine%Learning%Lab!](https://reader036.vdocuments.mx/reader036/viewer/2022081615/5fd93efad3396458d31e2c7e/html5/thumbnails/10.jpg)
Some Challenging Problems in Mining Social Media Arizona State University Data Mining and Machine Learning Lab April 22, 2014 10
Pa?erns from ObservaIon
![Page 11: SomeChallengingProblemsin Mining%Social%Media%huanliu/dmml_presentation/2014/...Arizona%State%University% Some%Challenging%Problems%in%Mining%Social%Media% %Data%Mining%and%Machine%Learning%Lab!](https://reader036.vdocuments.mx/reader036/viewer/2022081615/5fd93efad3396458d31e2c7e/html5/thumbnails/11.jpg)
Some Challenging Problems in Mining Social Media Arizona State University Data Mining and Machine Learning Lab April 22, 2014 11
Facing an EvaluaIon Dilemma
• Important to know if they are some meaningful paQerns – If yes, we invesHgate further how we use the paQerns for prevenHon or promoHon
– If not, why not? And what can we do? • We would like to evaluate migraHon paQerns, but without ground truth
• How? – User study or AMT?
![Page 12: SomeChallengingProblemsin Mining%Social%Media%huanliu/dmml_presentation/2014/...Arizona%State%University% Some%Challenging%Problems%in%Mining%Social%Media% %Data%Mining%and%Machine%Learning%Lab!](https://reader036.vdocuments.mx/reader036/viewer/2022081615/5fd93efad3396458d31e2c7e/html5/thumbnails/12.jpg)
Some Challenging Problems in Mining Social Media Arizona State University Data Mining and Machine Learning Lab April 22, 2014 12
EvaluaIng Pa?erns’ Validity
• One way is to verify if these paQerns are fortuitous
• Null Hypothesis: Migra(on in social media is a random process – GeneraHng another similar dataset for comparison • PotenHal migraHng populaHon includes overlapping users from Phase 1 and Phase 2 • Shuffled datasets are generated by picking random acHve users from the potenHal migraHng populaHon • The number of random users selected for each dataset is the same as the real migraHng populaHon
![Page 13: SomeChallengingProblemsin Mining%Social%Media%huanliu/dmml_presentation/2014/...Arizona%State%University% Some%Challenging%Problems%in%Mining%Social%Media% %Data%Mining%and%Machine%Learning%Lab!](https://reader036.vdocuments.mx/reader036/viewer/2022081615/5fd93efad3396458d31e2c7e/html5/thumbnails/13.jpg)
Some Challenging Problems in Mining Social Media Arizona State University Data Mining and Machine Learning Lab April 22, 2014 13
A Significance Test
Shuffled dataset
Observed migraHon dataset
Coefficients of user
aQributes
Comparing and
Sig. Test
LogisHc Regression
Chi Square StaHsHc
Coefficients of user
aQributes LogisHc Regression
![Page 14: SomeChallengingProblemsin Mining%Social%Media%huanliu/dmml_presentation/2014/...Arizona%State%University% Some%Challenging%Problems%in%Mining%Social%Media% %Data%Mining%and%Machine%Learning%Lab!](https://reader036.vdocuments.mx/reader036/viewer/2022081615/5fd93efad3396458d31e2c7e/html5/thumbnails/14.jpg)
Some Challenging Problems in Mining Social Media Arizona State University Data Mining and Machine Learning Lab April 22, 2014 14
EvaluaIon Results
• Significant differences observed in StumbleUpon, TwiQer, and YouTube
• PaQerns from other sites are not staHsHcally significant. PotenHal cause: – Insufficient Data?
![Page 15: SomeChallengingProblemsin Mining%Social%Media%huanliu/dmml_presentation/2014/...Arizona%State%University% Some%Challenging%Problems%in%Mining%Social%Media% %Data%Mining%and%Machine%Learning%Lab!](https://reader036.vdocuments.mx/reader036/viewer/2022081615/5fd93efad3396458d31e2c7e/html5/thumbnails/15.jpg)
Some Challenging Problems in Mining Social Media Arizona State University Data Mining and Machine Learning Lab April 22, 2014 15
Summary
• MiHgaHng or promoHng migraHon by targeHng high net-‐worth individuals – IdenHfying users with high value to the network, e.g., high network acHvity, user acHvity, and external exposure
• Social media migraHon is first studied in this work
• AlternaHve evaluaHon approaches can help address the evaluaHon dilemma
Understanding User MigraHon PaQerns in Social Media, S. Kumar, R. Zafarani, and H. Liu, AAAI’2010
![Page 16: SomeChallengingProblemsin Mining%Social%Media%huanliu/dmml_presentation/2014/...Arizona%State%University% Some%Challenging%Problems%in%Mining%Social%Media% %Data%Mining%and%Machine%Learning%Lab!](https://reader036.vdocuments.mx/reader036/viewer/2022081615/5fd93efad3396458d31e2c7e/html5/thumbnails/16.jpg)
Some Challenging Problems in Mining Social Media Arizona State University Data Mining and Machine Learning Lab April 22, 2014 16
Some Challenges in Mining Social Media
• EvaluaHon Dilemma
• Sampling Bias
• Noise-‐Removal Fallacy
• Studying Distrust in Social Media
![Page 17: SomeChallengingProblemsin Mining%Social%Media%huanliu/dmml_presentation/2014/...Arizona%State%University% Some%Challenging%Problems%in%Mining%Social%Media% %Data%Mining%and%Machine%Learning%Lab!](https://reader036.vdocuments.mx/reader036/viewer/2022081615/5fd93efad3396458d31e2c7e/html5/thumbnails/17.jpg)
Some Challenging Problems in Mining Social Media Arizona State University Data Mining and Machine Learning Lab April 22, 2014 17 17
• TwiQer provides two main outlets for researchers to access tweets in real Hme: – Streaming API (~1% of all public tweets, free) – Firehose (100% of all public tweets, costly)
• Streaming API data is o`en used to by researchers to validate hypotheses.
• How well does the sampled Streaming API data measure the true acHvity on TwiQer?
Sampling Bias in Social Media Data
![Page 18: SomeChallengingProblemsin Mining%Social%Media%huanliu/dmml_presentation/2014/...Arizona%State%University% Some%Challenging%Problems%in%Mining%Social%Media% %Data%Mining%and%Machine%Learning%Lab!](https://reader036.vdocuments.mx/reader036/viewer/2022081615/5fd93efad3396458d31e2c7e/html5/thumbnails/18.jpg)
Some Challenging Problems in Mining Social Media Arizona State University Data Mining and Machine Learning Lab April 22, 2014 18
Facets of Twi?er Data
• Compare the data along different facets • Selected facets commonly used in social media mining: – Top Hashtags – Topic ExtracIon – Network Measures – Geographic DistribuHons
![Page 19: SomeChallengingProblemsin Mining%Social%Media%huanliu/dmml_presentation/2014/...Arizona%State%University% Some%Challenging%Problems%in%Mining%Social%Media% %Data%Mining%and%Machine%Learning%Lab!](https://reader036.vdocuments.mx/reader036/viewer/2022081615/5fd93efad3396458d31e2c7e/html5/thumbnails/19.jpg)
Some Challenging Problems in Mining Social Media Arizona State University Data Mining and Machine Learning Lab April 22, 2014 19
Preliminary Results
Top Hashtags Topic ExtracIon
• No clear correlaHon between Streaming and Firehose data.
• Topics are close to those found in the Firehose.
Network Measures Geographic DistribuIons
• Found ~50% of the top tweeters by different centrality measures.
• Graph-‐level measures give similar results between the two datasets.
• Streaming data gets >90% of the geotagged tweets.
• Consequently, the distribuHon of tweets by conHnent is very similar.
![Page 20: SomeChallengingProblemsin Mining%Social%Media%huanliu/dmml_presentation/2014/...Arizona%State%University% Some%Challenging%Problems%in%Mining%Social%Media% %Data%Mining%and%Machine%Learning%Lab!](https://reader036.vdocuments.mx/reader036/viewer/2022081615/5fd93efad3396458d31e2c7e/html5/thumbnails/20.jpg)
Some Challenging Problems in Mining Social Media Arizona State University Data Mining and Machine Learning Lab April 22, 2014 20
How are These Results?
• Accuracy of streaming API can vary with analysis to be performed
• These results are about single cases of streaming API
• Are these findings significant, or just an arHfact of random sampling?
• How do we verify that our results indicate sampling bias or not?
![Page 21: SomeChallengingProblemsin Mining%Social%Media%huanliu/dmml_presentation/2014/...Arizona%State%University% Some%Challenging%Problems%in%Mining%Social%Media% %Data%Mining%and%Machine%Learning%Lab!](https://reader036.vdocuments.mx/reader036/viewer/2022081615/5fd93efad3396458d31e2c7e/html5/thumbnails/21.jpg)
Some Challenging Problems in Mining Social Media Arizona State University Data Mining and Machine Learning Lab April 22, 2014 21
Histogram of JS Distances in Topic Comparison
• This is just one streaming dataset against Firehose • Are we confident about this set of results? • Can we leverage another streaming dataset? • Unfortunately, we cannot rewind as we have only one streaming dataset
![Page 22: SomeChallengingProblemsin Mining%Social%Media%huanliu/dmml_presentation/2014/...Arizona%State%University% Some%Challenging%Problems%in%Mining%Social%Media% %Data%Mining%and%Machine%Learning%Lab!](https://reader036.vdocuments.mx/reader036/viewer/2022081615/5fd93efad3396458d31e2c7e/html5/thumbnails/22.jpg)
Some Challenging Problems in Mining Social Media Arizona State University Data Mining and Machine Learning Lab April 22, 2014 22
VerificaIon
• Created 100 of our own “Streaming API” results by sampling the Firehose data.
0"
10"
20"
30"
40"
50"
60"
Firehose" Streaming" Random"1" Random"2" …" Random"100"
Num
er&of&tweets&(k
)&
Genera2ng&Random&Samples&
![Page 23: SomeChallengingProblemsin Mining%Social%Media%huanliu/dmml_presentation/2014/...Arizona%State%University% Some%Challenging%Problems%in%Mining%Social%Media% %Data%Mining%and%Machine%Learning%Lab!](https://reader036.vdocuments.mx/reader036/viewer/2022081615/5fd93efad3396458d31e2c7e/html5/thumbnails/23.jpg)
Some Challenging Problems in Mining Social Media Arizona State University Data Mining and Machine Learning Lab April 22, 2014 23
Comparison with Random Samples
![Page 24: SomeChallengingProblemsin Mining%Social%Media%huanliu/dmml_presentation/2014/...Arizona%State%University% Some%Challenging%Problems%in%Mining%Social%Media% %Data%Mining%and%Machine%Learning%Lab!](https://reader036.vdocuments.mx/reader036/viewer/2022081615/5fd93efad3396458d31e2c7e/html5/thumbnails/24.jpg)
Some Challenging Problems in Mining Social Media Arizona State University Data Mining and Machine Learning Lab April 22, 2014 24
Summary
• Streaming API data could be biased in some facets
• Our results were obtained with the help of Firehose
• Without Firehose data, it’s challenging to figure out which facets might have bias, and how to compensate them in search of credible mining results
F. MorstaQer, J. Pfeffer, H. Liu, and K. Carley. Is the Sample Good Enough? Comparing Data from TwiAer’s Streaming API and Data from TwiAer’s Firehose. ICWSM, 2013. Fred MorstaQer, Jürgen Pfeffer, Huan Liu. When is it Biased? Assessing the Representa(veness of TwiAer's Streaming API, WWW Web Science 2014.
![Page 25: SomeChallengingProblemsin Mining%Social%Media%huanliu/dmml_presentation/2014/...Arizona%State%University% Some%Challenging%Problems%in%Mining%Social%Media% %Data%Mining%and%Machine%Learning%Lab!](https://reader036.vdocuments.mx/reader036/viewer/2022081615/5fd93efad3396458d31e2c7e/html5/thumbnails/25.jpg)
Some Challenging Problems in Mining Social Media Arizona State University Data Mining and Machine Learning Lab April 22, 2014 25
Some Challenges in Mining Social Media
• EvaluaHon Dilemma
• Sampling Bias
• Noise-‐Removal Fallacy
• Studying Distrust in Social Media
![Page 26: SomeChallengingProblemsin Mining%Social%Media%huanliu/dmml_presentation/2014/...Arizona%State%University% Some%Challenging%Problems%in%Mining%Social%Media% %Data%Mining%and%Machine%Learning%Lab!](https://reader036.vdocuments.mx/reader036/viewer/2022081615/5fd93efad3396458d31e2c7e/html5/thumbnails/26.jpg)
Some Challenging Problems in Mining Social Media Arizona State University Data Mining and Machine Learning Lab April 22, 2014 26 26
• We o`en learn that: “99% TwiQer data is useless.” – “Had eggs, sunny-‐side-‐up, this morning” – Can we remove noise as we usually do in DM?
• What is le` a`er noise removal? – TwiQer data can be rendered useless a`er convenHonal noise removal
• As we are certain there is noise in data, how can we remove it?
Noise Removal Fallacy
![Page 27: SomeChallengingProblemsin Mining%Social%Media%huanliu/dmml_presentation/2014/...Arizona%State%University% Some%Challenging%Problems%in%Mining%Social%Media% %Data%Mining%and%Machine%Learning%Lab!](https://reader036.vdocuments.mx/reader036/viewer/2022081615/5fd93efad3396458d31e2c7e/html5/thumbnails/27.jpg)
Some Challenging Problems in Mining Social Media Arizona State University Data Mining and Machine Learning Lab April 22, 2014 27
Social Media Data
• Massive and high-‐dimensional social media data poses unique challenges to data mining tasks – Scalability – Curse of dimensionality
• Social media data is inherently linked – A key difference between social media data and aQribute-‐value data
![Page 28: SomeChallengingProblemsin Mining%Social%Media%huanliu/dmml_presentation/2014/...Arizona%State%University% Some%Challenging%Problems%in%Mining%Social%Media% %Data%Mining%and%Machine%Learning%Lab!](https://reader036.vdocuments.mx/reader036/viewer/2022081615/5fd93efad3396458d31e2c7e/html5/thumbnails/28.jpg)
Some Challenging Problems in Mining Social Media Arizona State University Data Mining and Machine Learning Lab April 22, 2014 28
Feature SelecIon of Social Data
• Feature selecHon has been widely used to prepare large-‐scale, high-‐dimensional data for effecHve data mining
• TradiHonal feature selecHon algorithms deal with only “flat" data (aAribute-‐value data). – Independent and IdenHcally Distributed (i.i.d.)
• We need to take advantage of linked data for feature selecHon
![Page 29: SomeChallengingProblemsin Mining%Social%Media%huanliu/dmml_presentation/2014/...Arizona%State%University% Some%Challenging%Problems%in%Mining%Social%Media% %Data%Mining%and%Machine%Learning%Lab!](https://reader036.vdocuments.mx/reader036/viewer/2022081615/5fd93efad3396458d31e2c7e/html5/thumbnails/29.jpg)
Some Challenging Problems in Mining Social Media Arizona State University Data Mining and Machine Learning Lab April 22, 2014 29
RepresentaIon for Social Media Data
User-‐post relaHons
1
1 1 1
1
1 1
𝑢↓1
𝑢↓2
𝑢↓3
𝑢↓4
𝑢↓1 𝑢↓2 𝑢↓3 𝑢↓4
𝑝↓1
𝑝↓2
𝑝↓5 𝑝↓6
𝑝↓4
𝑝↓7 𝑝↓8
𝑓↓𝑚 …. …. …. 𝑐↓𝑘 ….
![Page 30: SomeChallengingProblemsin Mining%Social%Media%huanliu/dmml_presentation/2014/...Arizona%State%University% Some%Challenging%Problems%in%Mining%Social%Media% %Data%Mining%and%Machine%Learning%Lab!](https://reader036.vdocuments.mx/reader036/viewer/2022081615/5fd93efad3396458d31e2c7e/html5/thumbnails/30.jpg)
Some Challenging Problems in Mining Social Media Arizona State University Data Mining and Machine Learning Lab April 22, 2014 30
RepresentaIon for Social Media Data
1
1 1 1
1
1 1
𝑢↓1
𝑢↓2
𝑢↓3
𝑢↓4
𝑢↓1 𝑢↓2 𝑢↓3 𝑢↓4
𝑝↓1
𝑝↓2
𝑝↓5 𝑝↓6
𝑝↓4
𝑝↓7 𝑝↓8
𝑓↓𝑚 …. …. …. 𝑐↓𝑘 ….
User-‐user relaHons
![Page 31: SomeChallengingProblemsin Mining%Social%Media%huanliu/dmml_presentation/2014/...Arizona%State%University% Some%Challenging%Problems%in%Mining%Social%Media% %Data%Mining%and%Machine%Learning%Lab!](https://reader036.vdocuments.mx/reader036/viewer/2022081615/5fd93efad3396458d31e2c7e/html5/thumbnails/31.jpg)
Some Challenging Problems in Mining Social Media Arizona State University Data Mining and Machine Learning Lab April 22, 2014 31
RepresentaIon for Social Media Data
1
1 1 1
1
1 1
𝑢↓1
𝑢↓2
𝑢↓3
𝑢↓4
𝑢↓1 𝑢↓2 𝑢↓3 𝑢↓4
𝑝↓1
𝑝↓2
𝑝↓5 𝑝↓6
𝑝↓4
𝑝↓7 𝑝↓8
𝑓↓𝑚 …. …. …. 𝑐↓𝑘 ….
Social Context
![Page 32: SomeChallengingProblemsin Mining%Social%Media%huanliu/dmml_presentation/2014/...Arizona%State%University% Some%Challenging%Problems%in%Mining%Social%Media% %Data%Mining%and%Machine%Learning%Lab!](https://reader036.vdocuments.mx/reader036/viewer/2022081615/5fd93efad3396458d31e2c7e/html5/thumbnails/32.jpg)
Some Challenging Problems in Mining Social Media Arizona State University Data Mining and Machine Learning Lab April 22, 2014 32
Problem Statement
• Given labeled data X and its label indicator matrix Y, the dataset F, its social context including user-‐user following relaHonships S and user-‐post relaHonships P,
• Select k most relevant features from m features on dataset F with its social context S and P
![Page 33: SomeChallengingProblemsin Mining%Social%Media%huanliu/dmml_presentation/2014/...Arizona%State%University% Some%Challenging%Problems%in%Mining%Social%Media% %Data%Mining%and%Machine%Learning%Lab!](https://reader036.vdocuments.mx/reader036/viewer/2022081615/5fd93efad3396458d31e2c7e/html5/thumbnails/33.jpg)
Some Challenging Problems in Mining Social Media Arizona State University Data Mining and Machine Learning Lab April 22, 2014 33
How to Use Link InformaIon
• The new quesHon is how to proceed with addiHonal informaHon for feature selecHon
• Two basic technical problems – RelaHon extracHon: What are disHncHve relaHons that can be extracted from linked data
– MathemaHcal representaHon: How to use these relaHons in feature selecHon formulaHon
• Do we have theories to guide us?
![Page 34: SomeChallengingProblemsin Mining%Social%Media%huanliu/dmml_presentation/2014/...Arizona%State%University% Some%Challenging%Problems%in%Mining%Social%Media% %Data%Mining%and%Machine%Learning%Lab!](https://reader036.vdocuments.mx/reader036/viewer/2022081615/5fd93efad3396458d31e2c7e/html5/thumbnails/34.jpg)
Some Challenging Problems in Mining Social Media Arizona State University Data Mining and Machine Learning Lab April 22, 2014 34
𝑢↓1
𝑢↓2
𝑢↓3
𝑢↓4
𝑝↓1 𝑝↓2
p3 𝑝↓5
𝑝↓6
𝑝↓4
𝑝↓7
𝑝↓8
1. CoPost 2. CoFollowing 3. CoFollowed 4. Following
RelaIon ExtracIon
![Page 35: SomeChallengingProblemsin Mining%Social%Media%huanliu/dmml_presentation/2014/...Arizona%State%University% Some%Challenging%Problems%in%Mining%Social%Media% %Data%Mining%and%Machine%Learning%Lab!](https://reader036.vdocuments.mx/reader036/viewer/2022081615/5fd93efad3396458d31e2c7e/html5/thumbnails/35.jpg)
Some Challenging Problems in Mining Social Media Arizona State University Data Mining and Machine Learning Lab April 22, 2014 35
RelaIons, Social Theories, Hypotheses
• Social correlaHon theories suggest that the four relaHons may affect the relaHonships between posts
• Social correlaHon theories – Homophily: People with similar interests are more likely to be linked
– Influence: People who are linked are more likely to have similar interests
• Thus, four relaHons lead to four hypotheses for verificaHon
![Page 36: SomeChallengingProblemsin Mining%Social%Media%huanliu/dmml_presentation/2014/...Arizona%State%University% Some%Challenging%Problems%in%Mining%Social%Media% %Data%Mining%and%Machine%Learning%Lab!](https://reader036.vdocuments.mx/reader036/viewer/2022081615/5fd93efad3396458d31e2c7e/html5/thumbnails/36.jpg)
Some Challenging Problems in Mining Social Media Arizona State University Data Mining and Machine Learning Lab April 22, 2014 36
Modeling CoFollowing RelaIon
• Two co-‐following users have similar topics of interests
||||
)(^
k
Ffi
T
k
Ffi
k F
fW
F
fTuT kiki
∑∑∈∈ ==)(
Users' topic interests
∑ ∑∈
−++−u Nuu
jiFT
uji
uTuT,
22
^^
1,22
W||)()(||||W||||YWX||min βα
![Page 37: SomeChallengingProblemsin Mining%Social%Media%huanliu/dmml_presentation/2014/...Arizona%State%University% Some%Challenging%Problems%in%Mining%Social%Media% %Data%Mining%and%Machine%Learning%Lab!](https://reader036.vdocuments.mx/reader036/viewer/2022081615/5fd93efad3396458d31e2c7e/html5/thumbnails/37.jpg)
Some Challenging Problems in Mining Social Media Arizona State University Data Mining and Machine Learning Lab April 22, 2014 37
EvaluaIon Results on Digg
![Page 38: SomeChallengingProblemsin Mining%Social%Media%huanliu/dmml_presentation/2014/...Arizona%State%University% Some%Challenging%Problems%in%Mining%Social%Media% %Data%Mining%and%Machine%Learning%Lab!](https://reader036.vdocuments.mx/reader036/viewer/2022081615/5fd93efad3396458d31e2c7e/html5/thumbnails/38.jpg)
Some Challenging Problems in Mining Social Media Arizona State University Data Mining and Machine Learning Lab April 22, 2014 38
EvaluaIon Results on Digg
![Page 39: SomeChallengingProblemsin Mining%Social%Media%huanliu/dmml_presentation/2014/...Arizona%State%University% Some%Challenging%Problems%in%Mining%Social%Media% %Data%Mining%and%Machine%Learning%Lab!](https://reader036.vdocuments.mx/reader036/viewer/2022081615/5fd93efad3396458d31e2c7e/html5/thumbnails/39.jpg)
Some Challenging Problems in Mining Social Media Arizona State University Data Mining and Machine Learning Lab April 22, 2014 39
Summary
• LinkedFS is evaluated under varied circumstances to understand how it works. – Link informaHon can help feature selec(on for social media data.
• Unlabeled data is more o`en in social media, unsupervised learning is more sensible, but also more challenging.
Jiliang Tang and Huan Liu. `` Unsupervised Feature SelecHon for Linked Social Media Data'', the Eighteenth ACM SIGKDD InternaHonal Conference on Knowledge Discovery and Data Mining , 2012. Jiliang Tang, Huan Liu. ``Feature SelecHon with Linked Data in Social Media'', SIAM InternaHonal Conference on Data Mining, 2012.
![Page 40: SomeChallengingProblemsin Mining%Social%Media%huanliu/dmml_presentation/2014/...Arizona%State%University% Some%Challenging%Problems%in%Mining%Social%Media% %Data%Mining%and%Machine%Learning%Lab!](https://reader036.vdocuments.mx/reader036/viewer/2022081615/5fd93efad3396458d31e2c7e/html5/thumbnails/40.jpg)
Some Challenging Problems in Mining Social Media Arizona State University Data Mining and Machine Learning Lab April 22, 2014 40
Some Challenges in Mining Social Media
• EvaluaHon Dilemma
• Sampling Bias
• Noise-‐Removal Fallacy
• Studying Distrust in Social Media
![Page 41: SomeChallengingProblemsin Mining%Social%Media%huanliu/dmml_presentation/2014/...Arizona%State%University% Some%Challenging%Problems%in%Mining%Social%Media% %Data%Mining%and%Machine%Learning%Lab!](https://reader036.vdocuments.mx/reader036/viewer/2022081615/5fd93efad3396458d31e2c7e/html5/thumbnails/41.jpg)
Some Challenging Problems in Mining Social Media Arizona State University Data Mining and Machine Learning Lab April 22, 2014 41 41
Studying Distrust in Social Media
Trust in Social CompuIng
IncorporaIng Distrust
Summary
IntroducIon
Applying Trust
RepresenIng Trust
Measuring Trust
WWW2014 Tutorial on Trust in Social CompuHng Seoul, South Korea. 4/7/14 h?p://www.public.asu.edu/~jtang20/tTrust.htm
![Page 42: SomeChallengingProblemsin Mining%Social%Media%huanliu/dmml_presentation/2014/...Arizona%State%University% Some%Challenging%Problems%in%Mining%Social%Media% %Data%Mining%and%Machine%Learning%Lab!](https://reader036.vdocuments.mx/reader036/viewer/2022081615/5fd93efad3396458d31e2c7e/html5/thumbnails/42.jpg)
Some Challenging Problems in Mining Social Media Arizona State University Data Mining and Machine Learning Lab April 22, 2014 42
Distrust in Social Sciences
• Distrust can be as important as trust
• Both trust and distrust help a decision maker reduce the uncertainty and vulnerability associated with decision consequences
• Distrust may play an equally important, if not more, criHcal role as trust in consumer decisions
![Page 43: SomeChallengingProblemsin Mining%Social%Media%huanliu/dmml_presentation/2014/...Arizona%State%University% Some%Challenging%Problems%in%Mining%Social%Media% %Data%Mining%and%Machine%Learning%Lab!](https://reader036.vdocuments.mx/reader036/viewer/2022081615/5fd93efad3396458d31e2c7e/html5/thumbnails/43.jpg)
Some Challenging Problems in Mining Social Media Arizona State University Data Mining and Machine Learning Lab April 22, 2014 43
Understandings of Distrust from Social Sciences
• Distrust is the negaHon of trust ─ Low trust is equivalent to high distrust ─ The absence of distrust means high trust ─ Lack of the studying of distrust maQers liQle
• Distrust is a new dimension of trust ─ Trust and distrust are two separate concepts ─ Trust and distrust can co-‐exist ─ A study ignoring distrust would yield an incomplete esHmate of the effect of trust
![Page 44: SomeChallengingProblemsin Mining%Social%Media%huanliu/dmml_presentation/2014/...Arizona%State%University% Some%Challenging%Problems%in%Mining%Social%Media% %Data%Mining%and%Machine%Learning%Lab!](https://reader036.vdocuments.mx/reader036/viewer/2022081615/5fd93efad3396458d31e2c7e/html5/thumbnails/44.jpg)
Some Challenging Problems in Mining Social Media Arizona State University Data Mining and Machine Learning Lab April 22, 2014 44
Distrust in Social Media
• Distrust is rarely studied in social media • Challenge 1: Lack of computaHonal understanding of distrust with social media data – Social media data is based on passive observaHons – Lack of some informaHon social sciences use to study distrust
• Challenge 2: Distrust informaHon is usually not publicly available – Trust is a desired property while distrust is an unwanted one for an online social community
![Page 45: SomeChallengingProblemsin Mining%Social%Media%huanliu/dmml_presentation/2014/...Arizona%State%University% Some%Challenging%Problems%in%Mining%Social%Media% %Data%Mining%and%Machine%Learning%Lab!](https://reader036.vdocuments.mx/reader036/viewer/2022081615/5fd93efad3396458d31e2c7e/html5/thumbnails/45.jpg)
Some Challenging Problems in Mining Social Media Arizona State University Data Mining and Machine Learning Lab April 22, 2014 45
ComputaIonal Understanding of Distrust • Design computaHonal tasks to help understand distrust with passively observed social media data
§ Task 1: Is distrust the negaHon of trust? – If distrust is the negaHon of trust, distrust should be predictable from only trust
§ Task 2: Can we predict trust beQer with distrust? – If distrust is a new dimension of trust, distrust should have added value on trust and can improve trust predicHon
• The first step to understand distrust is to make distrust computable by incorporaHng distrust in trust models
![Page 46: SomeChallengingProblemsin Mining%Social%Media%huanliu/dmml_presentation/2014/...Arizona%State%University% Some%Challenging%Problems%in%Mining%Social%Media% %Data%Mining%and%Machine%Learning%Lab!](https://reader036.vdocuments.mx/reader036/viewer/2022081615/5fd93efad3396458d31e2c7e/html5/thumbnails/46.jpg)
Some Challenging Problems in Mining Social Media Arizona State University Data Mining and Machine Learning Lab April 22, 2014 46
Distrust in Trust RepresentaIons There are three major ways to incorporate distrust in trust representaHon – Considering low trust as distrust – Adding signs to trust values – Adding a dimension in trust representaHons
0
Trust
Distrust
1
0
Trust
Distrust
1
-‐1
0
Trust
1
-‐1 Distrust
![Page 47: SomeChallengingProblemsin Mining%Social%Media%huanliu/dmml_presentation/2014/...Arizona%State%University% Some%Challenging%Problems%in%Mining%Social%Media% %Data%Mining%and%Machine%Learning%Lab!](https://reader036.vdocuments.mx/reader036/viewer/2022081615/5fd93efad3396458d31e2c7e/html5/thumbnails/47.jpg)
Some Challenging Problems in Mining Social Media Arizona State University Data Mining and Machine Learning Lab April 22, 2014 47
An IllustraIon of Distrust in Trust RepresentaIons
• Considering low trust as distrust ─ Weighted unsigned network
• Extending negaHve values ─ Weighted signed network
• Adding another dimension ─ Two-‐dimensional unsigned network
0.8
1
0.8
1 1
0
-‐1
1
(0.8,0)
(1,0)
(0,1)
(1,0)
![Page 48: SomeChallengingProblemsin Mining%Social%Media%huanliu/dmml_presentation/2014/...Arizona%State%University% Some%Challenging%Problems%in%Mining%Social%Media% %Data%Mining%and%Machine%Learning%Lab!](https://reader036.vdocuments.mx/reader036/viewer/2022081615/5fd93efad3396458d31e2c7e/html5/thumbnails/48.jpg)
Some Challenging Problems in Mining Social Media Arizona State University Data Mining and Machine Learning Lab April 22, 2014 48
Task 1: Is Distrust the NegaIon of Trust?
• If distrust is the negaHon of trust, low trust is equivalent to distrust and distrust should be predictable from trust
• Given the transiHvity of trust, we resort to trust predicHon algorithms to compute trust scores for pairs of users in the same trust network
Distrust Low Trust
PredicIng Distrust
PredicIng Low Trust
IF
THEN
≡
≡
![Page 49: SomeChallengingProblemsin Mining%Social%Media%huanliu/dmml_presentation/2014/...Arizona%State%University% Some%Challenging%Problems%in%Mining%Social%Media% %Data%Mining%and%Machine%Learning%Lab!](https://reader036.vdocuments.mx/reader036/viewer/2022081615/5fd93efad3396458d31e2c7e/html5/thumbnails/49.jpg)
Some Challenging Problems in Mining Social Media Arizona State University Data Mining and Machine Learning Lab April 22, 2014 49
EvaluaIon of Task 1
§ The performance of using low trust to predict distrust is consistently worse than randomly guessing § Task 1 fails to predict distrust with only trust; and distrust is not the negaHon of trust
dTP: It uses trust propagaHon to calculate trust scores for pairs of users dMF: It uses the matrix factorizaHon based predictor to compute trust scores for pairs of users dTP-‐MF: It is the combinaHon of dTP and dMF using OR
![Page 50: SomeChallengingProblemsin Mining%Social%Media%huanliu/dmml_presentation/2014/...Arizona%State%University% Some%Challenging%Problems%in%Mining%Social%Media% %Data%Mining%and%Machine%Learning%Lab!](https://reader036.vdocuments.mx/reader036/viewer/2022081615/5fd93efad3396458d31e2c7e/html5/thumbnails/50.jpg)
Some Challenging Problems in Mining Social Media Arizona State University Data Mining and Machine Learning Lab April 22, 2014 50
Task 2: Can we predict Trust be?er with Distrust
§ If distrust is not the negaHon of trust, distrust should provide addiHonal informaHon about users, and could have added value beyond trust § We seek answer to whether using both trust and distrust informaHon can help achieve beQer performance than using only trust informaHon
§ We can add distrust propagaHon in trust propagaHon to incorporate distrust
![Page 51: SomeChallengingProblemsin Mining%Social%Media%huanliu/dmml_presentation/2014/...Arizona%State%University% Some%Challenging%Problems%in%Mining%Social%Media% %Data%Mining%and%Machine%Learning%Lab!](https://reader036.vdocuments.mx/reader036/viewer/2022081615/5fd93efad3396458d31e2c7e/html5/thumbnails/51.jpg)
Some Challenging Problems in Mining Social Media Arizona State University Data Mining and Machine Learning Lab April 22, 2014 51
EvaluaIon of Trust and Distrust PropagaIon
§ IncorporaHng distrust propagaHon into trust propagaHon can improve the performance of trust measurement § One step distrust propagaHon usually outperforms mulHple step distrust propagaHon
![Page 52: SomeChallengingProblemsin Mining%Social%Media%huanliu/dmml_presentation/2014/...Arizona%State%University% Some%Challenging%Problems%in%Mining%Social%Media% %Data%Mining%and%Machine%Learning%Lab!](https://reader036.vdocuments.mx/reader036/viewer/2022081615/5fd93efad3396458d31e2c7e/html5/thumbnails/52.jpg)
Some Challenging Problems in Mining Social Media Arizona State University Data Mining and Machine Learning Lab April 22, 2014 52 52
• EvaluaHon Dilemma • Sampling Bias in Social Media Data • Noise Removal Fallacy • Studying Distrust in Social Media
Concluding Remarks
![Page 53: SomeChallengingProblemsin Mining%Social%Media%huanliu/dmml_presentation/2014/...Arizona%State%University% Some%Challenging%Problems%in%Mining%Social%Media% %Data%Mining%and%Machine%Learning%Lab!](https://reader036.vdocuments.mx/reader036/viewer/2022081615/5fd93efad3396458d31e2c7e/html5/thumbnails/53.jpg)
Some Challenging Problems in Mining Social Media Arizona State University Data Mining and Machine Learning Lab April 22, 2014 53 53
• Organizers for this wonderful opportunity to share our research work
• Acknowledgments – Grants from NSF, ONR, ARO – DMML members and project leaders – Collaborators
THANKS to …
Ali Abbasi Shamanth Kumar Fred Morsta?er Reza Zafarani Jiliang Tang