probabilistic models for discovering e-communities
DESCRIPTION
Probabilistic Models for Discovering E-Communities. Ding Zhou, Eren Manavoglu, Jia Li, C. Lee Giles, Hongyuan Zha The Pennsylvania State University WWW 2006. Outline. Introduction Related Works Community-User-Topic Models Semantic Community Discovery Experiments Conclusion. Outline. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Probabilistic Models for Discovering E-Communities](https://reader035.vdocuments.mx/reader035/viewer/2022062301/56814e41550346895dbbb0f8/html5/thumbnails/1.jpg)
Probabilistic Models for Probabilistic Models for Discovering E-Discovering E-CommunitiesCommunities
Ding Zhou, Eren Manavoglu, Jia Li,Ding Zhou, Eren Manavoglu, Jia Li,
C. Lee Giles, Hongyuan ZhaC. Lee Giles, Hongyuan Zha
The Pennsylvania State UniversityThe Pennsylvania State University
WWW 2006WWW 2006
![Page 2: Probabilistic Models for Discovering E-Communities](https://reader035.vdocuments.mx/reader035/viewer/2022062301/56814e41550346895dbbb0f8/html5/thumbnails/2.jpg)
OutlineOutline
IntroductionIntroduction
Related WorksRelated Works
Community-User-Topic ModelsCommunity-User-Topic Models
Semantic Community DiscoverySemantic Community Discovery
ExperimentsExperiments
ConclusionConclusion
![Page 3: Probabilistic Models for Discovering E-Communities](https://reader035.vdocuments.mx/reader035/viewer/2022062301/56814e41550346895dbbb0f8/html5/thumbnails/3.jpg)
OutlineOutline
IntroductionIntroduction
Related WorkRelated Work
Community-User-Topic ModelsCommunity-User-Topic Models
Semantic Community DiscoverySemantic Community Discovery
ExperimentsExperiments
ConclusionConclusion
![Page 4: Probabilistic Models for Discovering E-Communities](https://reader035.vdocuments.mx/reader035/viewer/2022062301/56814e41550346895dbbb0f8/html5/thumbnails/4.jpg)
Social Network Analysis Social Network Analysis (SNA)(SNA) SNA is an established field in sociologySNA is an established field in sociology
The goal of SNAThe goal of SNA– Discovering interpersonal relationships based on various Discovering interpersonal relationships based on various
modes of information carriers, such as emails and the Webmodes of information carriers, such as emails and the Web
The community graph structureThe community graph structure– How social actors gather into groups such that they are intrHow social actors gather into groups such that they are intr
a-group close and inter-group loosea-group close and inter-group loose
– An important characteristic of all SNsAn important characteristic of all SNs
![Page 5: Probabilistic Models for Discovering E-Communities](https://reader035.vdocuments.mx/reader035/viewer/2022062301/56814e41550346895dbbb0f8/html5/thumbnails/5.jpg)
Discovering Discovering Community from Email Community from Email CorporaCorpora Typically the SN is constructed by measuring the intensity Typically the SN is constructed by measuring the intensity
of contacts between email users.of contacts between email users.– An edge indicates a communication between two users is An edge indicates a communication between two users is
higher than certain frequency thresholdhigher than certain frequency threshold
– Problematic in some scenariosProblematic in some scenarios A spammer in the email system sends out a lot of messagesA spammer in the email system sends out a lot of messages The lack of semantic interpretationThe lack of semantic interpretation
![Page 6: Probabilistic Models for Discovering E-Communities](https://reader035.vdocuments.mx/reader035/viewer/2022062301/56814e41550346895dbbb0f8/html5/thumbnails/6.jpg)
Proposed MethodProposed Method
The inner community property within SNs are examined The inner community property within SNs are examined by analyzing the semantic information such as emailsby analyzing the semantic information such as emails
A A generative Bayesian networkgenerative Bayesian network is used to model the gene is used to model the generation of communication in an SNration of communication in an SN
Similarity among social actors are modeled as a hidden lSimilarity among social actors are modeled as a hidden layer in the proposed probabilistic modelayer in the proposed probabilistic model
![Page 7: Probabilistic Models for Discovering E-Communities](https://reader035.vdocuments.mx/reader035/viewer/2022062301/56814e41550346895dbbb0f8/html5/thumbnails/7.jpg)
OutlineOutline
IntroductionIntroduction
Related WorkRelated Work
Community-User-Topic ModelsCommunity-User-Topic Models
Semantic Community DiscoverySemantic Community Discovery
ExperimentsExperiments
ConclusionConclusion
![Page 8: Probabilistic Models for Discovering E-Communities](https://reader035.vdocuments.mx/reader035/viewer/2022062301/56814e41550346895dbbb0f8/html5/thumbnails/8.jpg)
Related Work: Document Related Work: Document Content CharacterizationContent Characterization
Several factors, either observable or latent, are modeled as Several factors, either observable or latent, are modeled as variables in the generative Bayesian networkvariables in the generative Bayesian network
Topic-Word modelTopic-Word model– Documents are considered as a mixture of topicsDocuments are considered as a mixture of topics
– Each topic corresponds to a multinomial distribution over wordsEach topic corresponds to a multinomial distribution over words
– Latent Dirichlet Allocation (LDA) [D. Blei et al., 2003]Latent Dirichlet Allocation (LDA) [D. Blei et al., 2003]
![Page 9: Probabilistic Models for Discovering E-Communities](https://reader035.vdocuments.mx/reader035/viewer/2022062301/56814e41550346895dbbb0f8/html5/thumbnails/9.jpg)
Related Work (2)Related Work (2)
Author-Word modelAuthor-Word model– The author The author xx is chosen randomly from is chosen randomly from aadd
[A. McCallum, 1999][A. McCallum, 1999]
Author-Topic modelAuthor-Topic model– Involves both the author and the topicInvolves both the author and the topic
– Perform well for document content Perform well for document content
characterization [M. Steyvers et al., 2004]characterization [M. Steyvers et al., 2004]
![Page 10: Probabilistic Models for Discovering E-Communities](https://reader035.vdocuments.mx/reader035/viewer/2022062301/56814e41550346895dbbb0f8/html5/thumbnails/10.jpg)
OutlineOutline
IntroductionIntroduction
Related WorkRelated Work
Community-User-Topic ModelsCommunity-User-Topic Models
Semantic Community DiscoverySemantic Community Discovery
ExperimentsExperiments
ConclusionConclusion
![Page 11: Probabilistic Models for Discovering E-Communities](https://reader035.vdocuments.mx/reader035/viewer/2022062301/56814e41550346895dbbb0f8/html5/thumbnails/11.jpg)
Community-User-Topic ModeCommunity-User-Topic Models (CUT)ls (CUT) Communication documentCommunication document
– A document carrier of communicationA document carrier of communication
Basic ideaBasic idea– The issue of a communication document indicates the The issue of a communication document indicates the
activities of and is also conditioned on the community activities of and is also conditioned on the community structure within an SNstructure within an SN
– Considering the community as an extra latent variable in Considering the community as an extra latent variable in the Bayesian network in addition to the author and topic the Bayesian network in addition to the author and topic variablesvariables
![Page 12: Probabilistic Models for Discovering E-Communities](https://reader035.vdocuments.mx/reader035/viewer/2022062301/56814e41550346895dbbb0f8/html5/thumbnails/12.jpg)
CUTCUT11: Modeling : Modeling Community with Users Community with Users (1)(1) Assume an SN community is more than a group of usersAssume an SN community is more than a group of users
– Similar to that assumed in a topology-based methodSimilar to that assumed in a topology-based method
– Treat each community as a multinomial distribution over Treat each community as a multinomial distribution over usersusers
![Page 13: Probabilistic Models for Discovering E-Communities](https://reader035.vdocuments.mx/reader035/viewer/2022062301/56814e41550346895dbbb0f8/html5/thumbnails/13.jpg)
CUTCUT11: Modeling : Modeling Community with Users Community with Users (2)(2) Compute the posterior probability Compute the posterior probability PP((cc, , uu, , zz||ww) by comput) by comput
ing ing PP((cc, , uu, , zz, , ww))
A possible side-effect of CUTA possible side-effect of CUT11 is it relaxes the communit is it relaxes the communit
y’s impact on the generated topicsy’s impact on the generated topics
![Page 14: Probabilistic Models for Discovering E-Communities](https://reader035.vdocuments.mx/reader035/viewer/2022062301/56814e41550346895dbbb0f8/html5/thumbnails/14.jpg)
CUTCUT22: Modeling Community : Modeling Community with Topics (1)with Topics (1) An SN community consists of a set of topicsAn SN community consists of a set of topics
CUTCUT22 differs from CUT differs from CUT11 in strengthening the relation in strengthening the relation
between community and topicbetween community and topic
![Page 15: Probabilistic Models for Discovering E-Communities](https://reader035.vdocuments.mx/reader035/viewer/2022062301/56814e41550346895dbbb0f8/html5/thumbnails/15.jpg)
CUTCUT22: Modeling Community : Modeling Community with Topics (2)with Topics (2) Similarly, compute Similarly, compute PP((cc, , uu, , zz||ww) by computing ) by computing PP((cc, , uu, , zz, ,
ww))
A possible side-effect of CUTA possible side-effect of CUT22 is it might lead to loose ti is it might lead to loose ti
es between community and userses between community and users
![Page 16: Probabilistic Models for Discovering E-Communities](https://reader035.vdocuments.mx/reader035/viewer/2022062301/56814e41550346895dbbb0f8/html5/thumbnails/16.jpg)
OutlineOutline
IntroductionIntroduction
Related WorkRelated Work
Community-User-Topic ModelsCommunity-User-Topic Models
Semantic Community DiscoverySemantic Community Discovery
ExperimentsExperiments
ConclusionConclusion
![Page 17: Probabilistic Models for Discovering E-Communities](https://reader035.vdocuments.mx/reader035/viewer/2022062301/56814e41550346895dbbb0f8/html5/thumbnails/17.jpg)
Practical Algorithm: Practical Algorithm: Gibbs SamplingGibbs Sampling Gibbs sampling is an algorithm to approximate the joint Gibbs sampling is an algorithm to approximate the joint
distribution of multiple variables by drawing a sequence distribution of multiple variables by drawing a sequence of samplesof samples
Gibbs sampling is a Markov chain Monte Carlo Gibbs sampling is a Markov chain Monte Carlo algorithm and usually applies when the conditional algorithm and usually applies when the conditional probability distribution of each variable can be evaluatedprobability distribution of each variable can be evaluated
![Page 18: Probabilistic Models for Discovering E-Communities](https://reader035.vdocuments.mx/reader035/viewer/2022062301/56814e41550346895dbbb0f8/html5/thumbnails/18.jpg)
Gibbs Sampling for Gibbs Sampling for CUTCUT
![Page 19: Probabilistic Models for Discovering E-Communities](https://reader035.vdocuments.mx/reader035/viewer/2022062301/56814e41550346895dbbb0f8/html5/thumbnails/19.jpg)
Estimation of the Estimation of the Conditional ProbabilityConditional Probability Estimating Estimating PP((ccii, , uuii, , zzii||wwii) for CUT) for CUT11 and CUT and CUT22
CUTCUT11::
CUTCUT22::
![Page 20: Probabilistic Models for Discovering E-Communities](https://reader035.vdocuments.mx/reader035/viewer/2022062301/56814e41550346895dbbb0f8/html5/thumbnails/20.jpg)
EnF-Gibbs: Gibbs Sampling wEnF-Gibbs: Gibbs Sampling with Entropy Filteringith Entropy Filtering
• Non-informative words are ignored after A times of iterations
![Page 21: Probabilistic Models for Discovering E-Communities](https://reader035.vdocuments.mx/reader035/viewer/2022062301/56814e41550346895dbbb0f8/html5/thumbnails/21.jpg)
OutlineOutline
IntroductionIntroduction
Related WorkRelated Work
Community-User-Topic ModelsCommunity-User-Topic Models
Semantic Community DiscoverySemantic Community Discovery
ExperimentsExperiments
ConclusionConclusion
![Page 22: Probabilistic Models for Discovering E-Communities](https://reader035.vdocuments.mx/reader035/viewer/2022062301/56814e41550346895dbbb0f8/html5/thumbnails/22.jpg)
Experiment SetupExperiment Setup
Data: Enron email datasetData: Enron email dataset– Made public by Federal Energy Regulatory CommissionMade public by Federal Energy Regulatory Commission
Fix the number of communities Fix the number of communities CC at 6 and the number of at 6 and the number of topics topics TT at 20 at 20
The smoothing hyper-parameters α, β and γ were set at The smoothing hyper-parameters α, β and γ were set at 5/T, 0.01 and 0.1 respectively5/T, 0.01 and 0.1 respectively
![Page 23: Probabilistic Models for Discovering E-Communities](https://reader035.vdocuments.mx/reader035/viewer/2022062301/56814e41550346895dbbb0f8/html5/thumbnails/23.jpg)
Experiment Result-1Experiment Result-1
Table 1: Topics discovered by CUTTable 1: Topics discovered by CUT11
Table 2: AbbreviationsTable 2: Abbreviations
![Page 24: Probabilistic Models for Discovering E-Communities](https://reader035.vdocuments.mx/reader035/viewer/2022062301/56814e41550346895dbbb0f8/html5/thumbnails/24.jpg)
Experiment Result-2Experiment Result-2
Fig: Communities/topics of an employeeFig: Communities/topics of an employee
![Page 25: Probabilistic Models for Discovering E-Communities](https://reader035.vdocuments.mx/reader035/viewer/2022062301/56814e41550346895dbbb0f8/html5/thumbnails/25.jpg)
Experiment Result-3Experiment Result-3
Fig: A community discovered by CUT2
![Page 26: Probabilistic Models for Discovering E-Communities](https://reader035.vdocuments.mx/reader035/viewer/2022062301/56814e41550346895dbbb0f8/html5/thumbnails/26.jpg)
Experiment Result-4Experiment Result-4
D..steffes = vice president of Enron in charge of government affairsD..steffes = vice president of Enron in charge of government affairs
Cara.semperger = a senior analystCara.semperger = a senior analyst
Mike.grigsby = a marketing managerMike.grigsby = a marketing manager
Rick.buy = chief risk management officerRick.buy = chief risk management officer
![Page 27: Probabilistic Models for Discovering E-Communities](https://reader035.vdocuments.mx/reader035/viewer/2022062301/56814e41550346895dbbb0f8/html5/thumbnails/27.jpg)
Experiment Result-5Experiment Result-5
Similarity between two clustering results:Similarity between two clustering results:
Fig: Community similarity comparisonsFig: Community similarity comparisons
![Page 28: Probabilistic Models for Discovering E-Communities](https://reader035.vdocuments.mx/reader035/viewer/2022062301/56814e41550346895dbbb0f8/html5/thumbnails/28.jpg)
Experiment Result-6Experiment Result-6
Fig: Efficiency of EnF-GibbsFig: Efficiency of EnF-Gibbs
![Page 29: Probabilistic Models for Discovering E-Communities](https://reader035.vdocuments.mx/reader035/viewer/2022062301/56814e41550346895dbbb0f8/html5/thumbnails/29.jpg)
OutlineOutline
IntroductionIntroduction
Related WorkRelated Work
Community-User-Topic ModelsCommunity-User-Topic Models
Semantic Community DiscoverySemantic Community Discovery
ExperimentsExperiments
ConclusionConclusion
![Page 30: Probabilistic Models for Discovering E-Communities](https://reader035.vdocuments.mx/reader035/viewer/2022062301/56814e41550346895dbbb0f8/html5/thumbnails/30.jpg)
Conclusion and Conclusion and Future WorkFuture Work Two versions of Community-User-Topic models are presTwo versions of Community-User-Topic models are pres
ented for community discovery in SNs.ented for community discovery in SNs.
EnF-Gibbs sampling is introduced by extending Gibbs saEnF-Gibbs sampling is introduced by extending Gibbs sampling with entropy filteringmpling with entropy filtering
Experiments show that the proposed method effectively tExperiments show that the proposed method effectively tags communities with topic semanticsags communities with topic semantics
It would be interesting to explore the predictive performaIt would be interesting to explore the predictive performance of these models on new communications between strnce of these models on new communications between strange social actors in SNsange social actors in SNs
![Page 31: Probabilistic Models for Discovering E-Communities](https://reader035.vdocuments.mx/reader035/viewer/2022062301/56814e41550346895dbbb0f8/html5/thumbnails/31.jpg)
Illustration of Dirichlet DisIllustration of Dirichlet Distributiontribution
Several images of the probability density of the Dirichlet distribution Several images of the probability density of the Dirichlet distribution when when KK=3 for various parameter vectors =3 for various parameter vectors αα. Clockwise from top left: . Clockwise from top left: αα=(6, 2, 2), (3, 7, 5), (6, 2, 6), (2, 3, 4). =(6, 2, 2), (3, 7, 5), (6, 2, 6), (2, 3, 4).