overlapping communities - hpi.de · overlapping communities graph mining course winter semester...

40
Overlapping Communities Graph Mining course Winter Semester 2017 Davide Mottin Hasso Plattner Institute

Upload: others

Post on 23-Mar-2020

12 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Overlapping Communities - hpi.de · Overlapping Communities Graph Mining course Winter Semester 2017 Davide Mottin ... ACM International Conference on Web Search and Data Mining (WSDM),

Overlapping Communities

Graph Mining course Winter Semester 2017

DavideMottinHasso Plattner Institute

Page 2: Overlapping Communities - hpi.de · Overlapping Communities Graph Mining course Winter Semester 2017 Davide Mottin ... ACM International Conference on Web Search and Data Mining (WSDM),

Acknowledgements

§ Mostofthislectureistakenfrom:http://web.stanford.edu/class/cs224w/slides

GRAPH MINING WS 2017 2

Page 3: Overlapping Communities - hpi.de · Overlapping Communities Graph Mining course Winter Semester 2017 Davide Mottin ... ACM International Conference on Web Search and Data Mining (WSDM),

Lecture road

Introductiontographclustering

Hierarchicalapproaches

Spectralclustering

GRAPH MINING WS 2017 3

Page 4: Overlapping Communities - hpi.de · Overlapping Communities Graph Mining course Winter Semester 2017 Davide Mottin ... ACM International Conference on Web Search and Data Mining (WSDM),

Identifying Communities

Nodes:FootballTeamsEdges:Gamesplayed

Canweidentifynodegroups?(communities,

modules,clusters)

GRAPH MINING WS 2017 4

Page 5: Overlapping Communities - hpi.de · Overlapping Communities Graph Mining course Winter Semester 2017 Davide Mottin ... ACM International Conference on Web Search and Data Mining (WSDM),

College Football NetworkAtlanticFootballCups/conferences

Nodes:FootballTeamsEdges:Gamesplayed

GRAPH MINING WS 2017 5

Page 6: Overlapping Communities - hpi.de · Overlapping Communities Graph Mining course Winter Semester 2017 Davide Mottin ... ACM International Conference on Web Search and Data Mining (WSDM),

Protein-Protein Interactions

Canweidentifyfunctionalmodules?

Nodes:ProteinsEdges:Physicalinteractions

GRAPH MINING WS 2017 6

Page 7: Overlapping Communities - hpi.de · Overlapping Communities Graph Mining course Winter Semester 2017 Davide Mottin ... ACM International Conference on Web Search and Data Mining (WSDM),

Protein-Protein Interactions

Functionalmodules

Nodes:ProteinsEdges:Physicalinteractions

GRAPH MINING WS 2017 7

Page 8: Overlapping Communities - hpi.de · Overlapping Communities Graph Mining course Winter Semester 2017 Davide Mottin ... ACM International Conference on Web Search and Data Mining (WSDM),

Facebook Network

Canweidentifysocialcommunities?

Nodes:FacebookUsersEdges:Friendships

GRAPH MINING WS 2017 8

Page 9: Overlapping Communities - hpi.de · Overlapping Communities Graph Mining course Winter Semester 2017 Davide Mottin ... ACM International Conference on Web Search and Data Mining (WSDM),

Facebook Network

High school Summerinternship

Stanford (Squash)Stanford (Basketball)

Socialcommunities

Nodes:FacebookUsersEdges:Friendships

GRAPH MINING WS 2017 9

Page 10: Overlapping Communities - hpi.de · Overlapping Communities Graph Mining course Winter Semester 2017 Davide Mottin ... ACM International Conference on Web Search and Data Mining (WSDM),

Overlapping Communities

§ Non-overlappingvs.overlappingcommunities

GRAPH MINING WS 2017 10

Page 11: Overlapping Communities - hpi.de · Overlapping Communities Graph Mining course Winter Semester 2017 Davide Mottin ... ACM International Conference on Web Search and Data Mining (WSDM),

Non-overlapping Communities

Network Adjacencymatrix

Nodes

Nod

es

GRAPH MINING WS 2017 11

Page 12: Overlapping Communities - hpi.de · Overlapping Communities Graph Mining course Winter Semester 2017 Davide Mottin ... ACM International Conference on Web Search and Data Mining (WSDM),

Communities as Tiles!

§ What is the structure of community overlaps:Edgedensityintheoverlapsishigher!

Communitiesas“tiles”GRAPH MINING WS 2017 12

Page 13: Overlapping Communities - hpi.de · Overlapping Communities Graph Mining course Winter Semester 2017 Davide Mottin ... ACM International Conference on Web Search and Data Mining (WSDM),

Recap so far…

Thisiswhatwewant!Communitiesinanetwork

GRAPH MINING WS 2017 13

Page 14: Overlapping Communities - hpi.de · Overlapping Communities Graph Mining course Winter Semester 2017 Davide Mottin ... ACM International Conference on Web Search and Data Mining (WSDM),

Plan of attack

§ 1)Givenamodel,wegeneratethenetwork:

§ 2)Givenanetwork,findthe“best”model

C

A

B

D E

H

F

G

C

A

B

D E

H

F

G

Generativemodelfornetworks

Generativemodelfornetworks

GRAPH MINING WS 2017 14

Page 15: Overlapping Communities - hpi.de · Overlapping Communities Graph Mining course Winter Semester 2017 Davide Mottin ... ACM International Conference on Web Search and Data Mining (WSDM),

Model of networks

§ Goal: Defineamodelthatcangeneratenetworks• Themodelwillhaveasetof“parameters”thatwewilllaterwanttoestimate(anddetectcommunities)

§ Q:Givenasetofnodes,howdocommunities“generate”edgesofthenetwork?

C

A

B

D E

H

F

G

Generativemodelfornetworks

GRAPH MINING WS 2017 15

Page 16: Overlapping Communities - hpi.de · Overlapping Communities Graph Mining course Winter Semester 2017 Davide Mottin ... ACM International Conference on Web Search and Data Mining (WSDM),

Community-Affiliation Graph

§ GenerativemodelB(V,C,M,{pc})forgraphs:• NodesV,CommunitiesC,MembershipsM• Eachcommunityc hasasingleprobabilitypc

• Laterwefitthemodeltonetworkstodetectcommunities

Model

Network

Communities,C

Nodes,V

Model

pA pB

Memberships,M

GRAPH MINING WS 2017 16

Page 17: Overlapping Communities - hpi.de · Overlapping Communities Graph Mining course Winter Semester 2017 Davide Mottin ... ACM International Conference on Web Search and Data Mining (WSDM),

AGM: Generative Process

§ AGMgeneratesthelinks:Foreach• Foreachpairofnodesincommunity𝑨,weconnectthemwithprob.𝒑𝑨

• Theoveralledgeprobabilityis:

Model

ÕÇÎ

--=vu MMc

cpvuP )1(1),(

Network

Communities,C

Nodes,VCommunity Affiliations

pApB

Memberships,M

If𝒖, 𝒗 shareno communities:𝑷 𝒖, 𝒗 = 𝜺Think of this as an “OR” function: If at least 1 community says “YES” we create an edge

𝑴𝒖 … set of communities node 𝒖 belongs to

GRAPH MINING WS 2017 17

Page 18: Overlapping Communities - hpi.de · Overlapping Communities Graph Mining course Winter Semester 2017 Davide Mottin ... ACM International Conference on Web Search and Data Mining (WSDM),

Recap: AGM networks

Model

NetworkGRAPH MINING WS 2017 18

Page 19: Overlapping Communities - hpi.de · Overlapping Communities Graph Mining course Winter Semester 2017 Davide Mottin ... ACM International Conference on Web Search and Data Mining (WSDM),

AGM: Flexibility

§ AGMcanexpressavarietyofcommunitystructures:Non-overlapping,Overlapping,Nested

GRAPH MINING WS 2017 19

Page 20: Overlapping Communities - hpi.de · Overlapping Communities Graph Mining course Winter Semester 2017 Davide Mottin ... ACM International Conference on Web Search and Data Mining (WSDM),

Detecting Communities

§ DetectingcommunitieswithAGM:

C

A

B

D E

H

F

G

GivenaGraph𝑮(𝑽, 𝑬),findtheModel1) AffiliationgraphM2) NumberofcommunitiesC3) Parameterspc

GRAPH MINING WS 2017 20

generate

infer

Page 21: Overlapping Communities - hpi.de · Overlapping Communities Graph Mining course Winter Semester 2017 Davide Mottin ... ACM International Conference on Web Search and Data Mining (WSDM),

Maximum Likelihood Estimation

§ MaximumLikelihoodPrinciple(MLE):• Given: Data𝑿• Assumption: Dataisgeneratedbysomemodel𝒇(𝚯)⁃ 𝒇 …model⁃ 𝚯 …modelparameters

• Wanttoestimate𝑷𝒇 𝑿 𝚯):⁃ Theprobabilitythatourmodel𝒇 (withparameters𝜣)generatedthedata

• Nowlet’sfindthemostlikelymodelthatcouldhavegeneratedthedata:argmax

9𝑷𝒇 𝑿 𝚯)

GRAPH MINING WS 2017 21

Page 22: Overlapping Communities - hpi.de · Overlapping Communities Graph Mining course Winter Semester 2017 Davide Mottin ... ACM International Conference on Web Search and Data Mining (WSDM),

Example: MLE

§ Imaginewearegivenasetofcoinflips§ Task: Figureoutthebiasofacoin!

• Data: Sequenceofcoinflips:𝑿 = [𝟏, 𝟎, 𝟎, 𝟎, 𝟏, 𝟎, 𝟎, 𝟏]• Model:𝒇 𝚯 = return1withprob.Θ, elsereturn0• Whatis𝑷𝒇 𝑿 𝚯 ?Assumingcoinflipsareindependent⁃ So,𝑷𝒇 𝑿 𝚯 = 𝑷𝒇 𝟏 𝚯 ∗ 𝑷𝒇 𝟎 𝚯 ∗ 𝑷𝒇 𝟎 𝚯 …∗ 𝑷𝒇 𝟏 𝚯▪ Whatis𝑷𝒇 𝟏 𝚯 ?Simple, 𝑷𝒇 𝟏 𝚯 = 𝚯⁃ Then, 𝑷𝒇 𝑿 𝚯 = 𝚯𝟑 𝟏 − 𝚯 𝟓

⁃ Forexample:▪ 𝑷𝒇 𝑿 𝚯 = 𝟎. 𝟓 = 𝟎. 𝟎𝟎𝟑𝟗𝟎𝟔

▪ 𝑷𝒇 𝑿 𝚯 = 𝟑𝟖= 𝟎. 𝟎𝟎𝟓𝟎𝟐𝟗

• Whatdidwelearn? Ourdatawasmostlikelygeneratedbycoinwithbias 𝚯 = 𝟑/𝟖

𝑷𝒇𝑿𝚯

𝚯

𝚯∗ = 𝟑/𝟖

GRAPH MINING WS 2017 22

Page 23: Overlapping Communities - hpi.de · Overlapping Communities Graph Mining course Winter Semester 2017 Davide Mottin ... ACM International Conference on Web Search and Data Mining (WSDM),

MLE for Graphs

§ HowdowedoMLEforgraphs?• Modelgeneratesaprobabilisticadjacencymatrix• Wethenflipalltheentriesoftheprobabilisticmatrixtoobtainthebinaryadjacencymatrix𝑨

§ ThelikelihoodofAGMgeneratinggraphG:

0 0.9 0.4 0.040.1 0 0.85 0.750.1 0.77 0 0.60.04 0.65 0.7 0

0 1 0 01 0 1 10 1 0 10 1 1 0

Foreverypairofnodes𝒖, 𝒗 AGMgivestheprob.𝒑𝒖𝒗 ofthembeinglinked

Flip biased coins

)),(1(),()|(),(),(

vuPvuPGPEvuEvu

-PP=QÏÎ

𝑨

GRAPH MINING WS 2017 23

Page 24: Overlapping Communities - hpi.de · Overlapping Communities Graph Mining course Winter Semester 2017 Davide Mottin ... ACM International Conference on Web Search and Data Mining (WSDM),

Graphs: Likelihood P(G|Θ)

24GRAPH MINING WS 2017

GivengraphG(V,E) andΘ, wecalculatelikelihoodthatΘ generatedG: P(G|Θ)

0 1 1 01 0 1 01 1 0 10 0 1 0

0 0.9 0.9 00.9 0 0.9 00.9 0.9 0 0.90 0 0.9 0Θ=B(V,C,M,{pc})

GP(G|Θ)

)),(1(),()|(),(),(

vuPvuPGPEvuEvu

-PP=QÏÎ

G

A B

Page 25: Overlapping Communities - hpi.de · Overlapping Communities Graph Mining course Winter Semester 2017 Davide Mottin ... ACM International Conference on Web Search and Data Mining (WSDM),

MLE for Graphs

§ Ourgoal: Find𝚯 = 𝑩(𝑽, 𝑪,𝑴, 𝒑𝑪 ) suchthat:

§ Howdowefind𝑩(𝑽, 𝑪,𝑴, 𝒑𝑪 ) thatmaximizesthelikelihood?

QP( | )AGM

argmaxQ

𝑮

GRAPH MINING WS 2017 25

Page 26: Overlapping Communities - hpi.de · Overlapping Communities Graph Mining course Winter Semester 2017 Davide Mottin ... ACM International Conference on Web Search and Data Mining (WSDM),

MLE for AGM

§ Ourgoalistofind𝑩 𝑽, 𝑪,𝑴, 𝒑𝑪 suchthat:argmax

L(𝑽,𝑪,𝑴, 𝒑𝑪 )M 𝑷(𝒖, 𝒗) M(𝟏 − 𝑷 𝒖, 𝒗

𝒖𝒗∉𝑬

)�

𝒖,𝒗∈𝑬

§ Problem:FindingBmeansfindingthebipartiteaffiliationnetwork.

• Thereisnonicewaytodothis.• Fitting𝑩(𝑽, 𝑪,𝑴, 𝒑𝑪 ) is too hard, let’schangethemodel(soitiseasiertofit)!

GRAPH MINING WS 2017 26

Page 27: Overlapping Communities - hpi.de · Overlapping Communities Graph Mining course Winter Semester 2017 Davide Mottin ... ACM International Conference on Web Search and Data Mining (WSDM),

From AGM to BigCLAM

§ Relaxation:Membershipshavestrengths

• 𝑭𝒖𝑨: Themembershipstrengthofnode𝒖tocommunity𝑨 (𝑭𝒖𝑨 = 𝟎:nomembership)

• Eachcommunity𝑨 linksnodesindependently:𝑷𝑨 𝒖, 𝒗 = 𝟏 − 𝐞𝐱𝐩(−𝑭𝒖𝑨 ⋅ 𝑭𝒗𝑨)

𝑭𝒖𝑨

u v

GRAPH MINING WS 2017 27

Page 28: Overlapping Communities - hpi.de · Overlapping Communities Graph Mining course Winter Semester 2017 Davide Mottin ... ACM International Conference on Web Search and Data Mining (WSDM),

Factor Matrix 𝑭§ Communitymembershipstrengthmatrix𝑭

𝑭 =

j

Communities

Nod

es

𝑭𝒗𝑨 … strength of 𝒖’s membership to 𝑨

𝑭𝒖 … vector of community membershipstrengths of 𝒖

¡ 𝑷𝑨 𝒖, 𝒗 = 𝟏 − 𝐞𝐱𝐩(−𝑭𝒖𝑨 ⋅ 𝑭𝒗𝑨)§ Probabilityofconnection is proportionaltotheproductofstrengths§ Notice:Ifonenodedoesn’tbelongtothecommunity(𝐹XY = 0)then𝑷(𝒖, 𝒗) = 𝟎

¡ Prob.thatatleastonecommoncommunity𝑪 linksthenodes:§ 𝑷 𝒖, 𝒗 = 𝟏 −∏ 𝟏 − 𝑷𝑪 𝒖, 𝒗�

𝑪

GRAPH MINING WS 2017 28

Page 29: Overlapping Communities - hpi.de · Overlapping Communities Graph Mining course Winter Semester 2017 Davide Mottin ... ACM International Conference on Web Search and Data Mining (WSDM),

From AGM to BigCLAM

§ Community𝑨 linksnodes𝒖, 𝒗 independently:𝑷𝑨 𝒖, 𝒗 = 𝟏 − 𝐞𝐱𝐩(−𝑭𝒖𝑨 ⋅ 𝑭𝒗𝑨)

§ Thenprob.atleastonecommon𝑪 linksthem:

𝑷 𝒖, 𝒗 = 𝟏 −∏ 𝟏 − 𝑷𝑪 𝒖, 𝒗�𝑪

= 𝟏 − 𝐞𝐱𝐩(−∑ 𝑭𝒖𝑪 ⋅ 𝑭𝒗𝑪�𝑪 )

= 𝟏 − 𝐞𝐱𝐩(−𝑭𝒖 ⋅ 𝑭𝒗𝑻)§ Example𝑭matrix:

𝑭𝒖 :

𝑭𝒗 :

Then:𝑭𝒖 ⋅ 𝑭𝒗𝑻 = 𝟎. 𝟏𝟔And:𝑷 𝒖, 𝒗 = 𝟏 − 𝒆𝒙𝒑 −𝟎. 𝟏𝟔 = 𝟎. 𝟏𝟒But:𝑷 𝒖,𝒘 = 𝟎. 𝟖𝟖

𝑷 𝒗,𝒘 = 𝟎𝑭𝒘 :Node community

membership strengths

0 1.2 0 0.2

0.5 0 0 0.8

0 1.8 1 0

GRAPH MINING WS 2017 29

Page 30: Overlapping Communities - hpi.de · Overlapping Communities Graph Mining course Winter Semester 2017 Davide Mottin ... ACM International Conference on Web Search and Data Mining (WSDM),

BigCLAM: How to find F

§ Task:Givenanetwork𝑮(𝑽, 𝑬),estimate𝑭• Find𝑭 thatmaximizesthelikelihood:

𝒂𝒓𝒈𝒎𝒂𝒙𝑭 M 𝑷(𝒖, 𝒗�

(𝒖,𝒗)∈𝑬

) M (𝟏 − 𝑷 𝒖, 𝒗 )�

𝒖,𝒗 ∉𝑬⁃ where:𝑷(𝒖, 𝒗) = 𝟏 − 𝐞𝐱𝐩(−𝑭𝒖 ⋅ 𝑭𝒗𝑻)⁃ Manytimeswetakethelogarithmofthelikelihood,andcallitlog-likelihood:𝒍 𝑭 = 𝐥𝐨𝐠𝑷(𝑮|𝑭)

§ Goal:Find𝑭 thatmaximizes𝒍(𝑭):

GRAPH MINING WS 2017 30

Page 31: Overlapping Communities - hpi.de · Overlapping Communities Graph Mining course Winter Semester 2017 Davide Mottin ... ACM International Conference on Web Search and Data Mining (WSDM),

BigCLAM: V1.0

§ Computegradientofasinglerow𝑭𝒖 of𝑭:

§ Coordinategradientascent:• Iterateovertherowsof𝑭:⁃ Computegradient𝜵𝒍 𝑭𝒖 ofrow𝒖 (whilekeepingothersfixed)⁃ Updatetherow𝑭𝒖:𝑭𝒖 ← 𝑭𝒖 + 𝜼𝛁𝒍(𝑭𝒖)⁃ Project𝑭𝒖 backtoanon-negativevector:If𝑭𝒖𝑪 < 𝟎:𝑭𝒖𝑪 = 𝟎

§ Thisisslow! Computing𝜵𝒍 𝑭𝒖 takeslineartime!

𝓝(𝒖)..Setoutoutgoingneighbors

GRAPH MINING WS 2017 31

Page 32: Overlapping Communities - hpi.de · Overlapping Communities Graph Mining course Winter Semester 2017 Davide Mottin ... ACM International Conference on Web Search and Data Mining (WSDM),

BigCLAM: V2.0

§ However,wenotice:

• Wecache∑ 𝑭𝒗�𝒗

• So,computing∑ 𝑭𝒗�𝒗∉𝓝(𝒖) nowtakeslineartime

inthedegree|𝓝 𝒖 | of𝒖⁃ Innetworksdegreeofanodeismuchsmallertothetotalnumberofnodesinthenetwork,sothisisasignificantspeedup!

GRAPH MINING WS 2017 32

Page 33: Overlapping Communities - hpi.de · Overlapping Communities Graph Mining course Winter Semester 2017 Davide Mottin ... ACM International Conference on Web Search and Data Mining (WSDM),

BigClam: Scalability

§ BigCLAM takes5minutesfor300knodenets• Othermethodstake10days

§ Canprocessnetworkswith100Medges!

GRAPH MINING WS 2017 33

Page 34: Overlapping Communities - hpi.de · Overlapping Communities Graph Mining course Winter Semester 2017 Davide Mottin ... ACM International Conference on Web Search and Data Mining (WSDM),

Extension: Beyond Clusters

GRAPH MINING WS 2017 34

Page 35: Overlapping Communities - hpi.de · Overlapping Communities Graph Mining course Winter Semester 2017 Davide Mottin ... ACM International Conference on Web Search and Data Mining (WSDM),

Extension: Directed AGM

§ Extension:Makecommunitymembershipedgesdirected!

• Outgoingmembership: Nodes“sends”edges• Incomingmembership: Node“receives”edges

GRAPH MINING WS 2017 35

Page 36: Overlapping Communities - hpi.de · Overlapping Communities Graph Mining course Winter Semester 2017 Davide Mottin ... ACM International Conference on Web Search and Data Mining (WSDM),

Example: Model and Network

GRAPH MINING WS 2017 36

Page 37: Overlapping Communities - hpi.de · Overlapping Communities Graph Mining course Winter Semester 2017 Davide Mottin ... ACM International Conference on Web Search and Data Mining (WSDM),

Directed AGM

§ Everythingisalmostthesameexceptnowwehave2matrices:𝑭 and𝑯

• 𝑭…out-goingcommunitymemberships• 𝑯…in-comingcommunitymemberships

§ Edgeprob.:𝑷 𝒖, 𝒗 = 𝟏 − 𝒆𝒙𝒑(−𝑭𝒖𝑯𝒗𝑻)

§ Networklog-likelihood:

whichweoptimizethesamewayasbefore

𝑭 𝑯

GRAPH MINING WS 2017 37

Page 38: Overlapping Communities - hpi.de · Overlapping Communities Graph Mining course Winter Semester 2017 Davide Mottin ... ACM International Conference on Web Search and Data Mining (WSDM),

Predator-prey Communities

GRAPH MINING WS 2017 38

Page 39: Overlapping Communities - hpi.de · Overlapping Communities Graph Mining course Winter Semester 2017 Davide Mottin ... ACM International Conference on Web Search and Data Mining (WSDM),

Questions?

GRAPH MINING WS 2017 39

Page 40: Overlapping Communities - hpi.de · Overlapping Communities Graph Mining course Winter Semester 2017 Davide Mottin ... ACM International Conference on Web Search and Data Mining (WSDM),

References§ Yang,J.andLeskovec,J.Community-affiliationgraphmodelforoverlappingnetwork

communitydetection. ICDE,2012.

§ OverlappingCommunityDetectionatScale:ANonnegativeMatrixFactorizationApproach byJ.Yang,J.Leskovec. ACMInternationalConferenceonWebSearchandDataMining(WSDM),2013.

§ DetectingCohesiveand2-modeCommunitiesinDirectedandUndirectedNetworks byJ.Yang,J.McAuley,J.Leskovec. ACMInternationalConferenceonWebSearchandDataMining(WSDM),2014.

§ CommunityDetectioninNetworkswithNodeAttributes byJ.Yang,J.McAuley,J.Leskovec. IEEEInternationalConferenceOnDataMining(ICDM),2013.

GRAPH MINING WS 2017 40