overlapping communities - hasso plattner instituteacm international conference on web search and...

42
Overlapping Communities Graph Mining course Winter Semester 2016 Davide Mottin, Konstantina Lazaridou Hasso Plattner Institute

Upload: others

Post on 23-Mar-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Overlapping Communities

Graph Mining course Winter Semester 2016

DavideMottin,KonstantinaLazaridouHasso Plattner Institute

and

§ NextweekJanuary24,secondroundpresentations• Slidesaredueby January23,18.00

§ Projectreports:• DuebyJanuary24,18.00• OnlyPDFfiles• [email protected] [email protected]• Thisisastrictdeadline

§ WearelookingforaStudentassistantforGraphMiningtasks• SendapplicationswithtakenexamsandpreviousexperiencetoDavide• ExperimentsforInteractivegraphexploration researchprototypes• Implementationofaunifiedgraphminingframework• JavaoC++(Pythonmightbetooslow)skillsrequired

2GRAPH MINING WS 2016

Acknowledgements

§ Mostofthislectureistakenfrom:http://web.stanford.edu/class/cs224w/slides

GRAPH MINING WS 2016 3

Lecture road

Introductiontographclustering

Hierarchicalapproaches

Spectralclustering

GRAPH MINING WS 2016 4

Identifying Communities

Nodes:FootballTeamsEdges:Gamesplayed

Canweidentifynodegroups?(communities,

modules,clusters)

GRAPH MINING WS 2016 5

NCAA Football NetworkNCAAconferences

Nodes:FootballTeamsEdges:Gamesplayed

GRAPH MINING WS 2016 6

Protein-Protein Interactions

Canweidentifyfunctionalmodules?

Nodes:ProteinsEdges:Physicalinteractions

GRAPH MINING WS 2016 7

Protein-Protein Interactions

Functionalmodules

Nodes:ProteinsEdges:Physicalinteractions

GRAPH MINING WS 2016 8

Facebook Network

Canweidentifysocialcommunities?

Nodes:FacebookUsersEdges:Friendships

GRAPH MINING WS 2016 9

Facebook Network

High school Summerinternship

Stanford (Squash)Stanford (Basketball)

Socialcommunities

Nodes:FacebookUsersEdges:Friendships

GRAPH MINING WS 2016 10

Overlapping Communities

§ Non-overlappingvs.overlappingcommunities

GRAPH MINING WS 2016 11

Non-overlapping Communities

Network Adjacencymatrix

Nodes

Nod

es

GRAPH MINING WS 2016 12

Communities as Tiles!

§ Whatisthestructureofcommunityoverlaps:Edgedensityintheoverlapsishigher!

Communitiesas“tiles”GRAPH MINING WS 2016 13

Recap so far…

Thisiswhatwewant!Communitiesinanetwork

GRAPH MINING WS 2016 14

Plan of attack

§ 1)Givenamodel,wegeneratethenetwork:

§ 2)Givenanetwork,findthe“best”model

C

A

B

D E

H

F

G

C

A

B

D E

H

F

G

Generativemodelfornetworks

Generativemodelfornetworks

GRAPH MINING WS 2016 15

Model of networks

§ Goal: Defineamodelthatcangeneratenetworks• Themodelwillhaveasetof“parameters”thatwewilllaterwanttoestimate(anddetectcommunities)

§ Q:Givenasetofnodes,howdocommunities“generate”edgesofthenetwork?

C

A

B

D E

H

F

G

Generativemodelfornetworks

GRAPH MINING WS 2016 16

Community-Affiliation Graph

§ GenerativemodelB(V,C,M,{pc})forgraphs:• NodesV,CommunitiesC,MembershipsM• Eachcommunityc hasasingleprobabilitypc

• Laterwefitthemodeltonetworkstodetectcommunities

Model

Network

Communities,C

Nodes,V

Model

pA pB

Memberships,M

GRAPH MINING WS 2016 17

AGM: Generative Process

§ AGMgeneratesthelinks:Foreach• Foreachpairofnodesincommunity𝑨,weconnectthemwithprob.𝒑𝑨

• Theoveralledgeprobabilityis:

Model

ÕÇÎ

--=vu MMc

cpvuP )1(1),(

Network

Communities,C

Nodes,VCommunity Affiliations

pA pB

Memberships,M

If𝒖, 𝒗 shareno communities:𝑷 𝒖, 𝒗 = 𝜺Think of this as an “OR” function: If at least 1 community says “YES” we create an edge

𝑴𝒖 … set of communities node 𝒖 belongs to

GRAPH MINING WS 2016 18

Recap: AGM networks

Model

NetworkGRAPH MINING WS 2016 19

AGM: Flexibility

§ AGMcanexpressavarietyofcommunitystructures:Non-overlapping,Overlapping,Nested

GRAPH MINING WS 2016 20

Detecting Communities

§ DetectingcommunitieswithAGM:

C

A

B

D E

H

F

G

GivenaGraph𝑮(𝑽, 𝑬),findtheModel1) AffiliationgraphM2) NumberofcommunitiesC3) Parameterspc

GRAPH MINING WS 2016 21

Maximum Likelihood Estimation

§ MaximumLikelihoodPrinciple(MLE):• Given: Data𝑿• Assumption: Dataisgeneratedbysomemodel𝒇(𝚯)⁃ 𝒇 …model⁃ 𝚯 …modelparameters

• Wanttoestimate𝑷𝒇 𝑿 𝚯):⁃ Theprobabilitythatourmodel𝒇 (withparameters𝜣)generatedthedata

• Nowlet’sfindthemostlikelymodelthatcouldhavegeneratedthedata:argmax

9𝑷𝒇 𝑿 𝚯)

GRAPH MINING WS 2016 22

Example: MLE

§ Imaginewearegivenasetofcoinflips§ Task: Figureoutthebiasofacoin!

• Data: Sequenceofcoinflips:𝑿 = [𝟏, 𝟎, 𝟎, 𝟎, 𝟏, 𝟎, 𝟎, 𝟏]• Model:𝒇 𝚯 = return1withprob.Θ, elsereturn0• Whatis𝑷𝒇 𝑿 𝚯 ?Assumingcoinflipsareindependent⁃ So,𝑷𝒇 𝑿 𝚯 = 𝑷𝒇 𝟏 𝚯 ∗ 𝑷𝒇 𝟎 𝚯 ∗ 𝑷𝒇 𝟎 𝚯 …∗ 𝑷𝒇 𝟏 𝚯▪ Whatis𝑷𝒇 𝟏 𝚯 ?Simple, 𝑷𝒇 𝟏 𝚯 = 𝚯⁃ Then, 𝑷𝒇 𝑿 𝚯 = 𝚯𝟑 𝟏 − 𝚯 𝟓

⁃ Forexample:▪ 𝑷𝒇 𝑿 𝚯 = 𝟎. 𝟓 = 𝟎. 𝟎𝟎𝟑𝟗𝟎𝟔

▪ 𝑷𝒇 𝑿 𝚯 = 𝟑𝟖= 𝟎. 𝟎𝟎𝟓𝟎𝟐𝟗

• Whatdidwelearn? Ourdatawasmostlikelygeneratedbycoinwithbias 𝚯 = 𝟑/𝟖

𝑷𝒇𝑿𝚯

𝚯

𝚯∗ = 𝟑/𝟖

GRAPH MINING WS 2016 23

MLE for Graphs

§ HowdowedoMLEforgraphs?• Modelgeneratesaprobabilisticadjacencymatrix• Wethenflipalltheentriesoftheprobabilisticmatrixtoobtainthebinaryadjacencymatrix𝑨

§ ThelikelihoodofAGMgeneratinggraphG:

0 0.10 0.10 0.040.10 0 0.02 0.060.10 0.02 0 0.060.04 0.06 0.06 0

0 1 0 01 0 1 10 1 0 10 1 1 0

Foreverypairofnodes𝒖, 𝒗 AGMgivestheprob.𝒑𝒖𝒗 ofthembeinglinked

Flip biased coins

)),(1(),()|(),(),(

vuPvuPGPEvuEvu

-PP=QÏÎ

𝑨

GRAPH MINING WS 2016 24

Graphs: Likelihood P(G|Θ)

25GRAPH MINING WS 2016

GivengraphG(V,E) andΘ, wecalculatelikelihoodthatΘ generatedG: P(G|Θ)

0 1 1 01 0 1 01 1 0 10 0 1 0

0 0.9 0.9 00.9 0 0.9 00.9 0.9 0 0.90 0 0.9 0Θ=B(V,C,M,{pc})

GP(G|Θ)

)),(1(),()|(),(),(

vuPvuPGPEvuEvu

-PP=QÏÎ

G

A B

MLE for Graphs

§ Ourgoal: Find𝚯 = 𝑩(𝑽, 𝑪,𝑴, 𝒑𝑪 ) suchthat:

§ Howdowefind𝑩(𝑽, 𝑪,𝑴, 𝒑𝑪 ) thatmaximizesthelikelihood?

QP( | )AGM

argmaxQ

𝑮

GRAPH MINING WS 2016 26

MLE for AGM

§ Ourgoalistofind𝑩 𝑽, 𝑪,𝑴, 𝒑𝑪 suchthat:argmax

L(𝑽,𝑪,𝑴, 𝒑𝑪 )M 𝑷(𝒖, 𝒗) M(𝟏 − 𝑷 𝒖, 𝒗

𝒖𝒗∉𝑬

)�

𝒖,𝒗∈𝑬

§ Problem:FindingBmeansfindingthebipartiteaffiliationnetwork.

• Thereisnonicewaytodothis.• Fitting𝑩(𝑽, 𝑪,𝑴, 𝒑𝑪 ) istoohard,let’schangethemodel(soitiseasiertofit)!

GRAPH MINING WS 2016 27

From AGM to BigCLAM

§ Relaxation:Membershipshavestrengths

• 𝑭𝒖𝑨: Themembershipstrengthofnode𝒖tocommunity𝑨 (𝑭𝒖𝑨 = 𝟎:nomembership)

• Eachcommunity𝑨 linksnodesindependently:𝑷𝑨 𝒖, 𝒗 = 𝟏 − 𝐞𝐱𝐩(−𝑭𝒖𝑨 ⋅ 𝑭𝒗𝑨)

𝑭𝒖𝑨

u v

GRAPH MINING WS 2016 28

Factor Matrix 𝑭§ Communitymembershipstrengthmatrix𝑭

𝑭 =

j

Communities

Nod

es

𝑭𝒗𝑨 … strength of 𝒖’s membership to 𝑨

𝑭𝒖 … vector of community membershipstrengths of 𝒖

¡ 𝑷𝑨 𝒖, 𝒗 = 𝟏 − 𝐞𝐱𝐩(−𝑭𝒖𝑨 ⋅ 𝑭𝒗𝑨)§ Probabilityofconnectionisproportionaltotheproductofstrengths§ Notice:Ifonenodedoesn’tbelongtothecommunity(𝐹XY = 0)then𝑷(𝒖, 𝒗) = 𝟎

¡ Prob.thatatleastonecommoncommunity𝑪 linksthenodes:§ 𝑷 𝒖, 𝒗 = 𝟏 −∏ 𝟏 − 𝑷𝑪 𝒖, 𝒗�

𝑪

GRAPH MINING WS 2016 29

From AGM to BigCLAM

§ Community𝑨 linksnodes𝒖, 𝒗 independently:𝑷𝑨 𝒖, 𝒗 = 𝟏 − 𝐞𝐱𝐩(−𝑭𝒖𝑨 ⋅ 𝑭𝒗𝑨)

§ Thenprob.atleastonecommon𝑪 linksthem:

𝑷 𝒖, 𝒗 = 𝟏 −∏ 𝟏 − 𝑷𝑪 𝒖, 𝒗�𝑪

= 𝟏 − 𝐞𝐱𝐩(−∑ 𝑭𝒖𝑪 ⋅ 𝑭𝒗𝑪�𝑪 )

= 𝟏 − 𝐞𝐱𝐩(−𝑭𝒖 ⋅ 𝑭𝒗𝑻)§ Example𝑭matrix:

𝑭𝒖 :

𝑭𝒗 :

Then:𝑭𝒖 ⋅ 𝑭𝒗𝑻 = 𝟎. 𝟏𝟔And:𝑷 𝒖, 𝒗 = 𝟏 − 𝒆𝒙𝒑 −𝟎. 𝟏𝟔 = 𝟎. 𝟏𝟒But:𝑷 𝒖,𝒘 = 𝟎. 𝟖𝟖

𝑷 𝒗,𝒘 = 𝟎𝑭𝒘 :Node community

membership strengths

0 1.2 0 0.2

0.5 0 0 0.8

0 1.8 1 0

GRAPH MINING WS 2016 30

BigCLAM: How to find F

§ Task:Givenanetwork𝑮(𝑽, 𝑬),estimate𝑭• Find𝑭 thatmaximizesthelikelihood:

𝒂𝒓𝒈𝒎𝒂𝒙𝑭 M 𝑷(𝒖, 𝒗�

(𝒖,𝒗)∈𝑬

) M (𝟏 − 𝑷 𝒖, 𝒗 )�

𝒖,𝒗 ∉𝑬⁃ where:𝑷(𝒖, 𝒗) = 𝟏 − 𝐞𝐱𝐩(−𝑭𝒖 ⋅ 𝑭𝒗𝑻)⁃ Manytimeswetakethelogarithmofthelikelihood,andcallitlog-likelihood:𝒍 𝑭 = 𝐥𝐨𝐠𝑷(𝑮|𝑭)

§ Goal:Find𝑭 thatmaximizes𝒍(𝑭):

GRAPH MINING WS 2016 31

BigCLAM: V1.0

§ Computegradientofasinglerow𝑭𝒖 of𝑭:

§ Coordinategradientascent:• Iterateovertherowsof𝑭:⁃ Computegradient𝜵𝒍 𝑭𝒖 ofrow𝒖 (whilekeepingothersfixed)⁃ Updatetherow𝑭𝒖:𝑭𝒖 ← 𝑭𝒖 + 𝜼𝛁𝒍(𝑭𝒖)⁃ Project𝑭𝒖 backtoanon-negativevector:If𝑭𝒖𝑪 < 𝟎:𝑭𝒖𝑪 = 𝟎

§ Thisisslow! Computing𝜵𝒍 𝑭𝒖 takeslineartime!

𝓝(𝒖)..Setoutoutgoingneighbors

GRAPH MINING WS 2016 32

BigCLAM: V2.0

§ However,wenotice:

• Wecache∑ 𝑭𝒗�𝒗

• So,computing∑ 𝑭𝒗�𝒗∉𝓝(𝒖) nowtakeslineartime

inthedegree|𝓝 𝒖 | of𝒖⁃ Innetworksdegreeofanodeismuchsmallertothetotalnumberofnodesinthenetwork,sothisisasignificantspeedup!

GRAPH MINING WS 2016 33

BigClam: Scalability

§ BigCLAM takes5minutesfor300knodenets• Othermethodstake10days

§ Canprocessnetworkswith100Medges!

GRAPH MINING WS 2016 34

Extension: Beyond Clusters

GRAPH MINING WS 2016 35

Extension: Directed AGM

§ Extension:Makecommunitymembershipedgesdirected!

• Outgoingmembership: Nodes“sends”edges• Incomingmembership: Node“receives”edges

GRAPH MINING WS 2016 36

Example: Model and Network

GRAPH MINING WS 2016 37

Directed AGM

§ Everythingisalmostthesameexceptnowwehave2matrices:𝑭 and𝑯

• 𝑭…out-goingcommunitymemberships• 𝑯…in-comingcommunitymemberships

§ Edgeprob.:𝑷 𝒖, 𝒗 = 𝟏 − 𝒆𝒙𝒑(−𝑭𝒖𝑯𝒗𝑻)

§ Networklog-likelihood:

whichweoptimizethesamewayasbefore

𝑭 𝑯

GRAPH MINING WS 2016 38

Predator-prey Communities

GRAPH MINING WS 2016 39

In the next episode …

Studentpresentation(Secondgroup)

Anomalydetection

Andmuchmore…

GRAPH MINING WS 2016 40

Questions?

GRAPH MINING WS 2016 41

References§ Yang,J.andLeskovec,J.Community-affiliationgraphmodelforoverlappingnetwork

communitydetection. ICDE,2012.

§ OverlappingCommunityDetectionatScale:ANonnegativeMatrixFactorizationApproach byJ.Yang,J.Leskovec. ACMInternationalConferenceonWebSearchandDataMining(WSDM),2013.

§ DetectingCohesiveand2-modeCommunitiesinDirectedandUndirectedNetworks byJ.Yang,J.McAuley,J.Leskovec. ACMInternationalConferenceonWebSearchandDataMining(WSDM),2014.

§ CommunityDetectioninNetworkswithNodeAttributes byJ.Yang,J.McAuley,J.Leskovec. IEEEInternationalConferenceOnDataMining(ICDM),2013.

GRAPH MINING WS 2016 42