node similarity and classification - hasso-plattner-institut · node similarity and classification...

30
Node similarity and classification Graph Mining course Winter Semester 2017 Davide Mottin, Anton Tsitsulin Hasso Plattner Institute

Upload: others

Post on 16-Sep-2019

16 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Node similarity and classification - Hasso-Plattner-Institut · Node similarity and classification Graph Mining course Winter Semester 2017 Davide Mottin, Anton Tsitsulin HassoPlattnerInstitute

Node similarity and classification

Graph Mining course Winter Semester 2017

DavideMottin,AntonTsitsulinHasso Plattner Institute

Page 2: Node similarity and classification - Hasso-Plattner-Institut · Node similarity and classification Graph Mining course Winter Semester 2017 Davide Mottin, Anton Tsitsulin HassoPlattnerInstitute

Acknowledgements

§ Somepartofthislectureistakenfrom:http://web.eecs.umich.edu/~dkoutra/tut/icdm14.html

§ Otheradaptedcontentisfrom SocialNetworkDataAnalytics(Springer)Ed.Charu Aggarwal,March2011

GRAPH MINING WS 2017 2

Page 3: Node similarity and classification - Hasso-Plattner-Institut · Node similarity and classification Graph Mining course Winter Semester 2017 Davide Mottin, Anton Tsitsulin HassoPlattnerInstitute

3GRAPH MINING WS 2017

FriendsnetworkinanAmericanhighschool:1. Nodesarepeople2. Edgesarefriendship3. Colors=races

"Race, school integration, and friendship segregation in America," American Journal of Sociology 107, 679-716 (2001).

?

Whichcolorwillthishave?

Page 4: Node similarity and classification - Hasso-Plattner-Institut · Node similarity and classification Graph Mining course Winter Semester 2017 Davide Mottin, Anton Tsitsulin HassoPlattnerInstitute

What about politics?

Aretheyrepublicanordemocrats?

?

Source: http://adequatebird.com/2010/05/03/the-political-blogosphere-and-the-2004-u-s-election-divided-they-blog/

Page 5: Node similarity and classification - Hasso-Plattner-Institut · Node similarity and classification Graph Mining course Winter Semester 2017 Davide Mottin, Anton Tsitsulin HassoPlattnerInstitute

5

Give me your own example!

GRAPH MINING WS 2017

Page 6: Node similarity and classification - Hasso-Plattner-Institut · Node similarity and classification Graph Mining course Winter Semester 2017 Davide Mottin, Anton Tsitsulin HassoPlattnerInstitute

Classification with Network Data

§ Givenagraphandfewnodesforwhichweknowthe”label”ora”class”howcanwepredict userattributesorinterests?

Predictthelabelsfornonmarkednodes?

GRAPH MINING WS 2017 6

Page 7: Node similarity and classification - Hasso-Plattner-Institut · Node similarity and classification Graph Mining course Winter Semester 2017 Davide Mottin, Anton Tsitsulin HassoPlattnerInstitute

Why node classification?

§ Isthisafriendoranaquitance?§ Recommendationsystemstosuggestobjects(music,movies,

activities)§ Automaticallyunderstandrolesinanetwork(hubs,activators,

influencingnodes,…)§ Identifyexpertsforquestionansweringsystems§ Targetedadvertising§ Studyofcommunities(keyindividuals,groupstarters...)§ Studyofdiseasesandcures§ Identifyunusualbehaviorsorbehavioralchanges§ Findingsimilarnodesandoutliers

GRAPH MINING WS 2017 7

Page 8: Node similarity and classification - Hasso-Plattner-Institut · Node similarity and classification Graph Mining course Winter Semester 2017 Davide Mottin, Anton Tsitsulin HassoPlattnerInstitute

Why node classification is useful?

§ Notallthenodeshavelabels(usersarenotwillingtoprovideexplanations)

§ Rolesarenotexplicitlydeclared(whoismoreimportantinacompany?Thinkabouttheexchangedemails;))

§ Labelsprovidedbytheuserscanbemisleading§ Labelsaresparse(somecategoriesmightbemissingor

incomplete)

GRAPH MINING WS 2017 8

Page 9: Node similarity and classification - Hasso-Plattner-Institut · Node similarity and classification Graph Mining course Winter Semester 2017 Davide Mottin, Anton Tsitsulin HassoPlattnerInstitute

Node classification problem

§ Given:• Graph𝐺: 𝑉, 𝐸,𝑊 withverticesV,edgesEandweightmatrixW• Labelednodes𝑉'() ⊂ 𝑉,unlabelednodes𝑉+, = 𝑉 ∖ 𝑉'()• 𝒴 thesetofm possiblelabels (e.g.,𝒴={republican,democrat})• 𝑌'() = 𝑦D, … , y' thelabelsonlabelednodesin𝑉'()

§ Problem:Inferlabels𝑌+, forallnodesin𝑉+,

GRAPH MINING WS 2017

G2

1

1

𝑉'()

𝑉+,

? ?

?

?

𝒴={1,2}

9

Page 10: Node similarity and classification - Hasso-Plattner-Institut · Node similarity and classification Graph Mining course Winter Semester 2017 Davide Mottin, Anton Tsitsulin HassoPlattnerInstitute

Node classification problem (2)

§ Canbegeneralizedtomultilabel andmulticlassclassification:• Withmulticlassclassificationassumethateachlabelednodehasaprobabilitydistributiononthelabels.

§ Canworkongeneralizedgraphstructures• hypergraphs,graphswithweighted,labeled,timestampededges,multigraphs,probabilisticgraphsandsoon.

GRAPH MINING WS 2017 10

Page 11: Node similarity and classification - Hasso-Plattner-Institut · Node similarity and classification Graph Mining course Winter Semester 2017 Davide Mottin, Anton Tsitsulin HassoPlattnerInstitute

Influential factors on Networks

§ Individualbehaviorsarecorrelatedinanetworkenvironment

Homophily Influence Confounding

GRAPH MINING WS 2017 11

Beingrepublican→ Friendship

Friendship→ Sameshoes

BorninBerlin→ (individual)Likeelectronicmusic(connection)participatetomarathon

Externalfactor

Page 12: Node similarity and classification - Hasso-Plattner-Institut · Node similarity and classification Graph Mining course Winter Semester 2017 Davide Mottin, Anton Tsitsulin HassoPlattnerInstitute

The importance of the graph structure

§ Thegraphstructureencodesimportantinformationfornodeclassification

§ Soitisreasonabletothinkthatlabelspropagateinthenetworkfollowingthelinks

§ Methodsthatworkwithpointsinthespaceperformpoorlyinagraph

GRAPH MINING WS 2017

Assumption:Thelabelpropagatesonthenetwork

12

Page 13: Node similarity and classification - Hasso-Plattner-Institut · Node similarity and classification Graph Mining course Winter Semester 2017 Davide Mottin, Anton Tsitsulin HassoPlattnerInstitute

Node features

§ Nodefeatures:measurablecharacteristicsofthenodesthathelpdiscriminatinganodefromanotherorstatingthethesimilaritywithothernodes.

§ Examplesoffeatures:• In/outdegreeofthenode• Numberofl-labelededgesfromthatnode• Numberofpathsinthatgoesthroughthenode• Numberoftriangles• Degreeandnumberwithinego-netedges• …

GRAPH MINING WS 2017 13

Page 14: Node similarity and classification - Hasso-Plattner-Institut · Node similarity and classification Graph Mining course Winter Semester 2017 Davide Mottin, Anton Tsitsulin HassoPlattnerInstitute

Node classification approaches

§ Similaritybased• Findnodesthatsharethesamecharacteristicswithothernodes

§ Iterativelearning• Learnasetoflabelsandpropagatetheinformationtosimilarnodes

§ Labelpropagation• Labelednodespropagatetheinformationtotheneighborswithsomeprobability

GRAPH MINING WS 2017 14

Page 15: Node similarity and classification - Hasso-Plattner-Institut · Node similarity and classification Graph Mining course Winter Semester 2017 Davide Mottin, Anton Tsitsulin HassoPlattnerInstitute

Lecture road

Similaritybased

Iterativeclassification

GRAPH MINING WS 2017

Labelpropagation

15

Page 16: Node similarity and classification - Hasso-Plattner-Institut · Node similarity and classification Graph Mining course Winter Semester 2017 Davide Mottin, Anton Tsitsulin HassoPlattnerInstitute

Real-world Applications

GRAPH MINING WS 2017 16

Page 17: Node similarity and classification - Hasso-Plattner-Institut · Node similarity and classification Graph Mining course Winter Semester 2017 Davide Mottin, Anton Tsitsulin HassoPlattnerInstitute

Movies recommendations

GRAPH MINING WS 2017 17

Page 18: Node similarity and classification - Hasso-Plattner-Institut · Node similarity and classification Graph Mining course Winter Semester 2017 Davide Mottin, Anton Tsitsulin HassoPlattnerInstitute

Search Engines (IR)Topical Sessions

“popularmusicvideos”

QueriesURLs

“music”

“yahoo”

similar

GRAPH MINING WS 2017 18

Page 19: Node similarity and classification - Hasso-Plattner-Institut · Node similarity and classification Graph Mining course Winter Semester 2017 Davide Mottin, Anton Tsitsulin HassoPlattnerInstitute

Similarity based approaches

§ Equivalencesintermsofstructure• Structural,Automorphic,andRegular

§ Roleextractionmethods:• RolX

§ Recursivesimilarities• Paths,Max-flow,SimRank

19GRAPH MINING WS 2017

Page 20: Node similarity and classification - Hasso-Plattner-Institut · Node similarity and classification Graph Mining course Winter Semester 2017 Davide Mottin, Anton Tsitsulin HassoPlattnerInstitute

History: Equivalences

20GRAPH MINING WS 2017

Twonodesuandvarestructurally equivalentiftheyhavethesamerelationshipstoallothernodes

(Lorrainandwhite,F.,1971)

Twonodesuandvareautomorphically equivalentifallthenodescanberelabeled toformanisomorphicgraphwiththe

labelsofuandvinterchanged(justchangethenodeid)(Borgatti andEverett,1992)

Twonodesuandvareregularly equivalentiftheyareequallyrelatedtoequivalentothers

(Borgatti andEverett,1992)

Page 21: Node similarity and classification - Hasso-Plattner-Institut · Node similarity and classification Graph Mining course Winter Semester 2017 Davide Mottin, Anton Tsitsulin HassoPlattnerInstitute

Regular equivalence

Borgatti,S.P.andEverett,M.G.,1992.Regularblockmodels ofmultiway,multimodematrices.SocialNetworks.

GRAPH MINING WS 2017

§ Assumesasimilaritybetweensetsofnodes

Billy John

Prof.Einstein Prof.Hilbert

Professors

Students

BillyandJohnaresimilarbecausetheyarebothconnectedtoaprofessor.Sameforprof.EinsteinandHilbert

Regularequivalencedoesn’tcareaboutwhichconnectionsbuttowhichset/groupanodeisconnected

21

Twonodesuandvareregularly equivalentiftheyareequallyrelatedtoequivalentothers

(Borgatti andEverett,1992)

Page 22: Node similarity and classification - Hasso-Plattner-Institut · Node similarity and classification Graph Mining course Winter Semester 2017 Davide Mottin, Anton Tsitsulin HassoPlattnerInstitute

Relation among equivalences

GRAPH MINING WS 2017

Whatistherelationamongthethreeequivalences?

Regular

Automorphic

Structural

22

Page 23: Node similarity and classification - Hasso-Plattner-Institut · Node similarity and classification Graph Mining course Winter Semester 2017 Davide Mottin, Anton Tsitsulin HassoPlattnerInstitute

RolX: Role eXtraction algorithm

Henderson,K.,Gallagher,B.,Eliassi-Rad,T.,Tong,H.,Basu,S.,Akoglu,L.,Koutra,D.,Faloutsos,C.andLi,L.,2012.Rolx:structuralroleextraction&mininginlargegraphs.SIGKDD

GRAPH MINING WS 2017

Adjacencymatrix(𝑛×𝑛)

RecursiveFeatureExtraction

NodexFeaturematrix

RoleExtraction

NodexRolematrix

RolexFeatureMatrix

Input

Output

ndimensionalspace

r dimensionalspace

ddimspace

Non-negativematrixfactorization(NMF)

23

Page 24: Node similarity and classification - Hasso-Plattner-Institut · Node similarity and classification Graph Mining course Winter Semester 2017 Davide Mottin, Anton Tsitsulin HassoPlattnerInstitute

Recursive Feature extraction (ReFex)

Henderson,K.,Gallagher,B.,Li,L.,Akoglu,L.,Eliassi-Rad,T.,Tong,H.andFaloutsos,C.,2011.It'swhoyouknow:graphminingusingrecursivestructuralfeatures.SIGKDD.

GRAPH MINING WS 2017

§ Transformthenetworkconnectivityintorecursivestructuralfeatures.

§ Technically,embedsthegraphintoan|ℱ| dimensionalspace,whereℱisasetoffeatures(degree,self-loops,avg edgeweight,#ofedgesinegonet)

24

Page 25: Node similarity and classification - Hasso-Plattner-Institut · Node similarity and classification Graph Mining course Winter Semester 2017 Davide Mottin, Anton Tsitsulin HassoPlattnerInstitute

ReFeX: mining features

§ Local:• Measuresofthenodedegree

§ Egonet:• Theegonet (orego-network)ofanodeisthenodeitself,theadjecent nodes,andthegraphinducedbythosenodes

• Computedbasedoneachnode’segonetwork:#ofwithin-egonet edges,#ofedgesentering&leavingtheegonet

§ Recursive• Someaggregate(mean,sum,max,min,…)ofanotherfeatureoveranode’sneighbors

• Theaggregationcanbecomputedoveranyreal-valuedfeature,includingotherrecursivefeatures(thisprocessmightnotstopifuncontrolled!!!)

GRAPH MINING WS 2017

Neighborhood

Regional

25

Page 26: Node similarity and classification - Hasso-Plattner-Institut · Node similarity and classification Graph Mining course Winter Semester 2017 Davide Mottin, Anton Tsitsulin HassoPlattnerInstitute

ReFex (2)

§ Numberofpossiblerecursivefeaturesisinfinite§ ReFeX pruning• Featurevaluesaremappedtosmallintegersviaverticallogarithmicbinning• Logbinning:discretizethefeaturestakingnonuniform(butlogarithmic)bins=thefirstp|V|nodeswiththelowestfeaturevalueareassignedtobin0,thandividetheremaining|V|- p|V|takingthefirstp(|V|- p|V|)nodesandsoon.

• Logarithmicbinningincreasethechancestwofeatures

§ Lookpairsoffeatureswhosevaluesneverdisagreeatanynodebymorethanathresholds,andconnectinagraph.Foreachcomponenttakeonefeature.

§ Agraphbasedapproach(motivatedbypowerlawdistribution)§ Thresholdautomaticallyset

GRAPH MINING WS 2017 26

Page 27: Node similarity and classification - Hasso-Plattner-Institut · Node similarity and classification Graph Mining course Winter Semester 2017 Davide Mottin, Anton Tsitsulin HassoPlattnerInstitute

RolX: Role eXtraction algorithm

Henderson,K.,Gallagher,B.,Eliassi-Rad,T.,Tong,H.,Basu,S.,Akoglu,L.,Koutra,D.,Faloutsos,C.andLi,L.,2012.Rolx:structuralroleextraction&mininginlargegraphs.SIGKDD

GRAPH MINING WS 2017

Adjacencymatrix(𝑛×𝑛)

RecursiveFeatureExtraction

NodexFeaturematrix

RoleExtraction

NodexRolematrix

RolexFeatureMatrix

Input

Output

ndimensionalspace

r dimensionalspace

ddimspace

Non-negativematrixfactorization(NMF)

27

Page 28: Node similarity and classification - Hasso-Plattner-Institut · Node similarity and classification Graph Mining course Winter Semester 2017 Davide Mottin, Anton Tsitsulin HassoPlattnerInstitute

Role extraction: Feature grouping

§ Findr overlappingclustersinthefeaturespace• Eachnodecanhavemultiplerolesatthesametime

§ GeneratearankrapproximationofthenodexfeaturematrixV§ Usenon-negativematrixfactorization:𝑉 ≈ 𝐺𝐹

§ TheGmatrixassignsnodestoroles§ TheFmatrixrepresentshowthefeaturesexplaintheroles

GRAPH MINING WS 2017 28

Page 29: Node similarity and classification - Hasso-Plattner-Institut · Node similarity and classification Graph Mining course Winter Semester 2017 Davide Mottin, Anton Tsitsulin HassoPlattnerInstitute

A (very brief) glimpse to matrix factorization

§ IfVisamatrix,itispossibletofindanapproximationofthismatrixmultiplyingtwo(lowerrank)matrices

§ Inparticular,wewanttofindtwomatricesG,F,suchthatV ≈ 𝐺𝐹

Example: 3 46 8 = 1

2 34

§ However, theexactfactorizationisnotalwayspossible!!!§ Idea:letusfindG, 𝐹, withG ≥ 0, 𝐹 ≥ 0 suchthat

argminZ,[

𝑉 − 𝐺𝐹 [

where ⋅ [ istheFrobenius norm§ Intuitively:youwanttominimizethedifferencebetweenthe

singleelementsofthematrixV andtheproductGF

GRAPH MINING WS 2017 29

?

Page 30: Node similarity and classification - Hasso-Plattner-Institut · Node similarity and classification Graph Mining course Winter Semester 2017 Davide Mottin, Anton Tsitsulin HassoPlattnerInstitute

§ Rolessummarizethebehaviororalternatively,theycompressthefeaturematrixV(lowerdimensiondescription)

§ Whatisthebestmodel?

§ Idea:usetheMinimumDescriptionLength(MDL)paradigmtoselectthenumberofrolesthatresultsinthebestcompression• L:descriptionlength• M:#ofbitstodescribethemodel• E:costofdescribingtheerrorsin𝑉 − 𝐺𝐹• Findrsuchthatitminimizes𝐿 = 𝑀 + 𝐸

§ Minimize𝑀 + 𝐸• Assuminganyvaluerequiresbbits(therolevalues),thanthenumberofbitsforMis𝑀 = 𝑏𝑟(𝑛 + 𝑓),why?(thinkaboutthedimensionofthematrices)

• WhataboutE?Eistheamountoferrors.However,since𝑉 − 𝐺𝐹 isnotnormallydistributed,RolX usestheKLdivergence𝐸 = ∑ 𝑉e,f log

gh,i(Z[)h,i

− 𝑉e,f + (𝐺𝐹)e,f�e,f

Thebestmodelistheonethathasfewererrors andrequireslessspace

Selecting the right number of roles

GRAPH MINING WS 2017 30