graph mining and social networks

Upload: daniele-rossetti

Post on 05-Apr-2018

218 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/31/2019 Graph Mining and Social Networks

    1/23

    Graph Mining and Social Network

    AnalysisData Mining and Text Mining (UIC 583 @ Politecnico di Milano)

  • 7/31/2019 Graph Mining and Social Networks

    2/23

    Daniele Loiacono

    q Jiawei Han and Micheline Kamber, "DataMining: Concepts and Techniques", TheMorgan Kaufmann Series in DataManagement Systems (Second Edition)

    " Chapter 9

  • 7/31/2019 Graph Mining and Social Networks

    3/23

    Graph Mining

  • 7/31/2019 Graph Mining and Social Networks

    4/23

    Daniele Loiacono

    q Graphs are becoming increasingly important to model manyphenomena in a large class of domains (e.g., bioinformatics,computer vision, social analysis)

    q To deal with these needs, many data mining approacheshave been extended also to graphs and trees

    q Major approaches" Mining frequent subgraphs" Indexing" Similarity search" Classification" Clustering

  • 7/31/2019 Graph Mining and Social Networks

    5/23

    Daniele Loiacono

    q Given a labeled graph data setD = {G1,G2,,Gn}

    q We define as theq A subgraph in D is a subgraph with a support

    greater than

    q How to find frequent subgraph?" Apriori-based approach" Pattern-growth approach

  • 7/31/2019 Graph Mining and Social Networks

    6/23

    Daniele Loiacono

    q Apply a level-wise iterative algorithm1. Choose two subgraphs in S two similar subgraphs in a subgraph3. If the new subgraph is add to S4. Restart from 2. until all similar subgraphs have been

    considered. Otherwise restart from 1. and move to k+1.

    q What is subgraph size?" Number of vertex" Number of edges" Number of edge-disjoint paths

    q Two subgraphs of size-k are similar if they have the samesize-(k-1) subgraph

    q AprioriGraph has a big computational cost (due to themerging step)

  • 7/31/2019 Graph Mining and Social Networks

    7/23

  • 7/31/2019 Graph Mining and Social Networks

    8/23

    Daniele Loiacono

    q Closed subgraphs" G is iff there is no proper supergraph G with thesame support of G" Reduce the growth of subgraphs discovered" Is a more compact representation of knowledge

    q Unlabeled (or partially labeled) graphs" Introduce a special label " can match any label or only itself

    q Constrained subgraphs" Containment constraint (edges, vertex, subgraphs)" Geometric constraint" Value constraint

  • 7/31/2019 Graph Mining and Social Networks

    9/23

    Daniele Loiacono

    q Indexing is basilar for effective search and query processingq How to index graphs?q approach takes the as indexing unit

    " All the path up to maxLlength are indexed" Does not scale very well

    q approach takes and as indexing unit" A subgraph is frequent if it has a support greater than a

    threshold

    " A subgraph is discriminative if its support cannot be wellapproximated by the intersection of the graph sets thatcontain one of its subgraphs

  • 7/31/2019 Graph Mining and Social Networks

    10/23

    Daniele Loiacono

    q Mining of frequent subgraphs can be effectively used forclassification and clustering purposes

    q Classification" and subgraphs are used as toperform the classification task" A subgraph is discriminative if it is frequent only in one class ofgraphs and infrequent in the others

    " The threshold on frequency and discriminativeness should betuned to obtain the desired classification results

    q Clustering" The mined frequent subgraphs are used to define between graphs

    " Two graphs that should beconsidered and grouped in the same cluster

    " The threshold on frequency can be tuned to find the desirednumber of clusters

    q As the mining step affects heavily the final outcome, this is anintertwined process rather tan a two-steps process

  • 7/31/2019 Graph Mining and Social Networks

    11/23

    Social Network Analysis

  • 7/31/2019 Graph Mining and Social Networks

    12/23

  • 7/31/2019 Graph Mining and Social Networks

    13/23

    Daniele Loiacono

    q Are social networks random graphs?q

    NO!

  • 7/31/2019 Graph Mining and Social Networks

    14/23

    Daniele Loiacono

    Peter

    Jane

    SarahRalph

  • 7/31/2019 Graph Mining and Social Networks

    15/23

    Daniele Loiacono

    q Definitions" Nodes is the number of incident edges" Network is the max distance within 90% of

    the network

    q Properties"

    ""

    n: number of nodes

    e: number of eges1

  • 7/31/2019 Graph Mining and Social Networks

    16/23

    Daniele Loiacono

    q Several tasks can be identified in the analysis ofsocial networks

    q Link based object classification" Classification of objects on the basis of its attributes, its

    links and attributes of objects linked to it

    " E.g., predict topic of a paper on the basis of Keywords occurrence

    q Link type prediction" Prediction of link type on the basis of objects attributes" E.g., predict if a link between two Web pages is an

    advertising link or not

    q Predicting link existence" Predict the presence of a link between two objects

  • 7/31/2019 Graph Mining and Social Networks

    17/23

    Daniele Loiacono

    q Link cardinality estimation" Prediction of the number of links to an object" Prediction of the number of objects reachable from a

    specific object

    q Object reconciliation" Discover if two objects are the same on the basis of their

    attributes and links" E.g., predict if two websites are mirrors of each otherq Group detection

    " Clustering of objects on the basis both of their attributesand their links

    q Subgraph detection" Discover characteristic subgraphs within network

  • 7/31/2019 Graph Mining and Social Networks

    18/23

    Daniele Loiacono

    q Feature construction"Not only the objects attributes need to be considered butalso attributes of" and techniques must beapplied to reduce the size of search space

    q Collective classification and consolidation" Unlabeled data cannot be classified independently" New objects can be and need to be consideredto consolidate the current model

    q Link prediction" The prior probability of link between two objects may bevery low

    q Community mining from multirelational networks" Many approaches assume an

    while social networks usually representand

  • 7/31/2019 Graph Mining and Social Networks

    19/23

    Daniele Loiacono

    q Link Predictionq Viral Marketing

  • 7/31/2019 Graph Mining and Social Networks

    20/23

    Daniele Loiacono

    q What edges will be added to the network?q

    Given a snapshot of a network at time t, aimsto predict the edges that will be added before a given futuretime t

    q Link prediction is generally solved assigning to each pair ofnodes a weight

    q The higher the scorethe more likely that link will be added inthe near future

    q The score(X,Y)can be computed in several way" : the shortest he path between X and Y thehighest is their score" : the greater the number of neighborsX and Y have in common, the highest is their score

    " : weighted sum of paths thatconnects X and Y (shorter paths have usually largerweights)

  • 7/31/2019 Graph Mining and Social Networks

    21/23

    Daniele Loiacono

    q Several marketing approaches" Mass marketing is targeted on specific segment ofcustomers" Direct marketing is target on specific customers solely on

    the basis of their characteristics

    " tries to exploit the tomaximize the output of marketing actions

    q Each customer has a specific based on" The number of connections" Its role in the network (e.g., opinion leader, listener)" Role of its connections

    q Viral marketing aims to exploit the network value ofcustomers to predict their influence and to maximize theoutcome of marketing actions

  • 7/31/2019 Graph Mining and Social Networks

    22/23

    Daniele Loiacono

    q 500 randomly chosen customers are given a product (from 5000).

  • 7/31/2019 Graph Mining and Social Networks

    23/23

    Daniele Loiacono

    q The 500 most connected consumers are given a product.