[ieee 2010 international conference on advances in social networks analysis and mining (asonam 2010)...

3
Mining Potential Partnership through Opportunity Discovery in Research Networks Alessandro Cucchiarelli Polytechnic University of Marche Italy email:[email protected] Fulvio D’Antonio Polytechnic University of Marche Italy email:[email protected] Abstract—The paper introduces a formalisation of opportu- nities, as situations that can be exploited obtaining valuable outcomes, in the context of the social networks, and defines a methodology for discovering opportunities through the analysis of the relation among network actors. The proposed methodology is then applied to the research-oriented networks, whose members share paper coauthorship or potential research interests. Finally, its validity is tested by evaluating the research collaborations opportunities exploited in the context of two distinct research communities, modelled through the analysis of their publications over time. I. I NTRODUCTION Social Networks [1] are models of communities whose actors are linked together by some kind of social relation- ship (e.g. friendship, marriage, business collaboration). The analysis of the relations’ web can lead to the definition of opportunities, as situations that can be exploited obtaining valuable outcomes. For example, in [2] “weak ties” (where the strength of a tie is considered to be the combination of the amount of time, the emotional intensity and other factors) are seen as a very important source of opportunities because they are the bridges that join otherwise isolated communities, while in [3] structural holes, separating non-redundant sources of information, are considered a relevant source of broker opportunities because the sides of the hole are more additive than overlapping. Opportunity types strictly depend on the nature of social relationships existing in a specific network: examples include finding new friends, meeting a soul-mate or establishing new business collaboration. Discovering them can involve creativ- ity, experience, knowledge of the domain, social capital [4] and intuition. Moreover, it is an highly subjective activity: different people will discover different opportunities based mostly on their previous experiences and beliefs [5]. In this paper we propose a general model for discovering proficient collaboration opportunities in a social network, and apply it to research-oriented networks, whose nodes represent research organizations/industries producing research papers/documents individually or by joint collaboration. Research collaboration is traditionally established on the basis of compatible or complementary research themes, but it also heavily depends on human and social aspects such as mutual trust, previous successful collaboration or political choices. We shall only focus our attention on the former as- pect: the assumption is that potentially successful collaboration is based on the similarity of research interests. This ensures a common background knowledge between potential partners that facilitate mutual understanding, thus speeding up the joint- research production. The paper introduces the opportunity networks, modelled as a directed attributed multi-graph, an opportunity exploiter, a set of opportunity patterns and opportunity ranking functions. II. OPPORTUNITY NETWORKS An opportunity network is a tuple (G,E,O,R) where G is a network modelling the relationships among actors in the domain of interest, represented as a directed attributed multi-graph, E is an Exploiter, i.e. an actor interested in discovering opportunities, O = Opp 1 , ..., Opp k is a set of Opportunity patterns expressed as graph transformations, and R = Ranking 1 , ..., Ranking h is a set of Ranking functions, where every Ranking i : M atches(Opp i ) →< + is a mapping from the set of possible matches of the pattern Opp i in G to non-negative real numbers. Informally speaking, graph transformations can be modelled as a pair of graphs (L, R), called left hand side L and right hand side R. Applying the transformation p =(L, R) means finding a match of the graph L in the graph G and substituting L by R obtaining the transformed graph G 0 . Technically the matching process is called embedding and it is carried out by constructing a graph morphism from L to G. It is not the aim of this paper to detail a description of graph transformations concepts (see [6] and [7]). The concept of opportunity here is modelled to reflect a potential transformation of the graph by the exploiter; if a pattern match is found in the network, the exploiter can try to act in the real world to carry out the transformation. Transformations induced in the network may include (in order of increasing complexity) modifying node/attributes, adding new edges, creating cliques, deleting subgroups, etc. The exploiter is the actor searching for opportunities in order to obtain valuable outcomes from them. No special constraint is given about the nature of the exploiter: it may be an internal exploiter, i.e. a node of the graph or a sub-graph (a single enterprise or a consortium), or an external exploiter interested but not directly involved in the phenomena occurring in the 2010 International Conference on Advances in Social Networks Analysis and Mining 978-0-7695-4138-9/10 $26.00 © 2010 IEEE DOI 10.1109/ASONAM.2010.71 404 2010 International Conference on Advances in Social Networks Analysis and Mining 978-0-7695-4138-9/10 $26.00 © 2010 IEEE DOI 10.1109/ASONAM.2010.71 404 2010 International Conference on Advances in Social Networks Analysis and Mining 978-0-7695-4138-9/10 $26.00 © 2010 IEEE DOI 10.1109/ASONAM.2010.71 404

Upload: fulvio

Post on 19-Mar-2017

214 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: [IEEE 2010 International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2010) - Odense, Denmark (2010.08.9-2010.08.11)] 2010 International Conference on Advances

Mining Potential Partnership through OpportunityDiscovery in Research NetworksAlessandro Cucchiarelli

Polytechnic University of MarcheItaly

email:[email protected]

Fulvio D’AntonioPolytechnic University of Marche

Italyemail:[email protected]

Abstract—The paper introduces a formalisation of opportu-nities, as situations that can be exploited obtaining valuableoutcomes, in the context of the social networks, and defines amethodology for discovering opportunities through the analysisof the relation among network actors. The proposed methodologyis then applied to the research-oriented networks, whose membersshare paper coauthorship or potential research interests. Finally,its validity is tested by evaluating the research collaborationsopportunities exploited in the context of two distinct researchcommunities, modelled through the analysis of their publicationsover time.

I. INTRODUCTION

Social Networks [1] are models of communities whoseactors are linked together by some kind of social relation-ship (e.g. friendship, marriage, business collaboration). Theanalysis of the relations’ web can lead to the definition ofopportunities, as situations that can be exploited obtainingvaluable outcomes. For example, in [2] “weak ties” (wherethe strength of a tie is considered to be the combination ofthe amount of time, the emotional intensity and other factors)are seen as a very important source of opportunities becausethey are the bridges that join otherwise isolated communities,while in [3] structural holes, separating non-redundant sourcesof information, are considered a relevant source of brokeropportunities because the sides of the hole are more additivethan overlapping.

Opportunity types strictly depend on the nature of socialrelationships existing in a specific network: examples includefinding new friends, meeting a soul-mate or establishing newbusiness collaboration. Discovering them can involve creativ-ity, experience, knowledge of the domain, social capital [4]and intuition. Moreover, it is an highly subjective activity:different people will discover different opportunities basedmostly on their previous experiences and beliefs [5]. In thispaper we propose a general model for discovering proficientcollaboration opportunities in a social network, and apply itto research-oriented networks, whose nodes represent researchorganizations/industries producing research papers/documentsindividually or by joint collaboration.

Research collaboration is traditionally established on thebasis of compatible or complementary research themes, butit also heavily depends on human and social aspects suchas mutual trust, previous successful collaboration or political

choices. We shall only focus our attention on the former as-pect: the assumption is that potentially successful collaborationis based on the similarity of research interests. This ensuresa common background knowledge between potential partnersthat facilitate mutual understanding, thus speeding up the joint-research production.

The paper introduces the opportunity networks, modelled asa directed attributed multi-graph, an opportunity exploiter, aset of opportunity patterns and opportunity ranking functions.

II. OPPORTUNITY NETWORKS

An opportunity network is a tuple (G, E, O,R) where Gis a network modelling the relationships among actors inthe domain of interest, represented as a directed attributedmulti-graph, E is an Exploiter, i.e. an actor interested indiscovering opportunities, O = Opp1, ..., Oppk is a set ofOpportunity patterns expressed as graph transformations, andR = Ranking1, ..., Rankingh is a set of Ranking functions,where every Rankingi : Matches(Oppi) → <+ is a mappingfrom the set of possible matches of the pattern Oppi inG to non-negative real numbers. Informally speaking, graphtransformations can be modelled as a pair of graphs (L, R),called left hand side L and right hand side R. Applying thetransformation p = (L, R) means finding a match of thegraph L in the graph G and substituting L by R obtainingthe transformed graph G′. Technically the matching processis called embedding and it is carried out by constructing agraph morphism from L to G. It is not the aim of this paperto detail a description of graph transformations concepts (see[6] and [7]).

The concept of opportunity here is modelled to reflecta potential transformation of the graph by the exploiter; ifa pattern match is found in the network, the exploiter cantry to act in the real world to carry out the transformation.Transformations induced in the network may include (in orderof increasing complexity) modifying node/attributes, addingnew edges, creating cliques, deleting subgroups, etc. Theexploiter is the actor searching for opportunities in order toobtain valuable outcomes from them. No special constraint isgiven about the nature of the exploiter: it may be an internalexploiter, i.e. a node of the graph or a sub-graph (a singleenterprise or a consortium), or an external exploiter interestedbut not directly involved in the phenomena occurring in the

2010 International Conference on Advances in Social Networks Analysis and Mining

978-0-7695-4138-9/10 $26.00 © 2010 IEEE

DOI 10.1109/ASONAM.2010.71

404

2010 International Conference on Advances in Social Networks Analysis and Mining

978-0-7695-4138-9/10 $26.00 © 2010 IEEE

DOI 10.1109/ASONAM.2010.71

404

2010 International Conference on Advances in Social Networks Analysis and Mining

978-0-7695-4138-9/10 $26.00 © 2010 IEEE

DOI 10.1109/ASONAM.2010.71

404

Page 2: [IEEE 2010 International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2010) - Odense, Denmark (2010.08.9-2010.08.11)] 2010 International Conference on Advances

network (e.g. the case of public administrations “observing”and stimulating the growth of industrial networks). For adetailed definition of Exploiters, Opportunities and Rankingfunctions see [8].

III. OPPORTUNITIES IN RESEARCH-ORIENTED NETWORKS

Research-oriented networks are a typical application domainof the opportunity network model. In these networks, anopportunity is, for example, a potential collaboration betweengroups that share similar interests, but do not co-operate inthe research activity.

We shall apply the opportunity discovery model to tworeal world cases: the EU funded INTEROP NoE, a researchnetwork on enterprise interoperability, and the research com-munity behind the Medical Image Understanding and Analysisconference (MIUA). We shall model the networks by using:

• the partner publications on the INTEROP-Vlab web site(approximately 1500 papers).

• a collections of the MIUA conference proceedings overthe-three-year period 2006-2008 (about 150 papers).

The two data-sets address different domains (interoperabilityvs. medical imaging) and they differ significantly in size.The documents in the INTEROP and MIUA data-sets areannotated with vectors of domain relevant terms automaticallyextracted using the TermExtractor tool [9], and then expandedusing, respectively, the INTEROP ontology1 and an ontologyobtained processing the MESH2 (MEdical Subject Headings)vocabulary.

In order to discover joint-research opportunities we createa graph with a single node type representing research unitsand two types of attributed edges: similarity links and co-authorship links. The creation of the similarity edges involvesa number of complex steps:

• given a set RU of Research Units and a corpus Dof Documents produced by these units, we annotatesuch documents with terminological (weighted) vectorsof automatically extracted terms;

• if an ontology O is available, describing the researchdomain in which RU operate (e.g. computer science,medical, cultural heritage domain), the vectors are ex-panded by adding to each term ti its super-concepts in O,i.e. SuperO(ti). These vectors are supposed to describein a compact form the content of the document (indeedthe vector represents a sort of summary). Now we have,for each unit r in RU , a set of vectors representing asummary of the documents produced by r. We call thisset of documents r(D);

• we compute for each r ∈ RU the centroid rcentroid (i.e.the mean vector) of r(D);

• we create a network G = (RU,E) where the nodes arethe research units and the edges belongs to E = Esim ∪Ecoauth where:

1http://interop-vlab.eu/km2http://www.nlm.nih.gov/mesh/

Fig. 1. The INTEROP similarity/coauthorship network

– Esim are edges connecting each pair a, b in RUwith weight wsim(acentroid, bcentroid) 6= 0. In ourexperiment we have used the cosine similarity [10]as the weight function but, obviously, other choicesare possible. Such edges and the weights associatedrepresent an estimation of the similarity of the re-search units based on the centroids of the documentsproduced;

– Ecoauth edges are weighted with the number of joint-produced documents of each pair (a, b) of units inRU .

In figure 1 a detail of the similarity/coauthorship network ofINTEROP (obtained by deleting edges whose weight is belowthe mean of all edges’ weights) is shown. Curved dashed linesare used for similarity edges, bent lines for coauthorship edges.

IV. EVALUATION

The opportunity network model applied to research net-works presented in this paper relies on several steps andassumptions that are not easy to evaluate; in order to estimatethe quality of the proposed model, we performed the followingtwo experiments.

A. Experiment 1: evaluation of the automatic research inter-ests extraction

To estimate the plausibility of the model in the INTEROPexperiment, we compared it with an available “partial” groundtruth, represented by the explicit selection of a set of conceptsby the Vlab members; members were asked to access the Vlabplatform and select a subset of concepts from the INTEROPontology, in order to express their research interests. Therefore“objective” ground-truth vectors can then be derived from thisinformation, and compared with the vectors extracted by ouralgorithm. The reliability of the experiment is partly limitedby the fact that members accomplished the task of modellingtheir interests with variable dedication: some selected 10-12concepts or more, others just one.

We have also carried out a similar experiment for the MIUAdata-set. We selected 42 (out of 373) authors of MIUA (those

405405405

Page 3: [IEEE 2010 International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2010) - Odense, Denmark (2010.08.9-2010.08.11)] 2010 International Conference on Advances

TABLE IMEAN PRECISION AND RECALL OF EXTRACTED VS. DECLARED VECTORS

Mean Precision Mean RecallINTEROP 0.50 0.49INTEROP Expanded 0.17 0.76MIUA 0.03 0.39MIUA Expanded 0.06 0.49

who had produced at least 2 papers in the 2006-2008 period)and automatically extracted their research interest from theirWeb curricula, as described in [11]. In table I we present thevalues of mean precision and recall of the extracted vectorswith respect to those manually selected by partners (in the caseof INTEROP) or extracted by authors’ curricula (MIUA).

We can comment the results in this table in various ways:first we notice good recall values and bad precision values.This can be justified by the fact that the vectors extractedby the papers are generally richer and more specific than thevectors declared manually by partners/authors. If a researcherwants to summarise her/his research interests, she/he will tryto use concepts of intermediate generality and will generallyexpose only a subset of them while, when writing papers, willmainly use very specific ones. Therefore the bad values inprecision don’t necessarily indicate that most of the extractedvalues are “wrong”: they may be correct but too specific. Onthe other hand, good recall values indicate that most of thedeclared interests are covered by the extracted ones. In thetable, we notice a worse performance on the MIUA data-set. It may be explained in various ways: i) in INTEROP,partners had to select their research interests using a fixedvocabulary (the INTEROP ontology) while in MIUA data thereis no such constraint and this fact increases the terminologicalheterogeneity; ii) the online curricula we downloaded fromthe Web might not have been up-to-date and iii) the MIUAcorpus (about 150 documents) is significantly smaller than theINTEROP one (about 1500 documents) and this fact influencesthe quality of extracted vectors.

In both cases (INTEROP and MIUA) the table shows anincrement of performance by applying hierarchical expansion.

B. Experiment 2: evaluation of the predictive power of themodel

The second experiment is aimed at evaluating the “predic-tive power” of the proposed model. Intuitively our model isable to predict collaborations if the opportunities discoveredat a time point t have been established at a subsequent timet′. We do not expect our model to be fully predictive. Ourintent is to have a tool that can discover opportunities whichmight not be easily picked up by research units, thus acting asa real recommendation system. However, if the model exhibitsa good behaviour in predicting collaborations, this is a partialverification of the fact that the opportunities proposed are, insome sense, “valid”.

We have partitioned the set of INTEROP documents in 4subsets based on the year of publication: papers publishedbefore 2004, papers published in 2004, papers published

TABLE IIPREDICTIVE POWER BASED ON OPPORTUNITIES IN Gbef2004

Exploited Opportunities2004 20%2005 33%after 2005 57%

TABLE IIIPREDICTIVE POWER BASED ON OPPORTUNITIES IN G2004

Exploited Opportunities2005 54%after 2005 75%

in 2005 and papers published after 2005. We then com-puted the similarity/coauthorship networks for each subset:Gbef2004, G2004, G2005 and Gafter2005. The coauthorshiplinks that are in a network, but not in the networks that aretemporarily precedent, are the new collaborations that havebeen established. In tables II and III we show the percentageof collaborations (the opportunities) predicted, respectively,for the Gbef2004 and the G2004 networks, that have beenestablished by the end of the project.

The model seems to have a good predictive ability. Inparticular, the collaborations predicted in the network G2004

have higher rate of exploitation with respect to the onesin the network Gbef2004. This can somehow be justified bythe fact that Gbef2004 comes from papers published beforethe start of the project and so could fail to represent up-to-date research interests of partners, while the G2004 containsthe 2004 publications which contribute to define better theresearch themes currently addressed by them.

REFERENCES

[1] S. Wasserman, K. Faust, and D. Iacobucci, Social Network Analysis:Methods and Applications (Structural Analysis in the Social Sciences).Cambridge University Press, 1994.

[2] M. Granovetter, “The Strength of Weak Ties,” The American Journal ofSociology, vol. 78, no. 6, pp. 1360–1380, 1973.

[3] R. Burt, “The network structure of social capital,” Research in Organi-zational Behavior, vol. 22, 2000.

[4] W. Baker, Achieving success through social capital. Tapping the hiddenresources in your personal and business networks. San Francisco:Jossey-Bass, 2000.

[5] S. Shane, “Prior knowledge and the discovery of entrepreneurial oppor-tunities,” Organization Science, vol. 11, no. 4, pp. 448–469, 2000.

[6] R. Heckel, “Graph transformation in a nutshell,” in Electronic Notes inTheoretical Computer Science. Elsevier, 2006, pp. 187–198.

[7] G. Rozenberg, Ed., Handbook of graph grammars and computing bygraph transformation: volume I. foundations. River Edge, NJ, USA:World Scientific Publishing Co., Inc., 1997.

[8] A. Cucchiarelli and F. D’Antonio, “Opportunity discovery throughnetwork analysis,” in Enterprise Interoperability IV:Proc. of I-ESA’10,Interoperability for Enterprise Software and Applications, Coventry, UK,April 2010, pp. 323–330.

[9] F. Sclano and P. Velardi, “Termextractor: a web application to learnthe shared terminology of emergent web communities,” in Proc. ofI-ESA’07, Interoperability for Enterprise Software and Applications,Funchal, Portugal, 2007.

[10] R. Baeza-Yates and B. Ribeiro-Neto, Modern Information Retrieval.Addison Wesley, 1999.

[11] P. Velardi, R. Navigli, A. Cucchiarelli, and F. D’Antonio, “A newcontent-based model for social network analysis,” in ICSC’08: Proc.of the 2008 IEEE International Conference on Semantic Computing.Washington, DC, USA: IEEE Computer Society, 2008, pp. 18–25.

406406406