[ieee 2010 international conference on advances in social networks analysis and mining (asonam 2010)...
TRANSCRIPT
Mining Potential Partnership through OpportunityDiscovery in Research NetworksAlessandro Cucchiarelli
Polytechnic University of MarcheItaly
email:[email protected]
Fulvio D’AntonioPolytechnic University of Marche
Italyemail:[email protected]
Abstract—The paper introduces a formalisation of opportu-nities, as situations that can be exploited obtaining valuableoutcomes, in the context of the social networks, and defines amethodology for discovering opportunities through the analysisof the relation among network actors. The proposed methodologyis then applied to the research-oriented networks, whose membersshare paper coauthorship or potential research interests. Finally,its validity is tested by evaluating the research collaborationsopportunities exploited in the context of two distinct researchcommunities, modelled through the analysis of their publicationsover time.
I. INTRODUCTION
Social Networks [1] are models of communities whoseactors are linked together by some kind of social relation-ship (e.g. friendship, marriage, business collaboration). Theanalysis of the relations’ web can lead to the definition ofopportunities, as situations that can be exploited obtainingvaluable outcomes. For example, in [2] “weak ties” (wherethe strength of a tie is considered to be the combination ofthe amount of time, the emotional intensity and other factors)are seen as a very important source of opportunities becausethey are the bridges that join otherwise isolated communities,while in [3] structural holes, separating non-redundant sourcesof information, are considered a relevant source of brokeropportunities because the sides of the hole are more additivethan overlapping.
Opportunity types strictly depend on the nature of socialrelationships existing in a specific network: examples includefinding new friends, meeting a soul-mate or establishing newbusiness collaboration. Discovering them can involve creativ-ity, experience, knowledge of the domain, social capital [4]and intuition. Moreover, it is an highly subjective activity:different people will discover different opportunities basedmostly on their previous experiences and beliefs [5]. In thispaper we propose a general model for discovering proficientcollaboration opportunities in a social network, and apply itto research-oriented networks, whose nodes represent researchorganizations/industries producing research papers/documentsindividually or by joint collaboration.
Research collaboration is traditionally established on thebasis of compatible or complementary research themes, butit also heavily depends on human and social aspects suchas mutual trust, previous successful collaboration or political
choices. We shall only focus our attention on the former as-pect: the assumption is that potentially successful collaborationis based on the similarity of research interests. This ensuresa common background knowledge between potential partnersthat facilitate mutual understanding, thus speeding up the joint-research production.
The paper introduces the opportunity networks, modelled asa directed attributed multi-graph, an opportunity exploiter, aset of opportunity patterns and opportunity ranking functions.
II. OPPORTUNITY NETWORKS
An opportunity network is a tuple (G, E, O,R) where Gis a network modelling the relationships among actors inthe domain of interest, represented as a directed attributedmulti-graph, E is an Exploiter, i.e. an actor interested indiscovering opportunities, O = Opp1, ..., Oppk is a set ofOpportunity patterns expressed as graph transformations, andR = Ranking1, ..., Rankingh is a set of Ranking functions,where every Rankingi : Matches(Oppi) → <+ is a mappingfrom the set of possible matches of the pattern Oppi inG to non-negative real numbers. Informally speaking, graphtransformations can be modelled as a pair of graphs (L, R),called left hand side L and right hand side R. Applying thetransformation p = (L, R) means finding a match of thegraph L in the graph G and substituting L by R obtainingthe transformed graph G′. Technically the matching processis called embedding and it is carried out by constructing agraph morphism from L to G. It is not the aim of this paperto detail a description of graph transformations concepts (see[6] and [7]).
The concept of opportunity here is modelled to reflecta potential transformation of the graph by the exploiter; ifa pattern match is found in the network, the exploiter cantry to act in the real world to carry out the transformation.Transformations induced in the network may include (in orderof increasing complexity) modifying node/attributes, addingnew edges, creating cliques, deleting subgroups, etc. Theexploiter is the actor searching for opportunities in order toobtain valuable outcomes from them. No special constraint isgiven about the nature of the exploiter: it may be an internalexploiter, i.e. a node of the graph or a sub-graph (a singleenterprise or a consortium), or an external exploiter interestedbut not directly involved in the phenomena occurring in the
2010 International Conference on Advances in Social Networks Analysis and Mining
978-0-7695-4138-9/10 $26.00 © 2010 IEEE
DOI 10.1109/ASONAM.2010.71
404
2010 International Conference on Advances in Social Networks Analysis and Mining
978-0-7695-4138-9/10 $26.00 © 2010 IEEE
DOI 10.1109/ASONAM.2010.71
404
2010 International Conference on Advances in Social Networks Analysis and Mining
978-0-7695-4138-9/10 $26.00 © 2010 IEEE
DOI 10.1109/ASONAM.2010.71
404
network (e.g. the case of public administrations “observing”and stimulating the growth of industrial networks). For adetailed definition of Exploiters, Opportunities and Rankingfunctions see [8].
III. OPPORTUNITIES IN RESEARCH-ORIENTED NETWORKS
Research-oriented networks are a typical application domainof the opportunity network model. In these networks, anopportunity is, for example, a potential collaboration betweengroups that share similar interests, but do not co-operate inthe research activity.
We shall apply the opportunity discovery model to tworeal world cases: the EU funded INTEROP NoE, a researchnetwork on enterprise interoperability, and the research com-munity behind the Medical Image Understanding and Analysisconference (MIUA). We shall model the networks by using:
• the partner publications on the INTEROP-Vlab web site(approximately 1500 papers).
• a collections of the MIUA conference proceedings overthe-three-year period 2006-2008 (about 150 papers).
The two data-sets address different domains (interoperabilityvs. medical imaging) and they differ significantly in size.The documents in the INTEROP and MIUA data-sets areannotated with vectors of domain relevant terms automaticallyextracted using the TermExtractor tool [9], and then expandedusing, respectively, the INTEROP ontology1 and an ontologyobtained processing the MESH2 (MEdical Subject Headings)vocabulary.
In order to discover joint-research opportunities we createa graph with a single node type representing research unitsand two types of attributed edges: similarity links and co-authorship links. The creation of the similarity edges involvesa number of complex steps:
• given a set RU of Research Units and a corpus Dof Documents produced by these units, we annotatesuch documents with terminological (weighted) vectorsof automatically extracted terms;
• if an ontology O is available, describing the researchdomain in which RU operate (e.g. computer science,medical, cultural heritage domain), the vectors are ex-panded by adding to each term ti its super-concepts in O,i.e. SuperO(ti). These vectors are supposed to describein a compact form the content of the document (indeedthe vector represents a sort of summary). Now we have,for each unit r in RU , a set of vectors representing asummary of the documents produced by r. We call thisset of documents r(D);
• we compute for each r ∈ RU the centroid rcentroid (i.e.the mean vector) of r(D);
• we create a network G = (RU,E) where the nodes arethe research units and the edges belongs to E = Esim ∪Ecoauth where:
1http://interop-vlab.eu/km2http://www.nlm.nih.gov/mesh/
Fig. 1. The INTEROP similarity/coauthorship network
– Esim are edges connecting each pair a, b in RUwith weight wsim(acentroid, bcentroid) 6= 0. In ourexperiment we have used the cosine similarity [10]as the weight function but, obviously, other choicesare possible. Such edges and the weights associatedrepresent an estimation of the similarity of the re-search units based on the centroids of the documentsproduced;
– Ecoauth edges are weighted with the number of joint-produced documents of each pair (a, b) of units inRU .
In figure 1 a detail of the similarity/coauthorship network ofINTEROP (obtained by deleting edges whose weight is belowthe mean of all edges’ weights) is shown. Curved dashed linesare used for similarity edges, bent lines for coauthorship edges.
IV. EVALUATION
The opportunity network model applied to research net-works presented in this paper relies on several steps andassumptions that are not easy to evaluate; in order to estimatethe quality of the proposed model, we performed the followingtwo experiments.
A. Experiment 1: evaluation of the automatic research inter-ests extraction
To estimate the plausibility of the model in the INTEROPexperiment, we compared it with an available “partial” groundtruth, represented by the explicit selection of a set of conceptsby the Vlab members; members were asked to access the Vlabplatform and select a subset of concepts from the INTEROPontology, in order to express their research interests. Therefore“objective” ground-truth vectors can then be derived from thisinformation, and compared with the vectors extracted by ouralgorithm. The reliability of the experiment is partly limitedby the fact that members accomplished the task of modellingtheir interests with variable dedication: some selected 10-12concepts or more, others just one.
We have also carried out a similar experiment for the MIUAdata-set. We selected 42 (out of 373) authors of MIUA (those
405405405
TABLE IMEAN PRECISION AND RECALL OF EXTRACTED VS. DECLARED VECTORS
Mean Precision Mean RecallINTEROP 0.50 0.49INTEROP Expanded 0.17 0.76MIUA 0.03 0.39MIUA Expanded 0.06 0.49
who had produced at least 2 papers in the 2006-2008 period)and automatically extracted their research interest from theirWeb curricula, as described in [11]. In table I we present thevalues of mean precision and recall of the extracted vectorswith respect to those manually selected by partners (in the caseof INTEROP) or extracted by authors’ curricula (MIUA).
We can comment the results in this table in various ways:first we notice good recall values and bad precision values.This can be justified by the fact that the vectors extractedby the papers are generally richer and more specific than thevectors declared manually by partners/authors. If a researcherwants to summarise her/his research interests, she/he will tryto use concepts of intermediate generality and will generallyexpose only a subset of them while, when writing papers, willmainly use very specific ones. Therefore the bad values inprecision don’t necessarily indicate that most of the extractedvalues are “wrong”: they may be correct but too specific. Onthe other hand, good recall values indicate that most of thedeclared interests are covered by the extracted ones. In thetable, we notice a worse performance on the MIUA data-set. It may be explained in various ways: i) in INTEROP,partners had to select their research interests using a fixedvocabulary (the INTEROP ontology) while in MIUA data thereis no such constraint and this fact increases the terminologicalheterogeneity; ii) the online curricula we downloaded fromthe Web might not have been up-to-date and iii) the MIUAcorpus (about 150 documents) is significantly smaller than theINTEROP one (about 1500 documents) and this fact influencesthe quality of extracted vectors.
In both cases (INTEROP and MIUA) the table shows anincrement of performance by applying hierarchical expansion.
B. Experiment 2: evaluation of the predictive power of themodel
The second experiment is aimed at evaluating the “predic-tive power” of the proposed model. Intuitively our model isable to predict collaborations if the opportunities discoveredat a time point t have been established at a subsequent timet′. We do not expect our model to be fully predictive. Ourintent is to have a tool that can discover opportunities whichmight not be easily picked up by research units, thus acting asa real recommendation system. However, if the model exhibitsa good behaviour in predicting collaborations, this is a partialverification of the fact that the opportunities proposed are, insome sense, “valid”.
We have partitioned the set of INTEROP documents in 4subsets based on the year of publication: papers publishedbefore 2004, papers published in 2004, papers published
TABLE IIPREDICTIVE POWER BASED ON OPPORTUNITIES IN Gbef2004
Exploited Opportunities2004 20%2005 33%after 2005 57%
TABLE IIIPREDICTIVE POWER BASED ON OPPORTUNITIES IN G2004
Exploited Opportunities2005 54%after 2005 75%
in 2005 and papers published after 2005. We then com-puted the similarity/coauthorship networks for each subset:Gbef2004, G2004, G2005 and Gafter2005. The coauthorshiplinks that are in a network, but not in the networks that aretemporarily precedent, are the new collaborations that havebeen established. In tables II and III we show the percentageof collaborations (the opportunities) predicted, respectively,for the Gbef2004 and the G2004 networks, that have beenestablished by the end of the project.
The model seems to have a good predictive ability. Inparticular, the collaborations predicted in the network G2004
have higher rate of exploitation with respect to the onesin the network Gbef2004. This can somehow be justified bythe fact that Gbef2004 comes from papers published beforethe start of the project and so could fail to represent up-to-date research interests of partners, while the G2004 containsthe 2004 publications which contribute to define better theresearch themes currently addressed by them.
REFERENCES
[1] S. Wasserman, K. Faust, and D. Iacobucci, Social Network Analysis:Methods and Applications (Structural Analysis in the Social Sciences).Cambridge University Press, 1994.
[2] M. Granovetter, “The Strength of Weak Ties,” The American Journal ofSociology, vol. 78, no. 6, pp. 1360–1380, 1973.
[3] R. Burt, “The network structure of social capital,” Research in Organi-zational Behavior, vol. 22, 2000.
[4] W. Baker, Achieving success through social capital. Tapping the hiddenresources in your personal and business networks. San Francisco:Jossey-Bass, 2000.
[5] S. Shane, “Prior knowledge and the discovery of entrepreneurial oppor-tunities,” Organization Science, vol. 11, no. 4, pp. 448–469, 2000.
[6] R. Heckel, “Graph transformation in a nutshell,” in Electronic Notes inTheoretical Computer Science. Elsevier, 2006, pp. 187–198.
[7] G. Rozenberg, Ed., Handbook of graph grammars and computing bygraph transformation: volume I. foundations. River Edge, NJ, USA:World Scientific Publishing Co., Inc., 1997.
[8] A. Cucchiarelli and F. D’Antonio, “Opportunity discovery throughnetwork analysis,” in Enterprise Interoperability IV:Proc. of I-ESA’10,Interoperability for Enterprise Software and Applications, Coventry, UK,April 2010, pp. 323–330.
[9] F. Sclano and P. Velardi, “Termextractor: a web application to learnthe shared terminology of emergent web communities,” in Proc. ofI-ESA’07, Interoperability for Enterprise Software and Applications,Funchal, Portugal, 2007.
[10] R. Baeza-Yates and B. Ribeiro-Neto, Modern Information Retrieval.Addison Wesley, 1999.
[11] P. Velardi, R. Navigli, A. Cucchiarelli, and F. D’Antonio, “A newcontent-based model for social network analysis,” in ICSC’08: Proc.of the 2008 IEEE International Conference on Semantic Computing.Washington, DC, USA: IEEE Computer Society, 2008, pp. 18–25.
406406406