[ieee 2008 9th international conference for young computer scientists (icycs) - hunan, china...

5
Scientific Collaboration Network Evolution Model Based on Motif Emerging Xiuchun Shi Archives Information Laboratory Ludong University Yantai 264025, China Longde Wu Information Technology Office Highway Bureau of Developing Area Yantai 264000, China Hongyong Yang * School of Computer Science and Technology Ludong University Yantai 264025, China Email: [email protected] Abstract Based on the evolvement characteristic of the author collaboration in the scientific thesis, a motif emerging model of the scientific cooperation network is presented. It is validated by the theoretic analysis and computer simulation that the degree distribution of the network nodes is a power law and this network is a scale-free network. In order to make sure the validity of the network model, we gather the articles in Journal of Information Science, and construct a author collaboration network. By analyzing the data of the network, the demonstration results are consistent with that of the model. Keywords: Motif emerging, scientific cooperation, demonstration, network model 1. Introduction The studies of the complex network have opened out the internal essence of many phenomena in real world. Erdos and Renyi presented the random network model (called ER model) [1] since 1960, ER random graph has been used as a basic model to study complex network. In 1998, a paper, written by Watts and Strogatz on small-world network, was appeared in Nature[2] to depict the feature that there exist the shorter paths among nodes in a large network. Another seminal paper was written by Barabasi and Albert on scale-free networks in Science in 1999[3]. The scale- free network (also called BA model) is a network model inspired to the formation of the World Wide Web and is based on two basic ingredients: growth and preferential attachment. The degree distribution of BA model produces the form γ k k p ) ( . Because the power distribution absent distinct characteristic length, this class network is called Scale-free network. It has been shown that many real complex networks are small-world and scale-free network, these include transportation networks, phone call networks, Internet, WWW, the actors collaboration networks, scientific coauthorship and citation networks so on[4,5,6]. The presentation of these two original results affords new theory to realize the complex network, and the upsurge of reaching complex network is raised in many science fields. The statistic of the static state parameter is analyzed for Internet, WWW, cinema and television actors’ collaboration network, scientists’ collaboration network, people relation network, and linguistics network so on. Based on the network model, the dynamics features are studied to infectious disease, percolation model, network searching and network navigation. Especially, for the research of the coauthorship network in scientific articles, many new results have been obtained recently. These researches enrich the realization for complex network, and increase understands for the new characteristic of complex network. The collaboration network of scientific researcher is a network describing the collaboration relation of scientific researcher. In this network, each node is denoted as a scientific researcher of the network, and each link between two nodes is created if two scientific researchers have published an article together. In this paper, based on the complex network, we establish an evolution network model to study the collaboration relation of scientific researchers. The 9th International Conference for Young Computer Scientists 978-0-7695-3398-8/08 $25.00 © 2008 IEEE DOI 10.1109/ICYCS.2008.31 2748

Upload: hongyong

Post on 05-Jan-2017

215 views

Category:

Documents


3 download

TRANSCRIPT

Scientific Collaboration Network Evolution Model Based on Motif Emerging

Xiuchun Shi Archives Information Laboratory

Ludong University Yantai 264025, China

Longde Wu Information Technology Office

Highway Bureau of Developing Area Yantai 264000, China

Hongyong Yang* School of Computer Science and Technology

Ludong University Yantai 264025, China

Email: [email protected]

Abstract

Based on the evolvement characteristic of the author

collaboration in the scientific thesis, a motif emerging model of the scientific cooperation network is presented. It is validated by the theoretic analysis and computer simulation that the degree distribution of the network nodes is a power law and this network is a scale-free network. In order to make sure the validity of the network model, we gather the articles in Journal of Information Science, and construct a author collaboration network. By analyzing the data of the network, the demonstration results are consistent with that of the model.

Keywords: Motif emerging, scientific cooperation, demonstration, network model 1. Introduction

The studies of the complex network have opened out the internal essence of many phenomena in real world. Erdos and Renyi presented the random network model (called ER model) [1] since 1960, ER random graph has been used as a basic model to study complex network. In 1998, a paper, written by Watts and Strogatz on small-world network, was appeared in Nature[2] to depict the feature that there exist the shorter paths among nodes in a large network. Another seminal paper was written by Barabasi and Albert on scale-free networks in Science in 1999[3]. The scale-free network (also called BA model) is a network model inspired to the formation of the World Wide Web and is based on two basic ingredients: growth and preferential attachment. The degree distribution of BA

model produces the form γ−∝ kkp )( . Because the power distribution absent distinct characteristic length, this class network is called Scale-free network. It has been shown that many real complex networks are small-world and scale-free network, these include transportation networks, phone call networks, Internet, WWW, the actors collaboration networks, scientific coauthorship and citation networks so on[4,5,6].

The presentation of these two original results affords new theory to realize the complex network, and the upsurge of reaching complex network is raised in many science fields. The statistic of the static state parameter is analyzed for Internet, WWW, cinema and television actors’ collaboration network, scientists’ collaboration network, people relation network, and linguistics network so on. Based on the network model, the dynamics features are studied to infectious disease, percolation model, network searching and network navigation. Especially, for the research of the coauthorship network in scientific articles, many new results have been obtained recently. These researches enrich the realization for complex network, and increase understands for the new characteristic of complex network.

The collaboration network of scientific researcher is a network describing the collaboration relation of scientific researcher. In this network, each node is denoted as a scientific researcher of the network, and each link between two nodes is created if two scientific researchers have published an article together. In this paper, based on the complex network, we establish an evolution network model to study the collaboration relation of scientific researchers.

The 9th International Conference for Young Computer Scientists

978-0-7695-3398-8/08 $25.00 © 2008 IEEE

DOI 10.1109/ICYCS.2008.31

2748

2. Author Collaboration Network Evolution Model

Recently, we have done a statistic work of the papers during 2001.1-2006.12 in Journal of Journal of Information Science. A coauthorship database is created on the articles and their authors; and a scientific collaboration network is made from the coauthorship database, where a node denotes an author, link of two nodes denotes the cooperation between two authors (they have vended an article together). When analyzing the collaboration network, we found many different cases: there are a lot of unattached authors who are isolated point in the network; there are a lot of attached authors who collaborate many time; the dissimilarity articles have the diverse authors. Following the analysis, we present a new author scientific collaboration network evolution model:

Initially, suppose m0 articles and n0 authors in the network. With the time t changing, we increase an article every time, and author collaboration network evolve as following

1) Increasing a new author or multi new collaborating authors on the probability p1;

2) Increasing a new author with lots of existed authors to collaborate on the probability p2;

3) None new author is increased on the probability p3, these authors have existed in the network; where p1+p2+p3=1.

Rule 1: For every case, the author number i in an

article is decided by a probability qi, where ∑=

=K

iiq

1

1 .

These authors in the same article make up of a full connected sub-network.

Rule 2: The link strength between the nodes is defined as the strength is 1 if two nodes are linked firstly; and the strength adds 1 if these nodes are linked again, and so on.

Rule 3: an author is selected to collaborate following the method of strength preference, the selecting probability

∑ ++=Πj

ji ssi )1(/)1()( (1)

where the numerator si as the strength of node i, which is the summation of the neighbors’ strength of node i, the denominator is the summation of the strength of all nodes in the network. Here let the numerator and the denominator add 1, we suppose the isolated node can be selected to connect. 3. Strength Distribution of Network Model

3.1 Strength distribution of network model

In this section, we calculate the strength distribution of the node in the new evolution model. Following the algorithm description, the diversification of the strength si of the node i

).)((

))(())((s

133

122

111

∑∑

=

==

Π+

Π+Π=

K

jj

K

jj

K

jj

i

jqip

jqipjqipdtd

(2)

Suppose the probability qj to decide j authors collaborating is the equivalency, i.e.,

jqq === 3j2j1j qq ( K21j ,...,,= ),then we can denote

mjqjqjqK

jj

K

jj

K

jj === ∑∑∑

=== 13

12

11 , (3)

then Eq (1) can be rewritten

)()()p(s321 imimpp

dtd i Π=Π++= . (4)

In the network, the author number at time t

.)(

))((

210

21

110

tpmpn

tpjqpnNK

jj

++=

++= ∑= (5)

and the summation of the strength

.))2

)1j(j()2

)1j(j(

)2

)1j(j((

1jj33

1jj22

1jj110

0

tqpqp

qpss

KK

Kt

ii

∑∑

∑∑

==

==

−+++

−+=

Following Eq. (3), let

E:

)2

)1j(j(

)2

)1-j(j()2

)1j(j(

1jj3

1jj2

1jj1

=

−=

=−

∑∑

=

==

K

KK

q

qq

(6)

then

tmpEsst

ii )( 20

0

++=∑=

. (7)

We can obtained from Eq. (4)

t)(sN)1s(s

20 mpEm

dtd ii

++++= .

When time t is large enough, we have

2749

tB

tsA

dtds ii 1

11 +≈ (8)

where

)()(m

2211 mpEpmp

A+++

= , 11 AB = . (9)

We can calculate the strength distribution using the method in [4,5]

)/11(/11

111

/111 1

1

1

)()()( A

A

As

ABsAABmsp +−

+ ∝++= . (10)

3.2 Simulation

Suppose p1=0.4, p2=0.4, p3=0.2, and the probabilities to decide the author number q1=0.25, q2=0.4, q3=0.25, q4=0.1. We evolve a collaboration network following our algorithm, and sum the node strength in the network. The strength distribution plot (Log-Log) is shown Figure 1 whose data are obtained from the average of 20 times simulations. We can know following the figure that the strength distribution is accord with power distribution and is a class of Scale-free network.

Figure 1 strength distribution (log-log) plot of the

model

4. Analysis on demonstration network 4.1 Degree distribution of the demonstration network

We have been working a statistic of the papers from January 2001 to December 2006 in Journal of Information Science. A coauthorship database is created on the articles and their authors; there are 2400 articles and 2629 authors in the database. A scientific collaboration network is made from the coauthorship database, where a node denotes au author, link of two nodes denotes the cooperation between two authors (they have vended an article together). In this paper, we do not record the article number of the collaboration between two authors, and do not distinguish the author order in an article. The collaboration of the authors in an article is defined as the mode with full connecting. It is easy to know that the cooperation network is a sort of self-organized network. The information of the database is shown in Table 1.

Table 1 The information of the database of the

Journal of Information Science

Author Number

1 2 3 ≥4

Paper number 1218 828 314 40 Percent 50.75% 34.5% 13.08% 1.667%

Then, we analyze the characteristic of this network

by calculating the link degree of the node in the coauthorship network, which is the coauthor number of every author (such as degree 0 of a node denoting no coauthor, and degree 1 denoting only one coauthor, so on). The statistic result is shown in Table 2, where a lots of nodes have small degree (there are 1989 authors with degree less 3) and few nodes with large degree (only 13 author with degree greater than 9).

Table 2 Degree distribution of the nodes in the coauthorship network Degree

0 1 2 3 4 5 6 7 8 9 10 11 12 14 15 25 Node number 747 679 563 315 137 76 47 32 12 8 6 3 1 1 1 1

Degree distribution 0.284 0.258 0.214 0.119 0.052 0.029 0.018 0.012 0.005 0.003 0.002 0.001 03.000

4 … … …

2750

4.2 Comparing model simulation with demonstration data

Firstly, we calculate the simulation parameters by means of the statistic results. It is shown in Table 1 there are 2400 papers and 2629 authors in the database. Based on the data in table 1, there are about 2010 cooperation, i.e., there are 2010 links in the network and the total degree number 4020. Initially, suppose 10 articles and 15 authors with 50 links in the network model of section 2, we can obtain from the data in table 1 that the probabilities parameters q1=0.5070, q2=0.3450, q3=0.1308, q4=0.01667, to select the link node number. Following Eq.(3)(6), we have m=1.657, E=0.838. Since the algorithm should be run reiteratively 2390 times to generate 2400 paper, we can set t=2390 in the model. By applying Eq. (5)(7), we obtain p1=0.484, p2=0.368 and p3=0.148.

Then, we simulate the network evolution based on these model parameters. The degree distribution of the algorithm simulation (the average degree distribution of 20 times) is shown in Figure 2 with “*”, and the degree distribution of the demonstration network (the data in table 2) is depicted in same figure with “□”. It is shown that these two figures are more consentaneous. The inbuilt figure is the degree distribution of the demonstration data with log-log plot, where the power function with power k=-2.6 is plotted in this little frame also. We can know that the demonstration network is a scale-free network with the power of the degree distribution -2.6.

4.3 Comparing theoretic result with demonstration data

In this subsection, we compute the theoretic value of

the power for the network. By applying Eq.(10), the degree distribution of the network model has the power

)/11( 1A+−=γ . Based on the model simulation parameters calculated from the demonstration data p1=0.484, p2=0.368, p3=0.148, m=1.657, E=0.838, we can obtain

)/11( 1A+−=γ =-2.6499. This theoretic value is consensus with the statistic results (k=-2.6) of demonstration data.

Figure 2 Comparing model simulation with

demonstration data

4.4 Unlink characteristic of demonstration network

By analysis on the data in coauthorship network, we find that this coauthorship network is made up of many unattached sub-networks (Table 3). The same phenomena have been reported in the references [9], that there exists none link among these sub-networks, but the nodes in each sub-network are full connecting. The sub-networks of the demonstration data is collected in table 3.

Table 3 statistic table of sub-network

Author number

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 22 23 24 60 221

sub-network

747 294 103 48 16 9 6 7 5 3 1 3 1 1 2 2 1 1 1 1 1

It is shown in table 3, that there are much more

33% unattached authors, much more 50% authors cooperating links less than 3, and 70% authors cooperating links less than 4. We can see that the scientific authors in the informatics are more separately. However, there are five sub-networks with more than 20 nodes, and there is a super research group with 221 authors. It is advantaged for the presentation

of new thought and the promulgation of new knowledge.

The relation (log-log) figure is plotted (Fig. 3) with number of sub-network and author number in sub-network. This plot presents a power distribution (with power r≈-2.65). This coauthorship network is a network with power distribution, although it is an increasing self-organization network. Therefore, this unconnected phenomenon is also self-organization, and

0 5 10 15 20 250

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

Degree

Deg

ree

Dis

trib

utio

n

Demonstration dataSimulation result

100

101

102

10-4

10-2

100

2751

this coauthorship network is a class of especial scale-free network.

100

101

102

103

100

101

102

103

Degree of nodeNum sub-networkPower function

Figure 3 Degree and sub-network of demonstration

network(log-log)

5. Conclusions In this paper, based on the collaboration ways of the

authors, an author collaboration network model is presented. It is validated by the theoretic analysis and computer simulation that the degree distribution of the network nodes is a power law and this network is a scale-free network. By analyzing the data of the network, the simulation results are consistent with that of the model. By analyzing the data of the network, the demonstration results are consistent with that of the model. Therefore, this model can be applied to study the evolvement of the author collaboration network.

Acknowledgements

This work is supported by Chinese National Natural Science Foundation (under the grant 60774016) and the teaching innovation Foundation of Ludong

University (under the grant 22320301, Z0704) of China.

References [1] P. Erdos, A. Renyi, On the evolution of random graphs, Publ. Math. Inst. Hung. Acad. Sci., 1960, 5: 17–60. [2] D. Watts, S. Strogatz, Collective dynamics of “small world” networks, Nature, 1998, 393(4): 440–442. [3] A. Barabási, R. Albert, Emergence of scaling in random networks, Science, 1999, 286(5439): 509–512. [4] R. Albert, A. Barabási, Statistical mechanics of complex networks, Rev. Modern Phys., 2002, 74(1): 47-97. [5] M. Newman, The structure and function of complex network, SIAM Review, 2003, 45(2): 167-256. [6] S. Boccaletti, V. Latora, Y. Moreno, et al. Complex Networks: Structure and dynamics, Physics Reports, 2006, 424(4-5): 175-308. [7] M. Newman, Scientific collaboration. I. Network construction and fundamental results. Phys. Rev. E, 2001, 64(1):016131. [8] M Newman. Scientific collaboration. I. Shortest paths, weighted networks, and centrality. Phys. Rev. E, 2001, 64(1): 016132. [9] F.-S Wang, H.-Y Yang. Analysis on Scientific Collaboration Network of Author in Journal of the China Society for Scientific and Technical Information. Journal of the China Society for Scientific and Technical Information, 2007, 26(5): 659-663

2752