Transcript
Page 1: [IEEE 2012 International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2012) - Istanbul (2012.08.26-2012.08.29)] 2012 IEEE/ACM International Conference on Advances

Crawling Social Internetworking Systems

Francesco BuccafurriDIMET Dept.

Univ. of Reggio Calabria

Italy

e-mail: [email protected]

Gianluca LaxDIMET Dept.

Univ. of Reggio Calabria

Italy

e-mail: [email protected]

Antonino NoceraDIMET Dept.

Univ. of Reggio Calabria

Italy

e-mail: [email protected]

Domenico UrsinoDIMET Dept.

Univ. of Reggio Calabria

Italy

e-mail: [email protected]

Abstract—In new generation social networks, we expect thatthe paradigm of Social Internetworking Systems (SISs, for short)will be more and more important. In this new scenario, therole of Social Network Analysis is of course still crucial but thepreliminary step to do is designing a good way to crawl theunderlying graph. While this aspect has been deeply investigatedin the field of social networks, it is an open issue when movingtowards SISs. Indeed, we cannot expect that a crawling strategywhich is good for social networks, is still valid in a SocialInternetworking Scenario, due to its specific topological features.In this paper, we first confirm the above claim and, then, define anew crawling strategy specifically conceived for SISs. Finally, weshow that it fully overcomes the drawbacks of the state-of-the-artcrawling strategies.

Keywords: Social Networks, Crawling Strategies, Social

Internetworking Systems

I. INTRODUCTION

In the last years, social networks have become one of the

most popular communication media on the Internet [14]. The

resulting universe is a constellation of several social networks,

each forming a community with specific connotations, also

reflecting multiple aspects of people personal life. Despite this

inherent heterogeneity, the possible interaction among distinct

social networks is the basis of a new emergent internetworking

scenario enabling a lot of strategic applications, whose main

strength will be just the integration of possibly different

communities yet preserving their diversity and autonomy. This

concept is very recent and only a few commercial attempts to

implement Social Internetworking Systems (SISs, for short)

have been proposed [1], [2], [3], [4]. In this new scenario,

the role of Social Network Analysis [15], [21], [11], [25],

[8], [24], [23] is of course still crucial in studying the

evolution of structures, individuals, interactions, and so on,

and in extracting powerful knowledge from them. But the

necessary prerequisite is to have a good way to crawl the

underlying graph. In the past, several crawling strategies for

single social networks have been proposed. Among them the

most representative ones are Breadth First Search (BFS, for

short) [25], Random Walk (RW, for short) [20] and Metropolis-

Hastings Random Walk (MH, for short) [13]. They were

largely investigated for single social networks highlighting

their pros and cons [13], [17]. But, what happens when we

move towards Social Interneworking Scenarios? The question

opens in fact a new issue that, to the best our knowledge,

has not been investigated in the literature. Indeed, this issue

is far to be trivial, since we cannot presumably expect that a

crawling strategy which is good for social networks, is still

valid in a Social Internetworking Scenario, due to its specific

topological features.

This paper gives a contribution in this setting. In particular,

through a deep experimental analysis of the above existing

crawling strategies, conducted in a multi-social-network set-

ting, it reaches the conclusion that they are little adequate

to this new context, thus enforcing the need of designing

new crawling strategies specific for SISs. Starting from this

result, it gives its second important contribution, consisting

in the definition of a new crawling strategy, called Bridge-Driven Search (BDS, for short), which relies on a feature

strongly characterizing a SIS. Indeed BDS is centered on the

concept of bridge, which represents the structural element

that interconnects different social networks. Bridges are those

nodes of the graph corresponding to users who joined more

than one social network and explicitly declared their different

accounts. By an experimental analysis we show that BDS

fits the desired features, thus overcoming the drawbacks of

existing strategies.

The plan of this paper is as follows: Section II presents

related literature. The state-of-the-art crawling strategies BFS,

RW and MH are analyzed in Section III, showing why they are

not adequate for a SIS. In Section IV, our crawling strategy

is illustrated. Section V is devoted to the presentation of the

experiments carried out to evaluate BDS. Finally, Section VI

provides a look at our future research efforts.

II. RELATED WORK

With the increase of both the number and the dimension of

social networks the development of crawling strategies became

a very challenging issue. For instance, the problem of sampling

from large graphs is discussed in [19], [22]. Given a commu-

nication network, the approach of [12] aims at recognizing

its overall design as well as at identifying important nodes

and links in it. An investigation on the statistical properties

of sampled scale-free networks is proposed in [18]. In [7],

the authors compare the structures of Cyworld, MySpace and

Orkut. In particular, they analyze the degree distribution, the

clustering property, the degree correlation and the evolution

over time of Cyworld. Some methods to produce a small

realistic sample from a large real network are presented in

[16]. In [25], the social network graph crawling problem is

2012 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining

978-0-7695-4799-2/12 $26.00 © 2012 IEEE

DOI 10.1109/ASONAM.2012.87

505

2012 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining

978-0-7695-4799-2/12 $26.00 © 2012 IEEE

DOI 10.1109/ASONAM.2012.87

506

Page 2: [IEEE 2012 International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2012) - Istanbul (2012.08.26-2012.08.29)] 2012 IEEE/ACM International Conference on Advances

investigated in such a way as to answer questions like: (i) how

fast crawlers into consideration discover nodes/links; (ii) how

different social networks and the number of protected users

affect crawlers; (iii) how major graph properties are studied.

A framework of parallel crawlers based on BFS and operating

on eBay is described in [10]. In [17], the impact of different

graph traversal techniques (namely, BFS, DFS, Forest Fire,

Snowball Sampling) on the computation of the average node

degree of a network is analyzed. Finally, an analysis of the

Facebook friendship graph is proposed in [13].

III. MOTIVATIONS

Several crawling strategies for single social networks have

been proposed in the literature. Among these strategies, two

very popular ones are BFS and RW. BFS implements the

classical Breadth First Search visit, whereas RW selects the

next node to be visited uniformly at random among the

neighbors of the current node. A more recent strategy is MH

[13]. At each iteration, it randomly selects a node w from the

neighbors of the current node v. Then, it randomly generates

a number p belonging to the real interval [0, 1]. If p ≤ Γ(v)Γ(w) ,

where Γ(v) (Γ(w), resp.) is the outdegree of v (w, resp.), then

it moves from v to w. Otherwise, it stays in v. Observe that

the higher the degree of a node, the higher the probability that

MH discards it.

In the past, these crawling strategies were deeply investi-

gated when applied on a single social network. This analysis

showed that none of them is always better than the other ones.

Indeed, each of them could be the optimal one for a specific set

of analyses. However, no investigation about the application

of these strategies in a SIS has been carried out. Thus, we

have no evidence that they are still valid in this new context.

In order to reason about this, let us start by considering a

structural peculiarity of a SIS. It is the existence of bridges,

which, we recall, are those nodes of the graph corresponding to

users who joined more than one social network and explicitly

declared their different accounts. We expect that these nodes

play a crucial role in the crawling of a SIS since they allow the

crossing of different social networks, thus discovering the SIS

intrinsic nature (in fact related to interconnections). Bridges

are not “standard” nodes, due to their role; as a consequence,

we cannot see a SIS just as a huge social network. Besides

these intuitive considerations about bridges, we can help our

reasoning also with two results obtained in [9], namely: Fact(i) the fraction of bridges in a social network is low, and Fact(ii) bridges have high degrees on average.

Now, the question is, what about the capability of existing

crawling strategies of finding bridges? The deep knowledge

about BFS, RW and MH provided by the literature allows the

following conjectures to be drawn:

1) BFS tends to explore a local neighborhood of the seed it

starts from. As a consequence, if bridges are not present

in this neighborhood or their number is low (and this

is highly probable due to Fact (i)), the crawled sample

fails in covering many social networks. Furthermore,

it is well known that BFS tends to favor power users

and, therefore, presents bias in some network parameters

(e.g., the average degree of the nodes of the crawled

portions are overestimated) [17].

2) Differently from BFS, RW does not consider only a

local neighborhood of the seed. In fact, it selects the

next node to be visited uniformly at random among

the neighbors of the current node. Again, due to Fact

(i), the probability that RW selects a bridge as the next

node is low. As a consequence, the crawled sample does

not cover many social networks and, if more than one

social network is represented in it, the coupling degree

of the crawled portions of social networks is low. Finally,

analogously to BFS, RW tends to favor power users

(and, consequently, to present bias in some network

parameters [17]); this feature only marginally influences

the capability of RW to find bridges since, in any case,

their number is low.

3) MH has been conceived to unfavor power users and,

more in general, nodes having high degrees which are,

instead, favored by BFS and RW. It performs very

well in a single social network [13] especially in the

estimation of the average degree of nodes. However, due

to Fact (ii), MH will penalize bridges. As a consequence,

the sample crawled by MH does not cover many social

networks present in the SIS.

In sum, from the above reasoning, we expect that both BFS,

RW and MH are substantially inadequate in the context of

SISs. As it will be described in Section V, this conclusion

is fully confirmed by a deep experimental campaign, which

clearly highlights the above drawbacks. Thus we need to

design a specific crawling strategy for SISs.

IV. PROPOSED CRAWLING STRATEGY

In the design of the proposed crawling strategy we start by

the analysis of some aspects limiting BFS, RW and MH in a

SIS, in order to overcome them. Recall that BFS performs a

breadth first search on a local neighborhood of a seed. Now,

the average distance between two nodes of a single social

network is generally less than the one between two nodes

of different social networks. Indeed, in order to pass from

a social network to another, it is necessary to cross a bridge,

and since bridges are few, it could be necessary to generate

a long path before reaching one of them. As a consequence,

the local neighborhood considered by BFS includes one or a

small number of social networks. To overcome this problem,

a depth first search, instead of a breadth first search, could be

performed. For this purpose, the way of proceeding of RW and

MH could be included in our crawling strategy. However, since

the number of bridges in a social network is low, the simple

choice to go in-depth blindly does not favor the crossing from

a social network to another. Even worse, since MH penalizes

the nodes with a high degree, it tends to unfavor bridges,

rather than favor them. Again, in the above reasoning, we have

exploited Facts (i) and (ii) introduced in the previous section.

A solution that overcomes the above problems could consist

in implementing a “non-blind” depth first search in such a

506507

Page 3: [IEEE 2012 International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2012) - Istanbul (2012.08.26-2012.08.29)] 2012 IEEE/ACM International Conference on Advances

way as to favor bridges in the choice of the next node to

visit. This is the choice we have done, and the name we give

to our strategy, i.e., Bridge-Driven Search (BDS, for short),

clearly reflects this approach. However, in this way, it becomes

impossible to explore (at least partially) the neighborhood

of the current node because the visit proceeds in-depth very

quickly and, furthermore, as soon as a bridge is encountered,

there is a cross to another social network. The overall result

of this way of proceeding would be an extremely fragmented

crawled sample. In order to face this problem, given the current

node, our crawling strategy explores a fraction of its neighbors

before performing an in-depth search of the next node to visit.

In order to formalize our crawling strategy, we need to

introduce the following parameters:

• nf (node fraction). It represents the fraction of the non-

bridge neighbors of the current node which should be

visited. It ranges in the real interval (0,1]. For example,

when nf is equal to 1, our strategy behaves like BFS

since all neighbors of the current node are selected. This

parameter is used to tune the portion of the current node

neighborhood which has to be taken into account and,

hence, it balances the breadth and depth of the visit.

• bf (bridge fraction). It represents the fraction of the

bridge neighbors of the current node which should be

visited. Like nf , it ranges in the real interval (0,1].

Clearly, this parameter is greater than 0 in order to allow

the visit of at least one bridge (if any), thus resulting in

crossing to another social network.

• btf (bridge tuning factor). It is a real number belonging

to [0,1] which allows the filtering of the bridges to be

visited among the available ones, on the basis of their

degree. Its role will be explained in the following.

The pseudo-code of BDS is shown in Algorithm 1.

V. EXPERIMENTS

In this section, we present our experiment campaign con-

ceived to determine the performances of BDS and to compare

it with BFS, RW and MH when they operate in a SIS.

A. Techniques and Metrics

In order to perform our experiments, we implemented the

four crawling strategies considered in this paper. Since we

wanted to analyze the behavior of these strategies on a SIS,

we had to extract not only connections among the accounts of

different users in the same social network but also connections

among the accounts of the same user in different social

networks. In order to detect these connections, two standards

encoding human relationships, namely XFN [6] and FOAF

[5], are generally exploited. Interestingly enough, they allow

the identification of bridge nodes as those nodes linked by meedges; suitable instructions are present in XFN and FOAF to

define these edges and, ultimately, to derive bridges.

In our experiments, we considered a SIS consisting of

four social networks, namely Twitter, LiveJournal, YouTube

and Flickr. These networks are compliant with the XFN and

FOAF standards and have been largely analyzed in the past in

Algorithm 1 BDSNotation We denote by N(x) the function returning the number of the non-bridge

neighbors of the node x, and by B(x) the function returning the number of thebridges belonging to the neighborhood of x

Input s: the seed nodeOutput SeenNodes, VisitedNodes: a set of nodesConstant nit {The number of iterations}Constant nf {The node fraction parameter}Constant bf {The bridge fraction parameter}Constant btf {The bridge tuning factor parameter}Variable v, w: a nodeVariable p: a number in the real interval (0,1)Variable c: an integer numberVariable NodeQueue: a queue of nodesVariable BridgeSet: a set of bridge-nodes

1: SeenNodes:=∅, VisitedNodes:=∅, NodeQueue:=∅, BridgeSet:=∅2: insert s into NodeQueue3: for i := 1 to nit do4: poll the queue and extract a node v5: insert v into VisitedNodes6: insert all the nodes adjacent to v into SeenNodes7: if (B(v) ≥ 1) then8: clear NodeQueue9: c = 0

10: while c < �bf · B(v)� do11: let w be one of the bridge-nodes adjacent to v not in BridgeSet selected

uniformly at random12: generate uniformly at random a number p in the real interval (0,1)

13: if (p · btf ≤ N(v)N(w)

) then14: insert w into NodeQueue and BridgeSet15: c = c+ 116: end if17: end while18: else19: c = 020: while c < �nf · N(v)� do21: let w be one of the nodes adjacent to v selected uniformly at random22: generate uniformly at random a number p in the real interval (0,1)

23: if (p ≤ N(v)N(w)

) then24: insert w into NodeQueue25: c = c+ 126: end if27: end while28: end if29: end for

Social Network Analysis. For our experiments, we exploited

a server equipped with a 2 Quad-Core E5440 processor and

16 GB of RAM with the CentOS 6.0 Server operating system.

We performed the crawling tasks from November 5, 2011 to

January 22, 2012. Collected data can be found at the URL

http://www.ursino.unirc.it/bds.html.

A first needed step was to define reasonable metrics able

to evaluate the performances of crawlers operating on a SIS.

Even though this point could appear very critical and also

prone to unfair choices, it is immediate to realize that the

following chosen metrics are a good way to highlight some

desired features of a crawling strategy in a SIS:

1) Bridge Ratio (BR): this is a real number in the interval

[0,1] defined as the ratio of the number of the bridges

discovered to the number of all the nodes in the sample.

2) Crossings (CR): this is a non-negative integer and mea-

sures how many times the crawler switches from one

social network to another.

3) Covering (CV): this is a positive integer and measures

how many different social networks are visited by the

crawler.

4) Balancing (BL): this is a non-negative real number and

is defined as the standard deviation of the percentages

507508

Page 4: [IEEE 2012 International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2012) - Istanbul (2012.08.26-2012.08.29)] 2012 IEEE/ACM International Conference on Advances

of nodes discovered for each social network w.r.t. the

overall number of nodes discovered in the sample.

Observe that Balancing ranges from 0 to a maximum

value (for instance, 50 in case of 4 social networks),

corresponding to the case in which all sampled nodes

belong to a social network.

5) Degree Bias (DB): this is a real number computed as

the root mean squared error, for each social network

of the SIS, of the average node degree estimated by

the crawler and the one estimated by MH, which is

considered the best one in estimating the node degree

for a social network in the literature [17], [13]. If the

crawled sample does not cover one or more social

networks, then these last ones are not considered in the

computation of Degree Bias.

It is worth pointing out that, as for the first three metrics,

the higher their value the higher the performance of the

crawling strategy. By contrast, as for the fourth and the fifth

metrics, the lower their values the higher the performance

of the crawling strategy. Observe that Covering is related

to the crawler capability of covering many social networks.

Balancing is related to the crawler capability of uniformly

sampling all the social networks. Furthermore, observe that,

even though one could intuitively think that a fair sampling

should sample different social networks proportionally to their

respective overall size, a similar behavior of the crawler could

result in incomplete samples in case of high variance of these

sizes. Indeed, it could happen that some small social networks

would be not represented in the sample or represented in a

insufficient way. Bridge Ratio and Crossing are related to the

coupling degree, while Degree Bias to the average degree.

Finally, we note that the defined metrics are not completely

independent from each other. For instance, if BR = 0, then

CR and CV are also 0. Analogously, the value of CRinfluences both CV and BL.

B. Analysis of BFS, RW, MH and BDS

In this section, we analyze the performances of BFS, RW,

MH and BDS, when applied on a SIS.

For the first three crawling strategies we randomly chose

4 seeds, each belonging to one of the social networks of our

SIS, and, for each crawling strategy, we run the corresponding

crawler starting from each of the seeds. The number of

iterations of each crawling run was 5,000. The overall number

of seen nodes considered for the analysis of MH (resp., RW,

BFS and BDS) was 135,163 (resp., 941,303, 726,743 and

360,238).

In Table I, we show the values of our metrics, along with

the values of the other parameters we consider particularly

significant (i.e., the average degree of the nodes of each

social network), obtained for the four crawling strategies into

consideration.

We start our analysis by considering the three classical

crawling strategies, namely MH, BFS and RW. BDS is ex-

amined later. As for the first three strategies, we can draw the

following conclusions:

MH

Twitter YouTube Flickr LiveJournal Overall

Bridge Ratio 0.0028 0.0038 0.0 0.0024 0.0025

Crossing 5 3 0 4 3

Covering 2 2 1 3 2

Balancing 32.5503 40.6103 50 31.1253 38.5715

Degree Bias 0

Twitter Avg. Deg. 37.9189 37.0789 - 38.2842 37.76067YouTube Avg. Deg. 9.6064 10.0379 - 10.4144 10.01957Flickr Avg. Deg. - - 104.3497 - 104.3497LiveJournal Avg. Deg. - - - 40.5604 40.5604

BFS

Bridge Ratio 0.0008 0.0054 0.0 0.0 0.0016

Crossing 7 43 0 0 12.5

Covering 2 3 1 1 1.75

Balancing 49.9350 42.0286 50 49.9866 47.9876

Degree Bias 103.9339

Twitter Avg. Deg. 21.92 - - - 21.92YouTube Avg. Deg. - 9.5106 - - 9.5106Flickr Avg. Deg. - - 311.6118 - 311.6118LiveJournal Avg. Deg. - - - 40.0136 40.0136

RW

Bridge Ratio 0.0 0.00067 0.0 0.0 0.00017

Crossing 0 4 0 0 1

Covering 1 3 1 1 1.5

Balancing 49.7469 40.1363 50 50 47.4708

Degree Bias 180.8260

Twitter Avg. Deg. 39.0 41.9489 - - 40.4745YouTube Avg. Deg. - 12.5081 - - 12.5081Flickr Avg. Deg. - 476.6446 455.3002 - 465.9724LiveJournal Avg. Deg. - - - 43.3333 43.3333

BDS

Bridge Ratio - - - - 0.0965

Crossing - - - - 286

Covering - - - - 4

Balancing - - - - 13.5103

Degree Bias - - - - 19.0145

Twitter Avg. Deg. - - - - 42.0383YouTube Avg. Deg. - - - - 11.9278Flickr Avg. Deg. - - - - 129.5833LiveJournal Avg. Deg. - - - - 68.6234

TABLE IPERFORMANCES OF THE CRAWLING STRATEGIES INTO EXAMINATION

1) The value of Bridge Ratio is very low for all these

crawling strategies. For MH there are 2.5 bridges for

each 1000 crawled nodes on average. BFS behaves

worse than MH and RW is the worst one.

2) The value of Crossing is generally low for all these

crawling strategies. For BFS an increase of the value

can be observed only when the seed belongs to YouTube.

RW shows again the worst value. This result is clearly

related to the low value of Bridge Ratio, since the

few discovered bridges do not allow the crawlers to

sufficiently cross different social networks.

3) The value of Covering is quite low for all these tech-

niques. On overage only two of the social networks of

the SIS are visited. Also this result is related to the low

values of Bridge Ratio and Crossing.

4) The value of Balancing is very high for all these

techniques, very close to the maximum one (i.e., 50).

This indicates that, as far as this metric is concerned,

they behaves very badly. Indeed, it happens that they

often stay substantially bounded in the social network

of the starting seed. BFS and RW are even worse than

MH.

5) As for the average degrees of nodes, it is well known

that MH is the crawling strategy that best estimates them

in a single social network [17], [13]. From the previous

reasoning on Balancing it is possible to deduce that each

run of MH visits one or at most two social networks. As

a consequence, we can assume that the average degrees

it provides are also those of reference for the social

networks of the SIS, provided that at least one run of

MH starting from each social network is performed.

508509

Page 5: [IEEE 2012 International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2012) - Istanbul (2012.08.26-2012.08.29)] 2012 IEEE/ACM International Conference on Advances

On the basis of these reference values, we detect that

BFS presents a high value of Degree Bias. This fact is

well known in the literature for a scenario consisting

of a single social network, and thus we confirm this

conclusion also in the context of SISs. RW is even worse

than BFS.

In sum, we may conclude that the conjectures given in

Section III are fully confirmed by experiments. Now we have

to see how our crawling strategy behaves on the same SIS.In order to analyze BDS we first performed a large set of

experiments to identify the best values of nf , bf and btf .

Due to space reasons, we cannot report here the details of

these experiments. The interested reader can find them at the

URL http://www.ursino.unirc.it/bds.html. We

only say that a parameter configuration generally guaranteeing

satisfactory values for all the metrics into consideration is

nf = 0.10, bf = 0.25, btf = 0.25. However, there could

be some applications in which some metrics are much more

important than the other ones. BDS is highly flexible and,

in these cases, allows the choice of the configuration which

favors that metric. The values of our metrics obtained by BDS

with the chosen configuration are reported in the last part of

Table I. From the analysis of these values it clearly emerges

that, when operating on a SIS, BDS highly outperforms the

other approaches. The only exception is MH for Degree Bias

since, according to [13], [17], we have assumed that MH is the

best method to estimate the average node degree. However,

also for this metric, BDS obtains very satisfactory results.

As a final remark we highlight that, besides the capability

shown by BDS of crossing through different social networks,

overcoming the drawbacks of compared crawler strategies,

BDS presents a good behavior also from an intra-social-network point of view. This claim is supported from both the

results obtained for the Degree Bias, and the consideration that

our crawling algorithm, in absence of bridges, can be located

between BFS and MH, thus producing intra-social-network

results that reasonably cannot differ significantly from the ones

of these strategies.

VI. CONCLUSION AND FUTURE WORK

We think that SIS analysis is a very promising research

field and so we plan to perform further research efforts in the

future. In particular, one of the most challenging issue could

be the improvement of the BDS crawling strategy in such a

way that the values of nf , bf and btf dynamically change

during the crawling activity to adapt to the specificities of

the crawled SIS. Moreover, we plan to better investigate the

properties of BDS also as single-social-network crawler, also

to help the setting of the above parameters. Finally, we plan

to investigate the possible connections of our approach with

the information integration ones as well as to deal with the

privacy issue in crawling with SIS settings.

AcknowledgmentThis work was partially funded by the Italian Ministry of

Research through the PRIN Project EASE (Entity Aware

Search Engines).

REFERENCES

[1] FriendFeed. http://friendfeed.com/, 2012.[2] Gathera. http://www.gathera.com/, 2012.[3] Google Open Social. http://code.google.com/intl/it-IT/

apis/opensocial/, 2012.[4] Power.com. http://power.com, 2012.[5] The Friend of a Friend (FOAF) project.

http://www.foaf-project.org/, 2012.[6] XFN - XHTML Friends Network. http://gmpg.org/xfn, 2012.[7] Y.Y. Ahn, S. Han, H. Kwak, S. Moon, and H. Jeong. Analysis of

topological characteristics of huge online social networking services. InProc. of the International Conference on World Wide Web (WWW’07),pages 835–844, Banff, Alberta, Canada, 2007. ACM.

[8] M. Berlingerio, M. Coscia, F. Giannotti, A. Monreale, and D. Pedreschi.Towards discovery of eras in social networks. In Proc. of the Workshopsof the International Conference on Data Engineering (ICDE 2010),pages 278–281, Long Beach, California, USA, 2010. IEEE.

[9] F. Buccafurri, V.D. Foti, G. Lax, A. Nocera, and D. Ursino. BridgeAnalysis in a Social Internetworking Scenario. Technical Report. Avail-able at the address http://www.ursino.unirc.it/bds.html,2012.

[10] D.H Chau, S. Pandit, S. Wang, and C. Faloutsos. Parallel crawling foronline social networks. In Proc. of the International Conference onWorld Wide Web (WWW’07), pages 1283–1284, Banff, Alberta, Canada,2007. ACM.

[11] X. Cheng, C. Dale, and J. Liu. Statistics and Social Network of YoutubeVideos. In Proc. of the International Workshop on Quality of Service(IWQoS 2008), pages 229–238, Enschede, The Netherlands, 2008. IEEE.

[12] A.C. Gilbert and K. Levchenko. Compressing network graphs. In Proc.of the International Workshop on Link Analysis and Group Detection(LinkKDD’04), Seattle, WA, USA, 2004. ACM.

[13] M. Gjoka, M. Kurant, C.T. Butts, and A. Markopoulou. Walking inFacebook: A case study of unbiased sampling of OSNs. In Proc.of the International Conference on Computer Communications (INFO-COM’10), pages 1–9, San Diego, CA, USA, 2010. IEEE.

[14] J. Kleinberg. The convergence of social and technological networks.Communications of the ACM, 51(11):66–72, 2008.

[15] B. Krishnamurthy, P. Gill, and M. Arlitt. A few chirps about Twitter. InProc. of the First Workshop on Online Social Networks, pages 19–24,Seattle, Washington, USA, 2008.

[16] V. Krishnamurthy, M. Faloutsos, M. Chrobak, L. Lao, J.H. Cui, andA. Percus. Reducing Large Internet Topologies for Faster Simulations.In Proc. of the International Conference on Networking (Networking2005), pages 165–172, Waterloo, Ontario, Canada, 2005. Springer.

[17] M. Kurant, A. Markopoulou, and P. Thiran. On the bias of BFS (BreadthFirst Search). In Proc. of the International Teletraffic Congress (ITC 22),pages 1–8, Amsterdam, The Netherlands, 2010. IEEE.

[18] S.H. Lee, P.J. Kim, and H. Jeong. Statistical properties of samplednetworks. Physical Review E, 73(1):016102, 2006.

[19] J. Leskovec and C. Faloutsos. Sampling from large graphs. In Proc. ofthe ACM SIGKDD International Conference on Knowledge Discoveryand Data Mining (KDD’06), pages 631–636, Philadelphia, PA, USA,2006. ACM.

[20] L. Lovasz. Random walks on graphs: A survey. Combinatorics, PaulErdos is Eighty, 2(1):1–46, 1993.

[21] A. Mislove, H.S. Koppula, K.P. Gummadi, F. Druschel, and B. Bhat-tacharjee. Growth of the Flickr Social Network. In Proc. of theInternational Workshop on Online Social Networks (WOSN08), pages25–30, Seattle, Washington, USA, 2008. ACM.

[22] D. Rafiei. Effectively visualizing large networks through sampling. InVisualization, IEEE 2005 (VIS’05), pages 375–382, Minneapolis, MN,2005. IEEE.

[23] A.H. Rasti, M. Torkjazi, R. Rejaie, and D. Stutzbach. EvaluatingSampling Techniques for Large Dynamic Graphs. Univ. Oregon, Tech.Rep. CIS-TR-08-01, 2008.

[24] D. Stutzback, R. Rejaie, N. Duffield, S. Sen, and W. Willinger. Onunbiased sampling for unstructured peer-to-peer networks. In Proc. ofthe International Conference on Internet Measurements, pages 27–40,Rio De Janeiro, Brasil, 2006. ACM.

[25] S. Ye, J. Lang, and F. Wu. Crawling online social graphs. In Proc. of theInternational Asia-Pacific Web Conference (APWeb’10), pages 236–242,Busan, Korea, 2010. IEEE.

509510


Top Related