data and networks giacs conference palermo 9-4-08

61
Data and networks GIACS Conference Palermo 9- 4-08

Upload: jacob-hodges

Post on 17-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Data and networks GIACS Conference Palermo 9-4-08

Data and networksGIACS Conference Palermo 9-4-08

Page 2: Data and networks GIACS Conference Palermo 9-4-08

GIACS PALERMO 9-4-08

•Networks

Page 3: Data and networks GIACS Conference Palermo 9-4-08

GIACS PALERMO 9-4-08

Correlation based Minimal Spanning Tree1071 stocks traded at NYSE between 1987-1998 Different colours refers to different SIC sectors

Correlation based Minimal Spanning TreeArtificial market of 1071 stocks According the one factor model.Different colours refers to different SIC sectors

•Networks as an instrument of Data Filtering

Topology of correlation based minimal spanning trees in real and model marketsG. Bonanno, G. Caldarelli F. Lillo, R. Mantegna,Physical Review E 68 046130 (2003).

Networks of equities in financial marketsG. Bonanno, GC, F. Lillo, S. Miccichè, N. Vandewalle, R. N. Mantegna,European Physical Journal B 38 363-372 (2004).

Page 4: Data and networks GIACS Conference Palermo 9-4-08

GIACS PALERMO 9-4-08

COSIN (official number IST-20001-33555) was a Research Project financed by European Commissionthrough the Fifth Framework Programme.

COSIN is part of the actions taken by the Future and Emerging Technologies (FET)in the priority area of research of Information Society Technologies (IST)(http://www.cordis.lu/IST/FET)

Documents at http://www.cosinproject.org

•The Cosin project

Page 5: Data and networks GIACS Conference Palermo 9-4-08

GIACS PALERMO 9-4-08

COSIN involves7 different nodes in 5 countries

A. (Ph +CS) Roma, ItalyB. (Ph) Barcelona, SpainC. (Ph) Lausanne, SwitzerlandD. (Ph) Ens, Paris, FranceE. (CS) Karlsruhe, GermanyF. (Ph) Upsud, Paris, France

EU countries 2001

Non EU countries 2001

EU COSIN participant

Non EU COSIN participant

•The Cosin project

Page 6: Data and networks GIACS Conference Palermo 9-4-08

GIACS PALERMO 9-4-08

G. Bonanno, G. Caldarelli, F.Colaiori, G. Di Battista, D. Donato, S. Leonardi, R. Mantegna, A. Marchetti-Spaccamela, M. Patrignani, L. Pietronero, V. Servedio

A. Arenas, M. Boguña, A. Díaz-Guilera, R. Ferrer i Cancho, M.A. Muñoz, M.A Serrano, R. Pastor-Satorras

G. Bianconi, A. Capocci, P. De Los Rios, T. Erlebach, T. Petermann, Y.-C. Zhang

A. Barrat. S. Battiston, P. Nadal, A. Vespignani, G. Weisbuch,

U. Brandes, M. Gaertler, M. Kaufmann, D. Wagner,

•Some of the Cosin people

Page 7: Data and networks GIACS Conference Palermo 9-4-08

GIACS PALERMO 9-4-08

1. To develop a unified set of Complex Systems theoretical methodologies for the characterization of Complex Networks,

2. To develop statistical models for networks growth and evolution.

3. To collect data mainly for Internet and World Wide Web

4. To extend analysis to social and economic networks

5. To develop visualization tools for large scale systems

6. To disseminate results through publication, conferences and project web site.

•The Cosin project

Page 8: Data and networks GIACS Conference Palermo 9-4-08

GIACS PALERMO 9-4-08

1. After three years of activity we have a common ground of methodologies and tools at least between computer scientists and physicists (also some economists). Some more effort would be necessary to integrate social scientists.

2. We provided a class of models for network growth and evolution, moreover we addressed the study of statistical properties of weighted networks.

3. Data collection for Internet and World Wide Web resulted much more difficult than expected. Actually larger consortia have been funded specifically for this task in the meanwhile. Thank to external collaboration we still found the data to validate the models we produced

•A Cosin summary

Page 9: Data and networks GIACS Conference Palermo 9-4-08

GIACS PALERMO 9-4-08

4. In economic and financial networks , COSIN people are on the frontline of this very new field of research. This new approach attracted the interest of the community at level of Nobel laureates. Less successful has been the impact in social science. Unexpected and very successful has been the impact on biology (botany, zoology).

5. Standard visualization problem wants to keep all the graph structure and present it suitably. On this point some progress has been made, it is worth to mention that several ideas are now under consideration for the visualization of ``simplified graphs’’.

6. The project had a considerable impact on the scientific community in terms of citations, visibility, conferences, schools, books and data download from site. Maybe some more work could be done for the general public.

Page 10: Data and networks GIACS Conference Palermo 9-4-08

GIACS PALERMO 9-4-08

The graph of scientific collaborations on scale-free networks in statistical physics

M.E.J NewmanPRE 69 026113 (2004)

Page 11: Data and networks GIACS Conference Palermo 9-4-08

GIACS PALERMO 9-4-08

• More than 150 referred papers (some of them Nature, PNAS, PRL, LNCS)• Lectures and talks in the various world conference (for physics STATPHYS, APS Meetings) and invited talks in various institutions• Books

•Dissemination

Page 12: Data and networks GIACS Conference Palermo 9-4-08

GIACS PALERMO 9-4-08

The Sitges Conference published the proceedings of the most interesting talks on a special volume Statistical Mechanics of Complex NetworksSeries: Lecture Notes in Physics, Vol. 625 Pastor-Satorras, Romualdo; Rubi, Miguel; Diaz-Guilera, Albert (Eds.) 2003, XII, 206 p., HardcoverISBN: 3-540-40372-8

The Rome Conference published the proceeding on a special issue of the European Physical Journal B

Page 13: Data and networks GIACS Conference Palermo 9-4-08

GIACS PALERMO 9-4-08

•Web site

Page 14: Data and networks GIACS Conference Palermo 9-4-08

GIACS PALERMO 9-4-08

Trivially, the access to data was crucial for the project

We had that in some cases we found very nice datasets and could work on them

1. Internet (AS topology)

2. Wikipedia.

In presence of poor or no data, we obtained (of course) only

partial results

1. Liquidity shocks,

2. River networks

•What about data?

Page 15: Data and networks GIACS Conference Palermo 9-4-08

GIACS PALERMO 9-4-08

STATISTICAL PROPERTIES OF THE WIKIGRAPH

L.S. Buriol A. Capocci, F. Colaiori, D. Donato, S. Leonardi, F. Rao, V. Servedio, GC

Centro “E. Fermi”

1. Taxonomy and clustering in collaborative systems: the case of the on-line encyclopedia WikipediaA.Capocci, F. Rao, GC Europhysics Letters 81 28008 (arXiv:0710.3058) (2008)

2. Preferential attachment in the growth of social networks: the Internet encyclopedia WikipediaA. Capocci, V.D.P. Servedio, F. Colaiori, L.S. Buriol, D. Donato, S. Leonardi, GC

Physical Review E 74 036116 (2006).

Page 16: Data and networks GIACS Conference Palermo 9-4-08

GIACS PALERMO 9-4-08

•Wikipedia intro

Page 17: Data and networks GIACS Conference Palermo 9-4-08

GIACS PALERMO 9-4-08

Wikipedia in other languagesYou may read and edit articles in many different languages:Wikipedia encyclopedia languages with over 100,000 articles

Deutsch (German) · Français (French) · Italiano (Italian) · (Japanese) · Nederlands (Dutch) · Polski (Polish) · Português (Portuguese) · Svenska (Swedish)

Wikipedia encyclopedia languages with over 10,000 articles · Български (Bulgarian) · Català (Catalan) · Česky (Czech) · Dansk (Danish) · (Arabic) العربيةEesti (Estonian) · Español (Spanish) · Esperanto · Galego (Galician) · עברית (Hebrew) · Hrvatski (Croatian) · Ido · Bahasa Indonesia (Indonesian) · 한국어 (Korean) · Lietuvių (Lithuanian) · Magyar (Hungarian) · Bahasa Melayu (Malay) · Norsk bokmål (Norwegian) · Norsk nynorsk (Norwegian) · Română (Romanian) · Русский (Russian) · Slovenčina (Slovak) · Slovenščina (Slovenian) · Српски (Serbian) · Suomi (Finnish) · Türkçe (Turkish) · Українська (Ukrainian) · 中文 (Chinese)

Wikipedia encyclopedia languages with over 1,000 articles Alemannisch (Alemannic) · Afrikaans · Aragonés (Aragonese) · Asturianu (Asturian) · Azərbaycan (Azerbaijani) · Bân-lâm-gú (Min Nan) · Беларуская (Belarusian) · Bosanski (Bosnian) · Brezhoneg (Breton) · Чăваш чěлхи (Chuvash) · Corsu (Corsican) · Cymraeg (Welsh) · Ελληνικά (Greek) · Euskara (Basque) · فارسی (Persian) · Føroyskt (Faroese) · Frysk (Western Frisian) · Gaeilge (Irish) · Gàidhlig (Scots Gaelic) · हि�न्दी� (Hindi) · Interlingua · Íslenska (Icelandic) · Basa Jawa (Javanese) · ქართული (Georgian) · ಕನ್ನ�ಡ (Kannada) · Kurdî / · Latina (Latin) · Latviešu (Latvian) · Lëtzebuergesch (Luxembourgish) · (Kurdish) كوردیLimburgs (Limburgish) · Македонски (Macedonian) · मराठी� (Marathi) · Napulitana (Neapolitan) · Occitan · Ирон (Ossetic) · Plattdüütsch (Low Saxon) · Scots · Sicilianu (Sicilian) · Simple English · Shqip (Albanian) · Sinugboanon (Cebuano) · Srpskohrvatski/Српскохрватски (Serbo–Croatian) · தமி�ழ் (Tamil) · Tagalog · ภาษาไทย (Thai) · Tatarça (Tatar) · తెలు�గు� (Telugu) · Tiếng Việt (Vietnamese) · Walon (Walloon)

Complete list · Multilingual coordination · Start a Wikipedia in another language

•Wikipedia intro

Page 18: Data and networks GIACS Conference Palermo 9-4-08

GIACS PALERMO 9-4-08

The datasets of each language are available in two selfextracting files for mysql database. The table cur contains the current on-line articles, whereas the table old contains all previous versions of each current article. Old versions of an article are identified for using the same title, and not the same id. The dataset dumps are updated almost weekly, so the current graph is usually not more than a week old.

For generating a graph from the link structure of a dataset, each article is considered a node and each hyperlink between articles is a link in this graph. In the wikipedia datasets, each webpage is a single article. An article also might contain some external links that point pages outside the dataset. Usually wikipedia articles has no external links, or just a few of them. These kind of links are not considered for generating the wikigraphs, since we want to restrict the graph to pages into the set being analyzed.

•Wikipedia intro

Page 19: Data and networks GIACS Conference Palermo 9-4-08

GIACS PALERMO 9-4-08

• sociological reasons: the encyclopedia collects pages written by a number of indipendent and eterogeneous individuals. Each of them autonomously decides about the content of the articles with the only constraint of a prefixed layout. The autonomy is a common feature of the content creation in the Web. The wikipedia authors’ community is formed by members whose only wish is to make available to the world concepts and topics that they consider meaningful. In some sense, tracing the evolution of the wikipedia subsets should mirror the develop of significant trends within each linguistic community.

• generation on time: wikipedia provides time information associated with nodes. Moreover, it provides old information: time information for the creation and the modifications for each page on the dataset.

• independency of external links: wikipedia articles link mainly to articles on the same dataset.

• variety of graph sizes: it can be collected one graph by language, and the graph dimensions vary from a few hundred pages up to half million pages.

•Wikipedia interests

Page 20: Data and networks GIACS Conference Palermo 9-4-08

GIACS PALERMO 9-4-08

Summarizing:

• We have available all the history of growth, so that we can study the evolution

• We have an example of a “social” network of huge size

• We can compare the system produced by users of different language, thereby measuring the effect of different cultures.

• We can study Wikipedia as a case study for the World Wide Web

WE RECOVER A PREFERENTIAL ATTACHMENT MECHANISM FROM THE DATA.

DIFFERENT LANGUAGES PRODUCE SIMILAR STRUCTURES

WE FIND A SYSTEM SIMILAR TO THE WWW EVEN IF THE MICROSCOPIC RULE OF GROWTH IS VERY DIFFERENT.

•Results

Page 21: Data and networks GIACS Conference Palermo 9-4-08

GIACS PALERMO 9-4-08

We generated six wikigraphs, wikiEN, wikiDE, wikiFR, wikiES, wikiIT and wikiPT, generated from the English, German, French, Spanish, Italian and Portuguese datasets, respectively. The graphs were obtained from an old dump of June 13, 2004. We are not using the current data due to disk space restrictions. The English dataset of June 2005 has more than 36 GB compacted, that is about 200 GB expanded.

The page that was mostly visited was the main pages for wikiEN, wikiDE, wikiFR and wikiES, while that for the datasets wikiIT and wikiPT there were no visits associated with the pages.

•The Wiki graphs

Page 22: Data and networks GIACS Conference Palermo 9-4-08

GIACS PALERMO 9-4-08

• SCC (Strongly Connected Component) includes pages that are mutually reachable by traveling on the graph• IN component is the region from which one can reach SCC• OUT component encompasses the pages reached from SCC. • TENDRILS are pages reacheable from the IN component,and not pointing to SCC or OUT region TENDRILS also includes those pages that point to the OUT region not belonging to any of the other defined regions. • TUBES connect directly IN and OUT regions,• DISCONNECTED regions are those isolated from the rest.

The Bow-tie structure, found in the WWW (Broder et al. Comp. Net. 33, 309, 2000)

Page 23: Data and networks GIACS Conference Palermo 9-4-08

GIACS PALERMO 9-4-08

The percentage of the various components of the Wikigraph for the various languages.

The measure/size of the Wikigraph for the various languages.

•The Wikigraphs

Page 24: Data and networks GIACS Conference Palermo 9-4-08

GIACS PALERMO 9-4-08

in–degree(empty) and out–degree(filled) Occurrency distributions for the Wikgraph

in English (○) and Portuguese ().

The Degree shows fat tails that can be approximated by a power-law function of the kind P(k) ~ k-

Where the exponent is

the same both for in-degree and out-

degree.

In the case of WWW2 ≤ in ≤ 2.1

•Power laws (what else? )

Page 25: Data and networks GIACS Conference Palermo 9-4-08

GIACS PALERMO 9-4-08

The average neighbors’ in–degree, computed along incoming edges, as a function of the in–degree for the English (○) and Portuguese ()

As regards the assortativity (as measured

by the average degree of the

neighbours of a vertex with

degree k) there is no evidence

of any assortative behaviour.

•Correlations

Page 26: Data and networks GIACS Conference Palermo 9-4-08

GIACS PALERMO 9-4-08

The pagerank distribution for wikiEN is a power law function with γ = 2.1. Previous measures in webgraphs also exhibit the same behaviour for the pagerank distribution.

We list the number of visits of the top ranked pages just to show that this value is not related with the pagerank values. We confirm that very little correlation was found between the link analysis characteristics and the actual number of visits.

•PageRank

Page 27: Data and networks GIACS Conference Palermo 9-4-08

GIACS PALERMO 9-4-08

Given the history of growth one can

verify the hypothesis of preferential

attachment. This is done by means of the histogram (k)

who gives the number of vertices (whose degree is k)

acquiring new connections at time

t.This is quantity is weighted by the

factor N(t)/n(k,t)

We find preferential attachment for in and

out degree.

English (○) and Portuguese ().White= in-degreeFilled = out-degree

•Preferential attachment

Page 28: Data and networks GIACS Conference Palermo 9-4-08

GIACS PALERMO 9-4-08

Other power-laws related to dyamics

need to be explainedFor example the

number of updates also follows a power law.

Each point presents the number of nodes (y axis) that were updated exactly x times.

•Updates’ statistics

Page 29: Data and networks GIACS Conference Palermo 9-4-08

GIACS PALERMO 9-4-08

We introduced an evolution rule, similar to other models ofrewiring already considered*,

• At each time step, a vertex is added to the network. It is connected to the existing vertices by M oriented edges; the direction of each edge is drawn at random:

•with probability R1 the edge leaves the new vertex pointing to an existing one chosen with probability proportional to its in–degree;• with probability R2, the edge points to the new vertex, and the source vertex is chosen with probability proportional to its out–degree.

• Finally, with probability R3 = 1 − R1 − R2 the edge is added between existing vertices: the source vertex is chosen with probability proportional to the out–degree, while the destination vertex is chosen with probability proportional to the in–degree.

* See for example Krapivsky Rodgers and Redner PRL 86 5401 (2001)

•Wikipedia growth model

Page 30: Data and networks GIACS Conference Palermo 9-4-08

GIACS PALERMO 9-4-08

Actually 1) This network is oriented. 2) The preferential attachment in Wikipedia has a

somewhat different nature. Here, most of the times, the edges are added between existing vertices differently from the BA model. For instance, in the English version of Wikipedia a largely dominant fraction 0.883 of new edges is created between two existing pages, while a smaller fraction of edges points or leaves a newly added vertex (0.026 and 0.091 respectively).

From these data it seems that a model in the spirit of BA could reproduce most of the features of the system.

•Wikipedia growth model

Page 31: Data and networks GIACS Conference Palermo 9-4-08

GIACS PALERMO 9-4-08

The model can be solved analytically

P(kin) ~ kin-

inin1-R2))

P(kout) ~ kout

outout1-R1))

We can use for the model the empirical values of R1=0.026R2=0.091R3=0.883Already measured for the English version of Wikigraph

in out

•Wikipedia growth model

Page 32: Data and networks GIACS Conference Palermo 9-4-08

GIACS PALERMO 9-4-08

The model can be solved analytically

Knnin

(kin) ~ M N1-R1 R1R2/R3 (R3≠0)

Knnin

(kin) ~ M R1R2 ln (N) (R3=0)Both cases is constant

The value of the constant depends also upon the initial conditions. The two lines refer to two realizations of the model where in one case the 0.5% of the first vertices has been removed.

•Wikipedia growth model

Page 33: Data and networks GIACS Conference Palermo 9-4-08

GIACS PALERMO 9-4-08

• We have a structure that resembles the bow-tie of the WWW

• We have a power-law decay for the degree distributions and also a power-law decay for the number of one page updates

• Preferential Attachment in the Rewiring seems to be the driving force in the evolution of the system

• The microscopic structure of rewiring is very different from that of WWW In principle a user can change any series of edges and add as many pages as wanted. Still most of the quantities are similar

•Wikipedia growth model

Page 34: Data and networks GIACS Conference Palermo 9-4-08

GIACS PALERMO 9-4-08

It turns out that the pagerank of the pages is not related with the number of visit opens a very interesting scenario for further research work. Since, by definition, pagerank should give us the visit time of the page and since actually it is complety indipendent by the number of visits, we wonder if pagerank is a good measure of the authoritativeness of the pages in wikigraphs and which modifications should be introduced in order to tune its performances.

•Wikipedia growth model

Page 35: Data and networks GIACS Conference Palermo 9-4-08

GIACS PALERMO 9-4-08

•River Networks

Page 36: Data and networks GIACS Conference Palermo 9-4-08

GIACS PALERMO 9-4-08

•River Networks

Page 37: Data and networks GIACS Conference Palermo 9-4-08

GIACS PALERMO 9-4-08

•River Networks

Page 38: Data and networks GIACS Conference Palermo 9-4-08

GIACS PALERMO 9-4-08

•River Networks

Page 39: Data and networks GIACS Conference Palermo 9-4-08

GIACS PALERMO 9-4-08

From satellite images one gets Digital Elevation Models (DEM)

156.4 132.4 111.4

170.8 161.3 108.2

182.4 154.5 106.0

From DEM a spanning tree is computed (via steepest descent)From the spanning tree, the number of points uphill is computed

2 3 4

1 1 6

1 2 9

•River Networks

Page 40: Data and networks GIACS Conference Palermo 9-4-08

GIACS PALERMO 9-4-08

HACK’S LAWL// ~ Ah

•River Networks

Page 41: Data and networks GIACS Conference Palermo 9-4-08

GIACS PALERMO 9-4-08

•River Networks

Page 42: Data and networks GIACS Conference Palermo 9-4-08

GIACS PALERMO 9-4-08

Data on Mars topography were collected through the Mars Orbiter Laser Altimeter (MOLA)

•River Networks

Page 43: Data and networks GIACS Conference Palermo 9-4-08

GIACS PALERMO 9-4-08

•River Networks

Page 44: Data and networks GIACS Conference Palermo 9-4-08

GIACS PALERMO 9-4-08

•River Networks

Page 45: Data and networks GIACS Conference Palermo 9-4-08

GIACS PALERMO 9-4-08

Results are that we can distinguish regions whose DEM networks have properties similar to River Networks on Earth.

For River on Earth

P(A) A-1.43

•River Networks

Page 46: Data and networks GIACS Conference Palermo 9-4-08

THE LIQUIDITY MARKET

Monetary Policy

Banks get liquidity from ECB through auctions

Monetary policy realised by ECB to control interest rates

BANKS MANAGE THEIR LIQUIDITY IN THE INTERBANK MARKET

Reserves

ECB

Page 47: Data and networks GIACS Conference Palermo 9-4-08

The Market

Money Market

•EUROPEAN CENTRAL BANK provides LIQUIDITY to European Banks, through weekly auctions. •EVERY BANK must DEPOSIT to NATIONAL CENTRAL BANK the 2% of all deposits and debts issued in the last two years. This reserves are supposed to help in the case of liquidity shocks•2% value fluctuates in time and it is recomputed every month.

Banks sell and buy liquidity to adjust their liquidity needs and at the same time tend to reduce the value of reserve.

ECB

Page 48: Data and networks GIACS Conference Palermo 9-4-08

The Market

Market Data

The interbank markets are basically managed by each European country. These markets are in almost all case phone-based, that means that each bank has some brokers doing their transactions by phone. The only exception is the Italian market, which is totally screen-based, implying that each banks operator can see real time quotes of all other banks and do its transaction. The recent paper by Boss et al. investigate the network of overall credit relationships in the Austrian Interbank market. In their study the authors analyze all the liabilities for ten quarterly single months periods, between 2000 and 2003, among 900 banks. They find a power-law distribution of contract sizes, and a power-law decay of the distribution of incoming and outgoing links (a link between two banks exists if the banks have an overall exposure with each other). Furthermore they show that the most vulnerable vertices are those with the highest centrality (measured by the number of paths that go through them). A different issue has been explored by Cocco et al. who have investigated the nature of lending relationships in the fragmented Portuguese interbank market over the period 1997-2001. In fragmented markets the amount and the interest rate on each loan are agreed on a one-to-one basis between borrowing and lending institutions. Other banks do not have access to the same terms, and no public information regarding the loan is available. The authors showed that frequent and repeated interactions between the same banks appear with a probability higher than those expected for random matching. In addition they found that during illiquid periods, and in particular during the Russian financial crisis preferential lending relationships increased.

Page 49: Data and networks GIACS Conference Palermo 9-4-08

The Market

Market Data

Italian Interbank Money Market

Banks operating on the Italian market, this market is

fully electronic for interbank deposit since 1990 (e-Mid)

*) Daily volume 18 billion Euros

*) 200 participants

We report here the analysis on 196 Italian banks (plus 18 banks from abroad who interact with them)

who did 85202 transactions in 2000.

Page 50: Data and networks GIACS Conference Palermo 9-4-08

INTRODUCTION

Time activity

two time scales:dayone month maintenance period

Page 51: Data and networks GIACS Conference Palermo 9-4-08

Statistical Properties

Market Data

The network shows a rather peculiar architectureThe banks form a disassortative network where large banks interact mostly with small ones.

Page 52: Data and networks GIACS Conference Palermo 9-4-08

Statistical Properties

Market Data

Actually the banks form different groups roughly related to their “size” when considering the average volume of money exchanged.

Page 53: Data and networks GIACS Conference Palermo 9-4-08

Statistical Properties

Degree Distributions

Using the latter quantity we can divide banks in four groups (same number of classes of the Bank of Italy classification). Group 1 with volume in the range 0-23 million Euro per day, Group 2 in the range 23-70 million Euro per day, Group 3 in the range 70-165 million Euro per day, Group 4 over 165 million Euro per day. In this way we find an overlap of more than 90% between the two classifications.

Page 54: Data and networks GIACS Conference Palermo 9-4-08

Communities

Separation of business

Two main communities emerge

Many small banks and few little banks.

Second eigenvector of the normal matrix

Page 55: Data and networks GIACS Conference Palermo 9-4-08

Modelling

Model of bank network

v i

v i

v i

v i

v i

We assign to the N nodes (N is the size of the system) a value drawn from the previous distribution. Vertices origin and destination for one edge are chosen with a probability pij proportional to the sum of respective sizesvi and vj . In formulas

ijiji

jiij vv

vvp

,

totiji

jiiji

ji VNvvvv )1(22

1

,,

i

itot vV2

1

Page 56: Data and networks GIACS Conference Palermo 9-4-08

Modelling

Market Data

Page 57: Data and networks GIACS Conference Palermo 9-4-08

MODELLING

Model and clustering

To quantify the agreement between experimental and simulated networks we also define an overlap parameter m specifying how good is the behavior of the model in reproducing the observed clustering.

To quantify the agreement between experimental and simulated networks, we proceed in the following way.We define a matrix E, that is a weighted matrix 4 × 4, where the weights represent the number of connections between groups.

In order to measure the overlap between the matrices obtained by data and by computer model, we define a distance based on the differences between the elements of the matrices.

Page 58: Data and networks GIACS Conference Palermo 9-4-08

MODELLING

Model and clustering

gkg

numkg

exkg EEd

,,,

We can define a distance between the number of intergroup edges in experimental data and numerical simulation.

The sum of all elements, is equal to Etot in both cases. Therefore the maximum possible difference is 2Etot. This happens when all the links are between two groups in one case and in other two groups in the other. We use this maximum value to normalize the above expression and we than define theoverlap parameter m: m = 1 − d/2Etot

totgkg

numkg

gkg

exkg EEE

,

,,

,

WE HAVE AN OVERLAP m=98%

Page 59: Data and networks GIACS Conference Palermo 9-4-08

MODELLING

Model and clustering

To evaluate the relevance of division in classes, we have to compare the value of Eg,k with the corresponding quantity Enull

g,k for a network where there is not a division in classes (null hypothesis). The analytical expression for the null case is Enull

g,k = Etot/10 where 10 is the number of possible couplings between the 4 groups. The comparison between the two networks evidences that in the real case emerges the division in groups: in Table for each possible combination of groups is reported the value Eg,k/Etot. In the null case, each element of the same matrix should be equal to 10.

Group 1 2 3 4

1 0 6 4 8

2 6 3 8 17

3 4 8 5 27

4 8 17 27 22

Page 60: Data and networks GIACS Conference Palermo 9-4-08

CONCLUSIONS

Market Data

Financial Networks can help

1. In distinguishing behaviour of different markets

2. In visualizing important features as the business role

3. In testing the validity of market models

They might be an example of scale-free networks even more general than those described by growth and preferential attachment.

Page 61: Data and networks GIACS Conference Palermo 9-4-08

CONCLUSIONS

Thanks to Giulie

Giulia De Masi,Dep. EconomicsUniversità delle MarcheItaly

Giulia Iori,Department of Economics, School of Social Science City University, London

UK