the science of complex networks and the internet: lies, damned lies, and statistics walter willinger...

78
The Science of Complex Networks and the Internet: Lies, Damned Lies, and Statistics Walter Willinger AT&T Labs-Research [email protected]

Upload: morris-thompson

Post on 27-Dec-2015

214 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: The Science of Complex Networks and the Internet: Lies, Damned Lies, and Statistics Walter Willinger AT&T Labs-Research walter@research.att.com

The Science of Complex Networks

and the Internet:

Lies, Damned Lies, and Statistics

Walter WillingerAT&T Labs-Research

[email protected]

Page 2: The Science of Complex Networks and the Internet: Lies, Damned Lies, and Statistics Walter Willinger AT&T Labs-Research walter@research.att.com

2

Outline

• The Science of Complex Networks (“Network Science”)

• What “Network Science” has to say about the Internet

• What “engineering” has to say about the Internet

• Engineered vs. random network models

• Implications

Page 3: The Science of Complex Networks and the Internet: Lies, Damned Lies, and Statistics Walter Willinger AT&T Labs-Research walter@research.att.com

3

Acknowledgments• John Doyle (Caltech)• David Alderson (Naval Postgraduate School)• Steven Low (Caltech)• Yin Zhang (Univ. of Texas at Austin)• Matthew Roughan (U. Adelaide, Australia)• Anja Feldmann (TU Berlin)• Lixia Zhang (UCLA)• Reza Rejaie (Univ. of Oregon)• Mauro Maggioni (Duke Univ.)• Bala Krishnamurthy, Alex Gerber, Shubho Sen,

Dan Pai (AT&T)

• … and many of their students and postdocs

Page 4: The Science of Complex Networks and the Internet: Lies, Damned Lies, and Statistics Walter Willinger AT&T Labs-Research walter@research.att.com

4

NETWORK SCIENCE

http://www.nap.edu/catalog/11516.html

• “First, networks lie at the core of the economic, political, and social fabric of the 21st century.”

• “Second, the current state of knowledge about the structure, dynamics, and behaviors of both large infrastructure networks and vital social networks at all scales is primitive.”

• “Third, the United States is not on track to consolidate the information that already exists about the science of large, complex networks, much less to develop the knowledge that will be needed to design the networks envisaged…”

January, 2006

Page 5: The Science of Complex Networks and the Internet: Lies, Damned Lies, and Statistics Walter Willinger AT&T Labs-Research walter@research.att.com

5

“Network Science” in Theory …• What?

“The study of network representations of physical, biological, and social phenomena leading to predictive models of these phenomena.” (National Research Council Report, 2006)

• Why? “To develop a body of rigorous results that

will improve the predictability of the engineering design of complex networks and also speed up basic research in a variety of applications areas.” (National Research Council Report, 2006)

• Who?– Physicists (statistical physics),

mathematicians (graph theory), computer scientists (algorithm design), etc.

Page 6: The Science of Complex Networks and the Internet: Lies, Damned Lies, and Statistics Walter Willinger AT&T Labs-Research walter@research.att.com

6

Basic Questions ask by Network Scientists

Question 1

To what extent does there exist a “network structure” that is

responsible for large-scale properties in complex systems?

• Performance • Robustness• Adaptability / Evolvability• “Complexity”

Page 7: The Science of Complex Networks and the Internet: Lies, Damned Lies, and Statistics Walter Willinger AT&T Labs-Research walter@research.att.com

7

Basic Questions ask by Network Scientists (cont.)

Question 2

Are there “universal laws” governing the structure (and

resulting behavior) of complex networks? To what extent is

self-organization responsible for the emergence of system

features not explained from a traditional (i.e., reductionist)

viewpoint?

Page 8: The Science of Complex Networks and the Internet: Lies, Damned Lies, and Statistics Walter Willinger AT&T Labs-Research walter@research.att.com

8

Basic Questions ask by Network Scientists (cont.)

Question 3

How can one assess the vulnerabilities or fragilities

inherent in these complex networks in order to avoid

“rare yet catastrophic” disasters? More practically,

how should one design, organize, build, and manage

complex networks?

Page 9: The Science of Complex Networks and the Internet: Lies, Damned Lies, and Statistics Walter Willinger AT&T Labs-Research walter@research.att.com

9

Observation• The questions motivating recent work in

“Network Science” are “the right questions”– network structure and function– technological, social, and biological

• The issue is whether or not “Network Science” in its current form has been successful in providing scientifically solid answers to these (and and other) questions.

• Our litmus test for examining this issue– Applications of the current “Network

Science” approach to real systems of interest (e.g., Internet)

Page 10: The Science of Complex Networks and the Internet: Lies, Damned Lies, and Statistics Walter Willinger AT&T Labs-Research walter@research.att.com

10

As scientists, why should we care?• “Network Science” as a new scientific discipline

Page 11: The Science of Complex Networks and the Internet: Lies, Damned Lies, and Statistics Walter Willinger AT&T Labs-Research walter@research.att.com

11

Publications in Network Science Literature by Discipline

(As recorded by the Web of Science1 on October 1, 2007; courtesy D. Alderson)

0

500

1000

1500

2000

2500

3000

1998 1999 2000 2001 2002 2003 2004 2005 2006 2007*

Jour

nal P

ublic

atio

ns (cu

mul

ativ

e) "high impact"

physicsbiology, chemistry, medicine

computer science

sociology, economicsapplied mathematics

engineering

earth science

complex systemsbusiness, management

1998 1999 2000 2001 2002 2003 2004 2005 2006 2007*"high impact" 1 1 5 4 17 13 22 16 9 4 92physics 1 7 26 62 124 139 230 260 350 286 1485biology, chemistry, medicine 0 1 4 16 22 31 67 80 94 77 392computer science 0 1 2 7 10 22 47 61 64 19 233sociology, economics 0 1 2 6 7 11 14 22 15 16 94engineering 0 0 1 2 7 4 13 15 22 12 76complex systems 0 1 1 2 3 7 11 13 18 22 78applied mathematics 0 0 0 0 2 6 6 10 29 21 74earth science 0 1 1 2 7 4 6 11 11 0 43business, management 0 0 0 1 2 1 4 6 9 1 24

2 13 42 102 201 238 420 494 621 458 2591

Page 12: The Science of Complex Networks and the Internet: Lies, Damned Lies, and Statistics Walter Willinger AT&T Labs-Research walter@research.att.com

12

Article cites

1. Watts, DJ; Strogatz, SH. 1998. Collective dynamics of 'small-world' networks, NATURE 393(668).

2244

2. Barabasi AL, Albert R. 1999. Emergence of scaling in random networks. SCIENCE 286 (543).

2110

3. Albert R, Barabasi AL. 2002. Statistical Mechanics of Complex Networks. REV. OF MODERN PHYSICS 74 (1).

1972

4. Newman MEJ. 2003. The structure and function of complex networks. SIAM REVIEW 45 (2).

960

5. Jeong H, Tombor B, Albert R, et al. 2000. The large-scale organization of metabolic networks. NATURE 407 (6804).

903

6. Strogatz, SH. 2001. Exploring complex networks, NATURE 410(6825).

884

7. Albert R, Jeong H, Barabasi AL. 2000. Error and attack tolerance of complex networks. NATURE 406 (6794).

747

8. Dorogovtsev SN, Mendes JFF. 2002. Evolution of networks. ADV IN PHYSICS 51 (4).

636

9. Giot, L; Bader, J.S.; Brouwer, C; Chaudhuri, A; Kuang, B; et al. 2003. A protein interaction map of Drosophila melanogaster, SCIENCE, 302(5651).

550

10. Milo, R; Shen-Orr, S; Itzkovitz, S; Kashtan, N; Chklovskii, D; Alon, U. 2002. Network motifs: Simple building blocks of complex networks, SCIENCE 298(5594).

489

11. Amaral LAN, et al. 2000. Classes of small-world networks. PROC. NAT. ACAD. SCI. 97 (21).

475

12. Ravasz, E; Somera, AL; Mongru, DA; Oltvai, ZN; Barbasi, AL. 2002. Hierarchical organization of modularity in metabolic networks, SCIENCE 297(5586).

457

13. Pastor-Satorras, R; Vespignani, A. 2001. Epidemic spreading in scale-free networks, PHYS. REV. LETT. 86(14).

440

14. Tong, AHY, et al. 2004. Global mapping of the yeast genetic interaction network. SCIENCE 303(5659)

412

15. Barabasi, AL; Albert, R; Jeong, H. 1999. Mean-field theory for scale-free random networks, PHYSICA A 272.

364

13279

Most Cited Publications in Network Science Literature

(As recorded by the Web of Science1 on October 1, 2007; courtesy D. Alderson)

Page 13: The Science of Complex Networks and the Internet: Lies, Damned Lies, and Statistics Walter Willinger AT&T Labs-Research walter@research.att.com

13

As scientists, why should we care?• “Network Science” as a new scientific discipline

• “Network Science” for the masses …

Page 14: The Science of Complex Networks and the Internet: Lies, Damned Lies, and Statistics Walter Willinger AT&T Labs-Research walter@research.att.com

14

The “New Science of Networks”

Page 15: The Science of Complex Networks and the Internet: Lies, Damned Lies, and Statistics Walter Willinger AT&T Labs-Research walter@research.att.com

15

As scientists, why should we care?• “Network Science” as a new scientific discipline

• “Network Science” for the masses …

• “Network Science” for the (Internet) experts …

Page 16: The Science of Complex Networks and the Internet: Lies, Damned Lies, and Statistics Walter Willinger AT&T Labs-Research walter@research.att.com

16

The “New Science of Networks”

Page 17: The Science of Complex Networks and the Internet: Lies, Damned Lies, and Statistics Walter Willinger AT&T Labs-Research walter@research.att.com

17

As scientists, why should we care?• “Network Science” as a new scientific discipline

• “Network Science” for the masses …

• “Network Science” for the Internet experts …

• “Network Science” for undergraduate/graduate students in Computer Science/Electrical Engineering

Page 18: The Science of Complex Networks and the Internet: Lies, Damned Lies, and Statistics Walter Willinger AT&T Labs-Research walter@research.att.com

18

The “New Science of Networks”

• New course offerings– http://www.cc.gatech.edu/classes/AY2010/cs

8803ns_fall/– http://www.netscience.usma.edu/about.php– http://nicomedia.math.upatras.gr/courses/m

nets/index_en.html– http://www-personal.umich.edu/~mejn/cour

ses/2004/cscs535/index.html– http://www.phys.psu.edu/~ralbert/phys597_

09-fall

Page 19: The Science of Complex Networks and the Internet: Lies, Damned Lies, and Statistics Walter Willinger AT&T Labs-Research walter@research.att.com

19

As scientists, why should we care?• “Network Science” as a new scientific discipline …

• “Network Science” for the masses …

• “Network Science” for the Internet experts …

• “Network Science” for undergraduate/graduate students in Computer Science/Electrical Engineering

• … and most importantly, because we want to know how serious a science “Network Science” is ….

Page 20: The Science of Complex Networks and the Internet: Lies, Damned Lies, and Statistics Walter Willinger AT&T Labs-Research walter@research.att.com

20

The Main Points of this Talk …

I will show that in the case of the Internet …

The application of “Network Science” in its

current form has led to conclusions that are

not controversial but simply wrong.

I will deconstruct the existing arguments and generalize the potential pitfalls common to

“Network Science.”

I will also be constructive and illustrate an alternative approach to “Network Science”

based on engineering considerations.

Page 21: The Science of Complex Networks and the Internet: Lies, Damned Lies, and Statistics Walter Willinger AT&T Labs-Research walter@research.att.com

21

What does “Network Science” say about the Internet

• Illustration with a case study– Problem: Internet router-level topology– Approach: Measurement-based– Result: Predictive models with far-reaching

implications

• Textbook example for the power of “Network Science”– Appears solid and rigorous– Appealing approach with surprising findings– Directly applicable to other domains

Page 22: The Science of Complex Networks and the Internet: Lies, Damned Lies, and Statistics Walter Willinger AT&T Labs-Research walter@research.att.com

22

What does “Network Science” say about the Internet

• Measurement technique– traceroute tool – traceroute discovers compliant (i.e., IP)

routers along path between selected network host computers

Page 23: The Science of Complex Networks and the Internet: Lies, Damned Lies, and Statistics Walter Willinger AT&T Labs-Research walter@research.att.com

23

Running traceroute: Basic Experiment

• Basic “experiment”– Select a source and destination– Run traceroute tool

• Example– Run traceroute from my machine in Florham

Park, NJ, USA to www.duke.edu

Page 24: The Science of Complex Networks and the Internet: Lies, Damned Lies, and Statistics Walter Willinger AT&T Labs-Research walter@research.att.com

Running “traceroute www.duke.edu” from NJ

• 1 fp-core.research.att.com (135.207.16.1) 2 ms 1 ms 1 ms• 2 ngx19.research.att.com (135.207.1.19) 1 ms 0 ms 0 ms• 3 12.106.32.1 1 ms 1 ms 1 ms• 4 12.119.12.73 2 ms 2 ms 2 ms• 5 tbr1.n54ny.ip.att.net (12.123.219.129) 4 ms 5 ms 3 ms• 6 ggr7.n54ny.ip.att.net (12.122.88.21) 3 ms 3 ms 3 ms• 7 192.205.35.98 4 ms 4 ms 8 ms• 8 jfk-core-02.inet.qwest.net (205.171.30.5) 3 ms 3 ms 4 ms• 9 dca-core-01.inet.qwest.net (67.14.6.201) 11 ms 11 ms 11 ms• 10 dca-edge-04.inet.qwest.net (205.171.9.98) 11 ms 15 ms 11

ms• 11 gw-dc-mcnc.ncren.net (63.148.128.122) 18 ms 18 ms 18 ms• 12 rlgh7600-gw-to-rlgh1-gw.ncren.net (128.109.70.38) 18 ms 18

ms 18 ms• 13 roti-gw-to-rlgh7600-gw.ncren.net (128.109.70.18) 20 ms 20

ms 20 ms• 14 art1sp-tel1sp.netcom.duke.edu (152.3.219.118) 23 ms 20 ms

20 ms• 15 webhost-lb-01.oit.duke.edu (152.3.189.3) 21 ms 38 ms 20 ms

24

Page 25: The Science of Complex Networks and the Internet: Lies, Damned Lies, and Statistics Walter Willinger AT&T Labs-Research walter@research.att.com

25

traceroute-paths: (many) source-destination pairs

Page 26: The Science of Complex Networks and the Internet: Lies, Damned Lies, and Statistics Walter Willinger AT&T Labs-Research walter@research.att.com

26

What does “Network Science” say about the Internet

• Measurement technique– traceroute tool – traceroute discovers compliant (i.e., IP) routers along

path between selected network host computers• Available data: from large-scale traceroute

experiments– Pansiot and Grad (router-level, around 1995, France)– Cheswick and Burch (mapping project 1997--, Bell-

Labs)– Mercator (router-level, around 1999, USC/ISI)– Skitter (ongoing mapping project, CAIDA/UCSD)– Rocketfuel (state-of-the-art router-level maps of

individual ISPs, UW Seattle)– Dimes (ongoing EU project)

Page 27: The Science of Complex Networks and the Internet: Lies, Damned Lies, and Statistics Walter Willinger AT&T Labs-Research walter@research.att.com

27http://research.lumeta.com/ches/map/

Page 28: The Science of Complex Networks and the Internet: Lies, Damned Lies, and Statistics Walter Willinger AT&T Labs-Research walter@research.att.com

28

http://www.isi.edu/scan/mercator/mercator.html

Page 29: The Science of Complex Networks and the Internet: Lies, Damned Lies, and Statistics Walter Willinger AT&T Labs-Research walter@research.att.com

29

http://www.caida.org/tools/measurement/skitter/

Page 30: The Science of Complex Networks and the Internet: Lies, Damned Lies, and Statistics Walter Willinger AT&T Labs-Research walter@research.att.com

30http://www.cs.washington.edu/research/networking/rocketfuel/bb

Page 31: The Science of Complex Networks and the Internet: Lies, Damned Lies, and Statistics Walter Willinger AT&T Labs-Research walter@research.att.com

31http://www.cs.washington.edu/research/networking/rocketfuel/

Page 32: The Science of Complex Networks and the Internet: Lies, Damned Lies, and Statistics Walter Willinger AT&T Labs-Research walter@research.att.com

32

What does “Network Science” say about the Internet (cont.)

• Inference– Given: traceroute-based map (graph) of the

router-level Internet (Internet service provider)

– Wanted: Metric/statistics that characterizes the inferred connectivity maps

– Main metric: Node degree distribution

Page 33: The Science of Complex Networks and the Internet: Lies, Damned Lies, and Statistics Walter Willinger AT&T Labs-Research walter@research.att.com

33

http://www.isi.edu/scan/mercator/mercator.html

Page 34: The Science of Complex Networks and the Internet: Lies, Damned Lies, and Statistics Walter Willinger AT&T Labs-Research walter@research.att.com

34

What does “Network Science” say about the Internet (cont.)

• Inference– Given: traceroute-based map (graph) of the

router-level Internet (Internet service provider)

– Wanted: Metric/statistics that characterizes the inferred connectivity maps

– Main metric: Node degree distribution• Surprising finding

– Inferred node degree distributions follow a power law

– A few nodes have a huge degree, while the majority of nodes have a small degree

Page 35: The Science of Complex Networks and the Internet: Lies, Damned Lies, and Statistics Walter Willinger AT&T Labs-Research walter@research.att.com

35

Power Laws and Internet Topology

Source: Faloutsos et al (1999)

Most nodes have few connections

A few nodes have lots of connections

Page 36: The Science of Complex Networks and the Internet: Lies, Damned Lies, and Statistics Walter Willinger AT&T Labs-Research walter@research.att.com

36

What does “Network Science” say about the Internet (cont.)

• Inference– Given: traceroute-based map (graph) of the router-

level Internet (Internet service provider)– Wanted: Metric/statistics that characterizes the

inferred connectivity maps– Main metric: Node degree distribution

• Surprising finding– Inferred node degree distributions follow a power

law– A few nodes have a huge degree, while the

majority of nodes have a small degree• Motivation for developing new network/graph models

– Dominant graph models: Erdos-Renyi random graphs

– But: Node degrees of Erdos-Renyi random graph models follow a Poisson distribution

Page 37: The Science of Complex Networks and the Internet: Lies, Damned Lies, and Statistics Walter Willinger AT&T Labs-Research walter@research.att.com

37

What does “Network Science” say about the Internet (cont.)

• New class of network models– Preferential attachment (PA) growth model

• Incremental growth: New nodes/links are added one at a time

• Preferential attachment: a new node is more likely to connect to an already highly connected node (p(k) degree of node k)

– Captures popular notion of “the rich get richer”– There exist many variants of this basic PA

model– Generally referred to as “scale-free” network

models• Key features of PA-type network models

– Randomness enters via attachment mechanism– Exhibit power law node degree distributions

Page 38: The Science of Complex Networks and the Internet: Lies, Damned Lies, and Statistics Walter Willinger AT&T Labs-Research walter@research.att.com

38

PA-type Networks

Page 39: The Science of Complex Networks and the Internet: Lies, Damned Lies, and Statistics Walter Willinger AT&T Labs-Research walter@research.att.com

39

What does “Network Science” say about the Internet (cont.)

• Model validation– The models “fit the data” because they

reproduce the observed node degree distributions

– The models are simple and parsimonious

• PA-type models have resulted in highly publicized claims about the Internet and its properties– High-degree nodes form a hub-like core– Fragile/vulnerable to targeted node removal– Achilles’ heel– Zero epidemic threshold

Page 40: The Science of Complex Networks and the Internet: Lies, Damned Lies, and Statistics Walter Willinger AT&T Labs-Research walter@research.att.com

40

Cover Story: Nature 406, 2000.

Page 41: The Science of Complex Networks and the Internet: Lies, Damned Lies, and Statistics Walter Willinger AT&T Labs-Research walter@research.att.com

41

Beyond the Internet …

• Social networks• Information networks• Biological networks• Technological networks

– U.S. electrical power grid (data source: FEMA)

Page 42: The Science of Complex Networks and the Internet: Lies, Damned Lies, and Statistics Walter Willinger AT&T Labs-Research walter@research.att.com

42

Page 43: The Science of Complex Networks and the Internet: Lies, Damned Lies, and Statistics Walter Willinger AT&T Labs-Research walter@research.att.com

43

U.S. Electrical Power Grid

Page 44: The Science of Complex Networks and the Internet: Lies, Damned Lies, and Statistics Walter Willinger AT&T Labs-Research walter@research.att.com

44

Beyond the Internet …

• Social networks• Information networks• Biological networks• Technological networks

– U.S. electrical power grid (data source: FEMA)– Western U.S. power grid: 4921 nodes, 6594

links– J.-W. Wang and L.-L. Rong, “Cascade-based attack vulnerability

on the US power grid,” Safety Science 47, 2009

– NYT article, April 18, 2010: “Academic paper in China sets off alarms in U.S.”

• Interdependent networks (e.g., Internet and power grid)– S.V. Buldyrev, R. Parshani, G. Paul, H.E. Stanley and S. Havlin,

“Catastrophic cascade of failures in interdependent networks”, Nature 464 (April 2010)

Page 45: The Science of Complex Networks and the Internet: Lies, Damned Lies, and Statistics Walter Willinger AT&T Labs-Research walter@research.att.com

45

On the Impact of “Network Science” …

• On the scientific community as a whole– General excitement (huge number of papers)– The Internet story has been repeated in the

context of biological networks, social networks, etc.

– Renewed hope that large-scale complex networks across the domains (e.g., engineering, biology, social sciences) exhibit common features (universal properties).

Page 46: The Science of Complex Networks and the Internet: Lies, Damned Lies, and Statistics Walter Willinger AT&T Labs-Research walter@research.att.com

46

On the Impact of “Network Science” …

NYT 4/18/2010

Page 47: The Science of Complex Networks and the Internet: Lies, Damned Lies, and Statistics Walter Willinger AT&T Labs-Research walter@research.att.com

47

On the Impact of “Network Science” …

• On the scientific community as a whole– General excitement (huge number of papers)– The Internet story has been repeated in the

context of biological networks, social networks, etc.

– Renewed hope that large-scale complex networks across the domains (e.g., engineering, biology, social sciences) exhibit common features (universal properties).

• On domain experts (e.g., Internet researchers, biologists)– General disbelief– We “know” the claims are not true …– Back to basics ….

Page 48: The Science of Complex Networks and the Internet: Lies, Damned Lies, and Statistics Walter Willinger AT&T Labs-Research walter@research.att.com

48

Basic Question

Do the available Internet-related connectivitymeasurements and their analysis support the sort of claims that can be found in the existing complex networks literature?

Key Issues

•What about data hygiene?•What about statistical rigor?•What about model validation?

Page 49: The Science of Complex Networks and the Internet: Lies, Damned Lies, and Statistics Walter Willinger AT&T Labs-Research walter@research.att.com

On Data Hygiene

Page 50: The Science of Complex Networks and the Internet: Lies, Damned Lies, and Statistics Walter Willinger AT&T Labs-Research walter@research.att.com

50

On Measuring Internet Connectivity

• No central agency/repository • Economic incentive for ISPs to obscure

network structure• Direct inspection is typically not possible• Based on measurement experiments, hacks• Mismatch between what we want to measure

and can measure• Specific examples covered in this talk

– Physical connectivity (routers, switched, links)

Page 51: The Science of Complex Networks and the Internet: Lies, Damned Lies, and Statistics Walter Willinger AT&T Labs-Research walter@research.att.com

51

Measurements: traceroute tool

• traceroute www.duke.edu• traceroute to www.duke.edu (152.3.189.3), 30 hops max, 60 byte packets

• 1 fp-core.research.att.com (135.207.16.1) 2 ms 1 ms 1 ms• 2 ngx19.research.att.com (135.207.1.19) 1 ms 0 ms 0 ms• 3 12.106.32.1 1 ms 1 ms 1 ms• 4 12.119.12.73 2 ms 2 ms 2 ms• 5 tbr1.n54ny.ip.att.net (12.123.219.129) 4 ms 5 ms 3 ms• 6 ggr7.n54ny.ip.att.net (12.122.88.21) 3 ms 3 ms 3 ms•7 192.205.35.98 4 ms 4 ms 8 ms• 8 jfk-core-02.inet.qwest.net (205.171.30.5) 3 ms 3 ms 4 ms• 9 dca-core-01.inet.qwest.net (67.14.6.201) 11 ms 11 ms 11 ms•10 dca-edge-04.inet.qwest.net (205.171.9.98) 11 ms 15 ms 11 ms•11 gw-dc-mcnc.ncren.net (63.148.128.122) 18 ms 18 ms 18 ms•12 rlgh7600-gw-to-rlgh1-gw.ncren.net (128.109.70.38) 18 ms 18 ms 18 ms•13 roti-gw-to-rlgh7600-gw.ncren.net (128.109.70.18) 20 ms 20 ms 20 ms•14 art1sp-tel1sp.netcom.duke.edu (152.3.219.118) 23 ms 20 ms 20 ms•15 webhost-lb-01.oit.duke.edu (152.3.189.3) 21 ms 38 ms 20 ms

Page 52: The Science of Complex Networks and the Internet: Lies, Damned Lies, and Statistics Walter Willinger AT&T Labs-Research walter@research.att.com

52

Traceroute measurements revisited (1)

• traceroute is strictly about IP-level connectivity– Originally developed by Van Jacobson (1988)– Designed to trace out the route to a host

• Using traceroute to map the router-level topology– Engineering hack– Example of what we can measure, not what we

want to measure!• Basic problem #1: IP alias resolution problem

– How to map interface IP addresses to IP routers– Largely ignored or badly dealt with in the past– New efforts in 2008 for better heuristics …

Page 53: The Science of Complex Networks and the Internet: Lies, Damned Lies, and Statistics Walter Willinger AT&T Labs-Research walter@research.att.com

53

Interfaces 1 and 2 belong to the same router

Page 54: The Science of Complex Networks and the Internet: Lies, Damned Lies, and Statistics Walter Willinger AT&T Labs-Research walter@research.att.com

Example: Abilene Network

Page 55: The Science of Complex Networks and the Internet: Lies, Damned Lies, and Statistics Walter Willinger AT&T Labs-Research walter@research.att.com

55

IP Alias Resolution Problem for Abilene (thanks to Adam Bender)

Page 56: The Science of Complex Networks and the Internet: Lies, Damned Lies, and Statistics Walter Willinger AT&T Labs-Research walter@research.att.com

58

Traceroute measurements revisited (2)

• traceroute is strictly about IP-level connectivity

• Basic problem #2: Layer-2 technologies (e.g., MPLS, ATM)– MPLS is an example of a circuit technology that

hides the network’s physical infrastructure from IP– Sending traceroutes through an opaque Layer-2

cloud results in the “discovery” of high-degree nodes, which are simply an artifact of an imperfect measurement technique.

– This problem has been largely ignored in all large-scale traceroute experiments to date.

Page 57: The Science of Complex Networks and the Internet: Lies, Damned Lies, and Statistics Walter Willinger AT&T Labs-Research walter@research.att.com

59

(a) (b)

Page 58: The Science of Complex Networks and the Internet: Lies, Damned Lies, and Statistics Walter Willinger AT&T Labs-Research walter@research.att.com

60

Page 59: The Science of Complex Networks and the Internet: Lies, Damned Lies, and Statistics Walter Willinger AT&T Labs-Research walter@research.att.com

61

Traceroute measurements revisited (3)

• The irony of traceroute measurements– The high-degree nodes in the middle of the network

that traceroute reveals are not for real …– If there are high-degree nodes in the network, they

can only exist at the edge of the network where they will never be revealed by generic traceroute-based experiments …

• Additional sources of errors– Bias in (mathematical abstraction of) traceroute– Has been a major focus within CS/Networking

literature– Non-issue in the presence of above-mentioned

problems

Page 60: The Science of Complex Networks and the Internet: Lies, Damned Lies, and Statistics Walter Willinger AT&T Labs-Research walter@research.att.com

62

Traceroute measurements revisited (4)

• Bottom line– (Current) traceroute measurements are of little use

for inferring router-level connectivity– It is unlikely that future traceroute measurements will

be more useful for the purpose of router-level inference

• Lessons learned– Key question: Can you trust the available data?– Critical role of Data Hygiene in the Petabyte Age– Corollary: Petabytes of garbage = garbage– Data hygiene is often viewed as “dirty/unglamorous”

work

Page 61: The Science of Complex Networks and the Internet: Lies, Damned Lies, and Statistics Walter Willinger AT&T Labs-Research walter@research.att.com

On Model Validation

Page 62: The Science of Complex Networks and the Internet: Lies, Damned Lies, and Statistics Walter Willinger AT&T Labs-Research walter@research.att.com

64

Taking Model validation more serious …

• Criticism of conventional model validation– For one and the same observed phenomenon,

there are usually many different explanations/models

– The ability to reproduce a few graph statistics does not constitute “serious” model validation

– Model validation should be more than “data fitting”

• What constitutes “more serious” model validation?– There is more to networks than connectivity …– When “nodes” and “links” have specific

meaning …– What do real networks look like?

Page 63: The Science of Complex Networks and the Internet: Lies, Damned Lies, and Statistics Walter Willinger AT&T Labs-Research walter@research.att.com

65

Cisco 12000 Series Routers

Chassis Rack size SlotsSwitching Capacity

12416 Full 16 320 Gbps

12410 1/2 10 200 Gbps

12406 1/4 6 120 Gbps

12404 1/8 4 80 Gbps

• Modular in design, creating flexibility in configuration.• Router capacity is constrained by the number and speed

of line cards inserted in each slot.

Source: www.cisco.com

Page 64: The Science of Complex Networks and the Internet: Lies, Damned Lies, and Statistics Walter Willinger AT&T Labs-Research walter@research.att.com

66

100

101

102Degree

10-1

100

101

102

103

Ban

dwid

th (

Gbp

s)

15 x 1-port 10 GE

15 x 3-port 1 GE

15 x 4-port OC12

15 x 8-port FE

Technology constraint

Total Bandwidth

Router Technology ConstraintCisco 12416 GSR, circa 2002

high BW low degree high

degree low BW

Page 65: The Science of Complex Networks and the Internet: Lies, Damned Lies, and Statistics Walter Willinger AT&T Labs-Research walter@research.att.com

67

SOX

U. Florida

U. So. Florida

Miss StateGigaPoP

WiscREN

SURFNet

MANLAN

NorthernCrossroads

Mid-AtlanticCrossroads

Drexel U.

NCNI/MCNC

MAGPI

UMD NGIX

Seattle

Sunnyvale

Los Angeles

Houston

Denver

KansasCity

Indian-apolis

Atlanta

Wash D.C.

Chicago

New York

OARNET

Northern Lights Indiana GigaPoP

MeritU. LouisvilleNYSERNet

U. Memphis

Great Plains

OneNet

U. Arizona

Qwest Labs

CHECS-NETOregon

GigaPoP

Front RangeGigaPoP

Texas Tech

Tulane U.

TexasGigaPoP

LaNetUT Austin

CENIC

UniNet

NISN

PacificNorthwestGigaPoP

U. Hawaii

PacificWave

TransPAC/APAN

Iowa St.

Florida A&MUT-SWMed Ctr.

SINetWPI

Star-Light

IntermountainGigaPoP

Abilene BackbonePhysical Connectivity

(as of August 2004)

0.1-0.5 Gbps0.5-1.0 Gbps1.0-5.0 Gbps5.0-10.0 Gbps

DREN

Jackson St.

NREN

USGS

U. So. Miss.

PSC

DARPABossNet

SFGP/AMPATH

Arizona St.

ESnet

GEANT

North TexasGigaPoP

Page 66: The Science of Complex Networks and the Internet: Lies, Damned Lies, and Statistics Walter Willinger AT&T Labs-Research walter@research.att.com

68

Cisco 750X

Cisco 12008

Cisco 12410

dc1

dc2

dc3

hpr

dc1

dc3

hpr

dc2

dc1

dc1 dc2

hpr

hpr

SACOAK

SVL

LAX

SDG

SLOdc1

FRGdc1

FREdc1

BAKdc1

TUSdc1

SOLdc1

CORdc1

hprdc1

dc2

dc3

hpr

OC-3 (155 Mb/s)OC-12 (622 Mb/s)GE (1 Gb/s)OC-48 (2.5 Gb/s)10GE (10 Gb/s)

CENIC Backbone (as of January 2004)

AbileneLos Angeles

AbileneSunnyvale

The Corporation for Education Network Initiatives in California (CENIC) acts as ISP for the state's colleges and universitieshttp://www.cenic.orgLike Abilene, its backbone is a sparsely-connected mesh, with relatively low connectivity and minimal redundancy.• no high-degree hubs?• no Achilles’ heel?

Page 67: The Science of Complex Networks and the Internet: Lies, Damned Lies, and Statistics Walter Willinger AT&T Labs-Research walter@research.att.com

69

Page 68: The Science of Complex Networks and the Internet: Lies, Damned Lies, and Statistics Walter Willinger AT&T Labs-Research walter@research.att.com

70

Back to the Basic Question:

Do the available Internet-related connectivity measurements

and their analysis support the sort of claims that can be

found in the existing complex networks literature?

Short Answer: No!

Longer Answer:• Real-world router-level topologies look nothing like PA-type networks• The results derived from PA-type models of the Internet are not “controversial” – they are simply wrong!

• “The tragedy of science – the slaying of a beautiful hypothesis by an ugly fact.” (T. Huxley)

Page 69: The Science of Complex Networks and the Internet: Lies, Damned Lies, and Statistics Walter Willinger AT&T Labs-Research walter@research.att.com

71

What Went Wrong?

• No critical assessment of available data

• Ignore all networking-related “details”– Randomness enters via generic attachment

mechanism– Overarching desire to reproduce power law node

degree distributions

• Low model validation standards– Reproducing observed node degree distribution

Page 70: The Science of Complex Networks and the Internet: Lies, Damned Lies, and Statistics Walter Willinger AT&T Labs-Research walter@research.att.com

72

How to avoid such Fallacies?

• Know your data!

• Take model validation more serious!

• Apply an engineering perspective to engineered systems!

Page 71: The Science of Complex Networks and the Internet: Lies, Damned Lies, and Statistics Walter Willinger AT&T Labs-Research walter@research.att.com

73

Internet Modeling: An Engineering Perspective

• Surely, the way an ISP designs its physical infrastructure is not the result of a series of coin tosses …– ISPs design their router-level topology for a purpose,

namely to carry an expected traffic demand– Randomness enters in terms of uncertainty in traffic

demands– ISPs are constrained in what they can afford to build,

operate, and maintain (technology, economics).

• Decisions of ISPs are driven by objectives (performance) and reflect tradeoffs between what is feasible and what is desirable (heuristic optimization)– Constrained optimization as modeling language– Power law node degrees are a non-issue!

Page 72: The Science of Complex Networks and the Internet: Lies, Damned Lies, and Statistics Walter Willinger AT&T Labs-Research walter@research.att.com

74

Heuristically Optimized Topologies (HOT)

Given realistic technology constraints on routers, how well is the network able to carry traffic?

Step 1: Constrain to be feasible

Abstracted Technologically Feasible Region

1

10

100

1000

10000

100000

1000000

10 100 1000

degree

Ban

dw

idth

(M

bp

s)

kBxts

BBx

ijrkjikij

ji jijiij

,..

maxmax

:,

, ,

Step 3: Compute max flow

Bi

Bj

xij

Step 2: pick traffic demand model

jiij BBx

Page 73: The Science of Complex Networks and the Internet: Lies, Damned Lies, and Statistics Walter Willinger AT&T Labs-Research walter@research.att.com

75

HOT Design Principles

Hosts

Edges

CoresMesh-like core of fast,

low degree routers

High degree nodes are at the edges.

Page 74: The Science of Complex Networks and the Internet: Lies, Damned Lies, and Statistics Walter Willinger AT&T Labs-Research walter@research.att.com

76node degree

101

101

102

100

node r

ank

Preferential AttachmentHOT model

Page 75: The Science of Complex Networks and the Internet: Lies, Damned Lies, and Statistics Walter Willinger AT&T Labs-Research walter@research.att.com

77

HOT- vs. PA-type Network Models

Attack hubsHijack networkFragility

FragileRobustAttack Tolerance

Low throughputHigh throughputPerformance

RandomDesignedGeneration

Power lawHighly VariableDegree distribution

CoreEdgeHigh degree nodes

Slow, high degreeFast, low degreeCore nodes

PA-type modelsHOT-type/ Internet

Features

Page 76: The Science of Complex Networks and the Internet: Lies, Damned Lies, and Statistics Walter Willinger AT&T Labs-Research walter@research.att.com

78

Implications of this Engineering Perspective

• Important lessons learned– Know your data! – they typically reflect what we can

measure rather than what we would like to measure– Avoid the allure of PA-type network models! – there exist

more relevant, interesting, and rewarding network models that await discovery

– Details do matter! – layers, protocols, feedback control, etc.

• Network resilience – more than “knocking out” nodes/links

– NYC 9/11/2001, Baltimore tunnel fire (July 2001)– Eastern US/Canada blackout (August 2003)– Taiwan earthquake (December 2006)– Hijack BGP (“blackholing”, YouTube and Pakistan ISP,

2008)

Page 77: The Science of Complex Networks and the Internet: Lies, Damned Lies, and Statistics Walter Willinger AT&T Labs-Research walter@research.att.com

80

And always keep in mind …

“When exactitude is elusive, it is better to be approximately right than certifiably wrong.”

(B.B. Mandelbrot)

Page 78: The Science of Complex Networks and the Internet: Lies, Damned Lies, and Statistics Walter Willinger AT&T Labs-Research walter@research.att.com

81

SOME RELATED REFERENCES• L. Li, D. Alderson, W. Willinger, and J. Doyle, A first-principles approach

to understanding the Internet’s router-level topology, Proc. ACM SIGCOMM 2004.

• J.C. Doyle, D. Alderson, L. Li, S. Low, M. Roughan, S. Shalunov, R. Tanaka, and W. Willinger. The "robust yet fragile" nature of the Internet. PNAS 102(41), 2005.

• D. Alderson, L. Li, W. Willinger, J.C. Doyle. Understanding Internet Topology: Principles, Models, and Validation. ACM/IEEE Trans. on Networking 13(6), 2005.

• R. Oliveira, D. Pei, W. Willinger, B. Zhang, L. Zhang. In Search of the elusive Ground Truth: The Internet's AS-level Connectivity Structure.Proc. ACM SIGMETRICS 2008.

• B. Krishnamurthy and W. Willinger. What are our standards for validation of measurement-based networking research? Proc. ACM HotMetrics Workshop 2008.

• W. Willinger, D. Alderson, and J.C. Doyle. Mathematics and the Internet: A Source of Enormous Confusion and Great Potential. Notices of the AMS, Vol. 56, No. 2, 2009. Reprinted in: The Best Writing on Mathematics, Princeton University Press, 2010.

• M. Roughan, W. Willinger, O. Maennel, D. Perouli, and R. Bush. 10 Lessons from 10 years of measuring and modeling the Internet’s Autonomous Systems. IEEE JSAC Special Issue on “Measurement of Internet topologies,” 2011.