why we still don’t know how to simulate networks

Why We STILL Don’t Know How To Simulate Networks

Mostafa H. Ammar

College of ComputingGeorgia Institute of Technology

Atlanta, GA

Disclaimer

My Personal Perspective: Networking Researcher and not

Simulationist. Have written and used discrete event

computer simulations for over 25 years Involved in COMPASS project at GT for

the last 7 years

The Main Message

The use of simulation has been growing in the networking community

Current shifts in networking research landscape have increased the importance of simulation as tool for evaluation

There is a crisis of credibility causing people to question the validity of simulations

Why and How to Fix it?

Evaluating Networks: A Spectrum

A spectrum of approaches

Mathematical Analysis Computer Simulation Computer Emulation Prototype Testbed Real network testing/deployment

IncreasedCost/Overhead

DecreasedRealism/Accuracy

A Brief History of Network Simulation

In the beginning: A combination of Mathematical Analysis Small-scale prototypes Simulation

However, simulation was primitive and accessible only to people that had computers and knew how to program them.

Early Examples of Network Simulation

Kleinrock’s thesis (1962) used simulation to validate his Independence assumption.

“I invented effective dynamic routing procedures and also established the analytic model by which you could calculate delay . . . and to simulate it I had to make some fundamental assumptions-I simulated the hell out of it to show that the assumptions worked. “ LK http://www.computer.org/internet/v1n3/kleinrock9702.htm


Paul Baran: On Distributed Communications:II. Digital Simulation of Hot-Potato Routing in a Broadband Distributed Communications Network http://www.rand.org/publications/RM/RM3103II. The Simulated Network

Description

The size of the network simulated was limited by the amount of storage available in the IBM 7090 computer using FORTRAN. A heavy storage requirement was dictated by the need for each simulated node or station to maintain a table of recorded handover numbers--the tag appended to each message indicating the number of times that message has been relayed. For each node, a table containing handover numbers to every other node via every one of up to a maximum of eight links is needed.

The Rise of Network Simulation

As computing became more accessible more and more people started doing simulations

Papers using simulation INFOCOM 85: 10% , 92-98: ~ 60% SIGCOMM 89 : 4/29, 98: 13/26, 04:

11/30

The Main Message





Networking Research Landscape

Early efforts dealt with relatively simple phenomenon on small-scale networks.

Current research deals with complex phenomenon on large-scale networks

A long story …

Network Research Landscape

Systems are Less tractable mathematically Difficult to prototype And yet everyone has access to

abundant computing => Simulation more viable and

often the only evaluation tool available

The Main Message





Crisis of Credibility

“Some claim that stochastic simulation as a performance evaluation tool of various dynamic systems, including telecommunication networks, is misused, and that the spread of this phenomenon is so wide that one can speak about a deep credibility crisis. It is even claimed that one cannot rely on the majority of the published results of performance evaluation studies of dynamic systems based on stochastic simulation.”

From: Pawlikowski, K., Jeong, H.-D. J., Lee, J.-S. R.: On Credibility of Simulation Studie of Telecommunication Networks. IEEE Comms., Jan. 2002, 132-139.


“ I favor a stamp : WARNING: COMPUTER SIMULATION – MAY BE ERRONEOUS and UNVERIFIABLE. Like on Cigarettes.”

Michael Crichton in “State of Fear”


From: Cavin, Sasson and Schiper – On the accuracy of MANET Simulators

OpnetNs-2

Glomosim


A Typical Paper Review “This paper should be rejected because

its evaluation section is weak. The simulation (uses questionable models) and/or (simulates too small a network) and/or (does not have a valid statistical analysis of the simulation output) and/or … (your own critique here).”

The Main Message





Reasons for the Credibility Crisis

Confusion regarding the role of simulation

Impossibility of simulating Internet-scale networks

Difficulty in building realistic modelsLack of standards for validation and

repeatability

The Roles of Simulation

To validate approximate analysisTo get/confirm first-order insights

into new techniquesTo understand complex interactions

among various entities/proceduresTo perform relative evaluation

among alternativesTo answer questions regarding

deployability in a real network

The Roles of Simulation

Different tools may be needed for different roles

The burden on accuracy, repeatability and validity is highly dependent on the role

It is not always (rarely?) stated up front

A Personal Experience

Parts and Holes in a Manufacturing Transfer Line

A Significant Failure

Simulation has not been able to answer wide-scale deployability questions Multicast QoS RED …

Perhaps it’s a matter of simulation scale





repeatability

Large-Scale Network Simulation

Large-scale network simulation offers Verify validity of simulation results on

small networks Examine issues of scale Validate theoretical models for large

networksBut it has been quite challenging to

build large-scale simulationsFujimoto, Perumalla, Park, Wu, Ammar, Riley, "Large Scale Simulation: How Big? How Fast?," Proceedings of the 11th IEEE/ACM International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems (MASCOTS), October 2003.

Quantifying Simulator Performance

Execution time: T ≈ (NF * PF * HF) / PTS NF = number of flows

PF = packets sent per flow

HF = average hops per flow

PTS = simulator speed (simulated packets transmissions / sec) Ignores lost packets, protocol generated packets (e.g., acks)

Example 500,000 active UDP flows, 1.0 Mbps per flow, average of 8

hops to reach the destination Assume 1KByte packets (125 packets per sec per flow) Workload: simulate 500 Million packet transmissions per

second of network operation

Number ofpacket transmissions (hops)to be simulated

Scalability of Packet Level Simulators

Network Size (hosts, routers, etc.)

Sim

ulat

or S

peed

- P

TS

(tra

ffic

tha

t ca

n b

e s

imu

late

d in

re

al t

ime)

1 102 104 106 108

102

104

106

108

1010

SequentialSimulation

Time ParallelSimulation

Space-parallelSimulation

(parallel discreteevent simulation)

Our focus

Approaches to Parallel Network Simulation

Build “from scratch” approach:

Substantial effort to build & validate new models

Users must learn a new simulator

SSFNet, Qualnet, Javasim

Large-scaleparallel network

simulatorBackplane/RTI

NS NS NS NS

Federated simulation approach: Simulators integrated via a

software backplane/RTI Exploit existing software &

validated model & user base Heterogenous simulations PDNS

Hardware Platforms

Sequential: Sun / Solaris Ultra-80, UltraSPARC-II 450MHz 4GB memory

Parallel: Intel / RedHat Linux 7.3 8-way Pentium-III XEON (2MB L2 cache) SMP 550MHz clock speed 4GB memory 17 SMPs (136 CPUs) connectd via Gigabit Ethernet

Performance measurements are conservative (due to hardware performance)

Sequential Performance Comparison (Single Campus Network – ~ 500 nodes and links)

COTS(Sun/Solaris)

ns-2**(Sun/Solaris)

GTNetS (Sun/Solaris)

ns-2**(Intel/Linux)

Events 30,700,649 9,107,023 9,143,553 9,117,070

Packet Transmissions*

4,658,390 4,546,074 4,571,264 4,551,084

Events/Packet Transmission

6.59 2.00 2.00 2.00

Run Time (sec) 1,677 104 112.3 48

Packet Trans. / Sec. (PTS)

2,778 43,712 40,706 94,814

* A packet transmission involves simulating a packet transmission over a single link** Includes NixVectors optimization

Average end-to-end delay differed by less than 3%

PDNS Performance on Cluster(Perumalla/Park)

0

500,000

1,000,000

1,500,000

2,000,000

2,500,000

8 16 24 32 40 48 56 64 72 80 88 96 104 112 120

Processors

Pac

ket

Tran

smis

sio

ns

per

sec

on

d

Each processor simulates ~5000 nodes and links Up to 120 processors simulating 645,600 nodes

PTS

Lemieux Supercomputer

Pittsburgh Supercomputing Centerhttp://www.psc.edu/machines/tcs/lemieux.html

•750 HP-Alpha ES45 servers

•4Gbytes memory per server

•4 CPUs per server

•1GHz CPU

•3000 CPUs total

•64-bit computing

•Quadrics interconnect

PDNS Performance on PSC(Perumalla)

02040

6080

100120140

0 256 512 768 1024 1280 1536Processors

Mill

ion

Pkt

Tra

ns/

sec Ideal/Linear

PDNS Performance

147K PTS on one CPU Campus network topology, FTP traffic (500 packets/flow, TCP) Scale problem size & number CPUs (up to ~4 million network nodes) Performance up to 106 Million PTS

But… Can we build an Internet-scale Simulation?

A “back-of-the-envelope” calculation 100 million Internet hosts 1 router for every 100 and each router has 4 links 50% of end-hosts have 56Kbps access and 50% have

10Mbps access Router to router links are as follows: 50% @ 10Mbps,

40% @ 100Mbps, 5% @ 655Mbps and 5% @ 2.4Gbps Utilization is 50% for access links and 10% for network

links 1% of hosts have active connections Average packet size = 5000 bits

George Riley, Mostafa Ammar, "Simulating Large Networks: How Big is Big Enough?" Proceedings of First International Conference on Grand

Challenges for Modeling and Simulation, January 2002.

Back of the Envelope Calculation (cont’d)

2.9 x 10^11 events per second Assume can process 10^6 events per second (~

500,000 PTS) => 290,000 CPU seconds (4 days) for evey

second of Internet time !!!! => need 300 Terabytes of memory in ns – not

including routing table space!!! => need 14 Terabytes for event logging for each

second of simulation time!!! Requires 1000 parallel CPUs with 300 GB of main

memory and 1.4 TB of disk storage in each!!! Would not speed things up much – simply allows

simulation to run

Wait a few years and computing power will catch up

Possibly … but the network itself is also growing.

Even with Moore’s Law increase in processing power we will need 300x10^6 CPU seconds for every wallclock second (assuming typical Internet growth).

Open Question: What is the right simulation size to explore Internet-scale performance issues?

Many Challenges Remain

Tools & Parallel Simulation Issues Robust performance Making parallel simulation more transparent,

“automatic” (BenchMap and AutoPart) Access to HPC platforms Visualization Tools

Modeling issues [Floyd/Paxson] Building credible large-scale models and

scenarios Verifying and validating large-scale simulations

Topology? Traffic? Methodologies and tools to effectively utilize the

simulators





repeatability

Building Realistic Models

The Simulation Modeler’s Dilemma: One needs to eliminate “unimportant” details

in the simulation in order to speed up simulation (avoid kitchen-sink simulations)

But how can one tell if a detail is unimportant Simulate and see if there is any difference –

this is considered wasted effort – Perhaps we should encourage these kinds of

results!

Incorporating Packet-Level Details in P2P Simulations

access bandwidth affects throughput significantly

Models which do not capture packet-level details do not reveal the difference

He, Ammar, Riley, Raj, Fujimoto, "Mapping Peer Behavior to Packet-Level Details: A Framework for Packet-Level Simulation of Peer-to-Peer Systems," Proceedings of the MASCOTS 2003.

Building Realistic Models

A significant challenge especially for large-scale simulation

Significant attention to topology modeling but very little understanding of other important issues Workload Modeling Cross-layer interactions (particularly for

wireless networks) Modeling of operations and overheads

Cross-layer modeling

A perfect instance of the Modeler’s Dilemma

Split-stack composition may be helpful

Xu, Riley, Ammar, Fujimoto, ``Split Protocol Stack Network Simulations using the Dynamic Simulation Backplane'' Proceedings of the Ninth International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems, (MASCOTS'01), August 2001

Simulation Split Vertically

Each simulator simulates a portion of the protocol stack of the entire network

Simulator 1

Simulator 2

A

B

C

D

E

F

A B C D E F

Splitting Protocol Stack

Protocol stack split between TCP and IP

ns2

Glomosim

Workload Modeling

See our work presented in this conference about generating TCP workloads to match observed network utilization.

Qi He, Constantinos Dovrolis, Mostafa Ammar, "A Methodology for the Optimal Configuration of TCP Traffic in Network Simulation under Link Load Constraints," Proceedings of the 38th Annual Simulation Symposium, San Diego, April 2005.





repeatability

Simulation Validation and Repeatability

The issue: Given that the simulation model is

correct, how can one trust the results from the simulation

Two types of problems Technical Social

Technical Issues

Code Trustworthiness Open Source and Reusable Code is a big

imporvement Good Experimental Design Random Number Generation Correct Statistical Inference

Social Issues

Publication of enough details to allow repeatability – possibly even code

Allowance for Scholarly Credit for repeating experiments

Final Thoughts

Be open within the community about this issue

Provide acceptable guidelines for reporting simulation results – A Checklist Enough details for repeatability

Stronger enforcement of guidelinesChange reviewing process (perhaps only

for journals)Give Scholarly credit for repeating other

experiments

Network Topologies: CampusNet(Dartmouth)

10 campus networks connected in ring

Single Campus Network 538 nodes 543 links

why we still don’t know how to simulate networks

Documents

importance of simulation

used simulation

networking researcher

validity of simulations

digital simulation of

simulated node

crisis of credibility

simulations papers