fabric meeting, poznan poland, 25 sep 2006, r. hughes-jones manchester 1 broadband protocols wp...

33
FABRIC Meeting, Poznan Poland, 25 Sep 2006, R. Hughes-Jones Manchester 1 Broadband Protocols WP 1.2.1 IP protocols, Lambda switching, multicasting Richard Hughes-Jones The University of Manchester www.hep.man.ac.uk/~rich/ then “Talks”

Upload: jonah-nichols

Post on 13-Jan-2016

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: FABRIC Meeting, Poznan Poland, 25 Sep 2006, R. Hughes-Jones Manchester 1 Broadband Protocols WP 1.2.1 IP protocols, Lambda switching, multicasting Richard

FABRIC Meeting, Poznan Poland, 25 Sep 2006, R. Hughes-Jones Manchester1

Broadband Protocols

WP 1.2.1

IP protocols, Lambda switching, multicasting

Richard Hughes-Jones The University of Manchester

www.hep.man.ac.uk/~rich/ then “Talks”

Page 2: FABRIC Meeting, Poznan Poland, 25 Sep 2006, R. Hughes-Jones Manchester 1 Broadband Protocols WP 1.2.1 IP protocols, Lambda switching, multicasting Richard

FABRIC Meeting, Poznan Poland, 25 Sep 2006, R. Hughes-Jones Manchester2

Protocols Document

Page 3: FABRIC Meeting, Poznan Poland, 25 Sep 2006, R. Hughes-Jones Manchester 1 Broadband Protocols WP 1.2.1 IP protocols, Lambda switching, multicasting Richard

FABRIC Meeting, Poznan Poland, 25 Sep 2006, R. Hughes-Jones Manchester3

Protocols Document 1

“Protocol Investigation for eVLBI Data Transfer” Document JRA-WP1.2.1.001 Jodrell & Manchester folks with hard work from Matt Completed and on the EXPReS WIKI

Introduces e-VLBI and its Networking Requirements Continuously streamed data Individual packets are not particularly valuable. Maintenance of the data rate is important Quite different to those where bit-wise correct transmission is required

e.g. file transfer Forms a valuable use case for GGF GHPN-RG

Presents the actions required in order to make an informed decision and to implement suitable protocols in the European VLBI Network. Strategy document.

Page 4: FABRIC Meeting, Poznan Poland, 25 Sep 2006, R. Hughes-Jones Manchester 1 Broadband Protocols WP 1.2.1 IP protocols, Lambda switching, multicasting Richard

FABRIC Meeting, Poznan Poland, 25 Sep 2006, R. Hughes-Jones Manchester4

Protocols Document 2

Protocols considered for investigation include: TCP/IP UDP/IP DCCP/IP VSI-E RTP/UDP/IP Remote Direct Memory Access TCP Offload Engines

Very useful discussions at Haystack VLBI meeting Agreement to make joint tests Haystack-Jodrell Use of ESLEA 1 Gbit transatlantic link

Work in progress – Links to ESLEA UK e-science Vlbi-udp – Simon: UDP/IP stability & the effect of packet loss on

correlations

Tcpdelay – Stephen: TCP/IP and CBR data

Page 5: FABRIC Meeting, Poznan Poland, 25 Sep 2006, R. Hughes-Jones Manchester 1 Broadband Protocols WP 1.2.1 IP protocols, Lambda switching, multicasting Richard

FABRIC Meeting, Poznan Poland, 25 Sep 2006, R. Hughes-Jones Manchester5

tcpdelay

Page 6: FABRIC Meeting, Poznan Poland, 25 Sep 2006, R. Hughes-Jones Manchester 1 Broadband Protocols WP 1.2.1 IP protocols, Lambda switching, multicasting Richard

FABRIC Meeting, Poznan Poland, 25 Sep 2006, R. Hughes-Jones Manchester6

tcpdelay: VLBI Application Protocol

Want to examine how TCP moves Constant Bit Rate Data tcpdelay a test program:

instrumented TCP program emulates sending CBR Data.

Records relative 1-way delay Record TCP Stack activity with web100

n bytes

Number of packets

Wait timetime

Page 7: FABRIC Meeting, Poznan Poland, 25 Sep 2006, R. Hughes-Jones Manchester 1 Broadband Protocols WP 1.2.1 IP protocols, Lambda switching, multicasting Richard

FABRIC Meeting, Poznan Poland, 25 Sep 2006, R. Hughes-Jones Manchester7

VLBI Application Protocol

Data1

●●●

Timestamp1

Time

TCP & Network Receiver

Timestamp2

Sender

Data2Timestamp4

Timestamp5

Data4

Timestamp3

Data3

Packet loss

VLBI data is produced at Constant Bit Rate

Page 8: FABRIC Meeting, Poznan Poland, 25 Sep 2006, R. Hughes-Jones Manchester 1 Broadband Protocols WP 1.2.1 IP protocols, Lambda switching, multicasting Richard

FABRIC Meeting, Poznan Poland, 25 Sep 2006, R. Hughes-Jones Manchester8

Visualising the Results

When packet loss is detected TCP: Reduces Cwnd Halves the sending rate

Expect a delay in the message arrival time

Message number / Time

Packet lossDelay in stream

Expected arrival time at CBR

Arrival time

Stephen Kershaw

Page 9: FABRIC Meeting, Poznan Poland, 25 Sep 2006, R. Hughes-Jones Manchester 1 Broadband Protocols WP 1.2.1 IP protocols, Lambda switching, multicasting Richard

FABRIC Meeting, Poznan Poland, 25 Sep 2006, R. Hughes-Jones Manchester9

Arrival Times: UKLight JB-JIVE-Manc

Message size: 1448 Bytes Wait time: 22 us Data Rate: 525 Mbit/s Route:

JB-UKLight-JIVE-UKLight-Man

RTT ~27 ms

TCP buffer 32M bytes

BDP @512Mbit 1.8Mbyte Estimate catchup possible

if loss < 1 in 1.24M

0 1 2 3 4 5 6 7 8 9 10

x 104

5

10

15

20

25

30

35

40

45

50

Message number

Tim

e /

s

Effect of loss rate on message arrival time

Drop 1 in 5k

Drop 1 in 10k

Drop 1 in 20kDrop 1 in 40k

No loss

Page 10: FABRIC Meeting, Poznan Poland, 25 Sep 2006, R. Hughes-Jones Manchester 1 Broadband Protocols WP 1.2.1 IP protocols, Lambda switching, multicasting Richard

FABRIC Meeting, Poznan Poland, 25 Sep 2006, R. Hughes-Jones Manchester10

TCP Web100: JB-Manc – Large buffer

Message size: 1448 Bytes Wait time: 22 Data Rate: 525 Mbit/s Route:

JB-UKLight-JIVE-UKLight-Man

RTT ~27 ms

Standard TCP TCP buffer 930k Drop 1 in 40,000 packets Classic Cwnd behaviour

Limited by ssthresh ! TCP requires much care!!

0

100000

200000

300000

400000

500000

600000

5000 6000 7000 8000 9000 10000 11000 12000 13000 14000 15000

time ms

Da

ta B

yte

s O

ut

0

100000

200000

300000

400000

500000

600000

Cw

nd

DataBytesOut (Delta)DataBytesIn (Delta)CurCwnd (Value)

0

50

100

150

200

250

300

350

5000 7000 9000 11000 13000 15000time ms

Nu

m.

Du

p A

CK

s

0

0.5

1

1.5

2

2.5

3

5000 7000 9000 11000 13000 15000time ms

pkt r

e-tran

sm

it

Page 11: FABRIC Meeting, Poznan Poland, 25 Sep 2006, R. Hughes-Jones Manchester 1 Broadband Protocols WP 1.2.1 IP protocols, Lambda switching, multicasting Richard

FABRIC Meeting, Poznan Poland, 25 Sep 2006, R. Hughes-Jones Manchester11

iBOB

Page 12: FABRIC Meeting, Poznan Poland, 25 Sep 2006, R. Hughes-Jones Manchester 1 Broadband Protocols WP 1.2.1 IP protocols, Lambda switching, multicasting Richard

FABRIC Meeting, Poznan Poland, 25 Sep 2006, R. Hughes-Jones Manchester12

Prototype iBOB with two sampler boards attached

FPGA based signal processing board from UC Berkeley

Page 13: FABRIC Meeting, Poznan Poland, 25 Sep 2006, R. Hughes-Jones Manchester 1 Broadband Protocols WP 1.2.1 IP protocols, Lambda switching, multicasting Richard

FABRIC Meeting, Poznan Poland, 25 Sep 2006, R. Hughes-Jones Manchester13

Bryan Anderson

Station

Board

RAM

iBOB10GE

VSI

10GE CX4

systembasedDisk

VSI or headstack

CX4 - fibre media converter

iBOB block diagram

10 Gigabit Ethernet now available UDP/IP module exists Use for Demonstration of FPGA driven

IP networking Link to PC NIC – diagnostics Test over GÉANT Onsala - Jodrell

Page 14: FABRIC Meeting, Poznan Poland, 25 Sep 2006, R. Hughes-Jones Manchester 1 Broadband Protocols WP 1.2.1 IP protocols, Lambda switching, multicasting Richard

FABRIC Meeting, Poznan Poland, 25 Sep 2006, R. Hughes-Jones Manchester14

What is inside GÉANT2 What is the collaboration interesting?

10 Gigabit Ethernet UDP memory-2-memory flows TCP flows with allocated Bandwidth

Options using GÉANT Development Network 10 Gbit SDH Network

Options Using the GÉANT LightPath Service PoP Location for Network tests

Multi-Gigabit Trials on GEANT

Collaboration with Dante.

Page 15: FABRIC Meeting, Poznan Poland, 25 Sep 2006, R. Hughes-Jones Manchester 1 Broadband Protocols WP 1.2.1 IP protocols, Lambda switching, multicasting Richard

FABRIC Meeting, Poznan Poland, 25 Sep 2006, R. Hughes-Jones Manchester15

GÉANT2 Topology

Page 16: FABRIC Meeting, Poznan Poland, 25 Sep 2006, R. Hughes-Jones Manchester 1 Broadband Protocols WP 1.2.1 IP protocols, Lambda switching, multicasting Richard

FABRIC Meeting, Poznan Poland, 25 Sep 2006, R. Hughes-Jones Manchester16

GÉANT2: The Convergence Solution

NREN AccessNREN Access

ExistingIP Router

ExistingIP Router

GÉANT2POP B

GÉANT2POP A

Managed Lambda’s

1626 LM

1626 LM

L2Matrix

L2

TDM Matrix

TDM

1678 MCC

1678 MCCDar

k F

iber

EXPReS PC10 GE

EXPReS PC10 GE

Page 17: FABRIC Meeting, Poznan Poland, 25 Sep 2006, R. Hughes-Jones Manchester 1 Broadband Protocols WP 1.2.1 IP protocols, Lambda switching, multicasting Richard

FABRIC Meeting, Poznan Poland, 25 Sep 2006, R. Hughes-Jones Manchester17

From PoS to EthernetConnect. Communicate. Collaborate

•More Economical Architecture

•Highest Overall Network Availability

•Flexibility (VLAN management)

•Highest Network Performance (Latency)

Router

IP Links

1/10 Gigabit Ethernet

VC-4-nv Channels

L2Matrix

TDM Matrix

1678 MCCTransport Node

VLANs

Page 18: FABRIC Meeting, Poznan Poland, 25 Sep 2006, R. Hughes-Jones Manchester 1 Broadband Protocols WP 1.2.1 IP protocols, Lambda switching, multicasting Richard

FABRIC Meeting, Poznan Poland, 25 Sep 2006, R. Hughes-Jones Manchester18

What do we want to do?

Set up 4 Gigabit Lightpath Between GÉANT PoPs Collaboration with Dante PCs in their PoPs with 10 Gigabit NICs

VLBI Tests: UDP Performance

Throughput, jitter, packet loss, 1-way delay, stability Continuous (days) Data Flows – VLBI_UDP and multi-Gigabit TCP performance with current kernels Experience for FPGA Ethernet packet systems

Dante Interests: multi-Gigabit TCP performance The effect of (Alcatel) buffer size on bursty TCP when using BW limited

Lightpaths

Need A Collaboration Agreement

Page 19: FABRIC Meeting, Poznan Poland, 25 Sep 2006, R. Hughes-Jones Manchester 1 Broadband Protocols WP 1.2.1 IP protocols, Lambda switching, multicasting Richard

FABRIC Meeting, Poznan Poland, 25 Sep 2006, R. Hughes-Jones Manchester19

Options Using the GÉANT Development Network

10 Gigabit SDH backbone Alkatel 1678 MCC Node location:

London Amsterdam Paris Prague Frankfurt

Can do traffic routingso make long rtt paths

Available Dec/Jan 07 Less Pressure for

long term tests

Page 20: FABRIC Meeting, Poznan Poland, 25 Sep 2006, R. Hughes-Jones Manchester 1 Broadband Protocols WP 1.2.1 IP protocols, Lambda switching, multicasting Richard

FABRIC Meeting, Poznan Poland, 25 Sep 2006, R. Hughes-Jones Manchester20

Options Using the GÉANT LightPaths Set up 4 Gigabit Lightpath Between GÉANT PoPs

Collaboration with Dante PCs in Dante PoPs

10 Gigabit SDH backbone Alkatel 1678 MCC Node location:

Budapest Geneva Frankfurt Milan Paris Poznan Prague Vienna

Can do traffic routingso make long rtt paths

Ideal: London Copenhagen

Page 21: FABRIC Meeting, Poznan Poland, 25 Sep 2006, R. Hughes-Jones Manchester 1 Broadband Protocols WP 1.2.1 IP protocols, Lambda switching, multicasting Richard

FABRIC Meeting, Poznan Poland, 25 Sep 2006, R. Hughes-Jones Manchester21

4 Gigabit GÉANT LightPath

Example of a 4 Gigabit Lightpath Between GÉANT PoPs PCs in Dante PoPs 26 * VC-4s 4180 Mbit/s

Page 22: FABRIC Meeting, Poznan Poland, 25 Sep 2006, R. Hughes-Jones Manchester 1 Broadband Protocols WP 1.2.1 IP protocols, Lambda switching, multicasting Richard

FABRIC Meeting, Poznan Poland, 25 Sep 2006, R. Hughes-Jones Manchester22

PCs and Current Tests

Page 23: FABRIC Meeting, Poznan Poland, 25 Sep 2006, R. Hughes-Jones Manchester 1 Broadband Protocols WP 1.2.1 IP protocols, Lambda switching, multicasting Richard

FABRIC Meeting, Poznan Poland, 25 Sep 2006, R. Hughes-Jones Manchester23

Test PCs Have Arrived

Boston/Supermicro X7DBE Two Dual Core Intel Xeon Woodcrest 5130

2 GHz Independent 1.33GHz FSBuses

530 MHz FD Memory (serial)

Chipsets: Intel 5000P MCH – PCIe & MemoryESB2 – PCI-X GE etc.

PCI 3 8 lane PCIe buses 3* 133 MHz PCI-X

2 Gigabit Ethernet SATA

Page 24: FABRIC Meeting, Poznan Poland, 25 Sep 2006, R. Hughes-Jones Manchester 1 Broadband Protocols WP 1.2.1 IP protocols, Lambda switching, multicasting Richard

FABRIC Meeting, Poznan Poland, 25 Sep 2006, R. Hughes-Jones Manchester24

Lab Tests 10 Gigabit Ethernet 10 Gigabit Test Lab being set up in Manchester

Cisco 7600 Cross Campus λ <1ms Server quality PCs Neterion NICs Myricom & Chelsio being purchased

B2B performance so far SuperMicro X6DHE-G2 Kernel (2.6.13) & Driver dependent! One iperf TCP data stream 4 Gbit/s Two bi-directional iperf TCP data streams 3.8 & 2.2 Gbit/s

UDP Disappointing

Propose to install Fedora Core5 Kernel 2.6.17 on the new Intel dual-core PCs

Page 25: FABRIC Meeting, Poznan Poland, 25 Sep 2006, R. Hughes-Jones Manchester 1 Broadband Protocols WP 1.2.1 IP protocols, Lambda switching, multicasting Richard

FABRIC Meeting, Poznan Poland, 25 Sep 2006, R. Hughes-Jones Manchester25

Any Questions?

Page 26: FABRIC Meeting, Poznan Poland, 25 Sep 2006, R. Hughes-Jones Manchester 1 Broadband Protocols WP 1.2.1 IP protocols, Lambda switching, multicasting Richard

FABRIC Meeting, Poznan Poland, 25 Sep 2006, R. Hughes-Jones Manchester26

Backup Slides

Page 27: FABRIC Meeting, Poznan Poland, 25 Sep 2006, R. Hughes-Jones Manchester 1 Broadband Protocols WP 1.2.1 IP protocols, Lambda switching, multicasting Richard

FABRIC Meeting, Poznan Poland, 25 Sep 2006, R. Hughes-Jones Manchester27

Research ActivityResearch Activity

Policy Middleware

Network Resource Mgr

Bandwidth on DemandOur Long-Term Vision

EthernetApplicationse.g. GRID

Ethernet1678 MCC1678 MCC

1678 MCC

Applications

e.g. GRID

BandwidthRequest

BandwidthRequest

UNI-CCommand

GMPLS

Page 28: FABRIC Meeting, Poznan Poland, 25 Sep 2006, R. Hughes-Jones Manchester 1 Broadband Protocols WP 1.2.1 IP protocols, Lambda switching, multicasting Richard

FABRIC Meeting, Poznan Poland, 25 Sep 2006, R. Hughes-Jones Manchester28

10 Gigabit Ethernet: UDP Throughput

1500 byte MTU gives ~ 2 Gbit/s Used 16144 byte MTU max user length 16080 DataTAG Supermicro PCs Dual 2.2 GHz Xenon CPU FSB 400 MHz PCI-X mmrbc 512 bytes wire rate throughput of 2.9 Gbit/s

CERN OpenLab HP Itanium PCs Dual 1.0 GHz 64 bit Itanium CPU FSB 400 MHz PCI-X mmrbc 4096 bytes wire rate of 5.7 Gbit/s

SLAC Dell PCs giving a Dual 3.0 GHz Xenon CPU FSB 533 MHz PCI-X mmrbc 4096 bytes wire rate of 5.4 Gbit/s

an-al 10GE Xsum 512kbuf MTU16114 27Oct03

0

1000

2000

3000

4000

5000

6000

0 5 10 15 20 25 30 35 40Spacing between frames us

Rec

v W

ire

rate

Mb

its/

s

16080 bytes 14000 bytes 12000 bytes 10000 bytes 9000 bytes 8000 bytes 7000 bytes 6000 bytes 5000 bytes 4000 bytes 3000 bytes 2000 bytes 1472 bytes

Page 29: FABRIC Meeting, Poznan Poland, 25 Sep 2006, R. Hughes-Jones Manchester 1 Broadband Protocols WP 1.2.1 IP protocols, Lambda switching, multicasting Richard

FABRIC Meeting, Poznan Poland, 25 Sep 2006, R. Hughes-Jones Manchester29

10 Gigabit Ethernet: Tuning PCI-X

16080 byte packets every 200 µs Intel PRO/10GbE LR Adapter PCI-X bus occupancy vs mmrbc

Measured times Times based on PCI-X times from

the logic analyser Expected throughput ~7 Gbit/s Measured 5.7 Gbit/s

mmrbc1024 bytes

mmrbc2048 bytes

mmrbc4096 bytes5.7Gbit/s

mmrbc512 bytes

CSR Access

PCI-X Sequence

Data Transfer

Interrupt & CSR UpdateKernel 2.6.1#17 HP Itanium Intel10GE Feb04

0

2

4

6

8

10

0 1000 2000 3000 4000 5000Max Memory Read Byte Count

PC

I-X

Tra

nsfe

r tim

e

us

measured Rate Gbit/srate from expected time Gbit/s Max throughput PCI-X

DataTAG Xeon 2.2 GHz

0

2

4

6

8

10

0 1000 2000 3000 4000 5000Max Memory Read Byte Count

PC

I-X

Tra

nsfe

r tim

e

us

measured Rate Gbit/srate from expected time Gbit/s Max throughput PCI-X

Page 30: FABRIC Meeting, Poznan Poland, 25 Sep 2006, R. Hughes-Jones Manchester 1 Broadband Protocols WP 1.2.1 IP protocols, Lambda switching, multicasting Richard

FABRIC Meeting, Poznan Poland, 25 Sep 2006, R. Hughes-Jones Manchester30

Bandwidth Challenge wins Hat Trick The maximum aggregate bandwidth was >151 Gbits/s

130 DVD movies in a minute serve 10,000 MPEG2 HDTV movies

in real-time 22 10Gigabit Ethernet waves

Caltech & SLAC/FERMI booths In 2 hours transferred 95.37 TByte 24 hours moved ~ 475 TBytes

Showed real-time particle event analysis

SLAC Fermi UK Booth: 1 10 Gbit Ethernet to UK NLR&UKLight:

transatlantic HEP disk to diskVLBI streaming

2 10 Gbit Links to SALC:rootd low-latency file access

application for clusters Fibre Channel StorCloud

4 10 Gbit links to FermiDcache data transfers

SLAC-ESnet

FermiLab-HOPI

SLAC-ESnet-USNFNAL-UltraLight

UKLight

SLAC-ESnet

FermiLab-HOPI

SLAC-ESnet-USNFNAL-UltraLight

UKLight

SC2004 101 Gbit/s

In to booth

Out of booth

Page 31: FABRIC Meeting, Poznan Poland, 25 Sep 2006, R. Hughes-Jones Manchester 1 Broadband Protocols WP 1.2.1 IP protocols, Lambda switching, multicasting Richard

FABRIC Meeting, Poznan Poland, 25 Sep 2006, R. Hughes-Jones Manchester31

SC|05 Seattle-SLAC 10 Gigabit Ethernet 2 Lightpaths:

Routed over ESnet Layer 2 over Ultra Science Net

6 Sun V20Z systems per λ

dcache remote disk data access 100 processes per node Node sends or receives One data stream 20-30 Mbit/s

Used Neteion NICs & Chelsio TOE Data also sent to StorCloud

using fibre channel links

Traffic on the 10 GE link for 2 nodes: 3-4 Gbit per nodes 8.5-9 Gbit on Trunk

Page 32: FABRIC Meeting, Poznan Poland, 25 Sep 2006, R. Hughes-Jones Manchester 1 Broadband Protocols WP 1.2.1 IP protocols, Lambda switching, multicasting Richard

FABRIC Meeting, Poznan Poland, 25 Sep 2006, R. Hughes-Jones Manchester32

10 Gigabit Ethernet: TCP Data transfer on PCI-X

Sun V20z 1.8GHz to2.6 GHz Dual Opterons

Connect via 6509 XFrame II NIC PCI-X mmrbc 4096 bytes

66 MHz

Two 9000 byte packets b2b Ave Rate 2.87 Gbit/s

Burst of packets length646.8 us

Gap between bursts 343 us 2 Interrupts / burst

CSR Access

Data Transfer

Page 33: FABRIC Meeting, Poznan Poland, 25 Sep 2006, R. Hughes-Jones Manchester 1 Broadband Protocols WP 1.2.1 IP protocols, Lambda switching, multicasting Richard

FABRIC Meeting, Poznan Poland, 25 Sep 2006, R. Hughes-Jones Manchester33

10 Gigabit Ethernet: UDP Data transfer on PCI-X Sun V20z 1.8GHz to

2.6 GHz Dual Opterons Connect via 6509 XFrame II NIC PCI-X mmrbc 2048 bytes

66 MHz One 8000 byte packets

2.8us for CSRs 24.2 us data transfer

effective rate 2.6 Gbit/s

2000 byte packet, wait 0us ~200ms pauses

8000 byte packet, wait 0us ~15ms between data blocks

CSR Access 2.8us

Data Transfer