networking shawn mckee university of michigan doe/nsf review november 29, 2001

22
Networking Shawn McKee University of Michigan DOE/NSF Review November 29, 2001

Upload: shawn-barton

Post on 14-Dec-2015

217 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Networking Shawn McKee University of Michigan DOE/NSF Review November 29, 2001

Networking

Shawn McKee

University of Michigan

DOE/NSF Review November 29, 2001

Page 2: Networking Shawn McKee University of Michigan DOE/NSF Review November 29, 2001

November 29, 2001Shawn Mckee, UMich 2DOE/NSF Review

Why Networking?• Since the early 1980’s physicists have depended upon

leading-edge networks to enable ever larger international collaborations.

• Major HEP collaborations, such as ATLASATLAS, require rapid access to event samples from massive data stores, not all of which can be locally stored at each computational site.

• Evolving integrated applications, i.e. Data Grids, rely on seamless, transparent operation of the underlying LANs and WANs.

• Networks are among the most basic Grid building blocks.

Page 3: Networking Shawn McKee University of Michigan DOE/NSF Review November 29, 2001

November 29, 2001Shawn Mckee, UMich 3DOE/NSF Review

Tier 1

Tier2 Center

Online SystemOffline Farm,

CERN Computer Ctr ~25 TIPS

BNL CenterFrance ItalyUK

InstituteInstituteInstituteInstitute ~0.25TIPS

Workstations

~100 MBytes/sec

100 - 1000 Mbits/sec

Physicists work on analysis “channels”

Each institute has ~10 physicists working on one or more channels

Physics data cache

~PByte/sec

~2.5 Gbits/sec

Tier2 CenterTier2 CenterTier2 Center

~2.5 Gbps

Tier 0 +1

Tier 3

Tier 4

Tier2 Center Tier 2

CERN/Outside Resource Ratio ~1:2Tier0/( Tier1)/( Tier2) ~1:1:1

Hierarchical Computing Model

Page 4: Networking Shawn McKee University of Michigan DOE/NSF Review November 29, 2001

November 29, 2001Shawn Mckee, UMich 4DOE/NSF Review

MONARC Simulations

• MONARC (Models of Networked Analysis at Regional Centres) has simulated Tier 0/ Tier 1/Tier 2 data processing for ATLAS.

• Networking implications: Tier 1 centers require ~ 140 Mbytes/sec to Tier 0 and ~200 Mbytes/sec to (each?) other Tier 1s, based upon 1/3 of ESD stored at each Tier 1.

Page 5: Networking Shawn McKee University of Michigan DOE/NSF Review November 29, 2001

November 29, 2001Shawn Mckee, UMich 5DOE/NSF Review

TCP WAN Performance

Mathis, et. al., Computer Communications Review v27, 3, July 1997, demonstrated the dependence of bandwidth on network parameters:

PkLossRTT

MSSBW

7.0

BW - BandwidthBW - Bandwidth

MSS – Max. Segment SizeMSS – Max. Segment Size

RTT – Round Trip TimeRTT – Round Trip Time

PkLoss – Packet loss ratePkLoss – Packet loss rate

If you want to get 90 Mbps via TCP/IP on a WAN link from LBL to IU you need a packet loss < 1.8e-6 !! (~70 ms RTT).

Page 6: Networking Shawn McKee University of Michigan DOE/NSF Review November 29, 2001

November 29, 2001Shawn Mckee, UMich 6DOE/NSF Review

Network Monitoring: Iperf

• We have setup testbed network monitoring using Iperf (V1.2) (S. McKee(Umich), D. Yu (BNL))

• We test both UDP (90 Mbps sending) and TCP between all combinations of our 8 testbed sites.

• Globus is used to initiate both the client and server Iperf processes.

(http://atgrid.physics.lsa.umich.edu/~cricket/cricket/grapher.cgi)

Page 7: Networking Shawn McKee University of Michigan DOE/NSF Review November 29, 2001

November 29, 2001Shawn Mckee, UMich 7DOE/NSF Review

USATLAS Grid Testbed

Calren Esnet, Abilene, Nton

Abilene

ESnet, Mren

UC BerkeleyLBNL-NERSC

ESnet

NPACI, Abilene

BrookhavenNationalLaboratory

Indiana University

Boston University

ArgonneNationalLaboratory

HPSS sites

U Michigan

University ofTexas atArlington

University of Oklahoma

Prototype Tier 2s

Page 8: Networking Shawn McKee University of Michigan DOE/NSF Review November 29, 2001

November 29, 2001Shawn Mckee, UMich 8DOE/NSF Review

Testbed Network Measurements

SiteSite UDPUDP(Mbps)(Mbps) TCPTCP(Mpbs)(Mpbs) PkLoss (%)*PkLoss (%)* JitterJitter(ms)(ms)TCP Wind, TCP Wind, BottleneckBottleneck

ANLANL 65.4/81.3 17.7/20.9 0.24/0.03 1.1/0.1 2 M, 100

BNLBNL 66.4/83.5 10.5/13.6 0.51/0.19 1.7/0.5 4 M, 100

BUBU 63.4/78.6 10.8/13.4 0.70/0.25 2.4/1.27 128 K, 100

IUIU 35.8/40.3 26.7/35.0 0.31/0.048 0.9/0.55 2 M, 45

LBLLBL 70.4/88.4 15.7/20.8 0.16/0.014 1.6/0.7 2 M, 100

OUOU 72.1/90.8 21.5/27.8 0.89/0.020 1.7/0.4 2 M, 100

UMUM 69.7/87.3 27.5/36.0 0.26/0.018 1.8/0.6 2 M, 100

UTAUTA 9.5 3.8 0.57 1.3 128 K, 1010

Page 9: Networking Shawn McKee University of Michigan DOE/NSF Review November 29, 2001

November 29, 2001Shawn Mckee, UMich 9DOE/NSF Review

Networking Requirements

There is more than a simple requirement of adequate network bandwidth for USATLAS. We need:

–A set of local, regional, national and international networks able to interoperate transparently, without bottlenecks.

–Application software that works together with the network to provide high throughput and bandwidth management.

–A suite of high-level collaborative tools that will enable effective data analysis between internationally distributed collaborators.

The ability of USATLAS to effectively participate at the LHC is closely tied to our underlying networking infrastructure!

Page 10: Networking Shawn McKee University of Michigan DOE/NSF Review November 29, 2001

November 29, 2001Shawn Mckee, UMich 10DOE/NSF Review

Networking as a Common Project• A new Internet2 working group has formed from the LHC

Common Projects initiative: HENPHENP (High Energy/Nuclear Physics), co-chaired by Harvey Newman (CMS) and Shawn McKee (ATLAS).

• Initial meeting hosted by IU in June, kick-off meeting in Ann Arbor October 26th

• The issues this group is focusing on are the same same that USATLAS networking needs to address.

• USATLAS gains the advantage of a greater resource pool dedicated to solving network problems, a “louder” voice in standard settings and a better chance to realize necessary networking changes.

Page 11: Networking Shawn McKee University of Michigan DOE/NSF Review November 29, 2001

November 29, 2001Shawn Mckee, UMich 11DOE/NSF Review

Network Coupling to Software• Our software and computing model will evolve as

our network evolves…both are coupled.• Very different computing models result from

different assumptions about the capabilities of the underlying network (Distributed vs Local).

• We must be careful to keep our software “network network awareaware” while we work to insure our networks will meet the needs of the computing model.

Page 12: Networking Shawn McKee University of Michigan DOE/NSF Review November 29, 2001

November 29, 2001Shawn Mckee, UMich 12DOE/NSF Review

Achieving High Performance Networking• Server and Client CPU, I/O and NIC throughput sufficient

• Must consider firmware, hard disk interfaces, bus type/capacity• Knowledge base of hardware: performance, tuning issues, examples

• TCP/IP stack configuration and tuning is Absolutely RequiredAbsolutely Required• Large windows, multiple streams

• No Local infrastructure bottlenecks• Gigabit Ethernet “clear path” between selected host pairs• To 10 Gbps Ethernet by ~2003

• Careful Router/Switch configuration and monitoring • Enough router “Horsepower” (CPUs, Buffer Size, Backplane BW)• Packet Loss must be ~Zero (well below 0.1%)

• i.e. No “Commodity” networks (need ESNet, I2 type networks)• End-to-end monitoring and tracking of performance

Page 13: Networking Shawn McKee University of Michigan DOE/NSF Review November 29, 2001

November 29, 2001Shawn Mckee, UMich 13DOE/NSF Review

Local Networking Infrastructure

• LANs used to lead WANs in performance, capabilities and stability, but this is no longer true.

• WANs are deploying 10 Gigabit technology compared with 1 Gigabit on leading edge LANs.

• New protocols and services are appearing on backbones (Diffserv, IPV6, multicast) (ESNet, I2ESNet, I2).

• Insuring our ATLAS institutions have the required LOCAL level of networking infrastructure to effectively participate in ATLAS is a major challenge.

Page 14: Networking Shawn McKee University of Michigan DOE/NSF Review November 29, 2001

November 29, 2001Shawn Mckee, UMich 14DOE/NSF Review

Estimating Site CostsSite CostsSite Costs OC3

155MbpsOC12 622Mbps

OC48 2.4Gbps

Fiber/campus Backbone

I2 req.

(Sup. Gig)

I2 req.

(Sup Gig)

I2 req.

(Sup Gig)

Network Interface

$100/conn. (Fast Eth.)

$1K/conn. (Gigabit)

$1K/conn.

(Gigabit)

Routers $15-30K $40-80K $60-120K

Telecom service Provider

Variable (~$12K/y)

Variable (~$20K/y)

Variable (~$50K/y)

Network connection Fee

$110K $270K $430K

Network Planning for US ATLAS Tier 2 Facilities, R. Gardner, G. Bernbom (IU)

Page 15: Networking Shawn McKee University of Michigan DOE/NSF Review November 29, 2001

November 29, 2001Shawn Mckee, UMich 15DOE/NSF Review

Networking Plan of Attack

• Refine our requirements for the network• Survey existing work and standards • Estimate likely developments in networking and

their timescales• Focus on gaps between expectations and needs• Adapt existing work for US ATLAS• Provide clear, compelling cases to funding

agencies about the critical importance of the network

Page 16: Networking Shawn McKee University of Michigan DOE/NSF Review November 29, 2001

November 29, 2001Shawn Mckee, UMich 16DOE/NSF Review

Network Efforts• Survey of current/future

network related efforts• Determine and document US

ATLAS network requirements

• Problem Isolation (Finger pointing tools)

• ProtocolsProtocols (Achieving high bandwidth and reliable connections)

• Network testbed (implementation, Grid testbed upgrades)

• ServicesServices (QoS, Multicast, Encryption, Security)

• Network configuration examples and recommendations

• End-to-end knowledgebaseknowledgebase• Monitoring for both

prediction and fault detection

• Liaison to network related efforts and funding agencies

Page 17: Networking Shawn McKee University of Michigan DOE/NSF Review November 29, 2001

November 29, 2001Shawn Mckee, UMich 17DOE/NSF Review

Network Related FTEs/CostsUS ATLAS Networking

2.75

4 4 4

3.25

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

2002 2003 2004 2005 2006

FT

E

0

50

100

150

200

250

K$

FTE(tot)

FTE(need)

K$

Network related efforts to leverage and adapt existing efforts for ATLAS

Page 18: Networking Shawn McKee University of Michigan DOE/NSF Review November 29, 2001

November 29, 2001Shawn Mckee, UMich 18DOE/NSF Review

Support for Networking?• Traditionally, DOEDOE and NSFNSF have provided University

networking support indirectly through the overhead charged to grant recipients.

• National labs have network infrastructure provided by DOEDOE, but not at the level we are finding we require.

• Unlike networking, computing for HEP has never been considered as simply infrastructure.

• The Grid is blurring the boundaries of computing and the network is taking on a much more significant, fundamental role in HEP computing.

• It will be necessary for funding agencies to recognize the fundamental role the network plays in our computing model and to support it directly.

Page 19: Networking Shawn McKee University of Michigan DOE/NSF Review November 29, 2001

November 29, 2001Shawn Mckee, UMich 19DOE/NSF Review

What can we Conclude?• Networks will be vital to the success of our USATLAS

efforts.• Network technologies and services are evolving requiring us

to test and develop with current networks while planning for the future.

• We must raise and maintain awareness of networking issues for our collaborators, network providers and funding agencies.

• We must clearly present network issues to the funding agencies to get the required support.

• We need to determine what deficiencies exist in network infrastructure, services and support and work to insure those gaps are closed before they adversely impact our program.

Page 20: Networking Shawn McKee University of Michigan DOE/NSF Review November 29, 2001

November 29, 2001Shawn Mckee, UMich 20DOE/NSF Review

References

• US ATLAS Facilities Plan– http://www.usatlas.bnl.gov/computing/mgmt/dit/

• MONARC – http://monarc.web.cern.ch/MONARC/

• HENP Working Group– http://www.usatlas.bnl.gov/computing/mgmt/lhccp/henpnet/

• Iperf monitoring page– http://atgrid.physics.lsa.umich.edu/~cricket/cricket/grapher.cgi

Page 21: Networking Shawn McKee University of Michigan DOE/NSF Review November 29, 2001

November 29, 2001Shawn Mckee, UMich 22DOE/NSF Review

Network FTE Breakdown

2002 2003 2004 2005 2006

Survey 0.25 0.25 0.25 0.25 0.25

Requirements 0.5/0.25 0.5/0.25 0.25 0.25 0.25

Protocols 0.25 0.25 0.25 0.25

Services 0.25 0.5/0.25 0.75/0.25 0.5/0.25 0.5/0.25

Configuration 0.25 0.5/0.25 0.5/0.25 0.5/0.25 0.5/0.25

Testbed 0.25/0.25 0.5 0.5 0.5

Monitoring 0.25/0.25 0.25/0.25 0.25/0.25 0.5/0.25 0.5/0.25

End-to-End KB 0.25 0.5/0.25 0.5/0.5 0.5/0.5 0.5/0.5

Problem Isolation 0.25 0.5 0.5 0.5 0.5

Liaison 0.25/0.25 0.25/0.25 0.25/0.25 0.25/0.25 0.25/0.25

Page 22: Networking Shawn McKee University of Michigan DOE/NSF Review November 29, 2001

November 29, 2001Shawn Mckee, UMich 23DOE/NSF Review

Network K$ Breakdown

2002 2003 2004 2005 2006

Survey 44 44 44 22 22

Requirements 44 44 44 44 44

Protocols 55 55 1010 55

Services 55 1515 2020 2020 4040

Configuration 55 1010 1010 1515 2020

Testbed 6060 7575 120120 3030

Monitoring 1212 1212 1212 2525 2525

End-to-End KB 1010 2020 2020 2020 2020

Problem Isolation 44 55 66 88 66

Liaison 77 77 88 88 88