networkvirtualiza/on(on(internet2 · 2014. 11. 6. · fiber & optical transport sdn controller...
TRANSCRIPT
Network Virtualiza/on on Internet2
Eric Boyd Luke Fowler Joe Breen Jeronimo Bezerra Tom Lehman Xi Yang
=
Internet2 Mission
University CorporaEon
for Advanced Internet
Development
• Abundant Bandwidth – 100G, for now
• Network Programmability – SDN, Network VirtualizaEon
• FricEon-‐Free Science – Science DMZ
Internet2 Community InnovaEon Story
• Abundant Bandwidth – 100G, for now
• Network Programmability – SDN, Network VirtualizaEon
• FricEon-‐Free Science – Science DMZ
Internet2 Community InnovaEon Story
Network VirtualizaEon on Internet2
Network VirtualizaEon on Internet2
Network VirtualizaEon on Internet2
Network VirtualizaEon on Internet2
• Control a slice of the naEonal network!
• Enable: • Rapid prototyping of advanced applicaEons
• Rapid prototyping of new network services
• Rapid advancement of network research
• Internet2’s support for network virtualizaEon instanEates the Group B request from 2005/6 that asked for a mulE-‐purpose, large-‐scale, mulE-‐tenant innovaEon pla]orm.
• Internet2’s network virtualizaEon service implements the vision laid out at the April, 2012 Member MeeEng.
• What does the service enable? – Opportunity to deploy your own controller on the naEonal
backbone – Slice of the naEonal backbone resources (e.g. VLAN range, flow
table subset, etc.) – Ability to create a persistent naEonwide service using a fracEon of
the naEonal backbone
Realizing a Community Vision
• Network VirtualizaEon: Puts members in control of (a slice of) the network
• Change in paradigm: – Turning the “commons” on its head – Private network capabiliEes with shared network costs
• Large scale networking is normally about lowest common denominator
• Large scale virtualized networking is about creaEng custom faciliEes
• Extend the local domain into the naEonal (eventually global) arena
Network VirtualizaEon on Internet2
Fiber & Optical
Transport
SDNController
I2-Run Service Specific
Hardware
Ethernet Switches
Circuits and Wavelengths - AL1S
General Purpose VLAN Service - AL2S
Virtualized Ethernet Switching
Internet2 Service Taxonomy
?
NVS (Network Virtualization Service)
? Layer 3 R&E IP and TR-CPS Services
I2 Production ServiceService User
I2 Prototype ServiceImplemented
Using
Dependencies
NET+ExternalProvider Services
ESNET NOAA
Connectors
GENIXSEDE
ONOSGENI
Learning Switch
LHCONE
Hypervisor
• For most applicaEons run in a campus environment, the tradiEonal routed Layer 3 infrastructure provided by the Internet2 Advanced Layer 3 Service (AL3S) provides all the needed funcEonality and performance.
• For some applicaEons, the ability to run on a server in a campus environment or on a GENI Rack, connected by a Layer 2 VLAN, should suffice.
• For a few advanced applicaEons, parEcularly in the network research arena, there is a need to run their own controller on a virtual network.
Network VirtualizaEon Use Case
• ProducEon Service Staging – GENI wants to move to SEtching v3.0, but SEtching 2.0 is in wide use – Set up a slice, deploy a second OESS, deploy new version of FOAM
SEtching Aggregator – When it’s tested and ready, move to the producEon OESS stack
• Network Research – Network researcher has a bejer idea how to do networking – Set up a slice, deploy new network controller, write paper
• Service Prototyping – Look at alternaEves to AL3S – Implement a route server that speaks OpenFlow on southbound
interface with no routers – Deploy in a slice, begin peering with other domains – Evaluate efficacy, operaEonal savings – Over Eme transiEon to new service
Use Case Examples (1)
• Private Networks – Want something akin to AtlanEc Wave, original vision for LHCONE,
or GENI Virtual Network – Set up a distributed SDX across mulEple domains
• Network virtualizaEon experiments are already underway – Prototyping IP over SDN soluEon (no routers!) – Prototyping cloud-‐based services – Prototyping mulE-‐domain virtual networks – DANCES
Use Case Examples (2)
• GEC21: Last week the GENI Program Office demonstrated the first non-‐Internet2 controller running a slice on the Internet2 network
• GENI developed a simple applicaEon: – Floods all traffic unEl path is learned – Learns MAC addresses – Reacts to network traffic by installing
new flow rules (“learning”) allowing end-‐to-‐end communicaEon
GENI Learning Switch
• AL2S: Advanced Layer 2 Service – Enables provisioning of Layer 2 VLANs across the Internet2 Network • Fast provisioning • AutomaEc failover to redundant path
– Enables interdomain provisoning through interdomain protocol • Slower provisioning • No automaEc failover
Prototype MulE-‐Domain Layer 2 Service
• Backdrop: – Internet2 operates a Layer 2 Service – Campuses (e.g. University of Utah) operate a Layer 2 Service – Regional Networks (e.g. MAX) operate a Layer 2 Service – Exchange Points (e.g. AMPATH/FIU) operate a Layer 2 Service
• Is there a way to create a MulE-‐Domain Layer 2 Service? – Common capabiliEes – Willingness to collaborate – Willingness to contribute to a common project – Maintain local control
• Withdraw at any Eme – Enable (illusion of) global control
• Control remote administraEve domains – No change in sonware, just configuraEon
Prototype MulE-‐Domain Layer 2 Service
OESS
SDX1
FlowSpaceFirewall
SDX
Local VLAN Provisioning Service
Physicalswitch
Virtual Switch
FlowSpaceFirewall
OESS OESS OESS
SDX3SDX2SDX1
FlowSpaceFirewallFlowSpaceFirewall
SDX
Local VLAN Provisioning Service
Local VLAN Provisioning ServiceLocal VLAN
Provisioning Service
Physicalswitch
Physicalswitch
Physicalswitch
Virtual SwitchVirtual Switch Virtual Switch
FlowSpaceFirewall
OESS OESS OESS
SDX3SDX2SDX1
FlowSpaceFirewallFlowSpaceFirewall
SDX
Local VLAN Provisioning Service
Local VLAN Provisioning ServiceLocal VLAN
Provisioning Service
Physicalswitch
Physicalswitch
Physicalswitch
Virtual SwitchVirtual Switch Virtual Switch
NSI NSINSI
FlowSpaceFirewall
OESS
MD -OESS
OESS OESS
UtahMAX
FlowSpaceFirewall
Multi-DomainSDX
Local VLAN Provisioning Service
Local VLAN Provisioning Service
Local VLAN Provisioning Service
Physicalswitch
Physicalswitch
Virtual Switch
Virtual Switch
Virtual Switch
Virtual Switch
Virtual Switch
Virtual Switch
Virtual Switch
Virtual Switch
Virtual Switch
Virtual Switch
Physicalswitch
Physicalswitch
Physicalswitch
FlowSpaceFirewall
Internet2
FlowSpaceFirewall
OESS
FIU
Local VLAN Provisioning Service
Physicalswitch
Virtual Switch
Virtual Switch
MulE-‐Domain Sample Network
DELL
HP
PS
PS
PS
PS
PS
PS
BrocadeBrocade
Brocade
Brocade
HP
Utah
FIU
Brazil
3505
3503
3504
3502
MAXInternet2AL2S
MAX FlowSpace Firewall Deployment and TesEng
Tom Lehman and Xi Yang University of Maryland
Mid-‐AtlanEc Crossroads (MAX)
MAX FSFW Deployment
• MAX regional network is interested in using FSFW as part of the overall SDN capability set
• Provides the basis for the construcEon of applicaEon/workflow specific network topologies – includes layer2 isolaEon and protecEon of other network resources and slices on local network
– provides the basis for automaEng instanEaEon of mulE-‐domain private topologies
• MAX will build on this via integraEon of FSFW/OESS driven network provisioning and topologies with compute and storage resources – Building Sonware Defined Services (SDS)
MAX Current Deployment • Two network elements – Brocade MLXe, Dell/Force10 S4810
• Two perfSONAR Nodes • Two slices: MAX-‐OESS and MD-‐OESS
MAX Future Deployment Sonware Defined Services
• AddiEonal network elements • Integrate High Performance Parallel File System (Ceph) • Integrate Compute nodes and Workflow HosEng
Americas Lightpaths -‐ AmLight
Jeronimo Bezerra [email protected]
What is AmLight? AmLight Network • CollaboraEon among FIU, RNP,
ANSP, AURA, REUNA and RedCLARA to connectSouth America to the U.S.
• Works as a Distributed IXP – Connects AMPATH and SouthernLight IXPs
• Connects more than 1000 universiEes and research centers and 13 Research Networks
• 40Gbps (4x10Gbps)
Why VirtualizaEon/Programmability in regional networks and/or IXP environments make sense?
• Reduces the provisioning Eme
• Bejer support for new protocols and features
• Bejer support experimentaEon/No need for overlays
What is AmLight’s current plan and configuraEon?
• All AmLight switches support Openflow 1.0
• Internet2’s FlowSpace Firewall for virtualizaEon/programmability
• Internet2’s OESS to manage the AmLight “slice” (default users)
• OSCARS for inter-‐domain provisioniong
What does AmLight plan for the future?
• Improve the troubleshooEng techniques and skills
• Create bejer documentaEon and share informaEon to help researchers to use AmLight
• Upgrade the network to support more features: SDX, Security, Metering, etc
University of Utah
Joe Breen [email protected]
Why
• Support of virtual slices with different security contexts and network characterisEcs – Network/Security experiment environment – Minimal protecEon
– Science DMZ as a slice – More security than experimental slice, less than hospital/campus producEon environments; high performance characterisEcs
• Support of dynamic domain science workflow between resources, i.e. dedicated instrument and HPC clusters or storage
• ExploraEon of isolated environments for different security paradigms
• Minimize provisioning Eme for isolated environments
University of Utah’s current plans
• Migrate current InstaGENI and ProtoGENI infrastructure to AL2S from ION and DWDM infrastructure (hjp://geni.net, hjp://protogeni.net)
• Setup incoming research clusters and HPC infrastructure – Connect Apt cluster to AL2S infrastructure (hjp://aptlab.net)
– Connect incoming Cloudlab infrastructure (hjp://cloudlab.us)
– Connect porEons of current HPC environment starEng with Data Transfer Nodes
• Build out campus environment substrate for support of students and research
University of Utah future plans
• Deploy Network Management research pla]orm in a slice
• Test new security research with “friendly users”
• PotenEally deploy Entrepreneurship building (Engineering and Business collaboraEons) with slice capabiliEes for supporEng student projects in “garage bays”
• PotenEally work with other disciplines, such as the Entertainment Arts and Engineering (Gaming)
• Internet2 and Indiana University have developed a second-‐generaEon, open source hypervisor, called Flowspace Firewall (FSFW). – FSFW divvies up the available VLANs on a network into VLAN ranges,
known as slices. – FSFW acts as a proxy between one or more OpenFlow controllers and a
set of switches within a single administraEve domain. – FSFW only carries OpenFlow commands from a controller to a switch
(or the reverse) if the command falls within the allocated range of VLANs for that controller.
– FSFW acts as a resource protector, ensuring that no controller overconsumes scarce resources such as the rate at which OpenFlow rules can be fed to a switch or the number of OpenFlow entries in the Flow Table.
• Technology enables InnovaEon in the Internet2 Community
Technology behind Network VirtualizaEon
OpenFlow SwitchOpenFlow SwitchOpenFlow Switch
OpenFlow
OpenFlow
OpenFlow
OESS
FOAMO
ESS UI
FOAM
EXP APP
EXP APP
EXP APPNSI
IDCP
KEYOpenFlow Switch
Internet2 Software Stack
Experimenter Code
API
OESS API
OESS API
OESS API
NSI API
NSI
OSCARS API
OSCARSNOX
Sonware Architecture
OpenFlow SwitchOpenFlow SwitchOpenFlow Switch
FlowSpaceFirewall
FlowVisor
OpenFlow OpenFlow OpenFlow
OpenFlow
Exp OF App
Exp OF App
OESS
FOAMO
ESS UI
FOAM
EXP APP
EXP APP
EXP APP
OpenFlow
OpenFlow
OpenFlow
NSI
IDCP
KEYOpenFlow Switch
Internet2 Software Stack
Experimenter Code
API
OESS API
OESS API
OESS API
NSI API
NSI
OSCARS API
OSCARSNOX
OpenFlow SwitchOpenFlow SwitchOpenFlow Switch
FlowSpaceFirewall
FlowVisor
OpenFlow OpenFlow OpenFlow
OpenFlow
Exp OF App
OVX / O
NOS
OESS
FOAMO
ESS UI
FOAM
EXP APP
EXP APP
EXP APP
OpenFlow
OpenFlow
OpenFlow
NSI
IDCP
KEYOpenFlow Switch
Internet2 Software Stack
Experimenter Code
API
OESS API
OESS API
OESS API
NSI API
NSI
OSCARS API
OSCARSNOX
OpenFlow SwitchOpenFlow SwitchOpenFlow Switch
FlowSpaceFirewall
OpenFlow OpenFlow OpenFlow
OESS
OESS UI
FOAM
STITCHING
AG
GREG
ATE
EXP APP
EXP APPOpenFlow
NSI
IDCP
KEYOpenFlow Switch
Internet2 Software Stack
Experimenter Code
API
API
API
OESS API
NSI API
NSI
OSCARSNOX
MD-OESS
NOX
GENI LearningSwitch
OpenFlowOpenFlowUtah
FSFWMAX
FSFWFIU
FSFW
• Network VirtualizaEon Service (NVS) – This underpins AL2S and – This is core piece of the Internet2 InnovaEon Pla]orm and – This provides funcEonality needed by the advanced networking
community • How is the sonware documented?
– hjp://globalnoc.iu.edu/sonware/sdn.html
Service DefiniEon
• No impact on AL3S availability • No impact on AL2S availability • 2 or 3 early adopters rolled-‐out by 12/31/14
– including 1 by TechX
What does strong success look like?
• How many concurrent customers? – Actual? – Maximum, given resources?
• What is the typical Emeline from first inquiry to deployment? • What percentage of projects make it onto the Internet2 network? • What is the prioriEzaEon distribuEon of projects? • What does availability mean?
– Problems within the slice due to customer code – Problems within the slice due to FSFW implementaEon – Problems in the slice that impact underlying hardware (exposing vendor issues) – Problems in the slice
• What is the nature of their effort? – Early stage development? (Discouraged … for now) – At –scale naEonal deployment evaluaEon? – Prototype service? – ProducEon service?
• What are their resource requirements? – FTEs? – VMs? – VLANs? – Flow rules? – Etc.
• What new features are required?
Metrics => Analysis and Growth
• Impact on other services? – Share with NTAC in quarterly reports – Impact on AL2S – Impact on AL3S
• Usefulness to the community? – How long to get from proposal to tesEng to producEon? – What percentage of proposed projects make it on to the network? – What percentage of producEon prototype services become
producEon services?
How do we measure success?
• Constraints – We need tested, instrumented, operaEonally sound controllers – We have resource constraints on tesEng infrastructure – We have resource constraints on deployment infrastructure – We are constrained by the need for producEon vendor support of
some funcEonality (e.g. metering -‐> QOS) – Controller must support VLAN translaEon and operaEng inside of a
range of VLAN tags • Scalability
– We have a limited number of VLANs (<4000) – We have a limited flow table inserEon rate – We have a limited packet in rate – We have a limited flow table size
Constraints and Scalability
• Risks: – We are complicaEng the sonware stack that supports AL2S and
AL3S. By definiEon, that introduces risk. – There are unknown risks we have not planned for.
• Risk MiEgaEon: – The sonware has been designed to protect resources. – Our tesEng has been designed to expose resource overconsumpEon
(failures by the sonware) – At 3 AM, we have a plan to back out of low priority services in order
to maintain high priority services without waking up managers or developers
– We have an escalaEon matrix
Risks and MiEgaEon
• AbstracEons – We mask some underlying vendor deficiencies
• For example, some vendors do not support idle or hard Emeouts, but we can fake it in sonware
– We abstract away VLAN tag coordinaEon issues for controllers operaEng with untagged VLANs
• ExpectaEons (if you supply your own controller) – You must supply sonware package the Internet2 NOC can deploy
• You don’t deploy your own controller – Your controller must have adequate logging to debug your problems – You must provide a test package we can test in the lab (a 4-‐node
version of the Internet2 network). – You must help in developing a test plan that thoroughly tests all
funcEonality of your controller
AbstracEons
• Customer iniEates process – Open a Ecket with [email protected] – Fill out quesEonnaire.
• Internet2 replies with applicaEon constraints – VLAN Range – Constraints on number of flow rules – Constraints on rate of flow rule inserEon – Constraints on rate of Packet-‐In/Packet-‐Out events – Etc.
• Internet2 tests applicaEon on iDREAM GENI test lab • Internet2 (not the experimenter … yet) deploys applicaEon on
Internet2 Network
Process: Deploying Your Own Controller
• Provide Enough documentaEon to setup and configure your applicaEon • Provide enough logging (to a file) to be able to debug your applicaEon
– If it breaks we will disable your slice, and send you the log, your slice will not be enabled unEl the problem is fixed
• Any API (besides OpenFlow) or UI must be secure • Provide involved and reacEve developers • ApplicaEon should already have been tested with FlowSpace Firewall to
verify it will funcEon properly – FlowSpace Firewall does not re-‐write rules, it allows or denies rules. – Your app needs to be able to work on a set of VLANs (and they wont be the
same VLAN across all devices) • Know the FlowSpace you want for your slice
– Switches – EndPoints – Number of flows – Interfaces
What do you need to do …
• Have well tested, well versioned, and packaged code • Provide lots of documentaEon • Provide lots of configurable logging • Have a TickeEng/Bug reporEng system • Provide InstallaEon and OperaEon instrucEons • Given the FlowSpace, be able to generate the proper
ConfiguraEon for your applicaEon • Be paEent, it’s a learning experience for all of us
What do we want you to do
Ques/ons?
Eric Boyd [email protected]