nttドコモ様 導入事例 openstack summit 2016 barcelona 講演「expanding and deepening ntt...
TRANSCRIPT
Copyright©2016 NTT DOCOMO, INC. All rights reserved.
Expanding and Deepening
NTT DOCOMO’s private cloud
NTT DOCOMO Inc. Jun Ishii
Kojiro Amano
VirtualTech Japan Hiromichi Ito
DOCOMO, INC All Rights Reserved
Jun Ishii
o Research Engineer, NTT DOCOMO
o Developer, operator and technical consultant in NTT
DOCOMO private cloud
Hiromichi Ito
o CTO, VirtualTech Japan
o One of the first members of proposing OpenStack Bare
Metal Provisioning (currently called "Ironic")
Kojiro Amano
o Research Engineer, NTT DOCOMO
o Security consultant in NTT DOCOMO private cloud
About us
Copyright©2016 NTT DOCOMO, INC. All rights reserved.
Expanding strategy of
our private cloud
Scale-up
strategy
DOCOMO, INC All Rights Reserved
o One year after launched our private cloud,
it goes larger and larger!
Jun. 2015 Oct. 2016 Mar. 2017
Number of DCs 1 2 4
Number of HWs 50 300 900
Cores 1500 10000 Over 35000
DOCOMO, INC All Rights Reserved
o Why could we expand our cloud so fast ?
o Main Strategy : Forest and Tree
Make a forest
Fill in trees
DOCOMO, INC All Rights Reserved
o Three important details
Decided to migrate a large scale in-house system
Obtain budget for making a forest
Fast deployment methods by normalization
Enable to create the forest quickly
Novel challenges for various users
Enrich the forest to plant various trees
• L2GW
• GPU instance
• Reference model
• Security update
DOCOMO, INC All Rights Reserved
o Three important details
Decided to migrate a large scale in-house system
Obtain budget for making a forest
Fast deployment methods by normalization
Enable to create the forest quickly
Novel challenges for various users
Enrich the forest to plant various trees
• L2GW
• GPU instance
• Reference model
• Security update
DOCOMO, INC All Rights Reserved
o Decided to migrate a large scale in-house system
Update whole system due to HW EOL
o OpenStack-based cloud has many strengths.
Three years TCO is superior to an on-premises
Reduce 22% TCO, CAPEX & OPEX
Distributed architecture is compatible with cloud.
REST interfaces are suitable for maintaining systems.
Feasibility of migration/replication between long distance
L2GW, details are mentioned in later.
DOCOMO, INC All Rights Reserved
o Three important details
Decided to migrate a large scale in-house system
Obtain budget for making a forest
Fast deployment methods by normalization
Enable to create the forest quickly
Novel challenges for various users
Enrich the forest to plant various trees
• L2GW
• GPU instance
• Reference model
• Security update
DOCOMO, INC All Rights Reserved
o Fast deployment methods by normalization
DRY : Don't repeat yourself
o How to
deploy
set up [compute node, swift storage node] more
deal with hardware trouble
ansible, KB (See our Tokyo summit presentation)
o Only take one month to deploy new DC
From after racking and cabling ends till finish first QA test
Over 300 nodes, HW configuration settings & OpenStack install are just
finished in 10 days by 5 operators.
DOCOMO, INC All Rights Reserved
o Three important details
Decided to migrate a large scale in-house system
Obtain budget for making a forest
Fast deployment methods by normalization
Enable to create the forest quickly
Novel challenges for various users
Enrich the forest to plant various trees
• L2GW
• GPU instance
• Reference model
• Security update
DOCOMO, INC All Rights Reserved
o Novel challenges… to satisfy various users' will, there are many
difficulty and many know-how.
Add functions
L2GW
GPU instance
Reduce time to construct and manage users' systems
Reference model
Security update
DOCOMO, INC All Rights Reserved
o Three important details
Decided to migrate a large scale in-house system
Obtain budget for making a forest
Fast deployment methods by normalization
Enable to create the forest quickly
Novel challenges for various users
Enrich the forest to plant various trees
• L2GW
• GPU instance
• Reference model
• Security update
These reasons enable our private cloud so FaT !!
Copyright©2016 NTT DOCOMO, INC. All rights reserved.
How deepening our private cloud
Enrich
for users
Copyright©2016 NTT DOCOMO, INC. All rights reserved.
L2 Gatewayfor
connecting existing large-scale networks
and inter-cloud networking.
I'm good at
connecting.
DOCOMO, INC All Rights Reserved
○ Overview
Our user has large scale existing network and proprietary computer
systems.
– This network system has the great ability that provides Layer-2 connectivity
to nationwide.
– This proprietary computer system side does not have enough flexibility.
REST API
Service mobility
They decided to migrate to OpenStack on this renewal timing.
Network system side migration must be minimal.
Our user requested new two network services.
– Connect the tenant network between the two datacenters
– Connect instance and existing equipment with the layer-2
DOCOMO, INC All Rights Reserved
○ Before
DC 1RTT 20-40ms
WAN
(Nationwide)
DC 2
dedicated
equipment
dedicated
equipment
dedicated
equipment
dedicated
equipment
proprietary
computer
system
proprietary
computer
system
DOCOMO, INC All Rights Reserved
○ After
DC 1RTT 20-40ms
WAN
(Nationwide)
DC 2
dedicated
equipment
dedicated
equipment
dedicated
equipment
dedicated
equipment
L2
GW
RT
L2
GW
RT
DOCOMO, INC All Rights Reserved
○ Our user's requirements
High Availability
– Do not share control services between data center.
– Avoid Single-Point-Of-Failure (SPOF)
Technical limitations
– Do not change IP addressing and routing architecture.
– Do not use Network Address Translation(NAT).
– Must connect instance and existing equipment by Layer 2.
Performance
– Total throughput requirement is several dozen Gbps.
– Average packet size is smaller than general private clouds.
– Several hundred VLAN must connect.
DOCOMO, INC All Rights Reserved
○ Our user's requirements
High Availability
– Do not share control services between data center.
– Avoid Single-Point-Of-Failure (SPOF).
Technical limitations
– Do not change IP addressing and routing architecture.
– Do not use Network Address Translation(NAT).
– Must connect instance and existing equipment by Layer 2.
Performance
– Total throughput requirement is several dozen Gbps.
– Average packet size is smaller than general private clouds.
– Several hundred VLAN must connect.
We choose "Region" zoning model.
In "Region" model all service is separated correctly.
DOCOMO, INC All Rights Reserved
○ Our user's requirements
High Availability
– Do not share control services between data center.
– Avoid Single-Point-Of-Failure (SPOF).
Technical limitations
– Do not change IP addressing and routing architecture.
– Do not use Network Address Translation(NAT).
– Must connect instance and existing equipment by Layer 2.
Performance
– Total throughput requirement is several dozen Gbps.
– Average packet size is smaller than general private clouds.
– Several hundred VLAN must connect.
Our base OpenStack deployment model avoid SPOF already.
DOCOMO, INC All Rights Reserved
○ Our user's requirements
High Availability
– Do not share control services between data center.
– Avoid Single-Point-Of-Failure (SPOF).
Technical limitations
– Do not change IP addressing and routing architecture.
– Do not use Network Address Translation(NAT).
– Must connect instance and existing equipment by Layer 2.
Performance
– Total throughput requirement is several dozen Gbps.
– Average packet size is smaller than general private clouds.
– Several hundred VLAN must connect.
Existing system's IP addressing and routing architecture can
deploy on the overlay network.
DOCOMO, INC All Rights Reserved
○ Our user's requirements
High Availability
– Do not share control services between data center.
– Avoid Single-Point-Of-Failure (SPOF).
Technical limitations
– Do not change IP addressing and routing architecture.
– Do not use Network Address Translation(NAT).
– Must connect instance and existing equipment by Layer 2.
Performance
– Total throughput requirement is several dozen Gbps.
– Average packet size is smaller than general private clouds.
– Several hundred VLAN must connect.
NAT is must technique for floating IP and connecting the external network.
But, This system does not request the floating IP address function.
DOCOMO, INC All Rights Reserved
○ Our user's requirements
High Availability
– Do not share control services between data center.
– Avoid Single-Point-Of-Failure (SPOF).
Technical limitations
– Do not change IP addressing and routing architecture.
– Do not use Network Address Translation(NAT).
– Must connect instance and existing equipment by Layer 2.
Performance
– Total throughput requirement is several dozen Gbps.
– Average packet size is smaller than general private clouds.
– Several hundred VLAN must connect.Our base OpenStack deploy model is using L3 ECMP fabric and VXLAN.
So, We choose VXLAN Layer 2 Gateway(L2 Gateway).
DOCOMO, INC All Rights Reserved
○ Our user's requirements
High Availability
– Do not share control services between data center.
– Avoid Single-Point-Of-Failure (SPOF).
Technical limitations
– Do not change IP addressing and routing architecture.
– Do not use Network Address Translation(NAT).
– Must connect instance and existing equipment by Layer 2.
Performance
– Total throughput requirement is several dozen Gbps.
– Average packet size is smaller than general private clouds.
– Several hundred VLAN must connect.
Software based VXLAN L2 Gateway does not match short packet workload.
So, We choose using hardware based VXLAN L2 Gateway.
DOCOMO, INC All Rights Reserved
○ Equipment selection
Hardware VTEP
– Modern L3 switch chipset has Hardware VTEP gateway functions.
Intel FM6000, Broadcom Trident II, Trident II+, Tomahawk
– We tried to examine both Intel FM6000 and Broadcom Trident II based L3
network switch.
– Finally, we compared three vendors' L3 switches. (A, D, J)
Comparison result
– Vendor A's L3 switch can support VXLAN within a Multi-Chassis LAG
(MLAG) deployment. Other vendors can not. (as of June 2016)
– All vendors' L3 switches cleared performance criteria.
– All vendors OVSDB protocol support has some issues.
We choose vendor A's L3 switch. Because they support MLAG.
DOCOMO, INC All Rights Reserved
○ Software test and proof-of-concept
Test target
– Neutron networking-l2gw
API's and implementations to support L2 Gateways in Neutron.
– Networking-l2gw provides "L2GW Service Plugin" and "L2GW Agent".
"L2GW Service Plugin" provides L2GW API services.
"L2GW agent" controls L3 switch by OVSDB protocol.
Test results
– Several minor bugs (Already fixed by the community.)
– Missing of features that is required for the production environment.
SSL support (Already implemented by the community.)
Handling Mcast_Macs_Remote table (We created modified patch for
vendor A based on community patch, not merged yet.)
DOCOMO, INC All Rights Reserved
○ Controller Node
Neutron Server
L2GW Service
Plugin
API
ML2
L3
Nova
Keystone
Glance
Cinder
Horizon
ML2 L2POP
Compute Node
ML2 OVS
Agent
Open vSwitch
VTEP
Virtual
Switch
OVSDB
Server
ML2 L2POP
A’s Management Virtual applianceNetwork Node
L2GW
Agent
ML2 OVS Agent
Open vSwitchVTEP
L3
Agent
A’s
Hardware VTEP
WAN
Hardware
VTEP
Virtual Router
VLAN VLAN
OVSDB
ServerOVSDB
protocol
OVSDB
Server
Control
DOCOMO, INC All Rights Reserved
○ Result of pilot tests with as scale as production environment
Hardware L2GW side
– OVSDB Server crash issues
When inserting a large number of record at one time, OVSDB server has crashed. (This issue already fixed by the vendor.)
Networking-L2GW side
– We encountered several critical bugs.
But It is hard to reproduce.
When hit these bugs, L2GW agent stopped.
– L2GW agent recovery from a crash state is terrible.
L2 gateway agent always syncs state between neutron database and OVSDB.
Unfortunately, when L2GW agent crashed or stalled, these two databases sometimes lost sync.
So, We must re-register L2GW connections manually when met these bugs.
DOCOMO, INC All Rights Reserved
o Network trouble occurred without missing a week.
The L2GW agent is unstable.
The l2gw agent does not work correctly after few days when users run a long
test.
• That test includes continuous instance creating and deleting.
• That test includes continuous CRUD testing for neutron virtual network
port.
The instance could not communicate another region instance and existing
equipment.
DOCOMO, INC All Rights Reserved
o We decided to not use Networking-l2gw in the production
environment for the time being.
We could not reproduce connection troubles between OVSDB and the
L2GW agent that occurred a weekly.
We could not fix all critical bugs that we encountered.
In our environment, a port status issue occurred.
That issue will cause L2GW agent problem.
We would not like to use the l2population.
The l2population does not have enough scalability yet.
Keep to a delivery date.Delivery date delay. aka. Our project death.
DOCOMO, INC All Rights Reserved
o Our solution
We created a manual procedure to manage L2GW.
Manage an OVSDB by CLI
• "vtep-ctl" command
Manage an OVS flow table by CLI
• "ovs-ofctl" command
We created an automation system based on the manual procedure.
The upper management system calls the system.
This system is working correctly now.
DOCOMO, INC All Rights Reserved
o Results
We provide stable L2GW which is connecting the two regions instances
and existing equipment.
We passed all test criteria provided from the client.
Gets excellent service flexibility by the OpenStack.
The existing network configuration was kept that our user requested.
o Next challenges of our L2GW project
Fix known issues of networking-l2gw.
We would like to provide OpenStack API for managing L2GW.
Fix scalability issue of l2population.
Investigate EVPN for expanding L2GW services.
Copyright©2016 NTT DOCOMO, INC. All rights reserved.
GPU instance
Machine learning
on OpenStack
DOCOMO, INC All Rights Reserved
Nvidia Tesla M40
for CUDA/cuDNN
DOCOMO, INC All Rights Reserved
o Deploy
To deploy GPU nodes, not only PCI pass-through functions but also
IOMMU functions must be enabled.
Enable PCI pass-through
Whitelist, alias, flavor
See OpenStack wiki*
Enable IOMMU
Add grub settings to grub.conf
GRUB_CMDLINE_LINUX_DEFAULT=“$GRUB_CMDLINE_LINUX_DEFAUL
T intel_iommu=on”
dmseg | grep –e DMAR –e IOMMU
* https://wiki.openstack.org/wiki/Pci_passthrough
DOCOMO, INC All Rights Reserved
o Operate : Take care of flavor memory size
As a result of our verification, IOMMU allocates all memory resources
when instances are launched.
If you set flavor memory size large enough and launch maximum number
of instances, OOM-killer might kill the qemu process.
Swapping doesn't work well because IOMMU allocate memory too fast.
Normal compute node
Mem space
IOMMU-enabled
compute node
Mem space
HostOS
HostOS
Instance
AInstance B
Instance A Instance B
Allocated
arter
ballooning
DOCOMO, INC All Rights Reserved
o A workarounds to this problem are to take enough margin for
host OS.
Reduce flavor memory size
Sometimes too uncomfortable for GPU users
Set reserved_host_memory_mb in nova.cfg to large size
Also affect other flavors
Decrease maximum number of instances on per host
→Any other solutions? Help us!
DOCOMO, INC All Rights Reserved
o How should we offer GPU flavor to in-house users?
o GPU with OpenStack, pros/cons
As a point of Pros. Cons.
Virtualization More stable than container Some GPU card trouble needs host
reboot
Immutableness Fast deploy, fast PDCA cycle Difficulty fair sharing GPU resources
Preparation
before
run machine
learning
Can provide device driver and
CUDA pre-installed image file
Need to follow new version, new
combination of driver, guest OS, CUDA
ver, library ver…
Cooperate with GPU instance users is important for private cloud providers
Copyright©2016 NTT DOCOMO, INC. All rights reserved.
Reference Model
DOCOMO, INC All Rights Reserved
Aim to migrate some of in-house APP to our cloud
Problem Strict security policies:over 100 guidelines for system architecture
when users reconstruct APP on our cloud
A lot of efforts are required to meet the policies.
Nice if predefined models are provided:)
Monitoring
Security
vulnerabilityCertificate
Remote Access
Identification and
authentication IDS/IPS
Log
Encryption
Server Network Storage Operation
Firewall
Redundancy
BackupConfiguration
management
… … … …
DOCOMO, INC All Rights Reserved
“Reference Model” on our cloud System architecture based on many of security policies
Sets of OSS stacks that have been heavily tested on our project
Our cloud
Web three-tier
model
Heat template• Web basic model
• Web three-tier model
Input Template file into heat
Virtual router
Virtual network
Jump server
Proxy server
Web basic
model
Virtual router
Virtual network
LB/Web server
Jump server
Proxy ServerLB/Web server
DB ServerWEB/LBJump
…...
AP server
Images
Mechanism
DOCOMO, INC All Rights Reserved
System architecture of Reference Model
WEB/LB WEB/LB
DB DB
Storage Storage
Internet
AP AP
Backup
Public-NW 192.168.10.0/24
Private-NW 192.168.20.0/24
Management-NW 192.168.30.0/24
SSL termination SSL termination
LB
VI
P
DB
VI
P
VPNSSL -VPNProxy/NT
PMonitoring
Storage
LB-HA-NW
DB-HA-NW
DB-repl-NW
End User
HTTPS
Operator
VPN
DOCOMO, INC All Rights Reserved
WEB(Apache)/LB(LVS w/ Ultramonkey) server HTTPS, dummy certificates installed by default
WAF for IDS/ IPS
The key point is to not only install, but also complete the
default setting about security.
Why didn’t we use LBaaS_v1 ?
LBaaS_v1(juno) doesn’t satisfy with use cases of our users.
Required to
• set security group to LB(LBaaS_v2:not yet)
• terminate SSL at LB(LBaaS_v2:done)
• provide sorry page (LBaaS_v2:not yet)
DOCOMO, INC All Rights Reserved
VPN(OpenVPN) server SSL-VPN for secure remote access
Tools for operation of SSL-VPN, such as create and revoke certificates
Why didn’t we use VPNaaS_v1 ?
The algorithm for authentication in IKE phase1 accepted sha1, which
will be encryption losing safety assurance.
VPNaaS in recent version “Newton” accepted sha256.
DOCOMO, INC All Rights Reserved
The covered areas by “Reference Model”
60% of security policies
Future Update this model by adding the missing parts about security policies
Aims to cover 100%
Monitoring
Security
vulnerabilityCertificate
Remote Access
Identification and
authentication IDS/IPS
Log
Encryption
Server Network Storage Operation
Firewall
Redundancy
BackupConfiguration
management
… … … …
Copyright©2016 NTT DOCOMO, INC. All rights reserved.
Security Update
DOCOMO, INC All Rights Reserved
Current daily operation about vulnerabilities Most of the operation is manual.
Check vulnerabilities
Risk assessment of vulnerabilities
Management TODO list
Update our cloud
Caused
Human error
• Forget to check vulnerabilities
Time:1hours/day
More important operation of security as our cloud expands
Nice if these operations can be automated:)
DOCOMO, INC All Rights Reserved
Current Operation
We proposed how to be automatic through processes.
Testing to enable us to reduce human error
Check
vulnerabilities
Risk
assessmentManagement
TODO list
Update
our cloud
Semi-
automatic
OperationBy the script checking
package-version related
with vulnerabilities
By checking
vendor siteBy Excel
Semi-
automatic
By making ansible
playbook
DOCOMO, INC All Rights Reserved
Check vulnerabilities CVE & CVSS
CVE: attached ID to vulnerabilities
CVSS: score to vulnerabilities
API “CVE-search” is used for check Github: https://github.com/cve-search/cve-search
DOCOMO, INC All Rights Reserved
Risk Assessment Key point
CVSS risk assessment is not always match with our environment.
Important
The usage and version of package(→script) Whether the host can be internal NW or not.
Vulnerability that guest OS can invade host OS
Need to re-evaluate the CVSS score for each host regarding its
environment
DOCOMO, INC All Rights Reserved
Management TODO list Do not forget vulnerabilities which have high risk until the patch of the
vulnerability is applied.
Important
Even if the CVSS score is low, it will sometimes become high score
in our environment.
Need to check the same vulnerabilities continuously
DOCOMO, INC All Rights Reserved
Update Our cloud Semi-automatic Procedure
Manual interventions are required only for check points
Consider the influence on users’ instance
Future Apply our proposed way in the test environment at first
Extend our tools for user’s self check
Live Migration
users’ instance
Security
Update
Return back
user’s instance
Check point①The normality
of User’s APP
Check point②The normality of
Openstack function
Check point③The normality of
User’s APP
DOCOMO, INC All Rights Reserved
Thank you for listening!