anatomy of neutron from the eagle eyes of troubelshoorters
TRANSCRIPT
Anatomy Of OpenStack Neutron Through The Of TroubleshootersSadique PuthenCloud Success Architect27/10/2016
INSERT DESIGNATOR, IF NEEDED2
Examples:
1. Security group rules are not effective2. Newly created instances cannot get
ip from dhcp.3. Connection to floating ip randomly
fails.4. Communications through provider
networks are very very slow.○ Lessons learned.
Understand:
● Will explore only the limited anatomy associated with the problems explained here.
AgendaExplore troubleshooting neutron and its anatomy using real life troubleshooting examples
● Examples are real life troubleshooting examples.
● These solutions are applicable to the versions where they were hit.
○ May not be relevant with latest versions as patches may have landed to fix some of them permanently.
● The prime focus of this session is not on the problem and solution.
○ But on the anatomy of neutron and troubleshooting methods applied to solve them.
Security group rules are not effective
INSERT DESIGNATOR, IF NEEDED4
The rules say only ping and ssh should be allowed from x source, but everything is allowed from everywhere.
Security Groups Are Not WorkingRules are not effective, everything is allowed
● Understand how the packets flow through multiple iptables chains
● Understand where exactly Security group rules are applied.
● Try with different security groups and rules including the default one.
INSERT DESIGNATOR, IF NEEDED5
Security Group Not Working
FORWARD
neutron-openvswi-FORWARD
neutron-openvswi-sgchain
neutron-openvswitch-oxxx-x neutron-openvswitch-ixxx-x
incoming outgoing
No Yes Yes No
Does it meet the RETURN rule?
Does it meet the RETURN rule?
neutron-openvswitch-sg-fallback
DROP
Process further rules and apply default policy for the chain ACCEPT
INSERT DESIGNATOR, IF NEEDED6
The rules say only ping and ssh should be allowed from x source, but everything is allowed from everywhere.
Security Groups Are Not WorkingRules are not effective, everything is allowed
● Understand how the packets flow through multiple chains
● Verify the rules are inserted to required iptables chains.
● Understand where exactly Security group rules are applied.
● Verify if the packets are going through the chain by iptables logging.
● Example Rule.iptables -A CHAIN -j LOG --log-prefix “CHAIN:SG:" --log-level <level>
● When we added the rule, we saw that packets never traverse through iptables since nothing was logged.
● This led us to fix our focus to hunt for global parameters that could bypass iptables and found:
● This was because the default kernel configuration is set not to send packets in a linux bridge through iptables. net.bridge.bridge-nf-call-iptables =1
● Nova dynamically enables this now.
Newly created instances cannot get dhcp ip.
INSERT DESIGNATOR, IF NEEDED8
Newly Created Instances Cannot Get DHCP IP
Instance
dhcp discover
ctrl-0
dhcp offer
dhcp request c0
dhcp ACK
DHCP
dhcp discover
ctrl-0
dhcp offer
dhcp request c0
dhcp NACK
dhcp discover
ctrl-0
dhcp offer
dhcp request c0
dhcp NACK
Understand how HA for DHCP works.
● While network is created, a dhcp server is spawned on each network node depending on the value of dhcp_agents_per_network. In this case 3
● First the instance sends DHCP discover.
● All DHCP servers respond with an offer.
● Instance replies with a DHCP request with server identifier.
● That server replies with ACK, rest of them don’t respond or does NACK
DHCP DHCP
Instances created previously can still get their dhcp ip address on renewal or reboot.
INSERT DESIGNATOR, IF NEEDED9
Understand the L2 flow between instance and dhcp server
tapxxx-x eth0qb
rxxx
-x
qvbxxx-x
qvoxxx-x
br-int
patch-tun
patch-int
br-tun
ethx ethx ethx ethx
tapxxxx
br-int
patch-tun
patch-int
br-tun
dnsmasq
tapxxxx
br-int
patch-tun
patch-int
br-tun
dnsmasq
tapxxxx
br-int
patch-tun
patch-int
br-tun
dnsmasq
dhcp discover
dhcp offer
dhcp request c0
dhcp ACK
dhcp discover
dhcp offer
dhcp request c0
dhcp NACK
dhcp discover
dhcp offer
dhcp request c0
dhcp NACK
INSERT DESIGNATOR, IF NEEDED10
How does it work after the packet reaches dhcp-server?
nobody 27219 0.0 0.0 15552 540 ? S 13:35 0:00 dnsmasq --no-hosts --no-resolv --strict-order --except-interface=lo --pid-file=/var/lib/neutron/dhcp/d0ad9f6b-4927-4070-abe6-b0f149165d1e/pid --dhcp-hostsfile=/var/lib/neutron/dhcp/d0ad9f6b-4927-4070-abe6-b0f149165d1e/host --addn-hosts=/var/lib/neutron/dhcp/d0ad9f6b-4927-4070-abe6-b0f149165d1e/addn_hosts --dhcp-optsfile=/var/lib/neutron/dhcp/d0ad9f6b-4927-4070-abe6-b0f149165d1e/opts --dhcp-leasefile=/var/lib/neutron/dhcp/d0ad9f6b-4927-4070-abe6-b0f149165d1e/leases --dhcp-match=set:ipxe,175 --bind-interfaces --interface=tapd5bf1700-5c --dhcp-range=set:tag0,192.168.1.0,static,86400s --dhcp-lease-max=256 --conf-file=/etc/neutron/dnsmasq-neutron.conf --domain=openstacklocal
● We do not see any of the dhcp server responding with an offer to instance.○ This requires exploring how neutron dhcp with dnsmasq works.
● Each DHCP server is a dnsmasq process bound to a tapxxx interface in its own namespace.
● The DHCP server reads mac -> ip mapping from static host file and responds to only the mac -> ip listed there.
● While exploring the host file, we found it contains mac -> ip mapping only for previous instances.○ File never gets populated with mac -> ip mapping of any newly created instance.
● This led to further investigation on who is responsible to populate this file and how it does.
INSERT DESIGNATOR, IF NEEDED11
Who is responsible to update this file?
● dhcp-agent dynamically updates this file during changes to a port through message bus.
● This led us to explore the dhcp-agent logs and found.
2015-07-29 02:12:14.204 36387 TRACE root NotFound: Basic.consume: (404) NOT_FOUND - no queue 'dhcp_agent' in vhost '/'
The solution!
● We saw this error more than 5k times in each dhcp-agent.log
● Upon further digging, it was found the rpc/oslo.messaging code was missing the patch to reconnect to message bus in case it loses access to it.
● The immediate problem was solved by restarting dhcp-agent.
● A permanent fix was added by backporting the patch to always reconnect.
Connection to Floating IP randomly fails
INSERT DESIGNATOR, IF NEEDED13
Communication to instance through floating ip randomly fails
Random failure.
● 10 ping to floating ip works. Then drops some pings. Then works and starts dropping.
○ It’s purely random and there is no pattern.
● Layer 3 HA is used. Configured with.max_l3_agents_per_router=3l3_ha=True
● This creates three instances of the routers instances in active/passive mode.
Like 10 ping works, then stops and loses another 20, then starts working and fails
● Vxlan tunneling is used for communication between compute node and network node.
● Floating ip network is a vlan provider external network.
● Let us try to explore the anatomy of l3 HA configuration before going very deep into our problem.
INSERT DESIGNATOR, IF NEEDED14
Anatomy of L3 HA for floating ips
tapxxx-x eth0
qbr
xx-x
qvbxxx-x
qvoxxx-x
br-int
patch-tun
patch-int
br-tun
ethx ethx ethx ethx
qg-xxx
br-int
patch-tun
patch-int
br-tun
br-int
patch-tun
patch-int
br-tun
br-int
patch-tun
patch-int
br-tun
qr-yyyha-zzz
qg-xxx qr-yyyha-zzz
qrouter-xxx qrouter-xxx
qg-xxx qr-yyyha-zzz
qrouter-xxx
INSERT DESIGNATOR, IF NEEDED15
Troubleshooting steps.
tapxxx-x
eth0
qbrx
xx-x
qvbxxx-x
qvoxxx-x
br-int patch-tun
patch-int
br-tun
ethx ethx
qg-xxx
br-int patch-tun
patch-int
br-tun
qr-yyyha-zzz
qrouter-xxx
int-br-ex
phy-br-ex
br-ex
ethx
External network
Only the anatomy master network node for the router is shown in the diagram.
● Ping to the default gateway of the private network from instance.
○ That is the ip of qr-yyy. 100% successful.
● Ping the base ip of qg-xxx from instance. Every router has a base ip.
○ 100% successful.● From qrouter-xxx namespace, ping the
default gateway of external network.○ This reproduces it!
ip netns exec qrouter-xxx ping <ip>
INSERT DESIGNATOR, IF NEEDED16
Troubleshooting steps.Only the anatomy master network node for the router is shown in the diagram.
● From an external system connected to the same floating ip network, we tried to ping the base ip of qg-xxx.
○ This reproduces it.○ This helped us to focus on br-ex for
rest of troubleshooting.● Constantly monitored mac learning of ovs
on br-ex bridge. The mac -> port mapping was flapping randomly for the mac address of the instance.
# ovs-appctl fdb/show br-ex
port VLAN MAC Age1 0 00:2a:6a:8c:d6:c4 372 0 00:17:a4:77:10:2c 1
# ovs-appctl fdb/show br-ex
port VLAN MAC Age1 0 00:2a:6a:8c:d6:c4 371 0 00:17:a4:77:10:2c 1
tapxxx-x
eth0
qbrx
xx-x
qvbxxx-x
qvoxxx-x
br-int patch-tun
patch-int
br-tun
ethx ethx
qg-xxx
br-int patch-tun
patch-int
br-tun
qr-yyyha-zzz
qrouter-xxx
int-br-ex
phy-br-ex
br-ex
ethx
External network
INSERT DESIGNATOR, IF NEEDED17
Solution: Fix the loop from switch/enclosure
# tcpdump -i eth015:20:03.050558 ARP, Request who-has 12.1.1.1 tell 12.1.1.2, length 2815:20:03.050583 ARP, Request who-has 12.1.1.1 tell 12.1.1.2, length 2815:20:03.050835 ARP, Reply 12.1.1.1 is-at 00:17:a4:77:10:2c, length 3815:20:03.050558 ARP, Request who-has 12.1.1.1 tell 12.1.1.2, length 2815:20:03.050583 ARP, Request who-has 12.1.1.1 tell 12.1.1.2, length 2815:20:03.050835 ARP, Reply 12.1.1.1 is-at 00:17:a4:77:10:2c, length 38
# ovs-ofctl show br-ex
1 (eth0): addr:00:17:a4:77:10:14 …2 (phy-br-ex): addr:5e:b6:f8:49:06:41 …..
● Below command will give port number associated with the port in the ovs bridge.
● The instance mac address should always be mapped to port phy-br-ex to reach the packet to instance.
● We did tcpdump on physical interface to understand why the flapping happens.
● This clearly indicated that there is loop from switch that confuses ovs and understands the mac of instances is outside of the system and flips mac - ip mapping to the mac of physical interface.
● The loop on the hardware/switch was fixed to resolve this.
○ Beware some bonding mode or a misconfigured bonding configuration can exhibit the same problem.
# ovs-appctl vlog/set ofproto_dpif_xlate dbg2016-06-14T05:02:13.155Z|10769|ofproto_dpif_xlate(x)|DBG|bridge br-ex: learned that 00:17:a4:77:10:2c is on port eth1 in VLAN 13
2016-06-14T05:02:13.155Z|10770|ofproto_dpif_xlate(x)|DBG|bridge br-ex: learned that 00:17:a4:77:10:2c is on port phy-br-ex in VLAN 13
● Enabled DBG level logging in ofproto_dpif_xlate to see what OVS is learning when the loop happens.
Communication through provider network is very slow
INSERT DESIGNATOR, IF NEEDED19
Communication to instance is very slow on provider network
What are provider networks?
● Allows you to directly add an instance to external network.
○ Instance has gatway ip of the external gateway
● Compute node should be directly connected to external network.
● The infra was setup to route packets to external network via br-ex -> bond0.301 -> bond0 - > slaves
● Then a vlan provider network was created using below
Provider networks enable direct communication from instance to external network.
# neutron net-create provider-vlan171 --provider:network_type vlan --router:external true--provider:physical_network physnet1 --provider:segmentation_id 171 --shared
● Did tcpdump on physical interface and found packets are getting fragmented.
● lowered the mtu to 1450 and fixed the problem.
● This is not vxlan network, but vlan. Is lowering mtu the ultimate solution. Of course no!
● Let us try to explore the anatomy of provider network to get to the bottom of it.
INSERT DESIGNATOR, IF NEEDED20
Communication to instance is very slow on provider network
Let us see how provider network works.
● The diagram is packet flow on a compute node.
● When an outgoing packet reaches qvoxxx-x, ovs adds internal vlan tag associated with provider network to the packet.
● When it reaches phy-br-ex, ovs strips internal tag and adds vlan tag associated with provider network to the packet.
● When the packet reaches bond0.301, it again gets vlan tag added to packet header
Provider networks enable direct communication from instance to external network.
tapxxx-x
eth0
qbrx
xx-x
qvbxxx-x
qvoxxx-x
br-int
int-b
r-ex
br-ex
phy-
br-e
x
bond0.301
bond0
External network
Compute
eth0 eth1
10 301
301x2
INSERT DESIGNATOR, IF NEEDED21
Solution: Add plain interface to ovs bridge, not tagged interface
● This obviously causes double vlan tag on the packet when it goes out and exceeds the MTU.
● The solution is simple, Add bond0 to ovs bridge br-ex instead of bond0.301.
● This was an admin error who was confused on how provider network works and mixed with a doc that explains about flat provider network while configuring.
● But troubleshooting was not that simple.
Avoid double vlan tagging
tapxxx-x
eth0
qbrx
xx-x
qvbxxx-x
qvoxxx-x
br-int
int-b
r-ex
br-ex
phy-
br-e
x
bond0
External network
Compute
eth0 eth1
10 301
301
Lessons learned
INSERT DESIGNATOR, IF NEEDED23
● Collecting prerequisite information to start troubleshooting is time consuming and confusing.
○ Compute node the instance runs, instance name, port details, internal vlan tag on each node, etc.
● Too many hops to run tcpdumps for troubleshooting.○ Not easy to dump patch-peer . Need to mirror to another port.
● Understanding ovs topology is time confusing.○ Can be mitigated significantly by using
● Do not assume, neutron is always wrong.○ It can be user error, OS issues, issues with supporting services and Neutron
layer as well.● Hunting for expertise in each of them is challenging.● You may have to tread a lot of wrong paths before you get into the right track.
Lessons learned.Some of the lessons learned while troubleshooting
BREAKOUT SESSIONS - Thursday October 27th
Anatomy Of OpenStack Neutron Through The Eagle Eyes Of Troubleshooters
The Ceph Power Show :: Hands-on Lab to learn Ceph "The most popular Cinder backend"
Building self-healing applications with Aodh, Zaqar and Mistral
Writing A New Puppet OpenStack Module Like A Rockstar
Ambassador Community Report
VPP: the ultimate NFV vSwitch (and more!)?
Sadique Puthen
Brent Compton, Karan Singh
Zane Bitter, Lingxian Kong (Catalyst IT), Fei Long Wang (Catalyst IT)
Emilien Macchi
Erwan Gallen, Kavit Munshi (Aptira), Jaesuk Ahn (SKT), Marton Kiss (Aptira), Akihiro Hasegawa (Bit-isle Equinix, Inc)
Franck Baudin, Uri Elzur (Intel)
9:00am-9:40am
9:00am-10:30am
9:00am-9:40am
9:50am-10:30am
9:50am-10:30am
9:50am-10:30am
BREAKOUT SESSIONS - Thursday October 27th
Zuul v3: OpenStack and Ansible Native CI/CD
Container Defense in Depth
Analyzing Performance in the Cloud : solving an elastic problem with a scientific approach
One-stop-shop for OpenStack tools
OpenStack troubleshooting: So simple even your kids can do it
Solving Distributed NFV Puzzle with OpenStack and SDN
Ceph, now and later: our plan for open unified cloud storage
James Blair
Thomas Cameron, Scott McCarty
Alex Krzos, Nicholas Wakou (Dell)
Ruchika Kharwar
Vinny Valdez, Jonathan Jozwiak
Rimma Iontel, Fernando Oliveira (VZ), Rajneesh Bajpai (BigSwitch)
Sage Weil
11:00am-11:40am
11:50am-12:30pm
11:50pm-12:30pm
1:50pm-2:30pm
1:50pm-2:30pm
2:40pm-3:20pm
2:40pm-3:20pm
BREAKOUT SESSIONS - Thursday October 27th
How to configure your cloud to be able to charge your users using official OpenStack components!
A dice with several faces: Coordinators, mentors and interns on OpenStack Outreach internships
Yo dawg I herd you like Containers, so we put OpenStack and Ceph in Containers
Picking an OpenStack Networking solution
Forget everything you knew about Swift Rings - here's everything you need to know about Swift Rings
Julien Danjou, Stephane Albert (Objectif Libre), Christophe Sauthier (Objectif Libre)
Victoria Martinez de la Cruz, Nisha Yadav (Delhi Tech University), Samuel de Medeiros Queiroz (HPE)
Sean Cohen, Sebastien Han, Federico Lucifredi
Russell Bryant, Gal Sagie (Huawei), Kyle Mestery (IBM)
Christian Schwede, Clay Gerrard (Swiftstack)
2:40pm-4:10pm
2:40pm-4:10pm
3:30pm-4:10pm
4:40pm-5:20pm
5:30pm-6:10pm
[email protected]@sadiquepp
plus.google.com/+RedHat
linkedin.com/company/red-hat
youtube.com/user/RedHatVideos
facebook.com/redhatinc
twitter.com/RedHatNews