Why would DHCP Discovery, Request, Offer, ACK repeat
Recently some of our users have complained that their sessions to our IBM AS400 drop their connection during the first hour or two of the work day, 8am-10am. This is a very small group of users, maybe a dozen out of 200 hundred plus. They are on different switches, but same VLAN.
Everyone else on that VLAN seems to be working okay, but their apps are not as sensitive. We have received reports that streaming video for some users stutters along. And event logs for the affected AS400 users show that outlook loses connection to the exchange server for a few seconds.
Using Wireshark I can see that the typical DHCP process (discovery, request, offer, ack) repeats many times for users, typically a dozen times. This morning I did an ipconfig release then renew on my computer to start off the DHCP conversation and it repeated 11 times. In two of the eleven, I did notice the ACK to the previous request came in after the next discovery. But the original few four part DHCP conversation was in perfect order.
I also did notice that the original four part conversation had a lease time of one minute. But the second four part conversation had the correct lease time as set on our MS DHCP servers. Perhaps this was because my IP address (which is reserved) hadn't cleared in some table? We have two DHCP servers. Total speculation on my part.
But I also noticed that some users have a series of repeated DHCP conversations for minutes. One went for at least 20 minutes this morning when that person added a new computer to the domain.
The two DHCP servers are on the server vlan, and the users are on the user vlan. Two helper addresses are part of the vlan config.
I did several Google searches and couldn't really find anything that described this behavior. Most spoke to one end of the conversation not receiving or acknowledging part of the conversation, but that's not the case for us. The conversation completes but then repeats, over and over.
Where should I look in the capture, or on the network do determine possible solutions?
Thank you, Garry
Thanks, Garry
It would be helpful to see the actual PCAP to answer this question better. Timing may be an issue.
Please post it online to a public accessible share. (i.e. Google Drive)
I can already tell you that a one-minute lease should trigger DHCP renewal process in about 30 seconds in most setup I've seen. (Renew begins when clock reaches 50% of lease.)
Oh and it's usually DORA (Discover, Offer, Request, Ack) not DROA (discovery, request, offer, ack) ;)
Thanks for the correction on DORA, I inadvertently transposed them.
The problem is resolved.
But the cause of the problem was found by accident as I just happened to recognized an IP address with some DHCP activity, when to my knowledge it shouldn't. I did not see it involved with the DHCP conversations that concerned me, yet it was transmitting DHCP.
Based on the IP I recognized, we contacted the vendor responsible for that device and had them turn off DHCP despite their protests. We haven't had a problem since.
Post incident, I went back to my capture file to see what I could have done differently to better identify the problem. Because if not for recognizing the IP address and knowing it shouldn't use DHCP, the problem identification process might still be on going:)!
Below links are to jpeg screenshots. One shows the repeated DHCP conversations ...(more)
You can filter on (or search for) DHCP NAK packets with the display filter
dhcp.option.dhcp==6
. However, that assumes you already know that the NAK was the problem. Maybe the filterdhcp.option.dhcp and not dhcp.option.dhcp in {1 2 3 5}
is more useful as it will show all the non-DORA DHCP packets.See https://www.iana.org/assignments/bootp-dhcp-parameters/bootp-dhcp-parameters.xhtml for more dhcp option values.
Welcome to the wonderful world of different implementations and RFC interpretations (and don't forget bugs)!
Thank you SYN-bit!