This is a static archive of our old Q&A Site. Please post any new questions and answers at ask.wireshark.org.

TCP Zero windows leads to repeating 5s deadlock

0

Hello, we have a thorny issue here which we cannot get to the bottom of. We have a client-server application where both client and server are sending and receiving data at a high rate (actual cycle is client -> server -> client -> server -> client). For application reasons we want to throttle this communication and so the client starts to sleep before sending outbound as well as stopping reading data off the inbound. This causes the client recv and send buffers to fill up and the server send buffer to fill (server still reads data). As expected we get shrinking tcp windows on both ends which causes network backpressure through to the application. What we don't understand is that if we push things hard enough the tcp stack seemds to deadlock and then only release data packets every 5s. If we throttle back before this happens then things go through nicely. I've attached a screen shot of wireshark when this happens. You can see the two full TCP windows, what we don't understand is the complete absence of anything in the intervening 5s. We would expect to see at least an ack for frame 213 rather than this piggybacking on frame 215 (the first one after the pause). Nagle is off. delayed ack is off.

Update: this appears to be related to be the socket buffer sizes. The client has send and receive buffers set to 32k. The server has a receive buffer of 4k and a send buffer of 64k. Changing the receive buffers of both to 1mb and send buffers to 64k seems to avoid the problem. I think this is partly because the network is now operating a lot more efficiently.

Any ideas? Thx andy

Screen shot of wireshark

asked 04 Feb '16, 11:20

andypiper's gravatar image

andypiper
6113
accept rate: 0%

edited 05 Feb '16, 11:39

Hi without a trace (especially the SYN Packets) it nearly impossible to tell you something reliable.

So can you share a capture in a publicly accessible spot, e.g. CloudShark, Google Drive, Dropbox?

Also you can use https://www.tracewrangler.com/ for sanitization and anonymization of PCAP and PCAPng files.

(04 Feb '16, 13:23) Christian_R
(04 Feb '16, 13:32) andypiper

Hi andypiper, I have converted your answer a comment, because it is easier for everyone to follow this question.

The gap is system related. So what does the system monitor say at this moment.

(04 Feb '16, 14:15) Christian_R

Hi Christian, what should I be looking for? When you say its system related can you expand a little? This is on OSX, the same thing happens on windows. The system is working, as you can see from the trace once the 5s pause is hit it keeps happening.

(04 Feb '16, 14:35) andypiper

Application logs? Activity Monitor? I mean CPU, Systemload or Application

(04 Feb '16, 14:43) Christian_R

There is no obvious CPU or system spike. In the application what happens is that we are using non-blocking IO and the socket is basically not ready to write for 5s. So its not that we are not trying to write data - we are, its just the socket won't allow it.

(06 Feb '16, 03:50) andypiper

https://ask.wireshark.org/questions/34338/gap-between-last-ack-and-zero-window-full-message looks to be exactly the same problem, even down to the 5s pause.

(06 Feb '16, 04:29) andypiper
1

I find the capture quite problematic, because it's

1) obviously a in-host communication (both IP addresses being loopback) and

2) the packets showing typical local host capture errors (CRCs, oversized frames, etc)

I have seen similar captures that showed crazy things just because it wasn't a "real cable network" communication and contained details that would never happen on a cable connected to a real network card. In my experience symptoms in PCAPs captured like this cannot be trusted, and trying to troubleshoot with them can turn into a wild goose chase.

So can you reproduce this on a connection between two physical systems, over a cable? Or does this only happen on localhost connections?

(06 Feb '16, 17:16) Jasper ♦♦

I totally agree.

(06 Feb '16, 18:24) Christian_R
showing 5 of 9 show 4 more comments