I have a NIC vendor that claims the server is not sending frames soon enough and the delayed ack from the client is timing out causing large latency.
I've uploaded part of a conversation which shows the delayed acks. See trace:
Frame 21 is an example where the delayed ack waits 200 ms before acking. Note this ack is to frame 19 which had the push bit set. Frame 19 is also not a full size frame.
Normally we turn off delayed ACK at the client and this resolves our latency problem. However, in this case the TCP stack is in the NIC card and the vendor said turning it off is not supported.
My question being is the server not obeying TCP?
Below is what the NIC vendor is saying.
In summary, we believe that this latency phenomena is a direct result of the TCP/IP stack implementation inside the target. Looking from the outside in it is hard to say, but it almost appears that there may be some form of deadlock inside the target as it sends a single segment and then waits for an ACK before transitioning to full transmit. With delayed ACK enabled, the VendorXYZ initiator sends this ACK on every other segment or unless 200ms timer expires (note: this is in accordance with the RFC guidelines for delayed ACK). With the target not sending the next frame until the 200 msec timer expires on the initiator side, it is this delay that explains the overall latency.
asked 31 Jul '12, 09:08
The delay at frame 21 seems to be because the server has nothing more to send, not because it's waiting on a delayed ACK. Frame 1 is a SCSI Read request. In the Packet Details, there is "ExpectedDataTransferLength" of 0x00004000. Translated to decimal, that's 16,384 bytes, which would bring us to sequence number 16,385. (I don't know the SCSI protocol, so I'm guessing at what the "ExpectedDataTransferLength" means, but it seems reasonable that it is the amount of data that will be sent in response to the Read request.) As soon as the client issues the Read request, the server starts sending data.
Every packet the server sends has a full-sized (1,460 byte) TCP segment, until frame 19. The fact that frame 19 doesn't contain a full-sized segment and the server doesn't send any more data suggests that #19 is the last packet of the data stream, and the server doesn't have any more to send. At that point, we're up to ACK 16,433. That's not the 16,385 that we calculated earlier, but it's awfully close, and 0x00004000 is a suspiciously round number.
The last frame before #19 that the client acknowledged was #18, so the client waits for #20 before ACKing. No frame #20 is received, so the client ACKs frame #19 when the delayed ACK timer expires.
Note that the server does NOT resume sending at this point. Almost a full second goes by, then in frame #22 the client issues another SCSI Read request. Only then does the server begin sending data again, and it does so immediately.
So, there was 218 ms delay due to the delayed ACK, but 981 ms delay waiting for the client to issue another Read request.
There seem to be two things going on here.
If you add up all the delays waiting on delayed ACKs, you get 3.49 seconds. If you add up all the delays waiting for the client to issue a SCSI Read request, you get 7.85 seconds.