Ask Your Question
0

TCP Retransmits over industrial network

asked 2023-09-28 20:36:49 +0000

Hello everyone, Wireshark neophite here with a question.
We have four HP Windows 10 clients and two HP windows 2019 servers on an industrial network. Over the course of 2 or 3 days the clients become lethargic in response to mouse/keyboard clicks and require rebooting. I have run wireshark on one of the clients to check the network traffic and I see what looks to me like quite a few TCP retransmissions (1 or 2 per second). The capture can be found here: https://www.dropbox.com/scl/fi/1cxmc2...
The physical layer looks like this: client - fiber converter - patch panel - Cisco SF350 - server.
In the capture the client ip is .33 and the server is .131
Is this retransmit frequency normal?
Does this capture provide any clues where I should look next, physical layer, NIC settings, Cisco, etc?
Any help would be greatly appreciated!

edit retag flag offensive close merge delete

2 Answers

Sort by ยป oldest newest most voted
0

answered 2023-09-29 17:44:28 +0000

Henrik gravatar image

Hi,

this is a mismatch between the Windows Client and the Windows Server and belongs to the TCP Profile used for the traffic.

But please note: This retransmissions does not have any impact and seems to be not responsible for the issues you notice!

These specific retransmissions are cosmetic issues.

I would recommend to record the traffic in exact those times were you have the issue. Also 1:40 Minute of traffic is not enough to recommend anything.

Why this happens: The Server is using, im sure, a datacenter profile sending traffic, where the RTO (Retransmission Timeout) is very low at 20ms. The Client is using the Internet Profile with very conservatives settings, where the Delayed Ack is set to 200ms and min RTO with 300ms.

Therefore, the server is sending Payload to the client and uses the own RTO of 20ms. That means, if no ACK is seen after 20ms, it retransmits the packet. The Client itself is receiving the packet and the TCP Stack waits the Delayed ACK timer for the ACK (max 200ms), because there may be an answer from the application and the ACK + answer can be send in the same frame. So the client tries to prevent sending 2 frames and this for max 200ms (Delayed Ack Timeout).

So the RTO of the Server undergoes the delayed ack timer of the client.

Why the client mostly answers fast and sometimes not is unclear at the first look.

You can check available profiles at both systems with:

Powershell get-nettcpsetting

Also you can check which tcp connection is using which profil: powershell Get-NetTCPConnection

and additional you can see which filter is used to decide which profile shall be used for a specific connection: powershell Get-NetTransportFilter

Please also note, that for Windows client you can use this filter only via GPO. On server side you can use cli to define the filer, as far I know,..

But in the End, those retransmissions are no issue at all :-)

Hope this helps Cheers Henrik

edit flag offensive delete link more

Comments

Great analysis! I have seen this mismatch before (actually filed a bug report for one of the endpoints not choosing the right TCP profile). However, you say that these retransmissions are no issue at all, which I don't think is the case. The many unnecessary retransmissions keep the congestion window low and I believe that is cached per IP. So any other conversation between these two IP addresses will be impacted when a large data transfer is needed. As the RTT probably is very low, the impact should not be too high, but still. What do you think @Henrik?

SYN-bit gravatar imageSYN-bit ( 2023-10-01 07:29:39 +0000 )edit

Sorry for late response. Yes, of course, every retransmission is a waste of resources and yes even the fast-retransmit will cut the send congestion window to 1/2. The impact is more seen on links with higher latency than in LAN with Latency below 1ms. So may "does not have any impact" isn't true ;-) but in LAN environments may you don't notice. Its also the way you start analyzing issues. In this case, you see some dirt on the network, but you may don't have any business impact. Therefore, I would recommend only starting when you have an business impact. Clear any dirt in your network, makes it clean, but you may also invest so much time, which does nit have any effect to anyone! (its just my experience). Typically, we (I) don't have the time today to hunt for any dirt in networks,.. which is ...(more)

Henrik gravatar imageHenrik ( 2023-11-24 18:35:08 +0000 )edit
0

answered 2023-09-29 08:52:29 +0000

grahamb gravatar image

I filtered the capture on tcp.stream eq 0 and it looks like the application is using some sort of telemetry request\response or event\acknowledge protocol and the .131 endpoint is very fussy about getting a timely response from the .33 endpoint as it sends a small packet (16 bytes payload) and then approx. 20ms later retransmits that packet and subsequently the .33 endpoint does respond very quickly with a TCP ACK.

I'm not sure why the .133 endpoint is retransmitting so quickly though, normally the TCP stack would wait a little longer for the TCP ACK before retransmitting. I suspect something in the application is causing this.

edit flag offensive delete link more

Comments

Thank you for the feedback. I have mentioned your observations to the developer to see what he comes back with, I may be able to extend the wait time with configuration parameter changes.

surfwithsharks gravatar imagesurfwithsharks ( 2023-09-29 17:02:30 +0000 )edit

Your Answer

Please start posting anonymously - your entry will be published after you log in or create a new account.

Add Answer

Question Tools

1 follower

Stats

Asked: 2023-09-28 20:36:49 +0000

Seen: 318 times

Last updated: Sep 29 '23