Ask Your Question

Revision history [back]

click to hide/show revision 1
initial version

The second TCP connection ending around frame 871 should be considered complete. If you see FIN packets on both sides the connection has ended from the standpoint of either side.

You have a client-server pair that is re-using the TCP ports (specifically the client is re-using the source port). This too should not be a problem because of the long delay of 15 minutes between the last FIN sequence and the new SYN sequence.

Looking at the SYN packets of the first two successful connection sessions and comparing against the SYNs that lead to RST responses, there seems to be no significant difference that would cause them to invoke RSTs. This means, the client is probably not responsible for these RSTs.

As recorded on the server, it is returning RSTs to inbound SYNs from the linux client. Based on previous observations here, it is more probable that the server is responsible for the RSTs. The cause should not be external (other TCP communications/ packets) because we have a server side capture.

While, a "TCP connection re-use" issue does not appear to be a likely reason for this behavior, can you (nevertheless) try making the client rotate source ports from a bunch of port numbers (say 3 or 10)? If you get a reproduction with that change it will eliminate the re-use related issues completely.

As an aside, have you investigated the reason for the re-transmits in the first two TCP sessions?

One more aspect to check is the other communications that Windows server is handling. If there are too many connections on going that could load its TCP stack. But, a consistent reproduction with failures starting at the 3rd connection on 15 minute intervals does not seem to suggest a "3rd party" interference.

The second TCP connection ending around frame 871 should be considered complete. If you see FIN packets on both sides the connection has ended from the standpoint of either side.

Correction1: I don't know why I thought the source port is not moving. My bad.
You have a client-server pair that is re-using the TCP ports (specifically the client is re-using the source port). This too should not be a problem because of the long delay of 15 minutes between the last FIN sequence and the new SYN sequence.

Looking at the SYN packets of the first two successful connection sessions and comparing against the SYNs that lead to RST responses, there seems to be no significant difference that would cause them to invoke RSTs. This means, the client is probably not responsible for these RSTs.

As recorded on the server, it is returning RSTs to inbound SYNs from the linux client. Based on previous observations here, it is more probable that the server is responsible for the RSTs. The cause should not be external (other TCP communications/ packets) because we have a server side capture.

Correction1: re-use is out of question because source port is moving.
While, a "TCP connection re-use" issue does not appear to be a likely reason for this behavior, can you (nevertheless) try making the client rotate source ports from a bunch of port numbers (say 3 or 10)? If you get a reproduction with that change it will eliminate the re-use related issues completely. completely.

As an aside, have you investigated the reason for the re-transmits in the first two TCP sessions?
Update1: There are also lost frames in the communication (example: between frames 757 and 759 a bunch of client sourced frames are missing from the capture).

One more aspect to check is the other communications that Windows server is handling. If there are too many connections on going that could load its TCP stack. But, a consistent reproduction with failures starting at the 3rd connection on 15 minute intervals does not seem to suggest a "3rd party" interference.

interference.

Update1: Based on your comment about 400 other clients communicating with this server at connection rates within-seconds there are two aspects to check on the TCP Server system.

  1. Are all connections closing correctly on the server? If there are packet losses some connections could be holding on timeouts. This might prevent future connections setting up on the server (too many open connections).
  2. Can the server sustain this load of connections? (Is this an Enterprise Edition, how much RAM and compute power, etc.)

Can you access the server logs / events to check if there are any errors being logged for TCP communications?
Is the server behaving correctly for the other 400 clients? or, are all of these showing up similar issues?

That the server PCAP sample shows missed inbound frames also suggests that the server is being over-run by communications on its interface.


segue on use of Locally Administered MAC addresses.
It appears from your PCAP sample that you might be using locally defined MACs on the linux system. In such cases, it is prudent to (1) use locally administered MAC addresses, and (2) confirm you are not re-using the same MAC from more than one client machine.