Ask Your Question
0

RST + ACK + CWR message, beginner needs help (FIX application)

asked 2019-02-12 16:13:27 +0000

topoden gravatar image

updated 2019-02-14 14:11:38 +0000

Good day,

I am trying to debug problematic case with FIX message is being lost on regular basis. What I mean by this, is there are two applications (FIX client and FIX server). Those applications establish connection and start exchanging Hearbeat messages. When finally client application decides to send order message (this message is of longer size and can not fit in one packet, it takes 2-3 packets normally), connection drops with "An existing connection was forcibly closed by the remote host." message produced by both application's network api (Windows).

I have recoded pcap file

I wonder would could be possible reason, or if unclear, what should I review / check next.

Thank you for your help.

Update 1: Adding previous day pcap file as requested by Christian_R in comments.

Update 2: Adding both server pcap and client pcap files. It looks that checking client side pcap has something very interesting in it at around Frame #325. What does that "TCP Out-Of-Order" sent by client itself mean?

edit retag flag offensive close merge delete

Comments

Interesting case. But can you trace a whole session: -Session setup - Heartbeat - start sending - Session Drop

I have some question marks in mind, especially the used IP Flags are not clear to me at the moment.

Christian_R gravatar imageChristian_R ( 2019-02-12 16:49:51 +0000 )edit

Unfortunately, I started recording this session too late and do not have all the messages since TCP connection is established. So even if I upload entire pcap file, all it would have is continuous set of Heartbeat messages coming to / from server / client. Do you think this will help?

Otherwise I can provide pcap file from previous day, where you can see the initial session setup, but not error. I must say that the error happens in most (but not all the days) we have some days (rarely though) when the issue does not happen at all. What is also worth mentioning is that if the issue happens, it tends to happen to the first order (longer) message of the day. All the rest order messages get delivered no problems after that (when the new connection and new session is setup). Let me know if the pcap file from day ...(more)

topoden gravatar imagetopoden ( 2019-02-12 18:37:10 +0000 )edit

A better trace would help.

Christian_R gravatar imageChristian_R ( 2019-02-12 19:30:11 +0000 )edit

But in the meantime the session initiation of the old trace could help, too.

Christian_R gravatar imageChristian_R ( 2019-02-12 19:31:10 +0000 )edit

OK, I've updated the question with the link to the previous day pcap file. It has the messages since the connection was established until the first successfull order message comes.

topoden gravatar imagetopoden ( 2019-02-12 19:43:59 +0000 )edit

1 Answer

Sort by » oldest newest most voted
0

answered 2019-02-13 20:08:02 +0000

updated 2019-02-15 10:41:14 +0000

Answer for trace of Update1:

The trace at server side I guess, too. The session at all looks a little bit strange in some details.

But I would guess there is something inside the oder packet which causes the application to crash.

Another hint is that before session finally is initiated the SYN gots often an RST as an answer.

=================================================

Answer for Update2 traces:

First of all we see differences in the 3way-Handshake of client side and server side. Handshake at client side: - Client advertises 1460 MSS - Server advertises 1460 MSS

Handshake at server side: - Client advertises 1398 MSS - Server advertises 1460 MSS

Paket 325-330, are to big for the tunnel, and didn´t make it through the tunnel. At the end the client resets the session. Then we must change to the server side trace as the trace is longer. After that resets a few session retries happen and in the end the client tries a session with Fragmentation allowed. Which mostly won´t work well on tunnels.

So my recommendation is: Please try to advertise an adjusted server MSS to to the client. Like the client does on Server side trace.

Some routers are able to do so. see here: https://www.cisco.com/c/en/us/support...

Here you can find an explanation about MSS in general: https://crnetpackets.com/2016/01/27/t...

edit flag offensive delete link more

Comments

"...But I would guess there is something inside the oder packet which causes the application to crash...."

Unfortunately searching for the issues in software (applications) is where we had started the challenge before we decided to move to packet capture area. There is no indication of either side (client or server) to crash, or have any errors. Both applications keep working (no restarts involved, they keep working in the same thread) and, in fact, re-establish connection in several seconds (as you may see in the initial pcap I provided). After the new connection is re-established and new FIX session is set, the client (upon server request - following FIX protocol) re-sends the lost 'order' message and this time it gets delivered no problems.

Just to clarify, I am not saying the issue is not in applications, I am rather saying that we are trying to see if packet capture gives us ...(more)

topoden gravatar imagetopoden ( 2019-02-13 20:23:37 +0000 )edit

"...Another hint is that before session finally is initiated the SYN gots often an RST as an answer."

Could you please elaborate a little more what you mean by this.

topoden gravatar imagetopoden ( 2019-02-13 20:24:37 +0000 )edit

That means, that the port was was not ready to establish a session. Most likely because the service was down.

Christian_R gravatar imageChristian_R ( 2019-02-13 22:08:36 +0000 )edit

"...That means, that the port was was not ready to establish a session. Most likely because the service was down...."

Ah, yes, that is true. Client starts up a little before the server scheduled start up time is. So client keeps making connection attempts untill server is there and starts accepting new connections. So this is just the 'may be funny' approach the two applications use now. I do not think this is related to the issue, do you?

topoden gravatar imagetopoden ( 2019-02-13 22:29:13 +0000 )edit

While capturing, did you have capture filter applied? Packets 324 and 325 (client trace) have the same Seq.N. (not progressing), but reduced MTU. This is why #325 is called "out-of order".

There is only 0.2 ms delay between the two packets, which means the client had received an (ICMP?) instruction to reduce MTU. But we don't see it in the trace.

It looks like ICMP is filtered out of the trace.

The second question is - why even reduced packets didn't get through? Do you have double tunnel encapsulation on the path performed on two different routers?

Packet_vlad gravatar imagePacket_vlad ( 2019-02-14 14:28:56 +0000 )edit

"...
While capturing, did you have capture filter applied? Packets 324 and 325 (client trace) have the same Seq.N. (not progressing), but reduced MTU. This is why #325 is called "out-of order". There is only 0.2 ms delay between the two packets, which means the client had received an (ICMP?) instruction to reduce MTU. But we don't see it in the trace. It looks like ICMP is filtered out of the trace...."

You are right, I filtered out ICMP (mistakenly...I wrote in the subject, I am a beginner. I did not even think that would be valuable, but now I understand it definitely is...). I only had TCP traffic captured.

topoden gravatar imagetopoden ( 2019-02-14 15:23:02 +0000 )edit

Yes ICMP will be a valueable source. @Packet_vlad :As far as I can remember Fragmentation was allowed for one tha sides.... but I have no laptop to check for the next hours

Christian_R gravatar imageChristian_R ( 2019-02-14 15:51:04 +0000 )edit

@topoden: is your "working" example shows the same client traffic? I can't fully understand how the network is built. It seems both NAT and IPsec are involved because IP addresses are very different in Client and Server traces. Do you have any redundant links also? I'm so missing network diagram (at least a part you know of)..

@Christian_R: cool observation. Indeed in "working" trace IP Fragmentation in client's packets is allowed, when in "not working" traces DF is set. This is why it's interesting whether both captures show traffic originated from the same PC.

Packet_vlad gravatar imagePacket_vlad ( 2019-02-14 18:35:07 +0000 )edit

I got the same finding. Working trace: Reason for problem is for most likely the answer I gave above. Not working example: Is most likely a MTU / Tunnel problem, btw. On server side we can also see adjusted client MSS.

Christian_R gravatar imageChristian_R ( 2019-02-14 19:04:56 +0000 )edit

"....is your "working" example shows the same client traffic? I can't fully understand how the network is built. It seems both NAT and IPsec are involved because IP addresses are very different in Client and Server traces. Do you have any redundant links also? I'm so missing network diagram (at least a part you know of)....."

@Packet_vlad: If we speak about my "Update 2" and those two 'server' and 'client' pcaps, then yes, traffic is definitely for the same machines. I verified it because I had noticed different IPs too. We do have IPsec and may be NAT (not sure, but will try to confirm, ips are different so most likely we do...). I am trying to get the missing details about network from our IT (network) guys, but no details available yet (I am just a developer who is investigating the weird issue...) :(

topoden gravatar imagetopoden ( 2019-02-14 19:25:10 +0000 )edit

"...I got the same finding. Working trace: Reason for problem is for most likely the answer I gave above. Not working example: Is most likely a MTU / Tunnel problem, btw. On server side we can also see adjusted client MSS...."

@Christian_R: It sounds like you figured out something. I now see that yes, DF flag is set for some packets and not set for the other. I am not entirely sure what exactly is causing it, and what I can / should do to fix it.

Could you please explain it in easier wordings if I am not asking too much :)

topoden gravatar imagetopoden ( 2019-02-14 19:33:41 +0000 )edit

Your Answer

Please start posting anonymously - your entry will be published after you log in or create a new account.

Add Answer

Question Tools

1 follower

Stats

Asked: 2019-02-12 16:13:27 +0000

Seen: 3,577 times

Last updated: Feb 15 '19