Ask Your Question
0

RST + ACK + CWR message, beginner needs help (FIX application)

asked 2019-02-12 16:13:27 +0000

topoden gravatar image

updated 2019-02-14 14:11:38 +0000

Good day,

I am trying to debug problematic case with FIX message is being lost on regular basis. What I mean by this, is there are two applications (FIX client and FIX server). Those applications establish connection and start exchanging Hearbeat messages. When finally client application decides to send order message (this message is of longer size and can not fit in one packet, it takes 2-3 packets normally), connection drops with "An existing connection was forcibly closed by the remote host." message produced by both application's network api (Windows).

I have recoded pcap file

I wonder would could be possible reason, or if unclear, what should I review / check next.

Thank you for your help.

Update 1: Adding previous day pcap file as requested by Christian_R in comments.

Update 2: Adding both server pcap and client pcap files. It looks that checking client side pcap has something very interesting in it at around Frame #325. What does that "TCP Out-Of-Order" sent by client itself mean?

edit retag flag offensive close merge delete

Comments

Interesting case. But can you trace a whole session: -Session setup - Heartbeat - start sending - Session Drop

I have some question marks in mind, especially the used IP Flags are not clear to me at the moment.

Christian_R gravatar imageChristian_R ( 2019-02-12 16:49:51 +0000 )edit

Unfortunately, I started recording this session too late and do not have all the messages since TCP connection is established. So even if I upload entire pcap file, all it would have is continuous set of Heartbeat messages coming to / from server / client. Do you think this will help?

Otherwise I can provide pcap file from previous day, where you can see the initial session setup, but not error. I must say that the error happens in most (but not all the days) we have some days (rarely though) when the issue does not happen at all. What is also worth mentioning is that if the issue happens, it tends to happen to the first order (longer) message of the day. All the rest order messages get delivered no problems after that (when the new connection and new session is setup). Let me know if the pcap file from day ...(more)

topoden gravatar imagetopoden ( 2019-02-12 18:37:10 +0000 )edit

A better trace would help.

Christian_R gravatar imageChristian_R ( 2019-02-12 19:30:11 +0000 )edit

But in the meantime the session initiation of the old trace could help, too.

Christian_R gravatar imageChristian_R ( 2019-02-12 19:31:10 +0000 )edit

OK, I've updated the question with the link to the previous day pcap file. It has the messages since the connection was established until the first successfull order message comes.

topoden gravatar imagetopoden ( 2019-02-12 19:43:59 +0000 )edit

"..When finally client application decides to send order message.." But in your first trace client didn't send anything, it's just getting RST out of nowhere.

Moreover this RST has jump in Seq. number, this is why the client issues Dup ACK. The second RST uses correct Seq.

What's your link/environment? Are there many middleboxes?

Packet_vlad gravatar imagePacket_vlad ( 2019-02-13 10:05:58 +0000 )edit

"....But in your first trace client didn't send anything, it's just getting RST out of nowhere.

Moreover this RST has jump in Seq. number, this is why the client issues Dup ACK. The second RST uses correct Seq...."

Yes, I agree with you. The seq number jumps and the root cause of that is what I am trying to understand.

It also seems that what I call client is what you think of as server (if I am not mistaken). To be precise 135.17 is what I call client, while 93.13 is what I call server (93.13 is listening on the port (5001), while 135.17 initiates connection)

The frame #6 is the frame in question. This is the frame I expect to be the one with the 'order message' in it. But it seems that the 'order message' frame(s) is(are) actually lost ...(more)

topoden gravatar imagetopoden ( 2019-02-13 15:44:58 +0000 )edit

"....What's your link/environment? Are there many middleboxes?...."

The machines are physically located vary far away from each other. So I am not sure about 'middle boxes'. The machines communicate via ipsec site to site vpn.

topoden gravatar imagetopoden ( 2019-02-13 15:46:54 +0000 )edit

"...A better trace would help...."

It always happens this way... The issue happens every day untill you start hunting for it. I did not happen today, so I will continue recording tomorrow. I will also try adding client side pcap file if it helps.

topoden gravatar imagetopoden ( 2019-02-13 15:48:29 +0000 )edit

Ah, ok, I just assumed 93.13 to be a client because it has private range IP :) Wrong assumption. So this is server-side trace.

I guess jump in Seq. N happens because the packet from client gets lost in transit. Indeed client side trace would help very much 'cause we'll see whether this (these) missing request packet(s) existed on the network.

So probably the client sends "order" message (maybe even several times), gets no ACK and just resets the whole connection after timeout.

As for network itself ipsec always tells me "check for MSS/MTU values and ICMP blackholes on the path".

"RST+ACK+CWR" - actually I'd not check this on the first place, but after MTU behavior on client side. Probably ECN feature is ON on the client, therefore it tells about it this way. Try to disable ECN on the server, so it won't ...(more)

Packet_vlad gravatar imagePacket_vlad ( 2019-02-13 18:44:29 +0000 )edit

"...As for network itself ipsec always tells me "check for MSS/MTU values and ICMP blackholes on the path..." Thank you, we will try to look into this now.

topoden gravatar imagetopoden ( 2019-02-13 21:31:31 +0000 )edit

1 Answer

Sort by » oldest newest most voted
0

answered 2019-02-13 20:08:02 +0000

updated 2019-02-15 10:41:14 +0000

Answer for trace of Update1:

The trace at server side I guess, too. The session at all looks a little bit strange in some details.

But I would guess there is something inside the oder packet which causes the application to crash.

Another hint is that before session finally is initiated the SYN gots often an RST as an answer.

=================================================

Answer for Update2 traces:

First of all we see differences in the 3way-Handshake of client side and server side. Handshake at client side: - Client advertises 1460 MSS - Server advertises 1460 MSS

Handshake at server side: - Client advertises 1398 MSS - Server advertises 1460 MSS

Paket 325-330, are to big for the tunnel, and didn´t make it through the tunnel. At the end the client resets the session. Then we must change to the server side trace as the trace is longer. After that resets a few session retries happen and in the end the client tries a session with Fragmentation allowed. Which mostly won´t work well on tunnels.

So my recommendation is: Please try to advertise an adjusted server MSS to to the client. Like the client does on Server side trace.

Some routers are able to do so. see here: https://www.cisco.com/c/en/us/support...

Here you can find an explanation about MSS in general: https://crnetpackets.com/2016/01/27/t...

edit flag offensive delete link more

Comments

"...But I would guess there is something inside the oder packet which causes the application to crash...."

Unfortunately searching for the issues in software (applications) is where we had started the challenge before we decided to move to packet capture area. There is no indication of either side (client or server) to crash, or have any errors. Both applications keep working (no restarts involved, they keep working in the same thread) and, in fact, re-establish connection in several seconds (as you may see in the initial pcap I provided). After the new connection is re-established and new FIX session is set, the client (upon server request - following FIX protocol) re-sends the lost 'order' message and this time it gets delivered no problems.

Just to clarify, I am not saying the issue is not in applications, I am rather saying that we are trying to see if packet capture gives us ...(more)

topoden gravatar imagetopoden ( 2019-02-13 20:23:37 +0000 )edit

"...Another hint is that before session finally is initiated the SYN gots often an RST as an answer."

Could you please elaborate a little more what you mean by this.

topoden gravatar imagetopoden ( 2019-02-13 20:24:37 +0000 )edit

That means, that the port was was not ready to establish a session. Most likely because the service was down.

Christian_R gravatar imageChristian_R ( 2019-02-13 22:08:36 +0000 )edit

"...That means, that the port was was not ready to establish a session. Most likely because the service was down...."

Ah, yes, that is true. Client starts up a little before the server scheduled start up time is. So client keeps making connection attempts untill server is there and starts accepting new connections. So this is just the 'may be funny' approach the two applications use now. I do not think this is related to the issue, do you?

topoden gravatar imagetopoden ( 2019-02-13 22:29:13 +0000 )edit

While capturing, did you have capture filter applied? Packets 324 and 325 (client trace) have the same Seq.N. (not progressing), but reduced MTU. This is why #325 is called "out-of order".

There is only 0.2 ms delay between the two packets, which means the client had received an (ICMP?) instruction to reduce MTU. But we don't see it in the trace.

It looks like ICMP is filtered out of the trace.

The second question is - why even reduced packets didn't get through? Do you have double tunnel encapsulation on the path performed on two different routers?

Packet_vlad gravatar imagePacket_vlad ( 2019-02-14 14:28:56 +0000 )edit

Your Answer

Please start posting anonymously - your entry will be published after you log in or create a new account.

Add Answer

Question Tools

1 follower

Stats

Asked: 2019-02-12 16:13:27 +0000

Seen: 3,580 times

Last updated: Feb 15 '19