Ask Your Question
0

RST + ACK + CWR message, beginner needs help (FIX application)

asked 2019-02-12 16:13:27 +0000

topoden gravatar image

updated 2019-02-14 14:11:38 +0000

Good day,

I am trying to debug problematic case with FIX message is being lost on regular basis. What I mean by this, is there are two applications (FIX client and FIX server). Those applications establish connection and start exchanging Hearbeat messages. When finally client application decides to send order message (this message is of longer size and can not fit in one packet, it takes 2-3 packets normally), connection drops with "An existing connection was forcibly closed by the remote host." message produced by both application's network api (Windows).

I have recoded pcap file

I wonder would could be possible reason, or if unclear, what should I review / check next.

Thank you for your help.

Update 1: Adding previous day pcap file as requested by Christian_R in comments.

Update 2: Adding both server pcap and client pcap files. It looks that checking client side pcap has something very interesting in it at around Frame #325. What does that "TCP Out-Of-Order" sent by client itself mean?

edit retag flag offensive close merge delete

Comments

Interesting case. But can you trace a whole session: -Session setup - Heartbeat - start sending - Session Drop

I have some question marks in mind, especially the used IP Flags are not clear to me at the moment.

Christian_R gravatar imageChristian_R ( 2019-02-12 16:49:51 +0000 )edit

Unfortunately, I started recording this session too late and do not have all the messages since TCP connection is established. So even if I upload entire pcap file, all it would have is continuous set of Heartbeat messages coming to / from server / client. Do you think this will help?

Otherwise I can provide pcap file from previous day, where you can see the initial session setup, but not error. I must say that the error happens in most (but not all the days) we have some days (rarely though) when the issue does not happen at all. What is also worth mentioning is that if the issue happens, it tends to happen to the first order (longer) message of the day. All the rest order messages get delivered no problems after that (when the new connection and new session is setup). Let me know if the pcap file from day ...(more)

topoden gravatar imagetopoden ( 2019-02-12 18:37:10 +0000 )edit

A better trace would help.

Christian_R gravatar imageChristian_R ( 2019-02-12 19:30:11 +0000 )edit

But in the meantime the session initiation of the old trace could help, too.

Christian_R gravatar imageChristian_R ( 2019-02-12 19:31:10 +0000 )edit

OK, I've updated the question with the link to the previous day pcap file. It has the messages since the connection was established until the first successfull order message comes.

topoden gravatar imagetopoden ( 2019-02-12 19:43:59 +0000 )edit

1 Answer

Sort by » oldest newest most voted
0

answered 2019-02-13 20:08:02 +0000

Christian_R gravatar image

updated 2019-02-15 10:41:14 +0000

Answer for trace of Update1:

The trace at server side I guess, too. The session at all looks a little bit strange in some details.

But I would guess there is something inside the oder packet which causes the application to crash.

Another hint is that before session finally is initiated the SYN gots often an RST as an answer.

=================================================

Answer for Update2 traces:

First of all we see differences in the 3way-Handshake of client side and server side. Handshake at client side: - Client advertises 1460 MSS - Server advertises 1460 MSS

Handshake at server side: - Client advertises 1398 MSS - Server advertises 1460 MSS

Paket 325-330, are to big for the tunnel, and didn´t make it through the tunnel. At the end the client resets the session. Then we must change to the server side trace as the trace is longer. After that resets a few session retries happen and in the end the client tries a session with Fragmentation allowed. Which mostly won´t work well on tunnels.

So my recommendation is: Please try to advertise an adjusted server MSS to to the client. Like the client does on Server side trace.

Some routers are able to do so. see here: https://www.cisco.com/c/en/us/support...

Here you can find an explanation about MSS in general: https://crnetpackets.com/2016/01/27/t...

edit flag offensive delete link more

Comments

"...But I would guess there is something inside the oder packet which causes the application to crash...."

Unfortunately searching for the issues in software (applications) is where we had started the challenge before we decided to move to packet capture area. There is no indication of either side (client or server) to crash, or have any errors. Both applications keep working (no restarts involved, they keep working in the same thread) and, in fact, re-establish connection in several seconds (as you may see in the initial pcap I provided). After the new connection is re-established and new FIX session is set, the client (upon server request - following FIX protocol) re-sends the lost 'order' message and this time it gets delivered no problems.

Just to clarify, I am not saying the issue is not in applications, I am rather saying that we are trying to see if packet capture gives us ...(more)

topoden gravatar imagetopoden ( 2019-02-13 20:23:37 +0000 )edit

"...Another hint is that before session finally is initiated the SYN gots often an RST as an answer."

Could you please elaborate a little more what you mean by this.

topoden gravatar imagetopoden ( 2019-02-13 20:24:37 +0000 )edit

That means, that the port was was not ready to establish a session. Most likely because the service was down.

Christian_R gravatar imageChristian_R ( 2019-02-13 22:08:36 +0000 )edit

"...That means, that the port was was not ready to establish a session. Most likely because the service was down...."

Ah, yes, that is true. Client starts up a little before the server scheduled start up time is. So client keeps making connection attempts untill server is there and starts accepting new connections. So this is just the 'may be funny' approach the two applications use now. I do not think this is related to the issue, do you?

topoden gravatar imagetopoden ( 2019-02-13 22:29:13 +0000 )edit

While capturing, did you have capture filter applied? Packets 324 and 325 (client trace) have the same Seq.N. (not progressing), but reduced MTU. This is why #325 is called "out-of order".

There is only 0.2 ms delay between the two packets, which means the client had received an (ICMP?) instruction to reduce MTU. But we don't see it in the trace.

It looks like ICMP is filtered out of the trace.

The second question is - why even reduced packets didn't get through? Do you have double tunnel encapsulation on the path performed on two different routers?

Packet_vlad gravatar imagePacket_vlad ( 2019-02-14 14:28:56 +0000 )edit

Your Answer

Please start posting anonymously - your entry will be published after you log in or create a new account.

Add Answer

Question Tools

1 follower

Stats

Asked: 2019-02-12 16:13:27 +0000

Seen: 4,094 times

Last updated: Feb 15 '19