I am a wirshark n00b and in desperate need of help. We have a SMPP client on some our server and it is connecting to an SMSC. They chat backwards and forwards using SMPP which is just a protocol over TCP. Everything was working great until 1 August 2012 (very strange date for things to suddenly go pear shape).
One of the SMSCs that we connect to suddenly started exihibiting the worst connection stability. All the other SMSCs that we connect to have been perfect. We can maintain the connection for hours on end. This one SMSC drops the connection every couple of seconds.
We have run many many traces looking at the SMPP protocol and I am pretty sure that there is nothing wrong that we are doing according to the SMPP spec (that and every other SMSC is stable). So I was reading wireshark forums etc and came across looking at the connection RST. When I run a wireshark trace against their IP address and run "Expert Infos" the log is full of them. Their IP is mostly the source. Occassionally we are the source but that is due to the fact that they have not answered keep alive packets. I also see a lot of TCP DUP ack (which I believe is a good indicator of packet loss). The connection also appears to be very unstable when we put them under load. During low traffic it seems to be fine except for our receivers that reset the binds every now and then due to the SMSC not acknowledging SMPP keep alive packets.
Is there anybody here that can confirm my suspicions and look at this trace and tell me what they think.
Sorry, just to add, from previous questions that I have read the contributors always mention that it would be nice to get the developers input. We are the developers of the SMPP software. We can answer all those questions. Any questions you have we will be able to answer. For example the cause of the TCP resets from us is that the keep alive packets are not answered within 30 seconds so we assume a dead connection and restart.
There are a lot of these messages
Take a look at 'tcp.stream == 10' (first stream with 3-way handshake). Then take a look at 'tcp.stream == 11'. Apparently the SYN packet did not arrive and had to be retransmitted.
I guess you have "some/to much" packet loss somewhere on the way. The TCP Resets are just the final act in that play. I suggest to capture in front of both systems and then compare the capture files to verify that. Below is a "picture" with the suggested capture points (CP).
If the packet loss takes place on your systems (I don't think so, as several systems are affected), you can only debug your implementation.
If the packet loss takes place on the transfer network, you need to know all routers on that way (traceroute) and then work along that chain to find the place where the packets are lost.
If the routers are not within your control, you can run a TCP performance test to verify the quality of the connection (xjperf)
BTW: Did you upgrade any router/firewall/vpn firmware on the magic date 1. August?