Capture Tips/Tricks

asked 2023-03-01 17:30:46 +0000

mr. em gravatar image

Hello Wireshark community.

I keep getting plagued by a problem where a team keeps reporting issues with failed communication triggering alarms. They will report a 120 second loss even though our monitoring system polling does not. The end user says its not them. My suspicion is that it is related to the radio link latency introduced at times but not responsible for that equipment. The one responsible keeps putting the issue back on the network with the only data provided is that there is very little packet loss. I am not seeing any switchport errors and very little packet loss with an average latency of 129ms. I cannot install wireshark on the source system nor capture at the remote end due to it being unmanned and the problem being intermittent. I have setup a rolling packet capture on the field firewall to capture data from the datacenter server to the device at the end of the radio link. I know it is not the best but all I can really do.

My goal is to line up the next fault with some packet capture data. I figure I'll look at response time and retransmissions, dup acks during the time of the fault. I would appreciate any guidance towards a good analysis strategy and any documentation you've found extremely useful for such a situation.

Thanks in advance.

-E

edit retag flag offensive close merge delete

Comments

A 120-second loss is triggering alarms on the reporting team's side but not on your side so you must not be monitoring the same way. You need to understand what protocol and timers are used by the reporting team to get a better understanding of the issue. You can always try to use ICMP echoes (PING) to monitor the packet loss and latency. Wireshark does a lot of checks and calculation for ICMP.

A little packet loss sometimes creates bigger problems and so does higher latency. Is this latency and packet loss really expected or expected at those levels? (Again ICMP for the win.)

Not knowing more I can't really point you anywhere but to the Wireshark User Guide.

If you need help with a more specific issue (more specific protocol), you may upload a PCAP file on a public share for the community to look at. (Confirm ...(more)

Spooky gravatar imageSpooky ( 2023-03-07 22:25:49 +0000 )edit

Thanks Spooky. I actually figured no one was going to respond. :-) I am pretty sure I solved the issue and submitted a detailed report to the appropriate team. Believe it or not, I am pretty sure Wireshark packet captures identified the problem.

Long story short, it appears to be an application issue. I was able to identify the problem by running a rolling packet capture and comparing working TCP modbus to when it failed. I ruled out packet loss/latency by setting up a script that timestamps ICMP to a text file. This was done from a datacenter jump server and a workstation local to the site. I used the latter to rule out packet loss or latency at the time it starts to fault. The packet capture also showed odd behaviour. When the application faults, you would see the modbus request from the server, response from the client, followed ...(more)

mr. em gravatar imagemr. em ( 2023-03-08 00:23:52 +0000 )edit