Ask Your Question
0

End device goes offline randomly

asked 2021-09-16 15:26:18 +0000

updated 2021-09-16 15:34:17 +0000

grahamb gravatar image

Hello,

Currently we are having issues with a surtain type of end device that goes offline at random times and doesn't restore itself. The device in question is an LCD display that receives data through a serial connection, but also communicates with a webservice on a server on a remote location.

We have a lot of these devices in several locations and so far it seems that most if not all are having the same issues. The supplier isn't really cooperating in resolving the issue. He blames the network.

So, I installed a wireshark on the server and monitored the traffic until one of the LCD displays went offline (offline = no ICMP reply & monitored with Zabbix)

Now, i must admit i am not to knowledgeable when it comes to wireshark and this kind of detail. I am commited to learning this stuff over though!

This is the result i got from wireshark

74  11.851065   172.31.100.10   10.11.12.13     TLSv1   688     Application Data, Application Data
75  11.851827   10.11.12.13     172.31.100.10   TLSv1   1632    Application Data, Application Data
76  11.868948   172.31.100.10   10.11.12.13     TCP     60      49226 → 447 [ACK] Seq=635 Ack=1579 Win=68 Len=0
77  11.893959   172.31.100.10   10.11.12.13     TLSv1   752     Application Data, Application Data
78  11.905349   10.11.12.13     172.31.100.10   TLSv1   736     Application Data, Application Data
80  12.275970   10.11.12.13     172.31.100.10   TCP     736     [TCP Retransmission] 447 → 49226 [PSH, ACK] Seq=1579 Ack=1333 Win=1025 Len=682
81  12.838750   10.11.12.13     172.31.100.10   TCP     736     [TCP Retransmission] 447 → 49226 [PSH, ACK] Seq=1579 Ack=1333 Win=1025 Len=682
82  13.869762   10.11.12.13     172.31.100.10   TCP     736     [TCP Retransmission] 447 → 49226 [PSH, ACK] Seq=1579 Ack=1333 Win=1025 Len=682
102 15.916615   10.11.12.13     172.31.100.10   TCP     590     [TCP Retransmission] 447 → 49226 [ACK] Seq=1579 Ack=1333 Win=1025 Len=536
106 17.963494   10.11.12.13     172.31.100.10   TCP     590     [TCP Retransmission] 447 → 49226 [ACK] Seq=1579 Ack=1333 Win=1025 Len=536
114 20.010427   10.11.12.13     172.31.100.10   TCP     736     [TCP Retransmission] 447 → 49226 [PSH, ACK] Seq=1579 Ack=1333 Win=1025 Len=682
138 24.088593   10.11.12.13     172.31.100.10   TCP     736     [TCP Retransmission] 447 → 49226 [PSH, ACK] Seq=1579 Ack=1333 Win=1025 Len=682
145 26.416761   10.11.12.13     172.31.100.10   TCP     54      447 → 49226 [RST, ACK] Seq=2261 Ack=1333 Win=0 Len=0
242 42.676776   10.11.12.13     172.31.100.10   TCP     54      3005 → 49158 [FIN, ACK] Seq=1 Ack=1 Win=1024 Len=0
244 42.979161   10.11.12.13     172.31.100.10   TCP     54      [TCP Retransmission] 3005 ...
(more)
edit retag flag offensive close merge delete

1 Answer

Sort by » oldest newest most voted
0

answered 2021-09-21 21:23:30 +0000

André gravatar image

This capture confirms what you already know: the communication stops somewhere between frame 76 (last TCP-ACK) and frame 80 (first retransmission).

The capture was done at the server side, so when there is no more incoming traffic you still don't know the cause. It could be the 'end device' or something went wrong in the path between the end device and server. For instance a routing problem or a firewall that dropped the session after a timeout.

To know for sure you need to (also) capture the traffic at the end device side, or as close as possible. Capture continuously (use a ring-buffer) and stop the capture when the issue occurred at this device. The best way to convince the supplier is to use a tap.
With a capture at the end device side you can check if the retransmissions form the server do arrive and if the end device really stops responding or if the send packets were dropped in the network.

edit flag offensive delete link more

Comments

Hello,

First of all, thank you for taking your time to write this out and the information.

I will indeed try and capture this at the end device as well. Might be tricky since it happens at random to (at average) 5 devices daily out of 100-150 spread all over the country.

As a response to your remark about not knowing what is happening on the device side (since it’s a server side capture). If the firewall would indeed be closing the session at that time, would you deem it plausible that the device could only recover with a cold restart? So far no other actions have resulted in the device coming back online (physical UTP dis and reconnect, reboot switch, ect) Also, soon after the device goes offline, the MAC address also gets removed from the switch his CAM table. So network wise, the device seems “dead”. Serial ...(more)

Majexon gravatar imageMajexon ( 2021-09-21 22:00:21 +0000 )edit

Hi M,
A dis- and reconnect of the UTP cable (restarting the port on the switch has the same effect) should be enough to reset the link and trigger the device to set up connection(s) again. A firewall would see this as a new connection and pass it (if allowed).
So if this does not work and the MAC address gets removed from the switch, that's is strong indication that the device does indeed "die".
But you need to prove it with a capture to convince the supplier... Maybe multiple captures on multiple devices or try to reproduce in a lab setting? anyway, good luck :-)

André gravatar imageAndré ( 2021-09-21 22:47:52 +0000 )edit

Your Answer

Please start posting anonymously - your entry will be published after you log in or create a new account.

Add Answer

Question Tools

1 follower

Stats

Asked: 2021-09-16 15:26:18 +0000

Seen: 188 times

Last updated: Sep 21 '21