various errors, slow LAN
Hi,
at home I have 50 ish Ethernet nodes, connected through three switches (cascaded), some nodes connected to wifi.
DNS is provided by a pi-hole running in a VM on Intel NUC.
The LAN at some point in time became slower and slower. Meanwhile I see various and different effects. Some nodes loose their LAN connection sporadically, i.e. the regular ping I do on some nodes to check if they are alive, is timing out. This happens once or twice a night. What I also have seen is the one or the other node being rate limited by pi-hole. (limit is set to 1000 requests during 60 sec).
So I need to find the root cause.
I am not an expert at all, just used WS couple of times to peek into the LAN. But as a starting point I thought it might be helpful to capture some traffic with WS, for which I mirrored one switch port to my Monitoring PC.
What I did not find so far is a kind of step by step guide to check how "healthy" a LAN is.
So I did some first filtering of the trace , hoping this indicates "something" to get some advice from you guys
Captured packets 1.171.197
20 % bad TCP (tcp.analysis.flags)
7.7% TCP retransmission (tcp.analysis.retransmission or tcp.analysis.fast_retransmission)
DNS stats:
Total Packets 22008
rcode 22008
Server failure 110
Refused 34
Not implemented 6
No such name 233
No error 21625
opcodes 22008
Standard query 21837
Dynamic update 171
Topic / Item -------------------------------- Count------- Average---- Min Val--- Max Val
request-response time (msec)---------- 7286-------40,56---------0,32---------3744,55 (looks super slow...)
Another observation... at the WS source column, some nodes are shown with suffix ".local" , some show ".fritz.box" which "should be" my local domain. Maybe I screwed things up here !?
Again, I am no expert just trying to get help for a good starting point
Tks