NFS performance troubleshooting

asked 2022-03-08 14:22:55 +0000

NullPointer gravatar image

Hi,

We are having issues with our DB servers due to slow I/O. After several days of investagition by the third party vendor, and multiple people involved, it looks like little progress is being made. My role is only from the middleware tier upwards hence my access to infra, networking, storage is limited but I can communicate and request things from the team if needed.

I've requested TCPDUMPS from client,server when the issue was present (https://drive.google.com/file/d/1KIhE...)

However this is all out of my domain and would appreciate if you see anything out of normal or clear indication what could be the issue. I've noticed there are packages dropped (and before the issue that did not happen judging from an ifconfig output from before).

Any idea would be appreciated. I am not expert but I can see dup ack messages, retransmissions etc.

From OS perspective, eversince the issue started occuring, based on the nfiostat from the db host, there is a sharp increase on the r_avg_RTT starting on the day the servers were restarted.

All this issue occurred after the whole server was restarted (including VM) so I assume something had changed before and it only kicked after full restart.

Thanks in advance.

edit retag flag offensive close merge delete