This is a static archive of our old Q&A Site. Please post any new questions and answers at ask.wireshark.org.

NFS Read latency - best way to identify culprit

0

Hi, I have a repeatable event with a specific job run on SLES10 box acessing an EMC celerra. I see very high READ latency (Max RTT). How can I see which file(s) is responsible for this latency. Here are two examples: I used the following filter with tshark: -q -z rpc,rtt,100003,3,'nfs.nfsstat3!=70' Note the very high Max RTT for READ procedure.

NFS Version 3 RTT Statistics:
Filter: nfs.nfsstat3!=70
Procedure        Calls   Min RTT   Max RTT   Avg RTT
NULL                 0   0.00000   0.00000   0.00000
GETATTR         1517132   0.00000   0.58692   0.00105
SETATTR             21   0.00088   0.00954   0.00155
LOOKUP          11581651   0.00001   0.98236   0.00036
ACCESS          887419   0.00000   0.18176   0.00024
READLINK         11462   0.00013   0.43970   0.00689
READ            376117   0.00009 8854.46070   0.46602
WRITE              748   0.00033   0.08924   0.01823
CREATE               8   0.00111   0.02262   0.00607
MKDIR                1   0.00633   0.00633   0.00633
SYMLINK              0   0.00000   0.00000   0.00000
MKNOD                0   0.00000   0.00000   0.00000
REMOVE               7   0.00021   0.00865   0.00342
RMDIR                1   0.00117   0.00117   0.00117
RENAME               2   0.00112   0.00113   0.00112
LINK                 0   0.00000   0.00000   0.00000
READDIR            601   0.00016   0.03231   0.00037
READDIRPLUS       8254   0.00015   0.21962   0.00372
FSSTAT               0   0.00000   0.00000   0.00000
FSINFO              27   0.00013   0.00383   0.00071
PATHCONF             0   0.00000   0.00000   0.00000
COMMIT               3   0.00393   0.01380   0.00765

NFS Version 3 RTT Statistics: Filter: nfs.nfsstat3!=70 Procedure Calls Min RTT Max RTT Avg RTT NULL 0 0.00000 0.00000 0.00000 GETATTR 1425693 0.00000 0.59928 0.00099 SETATTR 21 0.00038 0.00070 0.00056 LOOKUP 11613451 0.00001 0.68981 0.00029 ACCESS 814987 0.00000 0.04566 0.00022 READLINK 11505 0.00003 0.37694 0.00704 READ 364942 0.00011 7943.28530 2.72942 WRITE 845 0.00029 0.57347 0.01104 CREATE 8 0.00053 0.01336 0.00282 MKDIR 1 0.00073 0.00073 0.00073 SYMLINK 0 0.00000 0.00000 0.00000 MKNOD 0 0.00000 0.00000 0.00000 REMOVE 5 0.00077 0.01270 0.00509 RMDIR 1 0.00067 0.00067 0.00067 RENAME 2 0.00044 0.00051 0.00048 LINK 0 0.00000 0.00000 0.00000 READDIR 601 0.00016 0.10647 0.00156 READDIRPLUS 8254 0.00014 0.11047 0.00389 FSSTAT 0 0.00000 0.00000 0.00000 FSINFO 3 0.00026 0.00032 0.00028 PATHCONF 0 0.00000 0.00000 0.00000 COMMIT 3 0.00241 0.00939 0.00486

asked 07 Feb ‘11, 09:53

debugme's gravatar image

debugme
6336
accept rate: 0%

edited 07 Feb ‘11, 10:22

SYN-bit's gravatar image

SYN-bit ♦♦
17.1k957245


One Answer:

0

You can use tshark -r <file> -R "rpc.time>3600" to find all responses that took more than an hour.

answered 07 Feb '11, 10:26

SYN-bit's gravatar image

SYN-bit ♦♦
17.1k957245
accept rate: 20%

Thank you SYNbit. rpc.time > 3600 returns a number of frames with NFS [Illegal Segments], i'm not sure what that means ... reassembly issues, lock / permission issues. Are there doc's out there which explain what NFS [Illegal Segments] means?

thanks

(07 Feb '11, 13:02) debugme

There are no docs available on each and every field. As this is Wireshark generated (because of the square brackets around Illegal Segments), it is probably a problem in the re-assembly. Maybe because of a problem with the packets themselves (which might coincide with the problems you are facing) or maybe a bug in Wireshark.

Are you able to share the trace-file? If so, I can have a look if I can pinpoint it. See my profile for my e-mail address...

(07 Feb '11, 14:40) SYN-bit ♦♦

If there is a pcap file available we can take a look at it for you. The biggest issues with NFS is that v3 (pretty common) has a read block limit of 32KB. So per RTT, you can only transfer 32KB.

If the problem is CPU or IO issues within EMC, the trace file can point it out. If it's an intermittent problem, I would look at the usual suspects as a start (pkt loss, CPU, AntiVirus locking up files)

(07 Feb '11, 18:36) hansangb