I execute tshark command. Process is too slow. Sometimes take more than 6-7 seconds. Pcap file is really small (~500bytes). Actually strange thing is that re-execute same command consequently, process duration is decreasing dramatically (about 1 second). After wait for a while (without execution command), re-run same command and process completion duration increase again. Why process behave like that? How to solve this problem?

Command :

tshark -ta -r test.pcap

OS Details :

SunOS er 5.10 Generic_147441-01 i86pc i386 i86pc

Thanks

asked 13 Mar '13, 02:32

erdinc's gravatar image

erdinc
1113
accept rate: 0%


Actually strange thing is that re-execute same command consequently, process duration is decreasing dramatically (about 1 second)

That's most certainly due to DNS resolving. The seconds time it runs faster due to DNS caching. Please use option '-n' and tshark should run faster.

tshark -ta -n -r test.pcap

If that does not help try this:

tshark -ta -r test.pcap -o name_resolve:FALSE

Regards
Kurt

link

answered 13 Mar '13, 04:30

Kurt%20Knochner's gravatar image

Kurt Knochner ♦
17.0k732168
accept rate: 15%

edited 13 Mar '13, 04:31

Both of options does not solve my problem. Tshark behaviour is not changed.

(13 Mar '13, 04:56) erdinc

what is you tshark version (tshark -v)?

Can you please add the output of the following command for the first call (long run time) and the second call (short run time)?

Case #1:

truss -a -d -e -f -o /var/tmp/tshark_case1.out tshark -ta -r test.pcap

Case #2:

truss -a -d -e -f -o /var/tmp/tshark_case2.out tshark -ta -r test.pcap

Please post the content of the files tshark_case1.out and tshark_case2.out somewhere.

(13 Mar '13, 16:49) Kurt Knochner ♦

I did your test, Biggest differences is at brk function. brk function consume lots of time. Output files link is below. http://www.speedyshare.com/AwzRh/tshark.rar What is wrong about memory (brk), why brk need to time, how to resolve it?

(29 Mar '13, 01:55) erdinc

As far as I know, brk() is used my malloc() to expand the programs heap. However, I have no idea why it takes that much longer in case #1 (first load) than in case #3 (second load).

Can you please post the output of the command 'limit' or 'ulimit -a' (whatever works)?

(02 Apr '13, 10:08) Kurt Knochner ♦

ulimit -a

core file size (blocks, -c) unlimited
data seg size (kbytes, -d) unlimited
file size (blocks, -f) unlimited
open files (-n) 256
pipe size (512 bytes, -p) 10
stack size (kbytes, -s) 10240
cpu time (seconds, -t) unlimited
max user processes (-u) 27493
virtual memory (kbytes, -v) unlimited

(03 Apr '13, 03:02) erdinc

can you please try to increase the stack size (ulimit -s) and see if that changes anything?

(03 Apr '13, 03:28) Kurt Knochner ♦

I set stack size to 100.000 (increase 10 times), result is same.

(03 Apr '13, 06:27) erdinc

Hm.. looks like there is 'something' on your system that makes memory reallocation slow for the first time (brk calls) and faster for consecutive calls. Could be a specific behavior of the solaris memory management or just a specific behavior of your installation, like a large memory consumption on the system and thus swapping. Are there any processes that consume large amounts of memory (output of 'top' and 'swapon -s')?

I googled for slow brk() calls, but was unable to find something interesting. Can you please check my assumption about memory usage (above).

(03 Apr '13, 07:02) Kurt Knochner ♦

Yes, system use most of memory. 12 GB memory is installed. Top command says %63 percentage is used. Only one process is about half of total memory consuming. I observe similar behaviour at windows machine, I guess differences is not as big as at solaris

(03 Apr '13, 10:11) erdinc
showing 5 of 9 show 4 more comments

How much time a process takes is dependent on a lot of things. It looks like your initial tshark execution is taking more time because of disk IO. The second time you run tshark it will be faster since it will have cached things in memory. When you run tshark for the 3rd time after waiting a while, the cache will have flushed your data from cache and will need to re-read it from disk (or the network in case of NFS, iSCSI, etc).

You can check this by using the "time" utility, although I'm not sure what options you have with it on solaris. But on my OSX I get the following:

$ /usr/bin/time -l tshark -ta -r bug4716.cap 
  1 14:41:28.843078     10.0.0.1 -> 10.0.0.1     TCP 62 swrmi > EtherNet-IP-1 [SYN] Seq=0 Win=65535 [TCP CHECKSUM INCORRECT] Len=0 MSS=1460 SACK_PERM=1
  2 14:41:29.169626     10.0.0.1 -> 10.0.0.1     TCP 60 swrmi > EtherNet-IP-1 [ACK] Seq=1 Ack=1 Win=65535 [TCP CHECKSUM INCORRECT] Len=0
  3 14:41:29.712539     10.0.0.1 -> 10.0.0.1     TCP 93 swrmi > EtherNet-IP-1 [PSH, ACK] Seq=1 Ack=1 Win=65535 [TCP CHECKSUM INCORRECT] Len=39
  4 14:41:30.203991     10.0.0.1 -> 10.0.0.1     TCP 119 swrmi > EtherNet-IP-1 [PSH, ACK] Seq=40 Ack=1 Win=65535 [TCP CHECKSUM INCORRECT] Len=65
  5 14:41:32.235725     10.0.0.1 -> 10.0.0.1     TCP 60 swrmi > EtherNet-IP-1 [PSH, ACK] Seq=105 Ack=2 Win=65534 [TCP CHECKSUM INCORRECT] Len=1
        1.21 real         0.22 user         0.20 sys
  88346624  maximum resident set size
         0  average shared memory size
         0  average unshared data size
         0  average unshared stack size
     22420  page reclaims
         0  page faults
         0  swaps
        55  block input operations
        34  block output operations
         0  messages sent
         0  messages received
         0  signals received
       161  voluntary context switches
        21  involuntary context switches
$ /usr/bin/time -l tshark -ta -r bug4716.cap 
  1 14:41:28.843078     10.0.0.1 -> 10.0.0.1     TCP 62 swrmi > EtherNet-IP-1 [SYN] Seq=0 Win=65535 [TCP CHECKSUM INCORRECT] Len=0 MSS=1460 SACK_PERM=1
  2 14:41:29.169626     10.0.0.1 -> 10.0.0.1     TCP 60 swrmi > EtherNet-IP-1 [ACK] Seq=1 Ack=1 Win=65535 [TCP CHECKSUM INCORRECT] Len=0
  3 14:41:29.712539     10.0.0.1 -> 10.0.0.1     TCP 93 swrmi > EtherNet-IP-1 [PSH, ACK] Seq=1 Ack=1 Win=65535 [TCP CHECKSUM INCORRECT] Len=39
  4 14:41:30.203991     10.0.0.1 -> 10.0.0.1     TCP 119 swrmi > EtherNet-IP-1 [PSH, ACK] Seq=40 Ack=1 Win=65535 [TCP CHECKSUM INCORRECT] Len=65
  5 14:41:32.235725     10.0.0.1 -> 10.0.0.1     TCP 60 swrmi > EtherNet-IP-1 [PSH, ACK] Seq=105 Ack=2 Win=65534 [TCP CHECKSUM INCORRECT] Len=1
        0.56 real         0.22 user         0.19 sys
  88346624  maximum resident set size
         0  average shared memory size
         0  average unshared data size
         0  average unshared stack size
     22420  page reclaims
         0  page faults
         0  swaps
         0  block input operations
         2  block output operations
         0  messages sent
         0  messages received
         0  signals received
       106  voluntary context switches
       338  involuntary context switches
$

Please note the block input/output operations and also the "involuntary context switches" which are caused by other processes (and which could contribute to extra load time if the system is heavily loaded).

As you can see, the sys and user time are about the same for both executions, but the real time differs because of time spent waiting on the disk and/or other processes.

link

answered 13 Mar '13, 06:06

SYN-bit's gravatar image

SYN-bit ♦♦
15.0k848217
accept rate: 19%

I execute commands with time, my result is below. I can inference that slowness is not about tshark, because of slowness of computer. Is this assumption true? Another question (but not primarely important) Involuntary context switches can cause "real" time is big but waiting disc IO operations should be in "sys" time? real 2.6 user 0.1 sys 0.0

(13 Mar '13, 06:43) erdinc

user and sys time only refer to time spent on the CPU, not time waiting on disk.

Have a look at http://stackoverflow.com/questions/556405/what-do-real-user-and-sys-mean-in-the-output-of-time1

(13 Mar '13, 07:53) SYN-bit ♦♦

I checked with DTrace script that calculate system call duration (below link).Total sys call duration is less than 0.1 second but all process is completed in 5 seconds. If I am correct, this dtrace script accumulate all context swiches and I/O operation durations. There should be another reason why tshark is completed >4 seconds.

http://www.brendangregg.com/DTrace/procsystime

(13 Mar '13, 08:25) erdinc

Pcap file is really small (~500bytes).

disk I/O and context switches as problem source for a 500 byte file? No way .. ;-))

There must be another reason, apparently some caching (DNS, filesystem, etc.)

@erdinc: Do you start tshark from an NFS mounted filesystem (@SYN-bit mentioned it)?

(13 Mar '13, 16:39) Kurt Knochner ♦

There is no any NFS mounting point on my system. If it helps to identify problem, mysql server is running and heavily used disc. iostat tells that %b around %50.

(14 Mar '13, 05:09) erdinc

disk I/O and context switches as problem source for a 500 byte file? No way .. ;-))

Well, I was not worried about the 500 byte file :-), but was thinking about the tshark executable and access to all the supporting files...

That the disk is 50% busy seems OK, but how about the "wait" and "%w" columns?

(14 Mar '13, 05:23) SYN-bit ♦♦

wait("w") is always zero.

(14 Mar '13, 06:05) erdinc

can you please add the truss output? (see the comment in my answer)...

(20 Mar '13, 11:13) Kurt Knochner ♦
showing 5 of 8 show 3 more comments
Your answer
toggle preview

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

Markdown Basics

  • *italic* or _italic_
  • **bold** or __bold__
  • link:[text](http://url.com/ "Title")
  • image?![alt text](/path/img.jpg "Title")
  • numbered list: 1. Foo 2. Bar
  • to add a line break simply add two spaces to where you would like the new line to be.
  • basic HTML tags are also supported

Tags:

×362
×32

Asked: 13 Mar '13, 02:32

Seen: 1,245 times

Last updated: 03 Apr '13, 10:11

powered by OSQA