Ask Your Question

Revision history [back]

Oracle 1.22s timeout after packet retransmit on AIX server

I have an issue of slowness on an Oracle base after observing packet losses, but don't really understand how to pinpoint more between Oracle and the AIX server TCP stack. (No pcap unfortunately, only this screenshot)

C:\fakepath\Snap4.png

TCP baseline is:

  • Client sends request of 145 bytes
  • Server answers with 182 bytes in less than 1ms

Sometimes, packets are dropped on the way back from server to client, in that case:

  • Client sends request (#142)
  • Server sends answer (#150) in 0.14ms
  • After 200ms, client sends another time the request (#2357)
  • Server acks immediately (#2358)
  • After 1.22s, servers sends the answer (#11903), which is a retry of #150

1st action: identify packet drops on the network and lower them

1st conclusion of the 1.22 seconds: Oracle is responsible of this delay to reproduce a 2nd time the answer, and the TCP stack acks the 2nd requests so quickly, proving that it's not an AIX TCP stack problem, but an applicative one, and also presence of PSH flag in request.

But: Oracle logs show that Oracle is waiting for client for 1.5s and does only once the process of request in DB Retransmission of packet #11903 has the same sequence number that #150, so this means the packet #11903 is coming directly from the stack buffer, not a new answer of application.

So:

  • Why would the stack immediately ack the request without data if it retransmit directly the packet after?
  • Why would the stack wait for 1.22s before resending a packet it has in buffer?
  • And more important: where should I troubleshoot more? Tracing the process between AIX and Oracle?

Thanks for any feedback, Thomas

Oracle 1.22s timeout after packet retransmit on AIX server

I have an issue of slowness on an Oracle base after observing packet losses, but don't really understand how to pinpoint more between Oracle and the AIX server TCP stack. (No stack.

No pcap unfortunately, only this screenshot)screenshot. Trace is taken with a tcpdump on the server.

C:\fakepath\Snap4.png

TCP baseline is:

  • Client sends request of 145 bytes
  • Server answers with 182 bytes in less than 1ms

Sometimes, packets are dropped on the way back from server to client, in that case:

  • Client sends request (#142)
  • Server sends answer (#150) in 0.14ms
  • After 200ms, client sends another time the request (#2357)
  • Server acks immediately (#2358)
  • After 1.22s, servers sends the answer (#11903), which is a retry of #150

1st action: identify packet drops on the network and lower them

1st conclusion of the 1.22 seconds: Oracle is responsible of this delay to reproduce a 2nd time the answer, and the TCP stack acks the 2nd requests so quickly, proving that it's not an AIX TCP stack problem, but an applicative one, and also presence of PSH flag in request.

But: Oracle logs show that Oracle is waiting for client for 1.5s and does only once the process of request in DB Retransmission of packet #11903 has the same sequence number that #150, so this means the packet #11903 is coming directly from the stack buffer, not a new answer of application.

So:

  • Why would the stack immediately ack the request without data if it retransmit directly the packet after?
  • Why would the stack wait for 1.22s before resending a packet it has in buffer?
  • And more important: where should I troubleshoot more? Tracing the process between AIX and Oracle?

Thanks for any feedback, Thomas