Ask Your Question
0

What causes retransmissions?

asked 2020-03-09 00:28:47 +0000

Don't have enough points to post a picture so here's what's happening.

I have two servers, both running Linux Mint 19.3. Tried Mellanox 10Gbps cards (Mellanox DAC) and Intel 10Gbps NICs (Intel branded DAC), no switch..... 5 meter DAC attaching both servers directly. Both servers also have a 1Gbps NIC that was active the entire time. I edited the 'hosts' file on each server and entered the host name and 10Gbps IP address for the other box.

When I copy a 15, 30, 50 gig file between the two servers, I'll get about 450-500MB/s one way but copying the same file back in the other direction, speeds will start off around 350-400MB/s but quickly fall back to 150+MB/s. I've tested the IO subsytem on both servers and the SSDs inside them can read/write at about 550MB/s.

I used Wireshark on one of the boxes and saw this:

Reassembly error, protocol TCP: New fragment overlaps old data (retransmission?)

I see that error repeated non-stop during the time a file copy is going on. I'm not a Linux (or networking expert) but I'm thinking this might be a case for setting up a proper route on the Linux boxes so ALLLLLLLLLLLL traffic between these boxes must absolutely stay on the 10Gbps NICs. Since I'm no Linux expert, I'm stuck here.

iPerf shows 9.6Gbps back and forth.

If I can't figure out the Linux route stuff, should I just grab a switch that has 10Gbps ports and have these servers talking through that (and pulling their 1Gbps CAT5 cables)?

edit retag flag offensive close merge delete

1 Answer

Sort by ยป oldest newest most voted
0

answered 2020-03-09 03:34:50 +0000

updated 2020-03-09 03:35:42 +0000

This is probably due to packet loss in your "slow" direction - but no loss in your "fast" direction.

The TCP "Congestion Avoidance" algorithm slows down the transmit rate when it detects packet loss (and assumes "congestion" somewhere in the path).

We'd need to see a packet capture file to prove this. However, you can see this for yourself by looking at Wireshark's graph: Statistics - TCP Stream Graphs - Window Scaling.

Have a look at my answer to question number 15002.

https://ask.wireshark.org/question/15...

edit flag offensive delete link more

Comments

I guess my next obvious question is (and I know this is a silly even asking but :) ...... would you care to wager a guess as to what could cause this congestion? My SSDs can read/write way faster then the speeds I'm seeing so I don't think that's the problem. Tonight, I was sitting in front of server A and was accessing the NFS share on server B. The file copied over at 450MB/s and was slowly, SLOWLY dropping in speed. Copying the same file back to server B.... started at around 200MB/s and very, VERY quickly dropped to 60MB/s. This weekend, I'll grab a packet capture from each PC and post if I can.

I have a 2.5M DAC on order that should arrive tomorrow. Maybe I'm at the edge of what a passive DAC is capable of? If that ...(more)

Road Hazard gravatar imageRoad Hazard ( 2020-03-10 00:37:37 +0000 )edit

That's a good question, given that you are directly connected.

A capture would prove the TCP "congestion" hypothesis and/or any other TCP stack issues.

As you suggest, it is good odds that you may be looking at hardware issues. So swapping things around might be the answer.

I'm not a big fan of guessing, so a capture file(s) will provide more solid evidence.

Philst gravatar imagePhilst ( 2020-03-11 04:02:55 +0000 )edit

A full capture can give you good head start in pinpointning issues. But the exact reason why a system is slower might not berevealed in a packet capture.

hugo.vanderkooij gravatar imagehugo.vanderkooij ( 2020-03-11 14:14:42 +0000 )edit

True, but a capture will help you eliminate "network" causes. It can also tell you that performance issues are inside a server and/or client, which transaction types are slow and which aren't, etc.

In the case above, I was suggesting that a capture would prove/disprove the TCP "Congestion Avoidance" hypothesis.

Philst gravatar imagePhilst ( 2020-03-17 01:39:16 +0000 )edit

Your Answer

Please start posting anonymously - your entry will be published after you log in or create a new account.

Add Answer

Question Tools

1 follower

Stats

Asked: 2020-03-09 00:28:47 +0000

Seen: 174 times

Last updated: Mar 09