Google Fixes A Longstanding, Important TCP Bug In The Linux Kernel
Google engineers managed to recently uncover a high profile TCP bug in the Linux kernel that has huge implications on network performance and efficiency.
Google landed this fix earlier this month into the Linux Git code. The issue and fix are well described via this blog post.
Google landed this fix earlier this month into the Linux Git code. The issue and fix are well described via this blog post.
The problem can be roughly summarized as the controller mistakenly characterizing the lack of congestion reports over a quiescent period as positive evidence that the network is not congested and therefore it should send at a faster rate when sending resumes. When put like this, its obvious that an endpoint that is not moving any traffic cannot use the lack of errors as information in its feedback loop.It's quite easy to hit this problem, so great to see the issue has been fixed after so many years.
The end result is that applications that oscillate between transmitting lots of data and then laying quiescent for a bit before returning to high rates of sending will transmit way too fast when returning to the sending state. This consequence of this is self induced packet loss along with retransmissions, wasted bandwidth, out of order packet delivery, and application level stalls.
21 Comments