Announcement

Collapse
No announcement yet.

IO_uring Network Zero-Copy Transmit Continues Looking Great For Linux

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #11
    Originally posted by onlyLinuxLuvUBack View Post

    what not use firehose ?
    what's that?
    Do you have a link? Googling firehose kinda gives firehoses :P

    Comment


    • #12
      Originally posted by pomac View Post
      This is gigabit speeds - now apply this to 100 and 400 gbit cards - I'd assume that the target here is RDMA like performance
      Yes, the hyperscalers have 400GbE in production, and have 800GbE in various development stages. Just like some of the Optane based IO_uring performance numbers, most consumers will only see any improvements in the fractions of percentages since you are not running at hyperscale.

      Comment


      • #13
        Originally posted by CommunityMember View Post

        Yes, the hyperscalers have 400GbE in production, and have 800GbE in various development stages. Just like some of the Optane based IO_uring performance numbers, most consumers will only see any improvements in the fractions of percentages since you are not running at hyperscale.
        Since most users use services from these cloud hosting companies, yes this does make a difference to just about everyone. We just don't notice when pennies are saved here and there because less servers were bought due to these optimizations.

        Comment


        • #14
          Originally posted by markg85 View Post

          what's that?
          Do you have a link? Googling firehose kinda gives firehoses :P
          it has disappeared from the face of the google

          I remember using it a long time ago

          Comment


          • #15
            Originally posted by markg85 View Post
            While i do quite like io_uring and the massive improvements it brings, these numbers do need a bit more context before i fully get the benefit.

            Assume 2 pc's. For context, an iperf test between the 2 pc's is going to get ~980mbit/sec. About the theoretical maximum.
            Now assume i can send files over those two pc's too (i made an application to use sendfile to get that done). This transfer is going at near the theoretical max speed too. Say ~950mbitsec.

            With those in mind, what's the advantage of using io_uring here as a replacement? I mean in this very specific case! I'm very unlikely to much higher throughput and definitely won't hit a 2.2x improvement as that's already impossible.

            Perhaps i'm answering myself here by saying this, but should i see this io_uring path in this specific case to be just more CPU friendly to execute? As in the throughput is likely the same but the cpu usage during that throughput is reduced?

            I'm keen on learning more on this!
            The problem is not about bandwith, it's more about pps (packet per second). Try to do an iperf with small packets (64bytes for example), you'll be able to reach 2-3 millions pps by core, not more.
            That's why they are kernel bypass techno like dpdk, xdp.


            (This is the same with nvme disk, you can reach easily full bandwith with big block, but with small block, cpu can limit it)

            Comment


            • #16
              Originally posted by markg85 View Post
              While i do quite like io_uring and the massive improvements it brings, these numbers do need a bit more context before i fully get the benefit.

              Assume 2 pc's. For context, an iperf test between the 2 pc's is going to get ~980mbit/sec. About the theoretical maximum.
              Now assume i can send files over those two pc's too (i made an application to use sendfile to get that done). This transfer is going at near the theoretical max speed too. Say ~950mbitsec.

              With those in mind, what's the advantage of using io_uring here as a replacement? I mean in this very specific case! I'm very unlikely to much higher throughput and definitely won't hit a 2.2x improvement as that's already impossible.

              Perhaps i'm answering myself here by saying this, but should i see this io_uring path in this specific case to be just more CPU friendly to execute? As in the throughput is likely the same but the cpu usage during that throughput is reduced?

              I'm keen on learning more on this!
              It translates to less latency and less cpu usage. However with 10g and faster networks it is more noticable, especially with small packets.

              Comment


              • #17
                which kernel deals with this innovation?

                Comment


                • #18
                  Originally posted by spirit View Post

                  The problem is not about bandwith, it's more about pps (packet per second). Try to do an iperf with small packets (64bytes for example), you'll be able to reach 2-3 millions pps by core, not more.
                  That's why they are kernel bypass techno like dpdk, xdp.


                  (This is the same with nvme disk, you can reach easily full bandwith with big block, but with small block, cpu can limit it)
                  io_uring is not going to help you much there either. Every piece of HW, be it a NIC or a Switch will have a hard pps limit that is quite low for consumer grade hardware. Even a high end switch can have a chassis wide max of 119Mpps.

                  And a M.2 nvme have very low pps at the HW level, a Samsung 970 EVO have a max of 500Kpps on read and 480Kpps on write. Your CPU is way faster than that.
                  Last edited by F.Ultra; 22 December 2021, 06:19 AM.

                  Comment


                  • #19
                    Originally posted by markg85 View Post
                    While i do quite like io_uring and the massive improvements it brings, these numbers do need a bit more context before i fully get the benefit.

                    Assume 2 pc's. For context, an iperf test between the 2 pc's is going to get ~980mbit/sec. About the theoretical maximum.
                    Now assume i can send files over those two pc's too (i made an application to use sendfile to get that done). This transfer is going at near the theoretical max speed too. Say ~950mbitsec.

                    With those in mind, what's the advantage of using io_uring here as a replacement? I mean in this very specific case! I'm very unlikely to much higher throughput and definitely won't hit a 2.2x improvement as that's already impossible.

                    Perhaps i'm answering myself here by saying this, but should i see this io_uring path in this specific case to be just more CPU friendly to execute? As in the throughput is likely the same but the cpu usage during that throughput is reduced?

                    I'm keen on learning more on this!
                    This is not about speed but about processing speed, when you have this scenarios is where the benefit should be interesting:

                    1.) Hyper scale networks of 100Gb + because it would eat a lot less CPU/RAM resources by not having to copy the packets around in buffers for further processing.
                    2.) Small server attending a big number of clients with variable packet size (lets say something like samba attending 500 clients) for the same reason as 1
                    3.) Any fast connection with very small packet sizes because is a lot of copying around you bypass

                    Comment


                    • #20
                      Thanx folks for the informative replies!

                      I kinda missed the throughput at small package size + the hyperscale networks.
                      In other terms, for local pc-to-pc large file transfer the io_uring route is unlikely to have any benefit. If there's consumer benefit at all it would be with small package sizes but a lot of them.

                      Comment

                      Working...
                      X