Announcement

Collapse
No announcement yet.

Another Sizable Performance Optimization To Benefit Network Code With Linux 5.17

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #31
    Originally posted by jacob View Post
    Yeah that's why there was also that student in Finland who had a 386 and wasn't happy, because he wanted to take full advantage of the CPU to run a true 32 bit, multitasking OS with protected memory, and OS/2 or Windows was not it
    yes, looking now back the changes that this world took, and the big achievement of that Student in Finland at the time..how it changed the world..

    Comment


    • #32
      Originally posted by tuxd3v View Post
      yes, looking now back the changes that this world took, and the big achievement of that Student in Finland at the time..how it changed the world..
      He had to revise his stance though, because initially he was saying that whatever he was doing would never run on anything other than a 386 with an ISA bus and an IDE HD, because that's all he had

      Comment


      • #33
        Originally posted by tuxd3v View Post
        But if you are sending via network a binary file native to your system, if you are in Little Endian you need to pass to network byte order, but this process you doesn't even know its happening, its done in background.. its the Operating system that does it..
        No, like asgavar (and yours truly in an earlier comment in this thread) pointed out, TCP/IP endianness only applies to the packet headers. TCP/IP doesn't give a shit about the data payload, it's just an opaque bag of bits.

        Higher level protocols designed to work in a mixed-endian environment need to decide how to do it, but that's independent of the choice TCP/IP has made. HTTP <= 1.1, for instance, is text based so doesn't have any endianness issues. And if you transfer a binary file (say, a JPEG) over HTTP, just like TCP/IP doesn't care about the packet payload, HTTP doesn't care about the content of the binary file, it can just be blasted as-is over the wire. Now, said binary file format might specify the endianness, but that's independent of HTTP and independent of TCP/IP.

        Comment


        • #34
          Originally posted by jacob View Post

          Actually they both have their benefits. On old big endian CPUs like the m68000 where the data bus was only 16 bit wide and loading a 32 bit word from memory took twice as long than a 16bit word, you could optimise a binary search tree by first loading only the first two bytes of each node which were the high bytes, and only load the rest if those high bytes were equal to those of your key.
          In this particular case, you could easily just do a halfword swap when storing, as part of your datastructure design. The cost of the swap is nothing or approximately nothing, even on m68k.

          Comment


          • #35
            Originally posted by tuxd3v View Post
            some do checksum offload, my "Qualcomm Atheros AR816x/AR817x Ethernet", using the alx driver doesn't.
            I can't even set tx/rx buffers..
            Looks like the NIC supports it so driver issue I guess.

            Comment


            • #36
              Originally posted by jabl View Post
              No, like asgavar (and yours truly in an earlier comment in this thread) pointed out, TCP/IP endianness only applies to the packet headers. TCP/IP doesn't give a shit about the data payload, it's just an opaque bag of bits.

              Higher level protocols designed to work in a mixed-endian environment need to decide how to do it, but that's independent of the choice TCP/IP has made. HTTP <= 1.1, for instance, is text based so doesn't have any endianness issues. And if you transfer a binary file (say, a JPEG) over HTTP, just like TCP/IP doesn't care about the packet payload, HTTP doesn't care about the content of the binary file, it can just be blasted as-is over the wire. Now, said binary file format might specify the endianness, but that's independent of HTTP and independent of TCP/IP.
              sooner or later it will have to be done., in a binary file.. host to network, and network to host..
              see this:
              https://www.digital-detective.net/un...an-byte-order/
              https://dflund.se/~pi/endian.html
              https://www.geeksforgeeks.org/little...ndian-mystery/

              JPEG, like I said is already in network byte order( Big Endian ), so no change for it..a program in Little Endian opening a JPEG, it already knows that it has to convert the file for reading, the same for writing..

              ASCII Text files are not changed, at least they shouldn't since the char arrays are index same way in Little Endian or Big Endian, the content of the char arrays is already Big Endian.. its our natural form of writing and reading for humans( non Arabic Languages.. ).

              UTF16 format, is sometimes different, sometimes it provide a header in the file, first 2 bytes( BOM ) byte order mask,..

              Comment


              • #37
                Originally posted by F.Ultra View Post
                Looks like the NIC supports it so driver issue I guess.
                yeah, probably supports in hardware.
                The driver Qualcom released for linux supported some offload features, but when maintainers looked at it, they saw that the driver was not built accordingly with Linux standards, and so they rewrote the driver, but this time with very simple functionality, taking out the "most precious" things..so right now its a POS driver..
                I found this about it:
                https://lwn.net/Articles/555179/
                Last edited by tuxd3v; 25 November 2021, 04:33 PM.

                Comment


                • #38
                  Originally posted by tuxd3v View Post
                  yeah, probably supports in hardware.
                  The driver Qualcom released for linux supported some offload features, but when maintainers looked at it, they saw that the driver was not built accordingly with Linux standards, and so they rewrote the driver, but this time with very simple functionality, taking out the "most precious" things..so right now its a POS driver..
                  I found this about it:
                  https://lwn.net/Articles/555179/
                  Good ol' TODO syndrome. Get anything in a barely usable state and as soon as the original maintainer can't put the time anymore (for whatever reason, all are valid cause it's their time) nobody will step up to finish the work.

                  Comment


                  • #39
                    Originally posted by sinepgib View Post
                    Indeed. You have a very limited number of instructions you can run on the CPU to achieve those throughputs. Swapping is an extra instruction, and not only that, it's an extra instruction that necessarily introduces a data dependency, which in turn means it takes extra space on your reorder buffer, slowing down your pipeline. The only operation you can get for free is the one you don't do.
                    The swapping operation can actually be eliminated at decode time, and something tells me this low hanging fruit is already there in AMD64 chips; you should see some of the stuff AMD64 decoders eliminate, even just with renaming. Then again, maybe they don't do this at decode time because the real cost is so low even when it's a separate operation in the ROB. I think in order to settle this for real though, we'd need to either construct a very sophisticated benchmark, or ask AMD and Intel whether they do it.
                    Last edited by microcode; 26 November 2021, 01:04 AM.

                    Comment


                    • #40
                      Originally posted by tuxd3v View Post
                      No its definitely NOT.. and you can see that in a 100Gbps adapter we are at ~65Gbps mtu 1500 or jumbo frames more than 90Gbps, I believe, using 2 cores( It is in the previous article Michael wrote, about network ), ..but you now just imagine if you want to saturate 400Gbps..
                      What makes you say "it's definitely NOT" free? Because as far as I can tell, CPUs like those that implement AMD64, could easily eliminate endianness swaps in a store at decode time; and if you have seen some of the things that are eliminated at decode time in these chips, it's hard to imagine they didn't bother with this... unless it's already so cheap that it's not worth doing, which may also be true. I think people here genuinely don't understand how cheap basic integer operations are, let alone simple MOVs.

                      Comment

                      Working...
                      X