Announcement

Collapse
No announcement yet.

Another Sizable Performance Optimization To Benefit Network Code With Linux 5.17

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #21
    Originally posted by tuxd3v View Post
    To programme Big Endian is amazing, "you are always on top of the cake", because of it natural representation in hexadecimal to humans, even in Binary..
    I really don't see the big deal here.

    Originally posted by tuxd3v View Post
    To be honest,in my opinion, it doesn't matter the Endianess in terms of efficiency, a algorithm created for BigEndian can be created for Little Endian with same efficiency, or vice-versa..
    It's the efficiency of the chip that is affected, plus the mixed endianess issues. There are certain things that require extra operations to work in big endian compared to little endian. They can be done in hardware or in software, but in both cases they mean wasted energy.

    Originally posted by tuxd3v View Post
    Of Course the algorithm will not be the same, it will be different if optimized for one of those.. but it will do the same task, in different representation.. there are no Buggy man in efficiency for endianess..
    I can't think of a case where the (high level, programmer visible) algorithm would change due to endianess handling. Just whether you'll be doing swaps or not during serialization and deserialization.

    Originally posted by tuxd3v View Post
    However tons and tons of code have being optimized for Little Endian over the years, and because of that, you are left with the impression that Little endian arch's are faster...its not the arch, its the way the code was created for..
    Nowadays userpace code, at least the majority of it, is optimized for Little Endian
    The kind of optimization you would do here is the kind that is forced by the endianess itself, and even that would most likely be implicit (done by the compiler or architecture itself).

    Originally posted by tuxd3v View Post
    and that translate to Lower Efficiency if that code runs on Big Endian, or even in errors( majority of times.. ), if correct attention is not taken to the problem..
    This has nothing to do with which endianess you used to write the software, but whether or not you handled mixing those.

    Originally posted by tuxd3v View Post
    but that doesn't turn Big Endian Less efficient per-se, however gives Little Endian an advantage, because of that code optimized for it, when you run in Little Endian you see a better result, due to the optimizations it was coded for..
    I don't follow.

    Originally posted by tuxd3v View Post
    Why?because its easier to program for it, its already the network byte order, it feels nice( at least to me..), others will say that Little Endian is the best format because they feel it is..
    Hmmm, it sounds like a bad idea. If LE weren't so popular but still used by consumers you would still need to write endian aware code for anything not strictly backend. Besides, that's the problem. I don't think we should be using BE as network endianess. But the harm is done and there's no going back.

    Originally posted by tuxd3v View Post
    even thought that Big Endian machines in the datacenter are huge with big processing power, and scalability( but they cost tons of money.. ), also Big Endian machines have tremendous performance when dealing with network..its not by chance..
    No, it's not by chance, it's precisely because swaps and what not don't come for free and for historical reasons we're stuck with BE for the network.

    Originally posted by tuxd3v View Post
    What costs me a bit is when I start to think, the amount of processing power we are wasting, every second, using the network, with Little Endian..but at same time you will have tons of processing in Big Endian to deal with software coded for Little Endian, and also with Little Endian standards, pcie, etc..its a mess situation..
    Everybody talks in going green... how can we think in going green when we burn so much processing power..?!
    I'd ask the same about using BE for the network. There's two ways to solve the issue after all.

    Comment


    • #22
      Originally posted by sinepgib View Post
      I really don't see the big deal here.
      Not everyone has programmed big endian in assembler, that is the case were you will mostly notice it, but there are also situations were you also loose
      Originally posted by sinepgib View Post
      It's the efficiency of the chip that is affected, plus the mixed endianess issues. There are certain things that require extra operations to work in big endian compared to little endian. They can be done in hardware or in software, but in both cases they mean wasted energy.
      Yes that is my point, its the Chip architecture,fabrication node, etc, and has nothing to do with endianess..

      There are also certain things that would require more operations to be done in Litle Endian too, but the code was done to follow the logical path in which the programmer was coding..for example test if a number is positive or negative..
      Originally posted by sinepgib View Post
      I can't think of a case where the (high level, programmer visible) algorithm would change due to endianess handling. Just whether you'll be doing swaps or not during serialization and deserialization.
      Algorithms are not a fixed thing, there are always or almost always, situations were the algorithm can , and would change to take advantage of one or other situation, independently of the arch, but of-course the programmer is the one that define that, sometimes he doesn't even realize that he is coding for little endian, because it feels so natural, probably for someone that was instructed to think that way.He can take advantage of comparisons for positive or negative, and he maybe can find a way to adapt his algorithm to take advantage of that..its only one example of big endian optimization..

      serialization/deserialization is the "typical elephant in the room" case were you need to explicitly swap bytes....
      Originally posted by sinepgib View Post
      The kind of optimization you would do here is the kind that is forced by the endianess itself, and even that would most likely be implicit (done by the compiler or architecture itself).
      There are situations that favour Big Endian or Litle Endian, if you are in little endian you will programme to take advantage of what favours little endian
      Its the way the algorithm was implemented give advantage to big or little..majority of code was thought for Little Endian in Gnu/Linux..yes the endianess maybe can give marginal gains, but marginal over marginal, after thousands and thousands of iterations or lines of code, makes difference..
      Originally posted by sinepgib View Post
      This has nothing to do with which endianess you used to write the software, but whether or not you handled mixing those.
      that can also happen yes..
      Originally posted by sinepgib View Post
      I don't follow.
      An algorithm is not a static thing, it can be moulded to be adapted to take advantage of the Endianess you are programming for..
      sometimes is easier to follow one route because you know in advance that favours your little endian, when compiler generates assembler code.., or even in assembler doing some operations instead of others..
      Originally posted by sinepgib View Post
      Hmmm, it sounds like a bad idea. If LE weren't so popular but still used by consumers you would still need to write endian aware code for anything not strictly backend. Besides, that's the problem. I don't think we should be using BE as network endianess. But the harm is done and there's no going back.
      yes you would but in that case, code could take advantage of Big Endian, because majority of people would work there, and would had moulded the code to be faster in Big Endian...in that case would be the little endian cpus that would have the burden in Gnu/Linux..

      Well, now that we know the history that CPUs/Operating systems took, its easy to say that Big Endian was a wrong move in network, but at the time..
      Originally posted by sinepgib View Post
      I'd ask the same about using BE for the network. There's two ways to solve the issue after all.
      Indeed,
      If today consumer hardware were mixed, some would suffer from one thing, others would suffer from other things
      If you were using BE for personal computers today, and if pcie, etc were invented to be little endian would also be a mess either way..
      But I suspect that in that BE scenario, pcie would have being invented BE( to take advantage of it endianess.. ).

      In any case the Ideal solution would have being, from the beginning the same, and don't change it..in my opinion.
      Last edited by tuxd3v; 24 November 2021, 01:33 AM.

      Comment


      • #23
        Originally posted by ASBai View Post
        If I remember correctly, basically all 100Gb+ network cards have tcp check sum offloading, right? So when do we need to run the csum_partial() function on the CPU?
        Yes, and many 1Gbps cards do as well so this code should only run on a small set of hw.

        Run ethtoolk -k <nic> to see which different types of offloads that you NIC supports and that are enabled by the kernel.

        Code:
        [email protected]:~# ethtool -k eno1
        Features for eno1:
        rx-checksumming: on
        tx-checksumming: on
        tx-checksum-ipv4: on
        tx-checksum-ip-generic: off [fixed]
        tx-checksum-ipv6: on
        tx-checksum-fcoe-crc: off [fixed]
        tx-checksum-sctp: on
        scatter-gather: on
        tx-scatter-gather: on
        tx-scatter-gather-fraglist: off [fixed]
        tcp-segmentation-offload: on
        tx-tcp-segmentation: on
        tx-tcp-ecn-segmentation: on
        tx-tcp-mangleid-segmentation: off
        tx-tcp6-segmentation: on
        generic-segmentation-offload: on
        generic-receive-offload: on
        large-receive-offload: off [fixed]
        rx-vlan-offload: on
        tx-vlan-offload: on
        ntuple-filters: on
        receive-hashing: on
        highdma: on
        rx-vlan-filter: on [fixed]
        vlan-challenged: off [fixed]
        tx-lockless: off [fixed]
        netns-local: off [fixed]
        tx-gso-robust: off [fixed]
        tx-fcoe-segmentation: off [fixed]
        tx-gre-segmentation: on
        tx-gre-csum-segmentation: on
        tx-ipxip4-segmentation: on
        tx-ipxip6-segmentation: on
        tx-udp_tnl-segmentation: on
        tx-udp_tnl-csum-segmentation: on
        tx-gso-partial: on
        tx-sctp-segmentation: off [fixed]
        tx-esp-segmentation: off [fixed]
        tx-udp-segmentation: off [fixed]
        fcoe-mtu: off [fixed]
        tx-nocache-copy: off
        loopback: off [fixed]
        rx-fcs: off [fixed]
        rx-all: off [fixed]
        tx-vlan-stag-hw-insert: off [fixed]
        rx-vlan-stag-hw-parse: off [fixed]
        rx-vlan-stag-filter: off [fixed]
        l2-fwd-offload: off
        hw-tc-offload: on
        esp-hw-offload: off [fixed]
        esp-tx-csum-hw-offload: off [fixed]
        rx-udp_tunnel-port-offload: on
        tls-hw-tx-offload: off [fixed]
        tls-hw-rx-offload: off [fixed]
        rx-gro-hw: off [fixed]
        tls-hw-record: off [fixed]
        Last edited by F.Ultra; 24 November 2021, 09:06 AM.

        Comment


        • #24
          Originally posted by tuxd3v View Post
          yeah and also in a lot of cases, if you send files via network, you need to convert them to the network byte order, depending on the format.
          One of those files that are stored in Big Endian is the JPEG format, even in little endian machines , you need to convert always, because its the format of the standard( even to just open a image,a conversion is needed, to store a image also, etc.. ), but a lot can be said about files over the network..then there are the formats that are stored in Little Endian also.
          Actually you don't, endianess of TCP/IP only concerns their header fields and not data payloads, so unless the higher-level protocol requires you to (HTTP transporting a JPEG image wouldn't, f.ex.) you can send it as it is, with no swapping involved.
          Processing them is of course a different story, but I'd still say the swap cost is negligible when compared to decompression, and surely irrelevant for network throughput of the servers.

          Comment


          • #25
            Originally posted by microcode View Post

            Yeah, I even disagree with the classic analogy of "endianness" (comes from a children's tale about meaningless prejudice, the little endians being those who eat a boiled egg starting from the little end, big endian from the big end), because it implies that it isn't important. Little endian has real benefits, which is why it is the choice for the majority of processors.

            Heck, even in the moral tale where the "endianness" metaphor comes from, there is a clear advantage to eating a boiled egg from the little end; the big end fits better in a softboiled egg cup.
            Actually they both have their benefits. On old big endian CPUs like the m68000 where the data bus was only 16 bit wide and loading a 32 bit word from memory took twice as long than a 16bit word, you could optimise a binary search tree by first loading only the first two bytes of each node which were the high bytes, and only load the rest if those high bytes were equal to those of your key.

            Comment


            • #26
              Originally posted by tuxd3v View Post
              At the time was the correct choice.. since all computers that were something real, were Big Endian, and we are basically talking about servers and supercomputers..
              Because at the time, doesn't existed personal computers..
              The problem was that intel, and Microsoft, the duopoly grow so much that they became the standard of personal computers, and now also the standard in the datacenter..also IBM/Motorola,Sun,mips,HP, DEC( alpha ) and others left the CPU market because they couldn't compete with the low prices that intel/Microsoft were practising..also they lacked Microsoft cheap and buggy software..

              IBM OS/2 was very stable and very advanced, but it costed both legs , and also both arms..also their hardware was top, but too much costly..
              So Intel/Microsoft basically just continued its track without any challenge.

              Now if network was designed today, of course it would be little endian, but that boat sailed long ago.. right now, I think only mips,armv7,sparc,powerpc are Bi Endian, maybe there are more I don't know..

              But the software was optimized for little endian , and so a lot of cpus started to work only with little endian,
              In fairness I don't see big advantage of being big or little, however the little endian concept come to stay, I believe pcie is little endian also( or was optimized for little endian.. ), something very used today in diverse accelerators, and uses a lot of bandwidth..

              So today you have a problem...
              • you want very fast network go big endian
              • you want pcie accelerators( optimized for speed) go little endian
              Now, and if I want both things??
              we have a problem here, because nowadays there are already in a lot of places 400Gbps fiber channel, and FCoE, but in todays servers you also want tons of pcie channels
              In the future cpus for network will need to be big endian, or if little endian, have offload accelerators( via pcie for example), but in that case you couldn't use others accelerators via pcie because pcie channels are limited..

              During a lot of time it worked well with little endian in desktop( desktop here will never be the real problem as requirements for desktop are low..at least not the real problem in the next 30 years or so.. ), now also on server, but with the increase of massive bandwidth for network, the problem starts to appear, and its only the tip of the iceberg..
              I was looking a previous article from Michael, and we can't even saturate 100Gbps with amd64, how will we saturate 400Gbps??
              and when network will be 800Gbps how will we saturate them?? big problem..



              OS/2 was a PC operating system, not a server OS. It was also far from being "very stable" and had a number of other issues as well. By the way it was sold by IBM but designed and implemented by Microsoft and ran exclusively on Intel (and AMD) CPUs.

              NT was born as a total rewrite of OS/2 to fix its problems.

              Comment


              • #27
                Originally posted by asgavar View Post
                Actually you don't, endianess of TCP/IP only concerns their header fields and not data payloads, so unless the higher-level protocol requires you to (HTTP transporting a JPEG image wouldn't, f.ex.) you can send it as it is, with no swapping involved.
                Processing them is of course a different story, but I'd still say the swap cost is negligible when compared to decompression, and surely irrelevant for network throughput of the servers.
                You should send a JPEG as it is, because it is already by design in network byte order..its Big Endian.
                When you are opening a JPEG is Little endian you need to convert the image, also when you are storing one, to Big Endian again..
                But there are also Little Endian formats like GIF and others..

                But if you are sending via network a binary file native to your system, if you are in Little Endian you need to pass to network byte order, but this process you doesn't even know its happening, its done in background.. its the Operating system that does it..

                Comment


                • #28
                  Originally posted by jacob View Post
                  OS/2 was a PC operating system, not a server OS. It was also far from being "very stable" and had a number of other issues as well. By the way it was sold by IBM but designed and implemented by Microsoft and ran exclusively on Intel (and AMD) CPUs.

                  NT was born as a total rewrite of OS/2 to fix its problems.
                  Yeah is was a Personal computer OS.
                  Yes Microsoft worked on it, and then based windows on IBM proprietary OS

                  But the key advantage was price of computer IBM pc compatible clones with windows were selling a lot more because it started to appear people that could afford them.
                  My brother was one in the 90s that brought a 286 or a 386, don't remember now by around ~3K€, IBM ones were costing a lot but a lot more, at the time..

                  And like my brother the world have done the same, Intel/Microsoft got a big mass scale business, the prices dropped so much that it become impossible to others to compete with them.
                  But the most important button that existed in windows, was not about working, it was the "save document" button, because you never knew when the system will crash, and so you were all the time pressing, "Save document", otherwise you would loose your unsaved work..

                  Comment


                  • #29
                    Originally posted by F.Ultra View Post
                    Yes, and many 1Gbps cards do as well so this code should only run on a small set of hw.

                    Run ethtoolk -k <nic> to see which different types of offloads that you NIC supports and that are enabled by the kernel.
                    some do checksum offload, my "Qualcomm Atheros AR816x/AR817x Ethernet", using the alx driver doesn't.
                    I can't even set tx/rx buffers..

                    Comment


                    • #30
                      Originally posted by tuxd3v View Post
                      Yeah is was a Personal computer OS.
                      Yes Microsoft worked on it, and then based windows on IBM proprietary OS

                      But the key advantage was price of computer IBM pc compatible clones with windows were selling a lot more because it started to appear people that could afford them.
                      My brother was one in the 90s that brought a 286 or a 386, don't remember now by around ~3K€, IBM ones were costing a lot but a lot more, at the time..

                      And like my brother the world have done the same, Intel/Microsoft got a big mass scale business, the prices dropped so much that it become impossible to others to compete with them.
                      But the most important button that existed in windows, was not about working, it was the "save document" button, because you never knew when the system will crash, and so you were all the time pressing, "Save document", otherwise you would loose your unsaved work..
                      Yeah that's why there was also that student in Finland who had a 386 and wasn't happy, because he wanted to take full advantage of the CPU to run a true 32 bit, multitasking OS with protected memory, and OS/2 or Windows was not it

                      Comment

                      Working...
                      X