Announcement

Collapse
No announcement yet.

Another Sizable Performance Optimization To Benefit Network Code With Linux 5.17

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #51
    Originally posted by microcode View Post
    AMD64 chips are not the home market of computers, they are the most common server chips in the world, and they are the machines that move the most TCP in software, period.
    What I was saying is that in heterogeneous environments, Like mixed Little Endian and Big Endian environments you have no chance to avoid it..
    well I don't know about tcp/ip, but for me seems that the payload if not in network native oder, will not be understood by Big Endian machines..
    Originally posted by microcode View Post
    What?
    If you are in mixed Environments, Big Endian/Little Endian..
    I count 2 or 3 operations, just for that.. it will translate in latency, and we are talking about 64 bits. multiplying this by a lot in a packet, or in tons of packets will translate to latency weather we want it or not..
    Also compilers have evolved a lot in the last 10 years or a bit more..

    Comment


    • #52
      Originally posted by sinepgib View Post
      On another note, out of complete ignorance, doesn't IPv6 require a swap for addresses too? Those are bigger AFAIR.
      I believe everybody avoids almost at all costs dealing with ipv6, due to the processing power/latency required..

      Comment


      • #53
        Originally posted by microcode View Post
        That depends on how you stored them to begin with, and whether you are computing on them as numbers (pretty uncommon); my understanding is that when you have a socket open like this, those IP fields are templated in.

        Overall, the bigger problem with TCP would not be byte swaps, but the bit manip required to fill those odd-shaped fields.
        Templated in maybe, but in Big Endian, at least by the standards..its a stream of bytes, the order matter..
        All operations will count, all will add work.

        Comment


        • #54
          Originally posted by F.Ultra View Post
          Going by the tables at https://www.agner.org/optimize/instruction_tables.pdf it looks like MOVBE is equal in both ops and reciprocal throughput to MOV (at least on Zen) so it does look like modern CPU:s actually do this with zero cost (as long as the compiler optimizes to use MOVBE and not BSWAP+MOV).
          for what I read from that, in MOVBE only one argument can be a memory address, the other needs to be a register, it will be needed 2 operations, to swap and then store in memory, I think..but it will be 2 ops, plus a return from the builtin function..I don't know if MOVBE is standard in all processors..

          Comment


          • #55
            Originally posted by tuxd3v View Post
            for what I read from that, in MOVBE only one argument can be a memory address, the other needs to be a register, it will be needed 2 operations, to swap and then store in memory, I think..but it will be 2 ops, plus a return from the builtin function..I don't know if MOVBE is standard in all processors..
            Its part of the AMD64 mnemonics. And you have to read or store the value to memory at some point so either perform the swap at the load or at the store. Would be interesting to benchmark on some ARM systems that can run in both big and little and see if there are any real world difference.

            Comment


            • #56
              Originally posted by tuxd3v View Post
              for what I read from that, in MOVBE only one argument can be a memory address, the other needs to be a register, it will be needed 2 operations, to swap and then store in memory, I think..but it will be 2 ops, plus a return from the builtin function..I don't know if MOVBE is standard in all processors..
              Why would you want to move between registers swapping endianess? You generally want to either have native endian values in registers or not take it into account (if you're only doing bitwise operations you may not care about byte order, for example). It only makes sense for reading from memory to a register or storing to memory from a register.

              Comment


              • #57
                Originally posted by tuxd3v View Post

                Templated in maybe, but in Big Endian, at least by the standards..its a stream of bytes, the order matter..
                All operations will count, all will add work.
                No, like literally no operations differ between little endian and big endian in this case; they're both memcpy.

                Comment


                • #58
                  Originally posted by sinepgib View Post
                  Why would you want to move between registers swapping endianess?
                  Ideally with MOVBE we need to swap bytes when you bring from memory, or when you are storing on memory.
                  Or you dos it first, or you does it later..
                  See F.Ultra comment above

                  Comment


                  • #59
                    Originally posted by microcode View Post
                    No, like literally no operations differ between little endian and big endian in this case; they're both memcpy.
                    The template and numbers of places will not change, but the information need to be swapped to become Big Endian, that was my point..

                    Comment


                    • #60
                      Originally posted by tuxd3v View Post
                      The template and numbers of places will not change, but the information need to be swapped to become Big Endian, that was my point..
                      No it doesn't. An IP address is never stored in the wrong order to begin with.

                      Comment

                      Working...
                      X