Announcement

Collapse
No announcement yet.

Glibc 2.29 Released With getcpu() On Linux, New Optimizations

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • #41
    Originally posted by F.Ultra View Post
    The SIMD examples are somewhat different since they are special utility registers that you cannot touch with the normal ALU and only with the SIMD instruction set and you could say that in that instance the CPU is actually 256-bit (for AVX2),
    Um, no.

    Originally posted by F.Ultra View Post
    and more to the point, if you already have designed your CPU to be able to manipulate a 256-bit value then why not also expand the ALUs to the same?
    Because nobody needs 256-bit addressing and only a few more need to compute with 256-bit scalars.

    And, just maybe because register and ALU width ain't free. Worse, if you look at how ALU operations are implemented, you're increasing the critical path length by at least log2(n) for n-bit wordlength. So, a wider chip will not only be hotter and bigger (and thus more expensive), but also slower.

    Compare that with vector arithmetic, and element-wise operations on a k-element vector only occupy k times as much as the same logic and datapath for operating on a single one of those elements.

    Originally posted by F.Ultra View Post
    The only reason imho that Intel not did that when they introduced the SIMD instructions was due to politics (aka they wanted people to move to Itanium).
    Seriously, WTF? Quit yer trollin'.

    Comment


    • #42
      Originally posted by cbxbiker61 View Post
      The bottom line is it wouldn't really matter "what" caused the improvement, just that there is an improvement.
      Yes, no argument there.

      Originally posted by cbxbiker61 View Post
      But again it's interesting that the only one putting up real-world results is getting attacked.
      Attacked? All I did was point out that the picture is more complex than you seemed to realize.

      Go back and read my first reply - I just asked a question - one that, if you'd really tried to answer it, should've lead you to my point.

      Is your ego really so fragile that you're threatened by such a question? Would you prefer I said nothing, only to miss a chance to learn something? I'd happily leave you alone, if that's what you want.

      Originally posted by cbxbiker61 View Post
      ******************
      [email protected] 32 bit ramsmp run
      ******************
      ramsmp -b 1 -p 4
      Again, a question: what is this really showing us? Is it showing how fast the CPU can write data to cache (or RAM), or is it showing how fast it can write one word at a time?

      I wonder what you'd get by just using the most-optimized memset(). I guess probably little or no difference between 32-bit and 64-bit modes.

      (Edit: I'm imagining it would use 128-bit NEON instructions, in both cases. Of course, the 32-bit code would need a fallback implementation for chips without NEON.)

      Originally posted by cbxbiker61 View Post
      ****************
      [email protected] 64 bit ramsmp run
      ****************
      ramsmp -b 1 -p 4 1
      I don't know if it matters, but this has an extra argument - '1'.
      Last edited by coder; 02-04-2019, 08:38 PM.

      Comment


      • #43
        Originally posted by coder View Post
        Seriously, WTF? Quit yer trollin'.
        So you really don't agree that Intel dragged their feet with 64-bit x86 due to them wanting to push the enterprises to IA64? That's the story I've heard over and over (which of course does not make it true).

        Comment


        • #44
          Originally posted by coder View Post
          Because nobody needs 256-bit addressing and only a few more need to compute with 256-bit scalars.

          And, just maybe because register and ALU width ain't free. Worse, if you look at how ALU operations are implemented, you're increasing the critical path length by at least log2(n) for n-bit wordlength. So, a wider chip will not only be hotter and bigger (and thus more expensive), but also slower.

          Compare that with vector arithmetic, and element-wise operations on a k-element vector only occupy k times as much as the same logic and datapath for operating on a single one of those elements.
          Exactly this. A vector scales linearly in power consumption and space wasted on the die. Double the width, double the cost (and performance).

          Meanwhile, a multiply for example scales roughly quadratically. So a 64-bit multiply is 4 times more complex than a 32-bit multiply. 128-bit multiply is 16 times more complex, etc. And if you ever only use it for small numbers that's a lot of wasted power for no reason.

          For vectors, obviously 32-bit is twice as fast as 64-bit, because you can simply fit twice as much data in a single instruction.

          Comment


          • #45
            Originally posted by Weasel View Post
            Exactly this. A vector scales linearly in power consumption and space wasted on the die. Double the width, double the cost (and performance).

            Meanwhile, a multiply for example scales roughly quadratically. So a 64-bit multiply is 4 times more complex than a 32-bit multiply. 128-bit multiply is 16 times more complex, etc. And if you ever only use it for small numbers that's a lot of wasted power for no reason.

            For vectors, obviously 32-bit is twice as fast as 64-bit, because you can simply fit twice as much data in a single instruction.
            I often like your posts, like that one; when you're calm & friendly.

            Just sayin'

            Comment


            • #46
              Originally posted by F.Ultra View Post
              So you really don't agree that Intel dragged their feet with 64-bit x86 due to them wanting to push the enterprises to IA64? That's the story I've heard over and over (which of course does not make it true).
              No, you were talking about 256-bit scalars, or some such. That doesn't even make sense, not least because Itanium only had 64-bit.

              SSE is 128-bit, so even if you were suggesting Intel didn't add 128-bit scalar support for the sake of market segmentation, I still say it's nonsense.

              Comment


              • #47
                Originally posted by coder View Post
                No, you were talking about 256-bit scalars, or some such. That doesn't even make sense, not least because Itanium only had 64-bit.

                SSE is 128-bit, so even if you were suggesting Intel didn't add 128-bit scalar support for the sake of market segmentation, I still say it's nonsense.
                OK I might have been a little unclear on that one, sorry. What I was referring to there was the address bus width since addressing large amounts of RAM is what the big enterprise users where after.

                Comment

                Working...
                X