Announcement

Collapse
No announcement yet.

Glibc 2.29 Released With getcpu() On Linux, New Optimizations

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • #21
    Also: look into what double data rate ram does the effective bus width. IE when you have 2 64 bit RAM sticks paired up... on a 2018 AMD64 system.

    Comment


    • #22
      Originally posted by cbxbiker61 View Post
      Interesting that you keep using examples from 23 years ago as if they are relevant today.

      His point is not arbitrary...I challenge you to find one current 64bit chip that is using memory that is less than 64bits wide. My Mycroft example is a current real-world scenario where 64bit memory access improves performance, therefore the chip idles more and uses less power.
      WTF are you even saying? The bus width has nothing to do with the architecture. Sure that the minimum bus width is usually the size of a pointer in the given architecture, but we're talking about 32-bit chips today and those are built with higher bus width than 32 bits, even if the arch is 32 bit.

      The point is that bus widths are larger than the pointer size in today's CPUs and has nothing to do with the pointer size. Nobody cares about the "minimum" bus width in the real world, only your theoretical nonsense.

      Comment


      • #23
        Originally posted by Weasel View Post
        WTF are you even saying? The bus width has nothing to do with the architecture. Sure that the minimum bus width is usually the size of a pointer in the given architecture, but we're talking about 32-bit chips today and those are built with higher bus width than 32 bits, even if the arch is 32 bit.

        The point is that bus widths are larger than the pointer size in today's CPUs and has nothing to do with the pointer size. Nobody cares about the "minimum" bus width in the real world, only your theoretical nonsense.
        Which 32-bit CPUs have a wider data bus than 32-bits? The reverse have been true on many systems (CPU:s having a lower data bus than their internal ALU) requiring them to spend multiple cycles to perform loads but a CPU with wider data bus that their ALU:s?

        edit: And of course all x86:s since Pentium 60 have had 64-bit data buses (previous Pentiums had only 32-bit though) although their ALU:s where still 32-bit. Turns out that I'm just too old to think about that the prefetcher actually can do for a CPU and that it thus does can perform wider loads than the CPU can handle.

        However prefetcher or not, the CPU still have to perform loads from cache so it will still have to read it in 32-bits and not 64-bits even though the cache itself can be fed in 64-bit chunks so internal memory shuffling is still faster with a 64-bit ALU than a 32-bit one.
        Last edited by F.Ultra; 02-03-2019, 12:02 AM.

        Comment


        • #24
          Originally posted by arjan_intel View Post
          it's statistically not horrible, meaning if you want to use it as an index into, say, an array of "mostly per cpu" structures, you'll get 95% or even 99% or more the "right" answer, and if the cost of being wrong is just a few cache misses/etc (e.g. performance) then that can be a very valid use of this
          Then why is it a system call? Why doesn't the thread just have a pointer to some kind of execution context? Whenever the thread woken, its execution context pointer can be set to that of the core or hardware thread on which it's running.

          This should be a macro - not a system call.

          Comment


          • #25
            Originally posted by cybertraveler View Post
            if an app doesn't need or benefit from bigger types (IE doing arithmatic with bigger numbers) and it doesn't need to address more than 4 GB of memory, then a 32 bit CPU will always (as far as I know) be superior to a 64 bit CPU (if all other things equal; e.g. equivalent instruction sets), as you will need less CPU cache and less memory to do the same job.
            That assumption is often not valid, because some (most?) ISAs add performance-enhancing features (e.g. more registers, new instructions) in their 64-bit mode. The transition from 32-bit to 64-bit usually provides a good opportunity for the chip maker to update the ISA in ways that also improve performance. Larger pointers can be a small price to pay for this.

            Of course, most microcontrollers and CPUs for wearables are still 32-bit, for the reasons you mentioned. But, you have to look beyond the Pi's A53 and go for ARM's Cortex-M cores.

            https://en.wikipedia.org/wiki/ARM_Cortex-M

            Comment


            • #26
              Originally posted by F.Ultra View Post
              Well a wider data bus will allow for faster loads and stores to cache/memory for applications that shuffle lots of memory around.
              Databus width and CPU word-length are independent. A common Intel desktop CPU has a 128-bit memory interface, but that doesn't make it a 128-bit CPU.

              Comment


              • #27
                Originally posted by cbxbiker61 View Post
                My Mycroft example is a current real-world scenario where 64bit memory access improves performance, therefore the chip idles more and uses less power.
                Huh? How do you know that? I think you don't actually know why your program is faster, in 64-bit mode. You're certainly not changing how many lines of the databus are active.

                Comment


                • #28
                  Originally posted by coder View Post
                  That assumption is often not valid, because some (most?) ISAs add performance-enhancing features (e.g. more registers, new instructions) in their 64-bit mode. The transition from 32-bit to 64-bit usually provides a good opportunity for the chip maker to update the ISA in ways that also improve performance. Larger pointers can be a small price to pay for this.

                  Of course, most microcontrollers and CPUs for wearables are still 32-bit, for the reasons you mentioned. But, you have to look beyond the Pi's A53 and go for ARM's Cortex-M cores.

                  https://en.wikipedia.org/wiki/ARM_Cortex-M
                  You quoted me as saying "if all other things equal; e.g. equivalent instruction sets".

                  Comment


                  • #29
                    Originally posted by coder View Post
                    Databus width and CPU word-length are independent. A common Intel desktop CPU has a 128-bit memory interface, but that doesn't make it a 128-bit CPU.
                    Yes I know that we have moved on from that in the last decades. However if the ALUs are still 64-bit then even if the databus is 128-bit the CPU cannot process all 128-bits in one cycle arithmetically. Now there are of course "nothing" that prevents the CPU makers from creating a CPU with 128-bit ALU:s and still use a 32-bit addressbus and thus use 32-bit pointers, it would of coruse violate the current defined data models but then this is embedded (which sometimes is more of a wild west kind of land).

                    Comment


                    • #30
                      Originally posted by F.Ultra View Post
                      Yes I know that we have moved on from that in the last decades. However if the ALUs are still 64-bit then even if the databus is 128-bit the CPU cannot process all 128-bits in one cycle arithmetically.
                      Superscalar CPUs often have > 1 load/store unit, enabling multiple word-lengths of data to be loaded from L1 cache in a single cycle.

                      Also, let's not forget vector instructions (like ARM's 128-bit NEON or Intel's 256-bit AVX).

                      But, you were talking about the memory bus. And that generally operates at the granularity of cachelines. These days, that's often 64 bytes (512 bits).
                      Last edited by coder; 02-03-2019, 04:01 PM.

                      Comment

                      Working...
                      X