Glibc 2.29 Released With getcpu() On Linux, New Optimizations

coder replied

03 February 2019, 02:48 PM
Originally posted by cbxbiker61 View Post

My Mycroft example is a current real-world scenario where 64bit memory access improves performance, therefore the chip idles more and uses less power.

Huh? How do you know that? I think you don't actually know why your program is faster, in 64-bit mode. You're certainly not changing how many lines of the databus are active.
Likes 1
Leave a comment:
coder replied

03 February 2019, 02:45 PM
Originally posted by F.Ultra View Post

Well a wider data bus will allow for faster loads and stores to cache/memory for applications that shuffle lots of memory around.

Databus width and CPU word-length are independent. A common Intel desktop CPU has a 128-bit memory interface, but that doesn't make it a 128-bit CPU.
Likes 1
Leave a comment:
coder replied

03 February 2019, 02:43 PM
Originally posted by cybertraveler View Post

if an app doesn't need or benefit from bigger types (IE doing arithmatic with bigger numbers) and it doesn't need to address more than 4 GB of memory, then a 32 bit CPU will always (as far as I know) be superior to a 64 bit CPU (if all other things equal; e.g. equivalent instruction sets), as you will need less CPU cache and less memory to do the same job.

That assumption is often not valid, because some (most?) ISAs add performance-enhancing features (e.g. more registers, new instructions) in their 64-bit mode. The transition from 32-bit to 64-bit usually provides a good opportunity for the chip maker to update the ISA in ways that also improve performance. Larger pointers can be a small price to pay for this.

Of course, most microcontrollers and CPUs for wearables are still 32-bit, for the reasons you mentioned. But, you have to look beyond the Pi's A53 and go for ARM's Cortex-M cores.

ARM Cortex-M - Wikipedia

https://en.wikipedia.org/wiki/ARM_Cortex-M
Leave a comment:
coder replied

03 February 2019, 02:32 PM
Originally posted by arjan_intel View Post

it's statistically not horrible, meaning if you want to use it as an index into, say, an array of "mostly per cpu" structures, you'll get 95% or even 99% or more the "right" answer, and if the cost of being wrong is just a few cache misses/etc (e.g. performance) then that can be a very valid use of this

Then why is it a system call? Why doesn't the thread just have a pointer to some kind of execution context? Whenever the thread woken, its execution context pointer can be set to that of the core or hardware thread on which it's running.

This should be a macro - not a system call.
Leave a comment:
F.Ultra replied

02 February 2019, 11:53 PM
Originally posted by Weasel View Post

WTF are you even saying? The bus width has nothing to do with the architecture. Sure that the minimum bus width is usually the size of a pointer in the given architecture, but we're talking about 32-bit chips today and those are built with higher bus width than 32 bits, even if the arch is 32 bit.

The point is that bus widths are larger than the pointer size in today's CPUs and has nothing to do with the pointer size. Nobody cares about the "minimum" bus width in the real world, only your theoretical nonsense.

Which 32-bit CPUs have a wider data bus than 32-bits? The reverse have been true on many systems (CPU:s having a lower data bus than their internal ALU) requiring them to spend multiple cycles to perform loads but a CPU with wider data bus that their ALU:s?

edit: And of course all x86:s since Pentium 60 have had 64-bit data buses (previous Pentiums had only 32-bit though) although their ALU:s where still 32-bit. Turns out that I'm just too old to think about that the prefetcher actually can do for a CPU and that it thus does can perform wider loads than the CPU can handle.

However prefetcher or not, the CPU still have to perform loads from cache so it will still have to read it in 32-bits and not 64-bits even though the cache itself can be fed in 64-bit chunks so internal memory shuffling is still faster with a 64-bit ALU than a 32-bit one.

Last edited by F.Ultra; 03 February 2019, 12:02 AM.
Leave a comment:
Weasel replied

02 February 2019, 05:55 PM
Originally posted by cbxbiker61 View Post

Interesting that you keep using examples from 23 years ago as if they are relevant today.

His point is not arbitrary...I challenge you to find one current 64bit chip that is using memory that is less than 64bits wide. My Mycroft example is a current real-world scenario where 64bit memory access improves performance, therefore the chip idles more and uses less power.

WTF are you even saying? The bus width has nothing to do with the architecture. Sure that the minimum bus width is usually the size of a pointer in the given architecture, but we're talking about 32-bit chips today and those are built with higher bus width than 32 bits, even if the arch is 32 bit.

The point is that bus widths are larger than the pointer size in today's CPUs and has nothing to do with the pointer size. Nobody cares about the "minimum" bus width in the real world, only your theoretical nonsense.
Likes 2
Leave a comment:
cybertraveler replied

02 February 2019, 11:08 AM
Also: look into what double data rate ram does the effective bus width. IE when you have 2 64 bit RAM sticks paired up... on a 2018 AMD64 system.
Leave a comment:
cybertraveler replied

02 February 2019, 10:59 AM
cbxbiker61 - you failed to acknowledge my previous comments pointing out the glaring issues with your prior post. I'm not spending further time discussing this with you.
Leave a comment:
cbxbiker61 replied

02 February 2019, 09:52 AM
Originally posted by cybertraveler View Post

True, but that's somewhat arbitrary because the bus size can be wider or thinner than the processor architecture. You could have a 32 bit CPU with a 128 bit bus. That Nintendo page I linked above also stated that the 64 bit CPU was connected to another chip via a 32 bit bus.

Also, the bus frequency affects the speed too.

Interesting that you keep using examples from 23 years ago as if they are relevant today.

His point is not arbitrary...I challenge you to find one current 64bit chip that is using memory that is less than 64bits wide. My Mycroft example is a current real-world scenario where 64bit memory access improves performance, therefore the chip idles more and uses less power.

Sure, there can be scenarios where 32bit might be an advantage over 64bit. But your statement "I expect that the vast majority of embedded systems and very low power devices would be more efficient if they used a 32 bit CPU architecture." is misguided. I'm assuming that you mean machines running a full OS, as I already pointed out that current micro-controllers are 32bits.

In any case, any engineer putting together a real solution will have the savvy to weigh the pros and cons and make an informed decision. I doubt they'll be weighing data from 23 year old designs very heavily.
Leave a comment:
cybertraveler replied

02 February 2019, 07:48 AM
Originally posted by F.Ultra View Post

Well a wider data bus will allow for faster loads and stores to cache/memory for applications that shuffle lots of memory around.

True, but that's somewhat arbitrary because the bus size can be wider or thinner than the processor architecture. You could have a 32 bit CPU with a 128 bit bus. That Nintendo page I linked above also stated that the 64 bit CPU was connected to another chip via a 32 bit bus.

Also, the bus frequency affects the speed too.
Likes 1
Leave a comment:

Announcement

Glibc 2.29 Released With getcpu() On Linux, New Optimizations

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment: