Announcement

**svenh** · 07 February 2022, 04:21 PM

Can binaries be compiled for armhf under 64bit Raspberry Pi OS? Under Ubuntu, this is possible and in my RAM intensive applications armhf will beat aarch64. This is similar to x32 ABI vs. 64bit on Intel/AMD.

**tildearrow** · 07 February 2022, 04:45 PM

Did the Pi just outperform my Phenom server by a factor of 5 in the LAME test?

Wow! Now it's only half the speed of my 6700K!!!

**qarium** · 07 February 2022, 04:56 PM

Originally posted by tildearrow View Post

Did the Pi just outperform my Phenom server by a factor of 5 in the LAME test?

Wow! Now it's only half the speed of my 6700K!!!

with that improvement the most old believed performance expetations are obsolete...

"only half the speed of my 6700K"

this is in fact impressive for such a device...

**Veto** · 07 February 2022, 05:28 PM

Originally posted by discordian View Post

And arm7 still had integer divisions optional, even if the hardware was available almost everywhere. Heck, even their cortex-m microprocessors have this mandatory.
Linux and most software will bake in function calls instead of a single instruction for that reason, unless you set march to a CPU with int-div. (To add insult to injury, there's no GCC flag to just enable it)

Just curious, but why function call the instruction instead of just trapping it? I thought interrupts are somewhat cheap on ARM?

Trapping the instruction would offset the cost towards the presumably more rare division-less processors. It would be crazy to run a heavyweight OS like Linux on a processor without integer division anyway...

Edit: Actually trapping instructions is a great way to make CPUs instruction sets more flexible. E.g. x86 is full of obsolete instructions, which could be trapped and emulated on modern processors to save some silicon.

**discordian** · 07 February 2022, 05:49 PM

Originally posted by Veto View Post

Just curious, but why function call the instruction instead of just trapping it? I thought interrupts are somewhat cheap on ARM?

you thought wrong, its only kinda true on the microprocessor line (Cortex-M). And even then, you would then have to read, decode and emulate the binary code of teh instruction all while being in a state that's not or really complicated to interrupt.

Originally posted by Veto View Post

Trapping the instruction would offset the cost towards the presumably more rare division-less processors. It would be crazy to run a heavyweight OS like Linux on a processor without integer division anyway...

Not sure about now, but ~3 years ago I compiled my arm kernels with some patches for that reason.
What linux does is "live-patching" early - overwriting the call to the div function with the instruction if hardware is available. But during compiling it still has to be assumed that a function call takes place, stack might need to setup, registers are considered clobbered, suboptimal code in other words.

Originally posted by Veto View Post

Edit: Actually trapping instructions is a great way to make CPUs instruction sets more flexible. E.g. x86 is full of obsolete instructions, which could be trapped and emulated on modern processors to save some silicon.

Yeah could be, but x86 still has everything down to 16bit on-chip (even if its in some form of microcode).

MIPS/RiscV is nice as there are no separate instruction sets, 32bit is just a subset of instructions. Not 2 (or 3 with Thumb2) instructionsets and decoders like ARM has to do.

**heavyjoe** · 07 February 2022, 06:04 PM

Amazing. I thought of this benchmark yesterday and here it is... thanks for the work.

**willmore** · 07 February 2022, 06:27 PM

Originally posted by atomsymbol

Given the particular set of benchmarks in this Phoronix article: The performance difference isn't caused by the ISA being 64-bit and not being 32-bit - but caused by the fact that the 64-bit AArch64 ISA happens to be a redesigned ISA. If AArch64 was ported to 32 bits then, obviously, the port would outperform AArch64 on a Raspberry Pi 4GB.

Performance advantage of 64-bit integers over 32-bit integers can indeed be demonstrated, but only using a different set of benchmarks than the set used in this Phoronix article.

Your mind fails to understand the core idea behind the argument "You don't usually need 64 bits - thus 32 bits is faster".

I don't think that anyone is saying that the only way that 64-bit ARM is better than 32-bit ARM (on this particular model) is because the data types are twice as big. They would be foolish to do that as the best that could result in is a doubling in performance. Clearly there are other aspects involved. You did mention a few of them, but let's enumerate a nice list:

Double the operand size
Better designed ISA
Double the registers
Improved calling convention
Newer instructions

But there's more to it than that. This isn't simply a comparison between 32 bit and 64 bit code on Linux, this is about the particular distributions differences between 32 bit and 64 bit. You have a 32 bit distro compiled for the lowest common target--ARM11/hf--vs a 64 bit distro with the lowest common target being ARMv8 in the form of a Cortex-A53. Now, they do a lot of tricks to use more appropriate code paths at run time, so they do get a degree of improvement from running on newer processors, but not as much as they could if they targeted the actual processor in the machine. Even if you compile your own code--like Michael does with his benchmarks--you still rely on system libraries which were compiled with the lowest common instruction set in mind.

There was one benchmark that improved to 5x of the original code. I don't think a person could look at that and conclude that the only difference was because of a doubled operand size--there had to be other factors at play.

Certainly, one could take the time to run some other testing and determine what of the listed factors impacted each benchmark, but that's well outside of the scope of this article. The question it sought to answer was "For iso hardware, what difference in performance is there between the 32 bit and 64 bit Raspbian OS for self compiled code?"

**Templar82** · 07 February 2022, 06:35 PM

Some pretty impressive gains, I only run a DNS server and a few minor things on my Pi but I will certainly be changing over to the 64 bit OS when I have some time.

**Veto** · 07 February 2022, 07:35 PM

Originally posted by discordian View Post

you thought wrong, its only kinda true on the microprocessor line (Cortex-M). And even then, you would then have to read, decode and emulate the binary code of teh instruction all while being in a state that's not or really complicated to interrupt.
Not sure about now, but ~3 years ago I compiled my arm kernels with some patches for that reason.
What linux does is "live-patching" early - overwriting the call to the div function with the instruction if hardware is available. But during compiling it still has to be assumed that a function call takes place, stack might need to setup, registers are considered clobbered, suboptimal code in other words.

Yeah could be, but x86 still has everything down to 16bit on-chip (even if its in some form of microcode).

MIPS/RiscV is nice as there are no separate instruction sets, 32bit is just a subset of instructions. Not 2 (or 3 with Thumb2) instructionsets and decoders like ARM has to do.

Yes, it is likely too expensive to trap and emulate instruction by instruction on current CPUs. However CPUs could likely optimize it a bit with some form of assisted dispatch/vectoring and dedicated registers to make the trap lightweight. Another possibility could be to trap on first unsupported instruction in a binary, and then trigger a "live-patching" run (JIT like). Or it could just trig the OS to reschedule the program to the single fat core on the die with full instruction support while the other N-1 cores could be kept lean.

There is really no reason why we should waste all that silicon on legacy instructions.

**caligula** · 07 February 2022, 08:06 PM

Originally posted by tuxd3v View Post

Do you sure that Raspbian 32bit is armv6?
I was expecting armv7

I thought everyone knew RPis have used ARMv6 code to remain compatible with the 256 MB single core 700 MHz original ARMv6 Pi. They've always argued the performance difference is so tiny that it doesn't make sense to switch to ARMv7 like some other distros like Armbian have done.

Announcement

Raspberry Pi OS 32-bit vs. 64-bit Performance

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment