Announcement

**coder** · 12 February 2022, 01:50 AM

Originally posted by PerformanceExpert View Post

The fact is, 99% of programmers are absolutely awful at optimizing code, understanding micro architecture or even what they should be optimizing.

The fact is > 99% of code doesn't even need to be optimized. Obviously, it depends on the type of software, but too often I see people optimizing code where a negligible amount of time is even spent, in the first place.

One of my early experiences with this was optimizing a large, separable convolution, on a VLIW DSP*. I got the 1D pass nice and fast. Performance was quite near the theoretical peak. Then, I just had to optimize the transpose, which was obviously thrashing the cache (this CPU had only like 8 KB of L1 cache and no L2... it was a while ago). At the urging of my boss, I measured the relative times of the convolution passes and transpose... and even though the transpose was horribly inefficient for what it was, it was still insignificant compared with the actual convolution. So, it just goes to show the golden rule of optimization: data is king!

* In truth, it's a little bit of a misnomer to call this chip a DSP. Most DSPs have a 2:1 load-to-store ratio, but this chip had 1:1 loads-to-stores. However, it also had 64 registers. So, what I did was to break the kernel into fixed-size chunks that that I could pre-load into a set of registers. Then, I convolved the input row one chunk at a time, accumulating the intermediates in a temporary buffer. The row and temporary buffer were small enough both to fit L1 cache. Also, since this was written in C, I was continually looking at the compiler output to make sure it didn't generate spills or have too many nop slots. These fast convolutions were one of our signature features and helped sell a lot of product.

**tuxd3v** · 12 February 2022, 08:55 PM

Originally posted by Raka555 View Post

pi@rpizero:~ $ cat /etc/os-release
PRETTY_NAME="Raspbian GNU/Linux 10 (buster)"
NAME="Raspbian GNU/Linux"
VERSION_ID="10"
VERSION="10 (buster)"
VERSION_CODENAME=buster
ID=raspbian
ID_LIKE=debian
HOME_URL="http://www.raspbian.org/"
SUPPORT_URL="http://www.raspbian.org/RaspbianForums"
BUG_REPORT_URL="http://www.raspbian.org/RaspbianBugs"
pi@rpizero:~ $ arch
armv6l
pi@rpizero:~ $ lscpu
Architecture: armv6l
Byte Order: Little Endian
CPU(s): 1
On-line CPU(s) list: 0
Thread(s) per core: 1
Core(s) per socket: 1
Socket(s): 1
Vendor ID: ARM
Model: 7
Model name: ARM1176
Stepping: r0p7
CPU max MHz: 1000.0000
CPU min MHz: 700.0000
BogoMIPS: 697.95
Flags: half thumb fastmult vfp edsp java tls

thanks..
In a PiZero that makes sense, but I was talking about RPi2 v1.2(armv7), I thought that RPI released 32bit armv6, and also 32bit armv7.

The article, after all, compares armv6 OS with armv8 OS, without testing armv7 OS..
Devian/Devuan for example they have:
armel( armv5T/armv6 )
armhf( armv7 )
arm64( armv8 )
if you want to see the differences between 32bit and 64bit you should compare armv7 with armv8.

**PerformanceExpert** · 13 February 2022, 07:50 AM

Originally posted by tuxd3v View Post

if you want to see the differences between 32bit and 64bit you should compare armv7 with armv8.

As pointed out in the discussion, if you run RPI OS on an armv7 or v8 CPU, it will use Neon for memcpy and in a few optimized applications. So building the OS for armv7 wouldn't make any difference (the main advantage is reduced codesize due to Thumb-2). The big gains come from the AArch64 ISA, more/wider registers, Neon on by default etc.

**Citan** · 26 February 2022, 02:43 PM

Originally posted by Mario Junior View Post

None, because Netflix doesn't support Linux.

Riiight. That's exactly why I've been streaming every tv show on Amazon Prime and Netflix on Linux since like, 2018 or so?

The amount of certainty you put into a blatant mistake (at best) or fud (at worst) sheds a strange new light on your comments... xd

**Mario Junior** · 28 February 2022, 05:59 PM

Originally posted by Citan View Post

Riiight. That's exactly why I've been streaming every tv show on Amazon Prime and Netflix on Linux since like, 2018 or so?

The amount of certainty you put into a blatant mistake (at best) or fud (at worst) sheds a strange new light on your comments... xd

Ok ret**d, good luck playing Netflix in 4K on Linux.

**darkoverlordofdata** · 19 March 2022, 10:34 AM

I'm running into so much breakage, I'm not sure how much I can trust those benchmarks. Rpi 64 is pre-pre-pre alpha at this point. Re run these when they release a more finished os...

**coder** · 19 March 2022, 04:25 PM

Originally posted by darkoverlordofdata View Post

I'm running into so much breakage, I'm not sure how much I can trust those benchmarks. Rpi 64 is pre-pre-pre alpha at this point. Re run these when they release a more finished os...

That's surprising to hear. I installed the beta about 10 months ago, which was already about a year old. Granted, I didn't do much beyond GUI & ssh logins plus running the compiler and some benchmarks I had, but I didn't run into any problems during any of that, installation, or a few rounds of updates.

What sort of breakage? Does it possibly involve drivers (esp GPU) with 64-bit ports that aren't fully-baked?

**kozman** · 08 April 2022, 09:43 AM

And now we have Raspbian OS with the 5.15.30 kernel. This should be a boost for the Pi4 folks. So, Michael, gonna test it?

https://downloads.raspberrypi.org/raspios_armhf/release_notes.txt

**kozman** · 16 June 2022, 04:35 PM

There have been a lot of good backports to 5.15 in May/June and we should be seeing a newer OS release from the Pi folks in July. Curious how the OG 64-bit 5.10.x-based version will stack up against the newer 5.15 version that's coming. I'd imagine a lot of the 64-bit "bugs" have gotten straightened out.

**coder** · 17 June 2022, 09:16 PM

It's now been 3 years since Pi v4 launched. If not for the silicon production crunch, I'd say Pi v5 should be launching pretty soon. Unfortunately, they literally can't afford to make it until prices come down a lot more.

Announcement

Raspberry Pi OS 32-bit vs. 64-bit Performance

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment