Announcement

**Hugh** · 04 January 2019, 12:00 PM

There are many programs in a conventional Linux distro that might reasonably be built using x32. I'd guess that it would be the vast majority.
It would be interesting to build a distro with x32 as the default and only selected programs as full x86_64.

What would benefit from the full pointer width? Programs that might do massive buffering (eg. database programs, Firefox, graphics editors, VM providers). Any others?

The first fallout would be the discovery of stupid portability bugs. So many C programs erroneously assume that pointers can fit in ints. For no serious benefit. This would be all to the good, but the work would fall on the wrong people. A cheap version of this experiment would be to build most things as x86_32 instead of x32. After all, most programs have already been tested in this architecture. But x32 ought to be more performant.

The second result would probably be a modest improvement in disk space for programs. My wild guess: 10%.

I would not be confident that there would be much of a performance improvement since it might turn out that most memory and processor is used by those programs that were left as x86_64. On my desktop, most of the resources are consumed by Firefox most of the time.

**Weasel** · 05 January 2019, 12:56 PM

Originally posted by Hugh View Post

I would not be confident that there would be much of a performance improvement since it might turn out that most memory and processor is used by those programs that were left as x86_64. On my desktop, most of the resources are consumed by Firefox most of the time.

How large is your L1 cache? Or perhaps total CPU cache?

Think about that for a second before you say "memory savings are minuscule" when uncached memory is usually the largest bottleneck in normal desktop apps (not computationally expensive). This applies even to games, like I recall there was a very interesting article about Doom 3's linked lists and how they massively boosted performance by re-orienting them to fit better in memory and more cache friendly (normal linked list wastes a lot of redundant memory and fragmentation).

Seriously guys.

**AsuMagic** · 05 January 2019, 03:42 PM

Originally posted by Weasel View Post

How large is your L1 cache? Or perhaps total CPU cache?

Think about that for a second before you say "memory savings are minuscule" when uncached memory is usually the largest bottleneck in normal desktop apps (not computationally expensive). This applies even to games, like I recall there was a very interesting article about Doom 3's linked lists and how they massively boosted performance by re-orienting them to fit better in memory and more cache friendly (normal linked list wastes a lot of redundant memory and fragmentation).

Seriously guys.

Linked lists are quite the worst case scenario for cache efficiency, though...

**Weasel** · 06 January 2019, 09:27 AM

Originally posted by AsuMagic View Post

Linked lists are quite the worst case scenario for cache efficiency, though...

Indeed, the point was to show that cache efficiency is a very important thing even in games (where a lot of stuff happens on GPU also) and a real bottleneck.

In general, fast memory is not cheap. RAM is slow, very slow. It may seem fast to people, but it's almost two orders of magnitude slower than L1 cache. And all the caches of the CPU are already about half of the entire CPU die. It's not like they can easily "just increase the cache, man".

**smitty3268** · 11 January 2019, 12:01 AM

Originally posted by Weasel View Post

Indeed, the point was to show that cache efficiency is a very important thing even in games (where a lot of stuff happens on GPU also) and a real bottleneck.

In general, fast memory is not cheap. RAM is slow, very slow. It may seem fast to people, but it's almost two orders of magnitude slower than L1 cache. And all the caches of the CPU are already about half of the entire CPU die. It's not like they can easily "just increase the cache, man".

Boosting cache efficiency is only important if the app you are using doesn't fit in it, though. That is to say, once you bust the cache you fall off a performance cliff, but if your app fits in less than half the L1 in it's busy loops then it doesn't really matter how much more you shrink it. You either have enough or you don't, and obviously compilers and cpu designers try to ensure most apps do fit already in x64 mode.

Anyway, the end result is that using x32 to reduce cache usage can massively benefit a small number of apps, but for the most part you'll only see very small benefits.

**Weasel** · 11 January 2019, 01:05 PM

Originally posted by smitty3268 View Post

You either have enough or you don't, and obviously compilers and cpu designers try to ensure most apps do fit already in x64 mode.

Compilers do absolutely nothing about re-ordering data, except on the stack but that is very small data (since it's per-function, and it doesn't reorder most function calls either). They aren't even allowed to begin with.

Sorry to burst that bubble but yes, you need actual programming skill to reorder your data yourself, no magic pill or switch from the compiler as most people oblivious to low level details seem to think (i.e. crappy programmers).

This is not just for cache use btw. Even for auto-vectorization. If your data is ordered wrong, the compiler won't be able to properly auto vectorize it at all. It's not because it's not smart enough, it's because the language forces it to use your stupid layout. You told it to use that data layout, that's what it will use. By design.

e.g. if you perform operations on one member of the struct but multiple elements, then place them in a separate struct or basic type itself. Make an array of members for each member instead of array of structs. Too bad if this "uglifies" your "pure code" but it's what you have to do if you want proper auto vectorization.

Sorry, no magic switch.

**smitty3268** · 11 January 2019, 10:51 PM

Originally posted by Weasel View Post

Compilers do absolutely nothing about re-ordering data, except on the stack but that is very small data (since it's per-function, and it doesn't reorder most function calls either). They aren't even allowed to begin with.

Sorry to burst that bubble but yes, you need actual programming skill to reorder your data yourself, no magic pill or switch from the compiler as most people oblivious to low level details seem to think (i.e. crappy programmers).

This is not just for cache use btw. Even for auto-vectorization. If your data is ordered wrong, the compiler won't be able to properly auto vectorize it at all. It's not because it's not smart enough, it's because the language forces it to use your stupid layout. You told it to use that data layout, that's what it will use. By design.

e.g. if you perform operations on one member of the struct but multiple elements, then place them in a separate struct or basic type itself. Make an array of members for each member instead of array of structs. Too bad if this "uglifies" your "pure code" but it's what you have to do if you want proper auto vectorization.

Sorry, no magic switch.

I wasn't talking about re-ordering data, but thanks for that...

**Weasel** · 12 January 2019, 12:02 PM

Originally posted by smitty3268 View Post

I wasn't talking about re-ordering data, but thanks for that...

Yeah you were talking about some magical method to utilize cache more efficiently "automatically" that is pure fantasy.

**smitty3268** · 12 January 2019, 05:40 PM

Originally posted by Weasel View Post

Yeah you were talking about some magical method to utilize cache more efficiently "automatically" that is pure fantasy.

Dude...

Simple things like CSE and GCM optimizations can improve cache hit rates by optimizing your code easily enough. That's all I meant.

Is that a magic button? No, but I never said it was. In fact, the line was pretty much a throwaway line, which you for some reason decided to shit post about that single word while completely ignoring the actual point of my entire post, which was that decreasing the cache usage only matters if you don't already have enough of it.

Which is why any kind of discussion with you is pointless. You seize on a single meaningless word and spend days arguing about it in order to "win the internet" argument you created out of nothing, while ignoring what other people are actually saying. I'm done with you.

**Weasel** · 13 January 2019, 11:39 AM

Originally posted by smitty3268 View Post

Dude...

Simple things like CSE and GCM optimizations can improve cache hit rates by optimizing your code easily enough. That's all I meant.

Is that a magic button? No, but I never said it was. In fact, the line was pretty much a throwaway line, which you for some reason decided to shit post about that single word while completely ignoring the actual point of my entire post, which was that decreasing the cache usage only matters if you don't already have enough of it.

Which is why any kind of discussion with you is pointless. You seize on a single meaningless word and spend days arguing about it in order to "win the internet" argument you created out of nothing, while ignoring what other people are actually saying. I'm done with you.

I'm not sure what that has to do with "try to ensure most apps do fit already" (emphasis mine, your exact words). Since the topic is about halving the size of pointers, clearly the context is about fitting more stuff in cache.

And your optimizations work equally well in x32 and x86_64 modes and still waste more cache in x86_64 mode, it's not like they do more aggressive cache optimizations just because pointers are larger. You have to do it yourself.

Announcement

Linux Kernel Developers Discuss Dropping x32 Support

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment