Announcement

**oleid** · 22 December 2020, 11:12 AM

Originally posted by pal666 View Post

xz is faster than bzip2 on decompress
and they compress kernel

Not only, that algos can be used anywhere in the kernel, even the file system

**programmerjake** · 22 December 2020, 12:14 PM

Originally posted by cl333r View Post

I don't know assembly so I didn't understand why using signed instead of unsigned would be 2x more efficient:

uint8_t8 *block;
block[uint32_t] // much slower
block[int32_t] // much faster .. why?

both index numbers wrap when they get to their maximum values.. it's just that uint32_t has more precision, so what's the deal?

actually, int32_t causes Undefined Behavior when it would wrap, so that means the compiler can do whatever it likes there, giving it the freedom to use a more efficient implementation. (such as promoting the 32-bit value to a 64-bit value since that matches pointer size, allowing it to skip a zero-extension step and use less instructions. Or at least that's my guess without looking at the code...)

**cl333r** · 22 December 2020, 12:29 PM

Originally posted by programmerjake View Post

actually, int32_t causes Undefined Behavior when it would wrap, so that means the compiler can do whatever it likes there, giving it the freedom to use a more efficient implementation. (such as promoting the 32-bit value to a 64-bit value since that matches pointer size, allowing it to skip a zero-extension step and use less instructions. Or at least that's my guess without looking at the code...)

So it seems it's just because x86 has an accelerated path for signed integers and doesn't for unsigned ones, if so then bzip code doesn't seem stupid, it's just that the programmer didn't know enough assembly like most programmers including me.

PS: I'm curious because Vulkan is using a lot of unsigned integers (maybe in a different context) when I'd prefer signed ones and was wondering if it would also be a performance issue.

**david-nk** · 22 December 2020, 01:14 PM

Originally posted by cl333r View Post

So it seems it's just because x86 has an accelerated path for signed integers and doesn't for unsigned ones, if so then bzip code doesn't seem stupid, it's just that the programmer didn't know enough assembly like most programmers including me.

It's not necessarily stupid, but the software is open-source, so you would expect someone to have improved it by now. It's probably just that no one cares about bzip, it has slow decompression and modern LZ-based compression algorithms come fairly close in terms of compression ratio in the one discipline bzip actually performs well in: compressing text and source code.

Also, to be clear: it's not that x86 has an accelerated path for signed integers, it's just that overflowing a signed integer is undefined behavior according to the C++ standard, so the compiler may assume that an overflow will never occur, while it HAS to account for the possibility of an overflow when an unsigned integer is used.

PS: I'm curious because Vulkan is using a lot of unsigned integers (maybe in a different context) when I'd prefer signed ones and was wondering if it would also be a performance issue.

Perhaps, but since most of the work is done by a GPU when using APIs like Vulkan, the impact would be much lower than it would normally be.

**cl333r** · 22 December 2020, 02:42 PM

Originally posted by david-nk View Post

it's not that x86 has an accelerated path for signed integers, it's just that overflowing a signed integer is undefined behavior according to the C++ standard, so the compiler may assume that an overflow will never occur, while it HAS to account for the possibility of an overflow when an unsigned integer is used.

So in layman's terms the compiler doesn't have to insert extra assembly instructions to deal with possible overflow and that's what makes using signed integers faster. What an interesting optimization. It's probably the case on ARM too since it's the c++ standard, right?

**archkde** · 22 December 2020, 04:33 PM

Originally posted by cl333r View Post

So in layman's terms the compiler doesn't have to insert extra assembly instructions to deal with possible overflow and that's what makes using signed integers faster. What an interesting optimization. It's probably the case on ARM too since it's the c++ standard, right?

Pretty much. It's not that the compiler inserts extra instructions for overflow checking, but it actually has to emit a check+branch something like "if (x < x + 1)". This can lead to a cascade of prevented optimization.

**cl333r** · 22 December 2020, 04:39 PM

Originally posted by archkde View Post

Pretty much. It's not that the compiler inserts extra instructions for overflow checking, but it actually has to emit a check+branch something like "if (x < x + 1)". This can lead to a cascade of prevented optimization.

OK, but now I'm curious - when does it do this? Because I guess it doesn't insert such branches for any possible overflow of unsigned integers like as typical as this:
uint32_t x = ...;
uint32_t y = ...;
uint32_t total = x + y; // here to check that total or x doesn't overflow

**archkde** · 22 December 2020, 05:09 PM

Originally posted by cl333r View Post

OK, but now I'm curious - when does it do this? Because I guess it doesn't insert such branches for any possible overflow of unsigned integers like as typical as this:
uint32_t x = ...;
uint32_t y = ...;
uint32_t total = x + y; // here to check that total or x doesn't overflow

This exact code should give the same assembly for both signed and unsigned in isolation. However, now throw some constants and comparisons into the mix, maybe even a loop or two, and the optimizer quickly finds more opportunities to optimize your program for signed numbers.

**Mangix** · 22 December 2020, 05:39 PM

Seems people didn't hear what was explained in the video.

The assembly generated on a 32-bit system is the efficient one. It's the horrible assembly on 64-bit.

It's because an array index is a pointer. It has nothing to do with signed/unsigned. size_t would also be an acceptable solution.

The point of the whole video was to explain undefined behavior. It just so happens that bzip2 code was in there.

**cl333r** · 22 December 2020, 06:05 PM

Originally posted by Mangix View Post

Seems people didn't hear what was explained in the video.

The assembly generated on a 32-bit system is the efficient one. It's the horrible assembly on 64-bit.

It's because an array index is a pointer. It has nothing to do with signed/unsigned. size_t would also be an acceptable solution.

The point of the whole video was to explain undefined behavior. It just so happens that bzip2 code was in there.

If you watch the video he clearly mentions the issue with unsigned vs signed in the context of pointer arithmetic/assembly and how that maps to 64 vs 32 bit code.
But the author of video explains it pretty vaguely, in that he talks much about it to the point you don't understand what is actually the problem unless you know assembly.
At 44:40 he says "..to have twice as many instructions (..) just because we used an unsigned integer when we didn't need that extra precision (...)"

Announcement

Patch Proposed For Removing BZIP2 Support From The Linux Kernel

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment