Announcement

**the_scx** · 24 July 2019, 11:03 AM

Originally posted by carewolf View Post

SSE3 is not that interesting. It would give too little over the current SSE2 minimum to be worth it. To get something that is worth updating minimum for you would need at least SSSE3 that has the all important byte-shuffle instruction that makes compiler auto-vectorization much better.
Or even better SSE4.1 that has the zero/sign-extend instructions, the blend instructions and the 32-bit integer multiply instruction (yeah, that was not in SSE2). But if you require SSE4.1 you might as well require SSE4.2, I think only two processors has been produced that has 4.1 and not 4.2.

Though I still have server at home with Phenom 2 that only has SSE3.. So it would not be nice for me. But then again I had an 2xAthlon 1800MP for over a decade until it was finally impossible to run anything on it as too much required SSE2.

SSE 4.2 was introduced in Nehalem, the first generation of Intel Core i processors (Core i3, i5, i7). This means that all Intel Core (2) Duo/Quad CPUs don't support it. C2Q is still doing well even with multimedia tasks, so I don't see the point of cutting off support it for such insignificant benefits.

**carewolf** · 24 July 2019, 11:16 AM

Originally posted by the_scx View Post

SSE 4.2 was introduced in Nehalem, the first generation of Intel Core i processors (Core i3, i5, i7). This means that all Intel Core (2) Duo/Quad CPUs don't support it. C2Q is still doing well even with multimedia tasks, so I don't see the point of cutting off support it for such insignificant benefits.

Insignificant? It makes many image handling routines run 3x times faster without any hand optimization. (that is for the 96% of users with such a CPU).

**the_scx** · 24 July 2019, 11:34 AM

Originally posted by carewolf View Post

Insignificant? It makes many image handling routines run 3x times faster without any hand optimization. (that is for the 96% of users with such a CPU).

Comparing SSE 4.2 to SSE 4.1? Nonsense.

**Space Heater** · 24 July 2019, 11:39 AM

Originally posted by carewolf View Post

It makes many image handling routines run 3x times faster without any hand optimization.

I think that's the kind of hard data that will be needed to go forward with increasing the ISA requirements.

Namely:

Actual Fedora user data on processor models being used to see what percentage of people would be affected by any change.
Benchmarks that compare the new extensions vs. baseline to see what performance increases we can expect and which packages stand to benefit from them.

**carewolf** · 24 July 2019, 12:14 PM

Originally posted by the_scx View Post

Comparing SSE 4.2 to SSE 4.1? Nonsense.

The context here was SSE2 to SSE4.1

From 4.1 to 4.2 you would only get a speed up if you needed fast CRC32, but that is not autovectorized.

**the_scx** · 24 July 2019, 12:29 PM

Originally posted by carewolf View Post

The context here was SSE2 to SSE4.1

From 4.1 to 4.2 you would only get a speed up if you needed fast CRC32, but that is not autovectorized.

But Penryn/Wolfdale support SSE4.1. Moreover, all Intel Core CPUs support SSE3 and SSSE3. In fact, SSE3 was supported even by some NetBurst processors (Pentium 4 Prescott).

**tildearrow** · 24 July 2019, 02:17 PM

Originally posted by IroLix View Post

Then it'd be better to have two official Fedora versions. One for Pre-AVX2 systems and the other for Post-AVX2 systems.

Maintaining cost increases.

**AdamW** · 24 July 2019, 05:42 PM

Originally posted by blueweb View Post

Are technical decisions (not just the 32-bit issue) made based on volunteer interest? Sounds like a lack of leadership.

Besides, we're talking about Fedora, a major distro and the testing ground for RHEL. This isn't some random person's hobby project.

If some distro states a goal/vision that requires certain compromises, I can understand even if I don't agree. But I don't buy this excuse about being volunteer-run to escape proper justification of decisions.

I think at this point we've reached a bit of a 'telephone game' scenario, because this point isn't *quite* about "being volunteer-run". It's more about...well, the point is, stuff gets done only if people do it.

So, when things like the i686 debate come up, some people argue very strongly that Fedora should keep i686 support. But very few of them go from "I think Fedora should have i686 support!" to "...and I will help make it happen!"

So, at that point, they're basically stating a belief that someone else ought to do it. Which is fine, you're allowed to do that. But it won't necessarily happen. For it to happen, someone - whether that's a volunteer or not - has to buy the case for i686 so hard that *they* go and do the work.

So, let's think about Red Hat. Yup, Fedora is a major distro and a proving ground for RHEL (among other things). Which means Red Hat contributes to it. But that doesn't necessarily mean RH is gonna say "well, all these people think Fedora should support i686, but they don't want to work on it, so we'll pay a couple of people to do it". If RH judged there was a huge volume of people who really want Fedora to run on i686, maybe it would - because Fedora having a wide userbase is indirectly useful to RH in lots of ways. But if RH thinks there's probably really only a pretty small number of people who are going to use it, RH isn't going to put resources into it unless there's some other kind of benefit to RH. Which there just isn't, take this from me, RH gets zero direct or strategic benefit from Fedora running on i686.

So the point isn't really about "volunteers", it's more about...for something to happen, someone has to value it enough to do the work on it. Whether that "someone" is Red Hat or a community volunteer or anyone else. If some people really wish a thing would happen but don't convince anyone else that it's worth the effort or step up to do it themselves...it's probably not going to happen.

**coder** · 24 July 2019, 10:35 PM

Originally posted by discordian View Post

My comment about x86 was partly hyperbole, but it stands that there is no other CPU messed up near that amount.

Yeah, I was just trying to point out that all architectures have ISA revisions. So, this problem won't go away, and particularly folks interested in > 128-bit vector extensions currently have nothing to gain by jumping to ARMv8 (while SVE exists, it's currently very uncommon).

But I certainly won't argue that x86 has a lot of garbage that's not doing the world any favors.

Originally posted by discordian View Post

RISC-V extensions are to allow diversification for microprocessors and other special niches. The "desktop ISA" is a set of mandatory extensions, making these alot simpler.

Except, what's the deal with different subsets already seeming to have several revisions? Wouldn't that land us in the same boat of having to pick a point in time, and lose most of the benefits that came after?

That's why I like the idea of compiling everything down to some portable, intermediate representation. Then, you can do optimization (including LTO) on the final deployed platform as (normally) JIT + caching. Of course, for those packages containing hand-coded assembly, that stuff would have to get compiled and linked in as normal.

Originally posted by discordian View Post

For the vector extensions, RISC-V has no fixed vector width, you inquire the vector width at runtime and iterate over a loop (adding this width).

As I understand it, this is like ARM's SVE - the architectural width of the vectors is variable and handled at runtime.

The thing about this is that you can optimize your code better if you know how large the vectors are and the latency of the vector pipeline. That enables software pipelining. So, again, we come back to the point that some sort of JIT would be best.

**coder** · 24 July 2019, 11:08 PM