Announcement

Collapse
No announcement yet.

Fedora Developers Discuss Raising Base Requirement To AVX2 CPU Support

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Originally posted by carewolf View Post
    SSE3 is not that interesting. It would give too little over the current SSE2 minimum to be worth it. To get something that is worth updating minimum for you would need at least SSSE3 that has the all important byte-shuffle instruction that makes compiler auto-vectorization much better.
    Or even better SSE4.1 that has the zero/sign-extend instructions, the blend instructions and the 32-bit integer multiply instruction (yeah, that was not in SSE2). But if you require SSE4.1 you might as well require SSE4.2, I think only two processors has been produced that has 4.1 and not 4.2.

    Though I still have server at home with Phenom 2 that only has SSE3.. So it would not be nice for me. But then again I had an 2xAthlon 1800MP for over a decade until it was finally impossible to run anything on it as too much required SSE2.
    SSE 4.2 was introduced in Nehalem, the first generation of Intel Core i processors (Core i3, i5, i7). This means that all Intel Core (2) Duo/Quad CPUs don't support it. C2Q is still doing well even with multimedia tasks, so I don't see the point of cutting off support it for such insignificant benefits.

    Comment


    • Originally posted by the_scx View Post
      SSE 4.2 was introduced in Nehalem, the first generation of Intel Core i processors (Core i3, i5, i7). This means that all Intel Core (2) Duo/Quad CPUs don't support it. C2Q is still doing well even with multimedia tasks, so I don't see the point of cutting off support it for such insignificant benefits.
      Insignificant? It makes many image handling routines run 3x times faster without any hand optimization. (that is for the 96% of users with such a CPU).

      Comment


      • Originally posted by carewolf View Post
        Insignificant? It makes many image handling routines run 3x times faster without any hand optimization. (that is for the 96% of users with such a CPU).
        Comparing SSE 4.2 to SSE 4.1? Nonsense.

        Comment


        • Originally posted by carewolf View Post
          It makes many image handling routines run 3x times faster without any hand optimization.
          I think that's the kind of hard data that will be needed to go forward with increasing the ISA requirements.

          Namely:
          • Actual Fedora user data on processor models being used to see what percentage of people would be affected by any change.
          • Benchmarks that compare the new extensions vs. baseline to see what performance increases we can expect and which packages stand to benefit from them.

          Comment


          • Originally posted by the_scx View Post
            Comparing SSE 4.2 to SSE 4.1? Nonsense.
            The context here was SSE2 to SSE4.1

            From 4.1 to 4.2 you would only get a speed up if you needed fast CRC32, but that is not autovectorized.

            Comment


            • Originally posted by carewolf View Post
              The context here was SSE2 to SSE4.1

              From 4.1 to 4.2 you would only get a speed up if you needed fast CRC32, but that is not autovectorized.
              But Penryn/Wolfdale support SSE4.1. Moreover, all Intel Core CPUs support SSE3 and SSSE3. In fact, SSE3 was supported even by some NetBurst processors (Pentium 4 Prescott).

              Comment


              • Originally posted by IroLix View Post

                Then it'd be better to have two official Fedora versions. One for Pre-AVX2 systems and the other for Post-AVX2 systems.
                Maintaining cost increases.

                Comment


                • Originally posted by blueweb View Post

                  Are technical decisions (not just the 32-bit issue) made based on volunteer interest? Sounds like a lack of leadership.

                  Besides, we're talking about Fedora, a major distro and the testing ground for RHEL. This isn't some random person's hobby project.

                  If some distro states a goal/vision that requires certain compromises, I can understand even if I don't agree. But I don't buy this excuse about being volunteer-run to escape proper justification of decisions.
                  I think at this point we've reached a bit of a 'telephone game' scenario, because this point isn't *quite* about "being volunteer-run". It's more about...well, the point is, stuff gets done only if people do it.

                  So, when things like the i686 debate come up, some people argue very strongly that Fedora should keep i686 support. But very few of them go from "I think Fedora should have i686 support!" to "...and I will help make it happen!"

                  So, at that point, they're basically stating a belief that someone else ought to do it. Which is fine, you're allowed to do that. But it won't necessarily happen. For it to happen, someone - whether that's a volunteer or not - has to buy the case for i686 so hard that *they* go and do the work.

                  So, let's think about Red Hat. Yup, Fedora is a major distro and a proving ground for RHEL (among other things). Which means Red Hat contributes to it. But that doesn't necessarily mean RH is gonna say "well, all these people think Fedora should support i686, but they don't want to work on it, so we'll pay a couple of people to do it". If RH judged there was a huge volume of people who really want Fedora to run on i686, maybe it would - because Fedora having a wide userbase is indirectly useful to RH in lots of ways. But if RH thinks there's probably really only a pretty small number of people who are going to use it, RH isn't going to put resources into it unless there's some other kind of benefit to RH. Which there just isn't, take this from me, RH gets zero direct or strategic benefit from Fedora running on i686.

                  So the point isn't really about "volunteers", it's more about...for something to happen, someone has to value it enough to do the work on it. Whether that "someone" is Red Hat or a community volunteer or anyone else. If some people really wish a thing would happen but don't convince anyone else that it's worth the effort or step up to do it themselves...it's probably not going to happen.

                  Comment


                  • Originally posted by discordian View Post
                    My comment about x86 was partly hyperbole, but it stands that there is no other CPU messed up near that amount.
                    Yeah, I was just trying to point out that all architectures have ISA revisions. So, this problem won't go away, and particularly folks interested in > 128-bit vector extensions currently have nothing to gain by jumping to ARMv8 (while SVE exists, it's currently very uncommon).

                    But I certainly won't argue that x86 has a lot of garbage that's not doing the world any favors.

                    Originally posted by discordian View Post
                    RISC-V extensions are to allow diversification for microprocessors and other special niches. The "desktop ISA" is a set of mandatory extensions, making these alot simpler.
                    Except, what's the deal with different subsets already seeming to have several revisions? Wouldn't that land us in the same boat of having to pick a point in time, and lose most of the benefits that came after?

                    That's why I like the idea of compiling everything down to some portable, intermediate representation. Then, you can do optimization (including LTO) on the final deployed platform as (normally) JIT + caching. Of course, for those packages containing hand-coded assembly, that stuff would have to get compiled and linked in as normal.

                    Originally posted by discordian View Post
                    For the vector extensions, RISC-V has no fixed vector width, you inquire the vector width at runtime and iterate over a loop (adding this width).
                    As I understand it, this is like ARM's SVE - the architectural width of the vectors is variable and handled at runtime.

                    The thing about this is that you can optimize your code better if you know how large the vectors are and the latency of the vector pipeline. That enables software pipelining. So, again, we come back to the point that some sort of JIT would be best.

                    Comment


                    • Originally posted by skeevy420 View Post
                      Which is why I mentioned 8-16GB of whatever memory attached to the APU for just the GPU.
                      So, most likely, the HBM scenario I mentioned. But yeah, you could have some extra VRAM soldered on board (would add a lot of cost to motherboards - cooling it would also be a challenge, since it would have to be located right next to the CPU).

                      Originally posted by skeevy420 View Post
                      8gb just for the GPU with whatever amount installed on the quad channel DDR4 for the system would change things up quite a bit and is what would make a SuperAPU worth having over a StandardAPU.
                      Okay, so 256-bit interface for GDDR and then another 256-bit (quad-channel) for DDR4 or DDR5? You know what other CPU has a 512-bit memory interface? EPYC, in its massive socket.

                      Maybe you could get away with shrinking the GDDR interface to 128-bit, if it used GDDR6. That could get you 225 GB/sec. However, now you're veering into RX 570 territory, which would also shrink the market and make the product less viable.


                      Originally posted by skeevy420 View Post
                      "needs retarded levels of cooling" hasn't stopped AMD before; just look at the FX-9000 series. IMHO, this is the only real technical challenge with a SuperAPU and 7nm and less fabs will help with that.
                      First, FX-9000 was none too popular, IIRC. AMD wouldn't still be in the CPU business, if they'd continued down that path. Secondly, as we've seen with Radeon VII and now Navi, 7 nm doesn't automatically mean low power. Those GPUs take the energy savings dividends and re-invest them in higher clocks, in order to deliver competitive performance. So, either you're leaving more performance on the table, or you've still got a monster heat problem.

                      Also, I don't know how well their power scales down, if you drop the clocks of those GPUs. 7 nm surely has more leakage than 14 nm. Ryzen 3k shows there are efficiency gains to be had, but they're not that amazing.

                      Originally posted by skeevy420 View Post
                      Tin foil hat answer: Either Sony or Microsoft has a "no SuperAPU clause" on AMD to not supply the consumer market with an APU that's better than their current gen console.
                      I don't know that AMD would agree to that. For instance, the real threat to Sony and MS is each other - much moreso than PCs. So, if they got exclusivity agreements on anything, it would be to try to keep tech out of the hands of the other. However, that hasn't really seemed to happen, in any major way, as those platforms have mostly moved in lock-step.

                      In fact, we know that Sony pioneered the expansion of the async compute queues, in the original PS4. AMD turned around and made this improvement in subsequent generations of GCN. Eventually, MS would've gotten it, in the XBox One X.

                      Originally posted by skeevy420 View Post
                      Or AMD doesn't want to compete with itself. If there was an SuperAPU, quite a few people would just buy it instead of a CPU and a GPU.
                      AMD has 3 ways to profit: CPU, GPU, and motherboard. A super APU at least gives them 2. Plus, as discussed, the price of those two would be substantial, giving them plenty of room for decent margins. And it still wouldn't out-compete dGPUs, so there'd still be the upgrade path you have with conventional APUs.

                      Plus, lots of people are probably pairing Ryzen with Nvidia GPUs, so they're not automatically getting all 3, even today.

                      Anyway, there no hard, technical reason it can't be done. Multiple generations of consoles stand as testimony to the fact that it can. So, IMO, it's less of a technical challenge and more of a business case issue. Although, trying to do it in a standard, socketed PC form factor certainly does add some technical constraints.

                      One of the more interesting claims, in that article about the Subor Z+ were that:
                      a company in China invested the best part of 400 million RMB / $60 million USD in a custom processor for its upcoming console and PC hybrid system.
                      So, that gives you a lower-bound in terms of the volume you'd have to hit, for the project to be economically viable.

                      Anyway, they sold the Subor Z+ as both a console and a PC (separate product - same base hardware), in China. So, presumably some folks have got Linux running on it. That might be the best monster APU option you get, at least for a good while.

                      Comment

                      Working...
                      X