Announcement

Collapse
No announcement yet.

Cleaning Up A Mess: Linux 6.9 Likely To Land Rework Of x86 CPU Topology Code

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #11
    Or to put it even more simple, here's a good metaphor for intel's vision of the perfect cpu:



    And the even better idea to "fix" linux to facilitate it...
    Attached Files
    Last edited by ddriver; 17 February 2024, 01:59 AM.

    Comment


    • #12
      Originally posted by ddriver View Post


      FTFY. There's a world of a difference between a company that does technical innovation, and a company that's in an abusive monopolistic position frantically trying to mitigate its inability to innovate or compete.

      ​BTW I was under the impression the "E" stood for "efficient", but looking at real world results, I can't see the E cores solving neither intel's core count disadvantage nor their even worse efficiency disadvantage.

      If a cpu that is "2/3" "efficient" cores dreadfully losing to a competing cpu with ZERO efficiency cores... that's not a good look, nor evidence of intel being on the right track...


      I'd encourage intel to make 1 good core and scale it up or down, rather than continue adding more fake cores and needless complexity in the mix. It is not something the tech world needs, it is something intel needs as a substitute for being competitive and competent in their designs.

      Because keeping on adding n number of core types for each and every way in which your product sucks does not a good product make.

      It. Only. Makes. It. Worse!

      Here's a brief on intel's issue:

      Actual issue:
      our core is bloated, under-performing and criminally inefficient and we are losing precious enterprise market at a fatal rate

      Actual solution:
      get off your hands and fix your bad design, and that's the end of it

      Non-solution:
      shove another of your bad cores in there, and continue to bloat your bloated core as your main performance improvement strategy
      For Intel, the efficiency cores (E cores) are about space-efficiency more than energy-efficiency.
      With Intel's model, you can put 4 E cores in the space of 1 P core on a chip. With AMD's upcoming implementations, it's 2 C cores per P core.

      If I can trade you 1 of your P cores for a collection of smaller cores that can do more work than the 1 P core, or perhaps work with more energy efficiency, that is a tradeoff you can make. And you can still keep your other P cores, and they're going to help the single-threaded tasks complete the fastest.

      And you know who else is interested in heterogeneous architecture, and who knows that there are clear tradeoffs and tweaks worth making? AMD, Qualcomm, Apple, Arm, and other organizations. Not everybody does it the same way, but everybody is or will be doing it going forward.

      There's no way Intel doesn't know they have a lot of work to do to catch up on their P core efficiency, but that is clearly a parallel effort, and P cores shouldn't be the only thing Intel offers. Companies do multiple things at once. Some challenges are harder to solve than others. And by the time Intel beats current hardware, all Intel's competitors have improved their hardware too. Each year, when we compare every CPU KPI, there will be winners and losers.

      So now, Intel has to compete in a heterogeneous CPU world. Add that to the list of battles they are fighting. On your other note, I agree that Intel has a colorful history of abusive monopolistic behavior, stagnation, market segmentation, and dishonest and terrible marketing. Heterogeneous architecture itself is not evidence that they're drumming their fingers on their P cores and intentionally getting their butt kicked.
      Last edited by Mitch; 17 February 2024, 03:33 AM.

      Comment


      • #13
        Originally posted by pomac View Post
        I've said it before, Thomas Gleixner is a hero, it's very hard to take and rewrite code like this... He did the same with the interrupt subsystem a while ago.
        I do agree with you that this is very valuable work and way overdue.

        I do disagree the work is particularly difficult - just long, boring and with low initial payoff. It takes dedicated effort to insert yourself in the problem domain and staying there, but that is no different than, say, programming a game.

        Biggest reason most people think this is difficult, is because most software developers have never worked with low level programming before. Therefore an unfortunate stigma has emerged that low level programming is hard and you need to know a ton of math and CS just to work in that field. Not really true, though yes, like any skill you do need to write a couple of FreeRTOS based schedulers and do some Microchip development.

        The reason so few people do it is not because you need to have superhuman intelligence, but due to the learning threshold being a bit steep. But like we all know with vim, sometimes that threshold is worth surpassing.

        So thank you to Thomas, that you are doing this, and yes you are a hero for doing this. My gripe is that you shouldn't have to be a hero, this is something more people could help with, but due to the stigma currently surrounding low level programming, few are willing to even pop the hood.

        Sorry for going off tangent.

        Comment


        • #14
          Originally posted by Mitch View Post

          And you know who else is interested in heterogeneous architecture, and who knows that there are clear tradeoffs and tweaks worth making? AMD, Qualcomm, Apple, Arm, and other organizations. Not everybody does it the same way, but everybody is or will be doing it going forward.
          Amd has done it right. Not for marketing numbers, but for cost optimization.

          Amd has no performance discrepancy or feature disparity between their "big" and "little" cores. It is just an optimization technique - there's no point of every CPU core having the same amount of resources, considering that all core workloads will always run massively lower clocks than lower thread count code.

          That's the benefit of having a good, well rounded core. You don't have to invite an increasing amount of crutches into your design to "cure" its limp.

          What makes amd so successful in the enterprise is its cores, unlike pretty much most vendors with meaningful market share, are optimized for real world work, rather than scoring at meaningless synthetic isolate benchmarks. It is just hard to take intel seriously with such priorities and designs... oh and lets not forget all the "just wait hype". And it is nice to see amd doubling down on that in its core cost optimizations, I like performance consistency and not having to wonder what black box totally out of my control may go wrong and choke my pipeline.
          Last edited by ddriver; 17 February 2024, 10:36 AM.

          Comment


          • #15
            Originally posted by ddriver View Post
            Amd has no performance discrepancy
            I think the lack of performance discrepancy is one of the Intel hybrid approach problems TBH. In theory Intel's approach should give them advantage in both ST and MT workloads, but we don't see this clearly and universally in the practice. That's because P core is not as powerful as it should be and the amount of E cores is not high enough (and they are a little too big most likely). The way as I see it P core should be huge, fat, high IPC high frequency core and the amount of them should cap perhaps at 6 these days. On the other hand E core should be perhaps a bit smaller, lower frequency and the amount of them should be in multiple dozens, they should be like a conceptual mini "phi" accelerator built-in CPU for MT boost. So yeah, perhaps "LP" would be the type they should stick to. But that's not the case. Intel P core is around the same IPC as competing AMD's "ordinary core", while E core count is not that high to make a practical difference against AMD in the practical workloads. Such historical CPU's as, for example, 12700K should not even exist, because 4 E cores is basically useless in the practical Desktop use cases.

            I think Intel is quite far away from it's hybrid approach "endgame" and that's the main problem. But even if they did something like that, I would still think that many-core-cpu is too niche these days in the consumer Desktop segment. I think, hypothetically speaking, if Intel removed E cores in Desktop line and added extra 2 P cores with unlocked AVX512 (and perhaps a little more cache perhaps) this would benefit a lot more Desktop users that it would hurt users IMHO, because majority desktop users use PCs for gaming and single/few thread driven apps like CADs, photo editors etc. while they tend to use GPU for encoding/rendering. I think statistically speaking an average user would benefit from more powerful/feature rich P core more than from E cores.
            Last edited by drakonas777; 17 February 2024, 07:01 AM.

            Comment


            • #16
              Originally posted by wertigon View Post

              I do agree with you that this is very valuable work and way overdue.

              I do disagree the work is particularly difficult - just long, boring and with low initial payoff. It takes dedicated effort to insert yourself in the problem domain and staying there, but that is no different than, say, programming a game.
              and noone else does it even if it's needed - it's something that feels huge and insurmountable and yet he does it

              Originally posted by wertigon View Post
              Biggest reason most people think this is difficult, is because most software developers have never worked with low level programming before. Therefore an unfortunate stigma has emerged that low level programming is hard and you need to know a ton of math and CS just to work in that field. Not really true, though yes, like any skill you do need to write a couple of FreeRTOS based schedulers and do some Microchip development.

              The reason so few people do it is not because you need to have superhuman intelligence, but due to the learning threshold being a bit steep. But like we all know with vim, sometimes that threshold is worth surpassing.

              So thank you to Thomas, that you are doing this, and yes you are a hero for doing this. My gripe is that you shouldn't have to be a hero, this is something more people could help with, but due to the stigma currently surrounding low level programming, few are willing to even pop the hood.

              Sorry for going off tangent.
              Yes and no, it's low-level but it's the whole reading, understanding, figuring out whats wrong, redoing and making sure that all other things still work with it.

              Linux needs this, and Thomas is a hero for the effort and time he puts in to this. Even the effort he puts on the changelogs is quite large =)

              Basically, anyone tackling the hard problems, the boring things, but make things better for all of us is a hero by definition =)

              Comment


              • #17
                Originally posted by ddriver View Post

                Amd has done it right. Not for marketing numbers, but for cost optimization.

                Amd has no performance discrepancy or feature disparity between their "big" and "little" cores. It is just an optimization technique - there's no point of every CPU core having the same amount of resources, considering that all core workloads will always run massively lower clocks than lower thread count code.

                That's the benefit of having a good, well rounded core. You don't have to invite an increasing amount of crutches into your design to "cure" its limp.

                What makes amd so successful in the enterprise is its cores, unlike pretty much most vendors with meaningful market share, are optimized for real world work, rather than scoring at meaningless synthetic isolate benchmarks. It is just hard to take intel seriously with such priorities and designs... oh and lets not forget all the "just wait hype". And it is nice to see amd doubling down on that in its core cost optimizations, I like performance consistency and not having to wonder what black box totally out of my control may go wrong and choke my pipeline.
                There are ways you can have cores massively different in power and performance capabilities and still have a great chip. Look up M1's model. A lot of the success in Apple's heroic power efficiency comes from the fact that their E and P cores are very different. It's not an issue when you get the scheduling right.

                The Asahi project is a great example as they have implemented Energy Aware Scheduling recently on M series and they hit heroic levels of power efficiency.

                Your belief that Intel engineered chips with E cores and abandoned reliable AVX instructions, purely with marketing intentions, is your opinion. And it's unfalsifiable.

                Comment


                • #18
                  Nice.

                  Comment


                  • #19
                    I think I have the perfect solution, eliminate multiple cores completely and go back to the good old days of just one core.

                    Here's the reality, you don't need multiple cores, multiple cores was an evolution of SMT.

                    Consider a hypothetical single core chip, with only 1 ALU that is used for integer math, 1 FPU that used for floating point math and 1 ALU that's used purely for boolean comparisons, Yes/No, that sort of thing; it has L1 cache and that's it.

                    A very simple design.

                    What happened over the years is they added more ALUs, more FPUs, a SIMD unit, L2 and L3 caches and so on.

                    Ar some point they realized that many of the execution resources within the core where not being used, either because the task at hand is linear in nature or because programmers where writing their code in serialized fashion or because programmers where targeting the lowest common denominator, i.e. CPUs with less execution units.

                    So then the idea was let's present the core as 2 or more logical processors to the OS and applications, so that more of the execution units can be utilized at the same time.

                    That's all well and good until they decided hey, let's 2 cores on the same chip, basically double the L1, double the ALU, double the L2 and to the OS and applications it will look like processors.

                    The problem was that this was not necessarily faster and in fact I had a Pentium D that under certain tasks was slower and less responsive than a P4 2.4c.

                    So they decided let's modify the dual core design, have the 2 cores share an L2 cache for sharing of data, add some execution units and call it a day.

                    If course if 2 cores is good then 4 must be better and we ended up with quad core CPUs.

                    But these quad cores had the same problem that the single core processors had, namely too many ideal units and so SMT was reintroduced.

                    This process continued and today we have processors with over 100 cores.

                    Here's the thing, as more and more cores were added, scheduling threads became more an more difficult for the OS, especially as power consumption considerations came into play.

                    But think about this for a second, why do we need this many cores?

                    Going back to the theoretical simple one core CPU, is we built a 20 core chip based on that design we would have 20 ALUs for integer math, 20 ALUs for boolean logic, 20 FPUs, and 20 L1 caches that appeared as 20 cores to the OS and applications.

                    Why can't we just consolidate all those execution units into one core, unify all the caches into one large pool of cache, and use a hardware thread scheduler to assign thread execution to whichever units are free.

                    The thread scheduler could be very simple, monitor the power consumption of each ALU and FPU, when the consumption of any one unit foes a certain percentage above baseline, work is moved to another unit of the same type and so on until power consumption normalizes across all similar units.

                    I know it's a radical idea and marketing would never go for it under the theory that consumers would not accept a reduction in core count but i think if it was presented as still having the same number of execution units but in a more efficient design that is easier to schedule threads for, consumers would respond positively.

                    Comment


                    • #20
                      cleaning up the old mess in the existing codebase is very welcome activity. Too bad that many commercial projects can't afford that, as this doesn't directly contribute to the business value

                      Comment

                      Working...
                      X