Announcement

Collapse
No announcement yet.

Intel i9-12900K Alder Lake Linux Performance In Different P/E Core Configurations

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #81
    Originally posted by birdie View Post
    ARM as well as Apple have been using BIG-little designs for many years now. AMD has been rumored to migrate to a BIG-little uArch as well somewhere in the future. Maybe what Intel has implemented is not that bad after all. Maybe if the Linux kernel properly supported ADL CPUs we wouldn't have such a salty conversation.
    Brings back memories of the old Surface and its Windows RT* operating system which ran rather fast despite being Microsoft's first real attempt on having a desktop WIndows implementation on ARM, and the Tegra 3 also had BIG.little. Considering that Windows 10 also has an ARM64 version the only feasible explanation is that Microsoft merged what they learned from Windows RT into mainline Windows' kernel and scheduler.

    *Windows RT 8.0 release only. RT 8.1 was badly broken.

    Comment


    • #82
      Originally posted by sdack View Post
      AMD's Zen CPUs are not as uniform as people think. Only AMD hides the details and presents a uniform multi-core CPU. One can certainly find cases where AMD's design shows weaknesses, but I have not yet seen anything as worrying as was shown here with Alder Lake.

      The idea of hybrid designs is certainly not bad. What is bad is not to have the software ready on release to benefit from it as there is evidently much to gain from it.
      I am getting the impression that the biggest issue appears to be Intel trying to provide a solution for something that from at least my OS studies back at uni is not really solvable, i.e. automagic scheduling on big little design that generally works better than the alternative. Big little designs work best when developers specifically code into their applications how to use cores, i.e. if you are virus scanner you would pretty much always want to use an E core, or for background tasks like checking for emails or index'ing for fuzzy file search.

      This is because knowing what should run on an e-core and what should run on a p-core is primarily a subjective thing, its when making something deliberately slower but to save power is acceptable which depends on the context of the application.

      Comment


      • #83
        Originally posted by Grinness View Post

        Back in to your rat hole
        Right, because you have nothing to argue?

        Fact remains that for all the benchmarks Michael has done about Linux having better performance over Windows, they simply don't carry forward to real-world computing. Till now nobody can provide a reasonable explanation as to why Windows boots, launches programs and generally respond to application inputs faster than Linux on the same hardware, especially on low-power hardware like Atoms.

        Comment


        • #84
          Originally posted by mdedetrich View Post

          I am getting the impression that the biggest issue appears to be Intel trying to provide a solution for something that from at least my OS studies back at uni is not really solvable, i.e. automagic scheduling on big little design that generally works better than the alternative. Big little designs work best when developers specifically code into their applications how to use cores, i.e. if you are virus scanner you would pretty much always want to use an E core, or for background tasks like checking for emails or index'ing for fuzzy file search.

          This is because knowing what should run on an e-core and what should run on a p-core is primarily a subjective thing, its when making something deliberately slower but to save power is acceptable which depends on the context of the application.
          Processor architectures are created by humans, as are operating system kernels and schedulers. Even an NP-complete problem is solvable with crude techniques.

          Optimus is supposed to know which applications should run on the powerful Nvidia GPU cores while lesser applications get punted off to the iGPU. Sounds magical but really isn't. What Nvidia implemented was a dictionary of applications with static declarations of what should run where. Worked well enough that it did not get in the way of users who knew nothing about Optimus, while the more savvy ones went around editing said dictionary for their individual preferences.

          Comment


          • #85
            Originally posted by Sonadow View Post

            Optimus is supposed to know which applications should run on the powerful Nvidia GPU cores while lesser applications get punted off to the iGPU. Sounds magical but really isn't. What Nvidia implemented was a dictionary of applications with static declarations of what should run where. Worked well enough that it did not get in the way of users who knew nothing about Optimus, while the more savvy ones went around editing said dictionary for their individual preferences.
            Yes but GPU's have a very different workload, they are much more specific than general purpose CPU's. Also Optimus is incredibly crude, it just sees how much load you have on the GPU and if the load is very high then it moves it to the discrete GPU.

            What we are dealing with big-little is an entirely different problem and its already been shown numerous times that the scheduler does close to jack shit when it comes to optimising anything. Steve from Gamers Nexus did extensive performance testing on Windows 10 (which has no scheduler) vs Windows 11 and the difference is almost completely negligible (talking about <1% if anything)

            This is what I meant what I said earlier about a "completely magical scheduler" not really existing, at least not one that makes enough of a difference.

            Comment


            • #86
              Originally posted by mdedetrich View Post

              Yes but GPU's have a very different workload, they are much more specific than general purpose CPU's. Also Optimus is incredibly crude, it just sees how much load you have on the GPU and if the load is very high then it moves it to the discrete GPU.

              What we are dealing with big-little is an entirely different problem and its already been shown numerous times that the scheduler does close to jack shit when it comes to optimising anything. Steve from Gamers Nexus did extensive performance testing on Windows 10 (which has no scheduler) vs Windows 11 and the difference is almost completely negligible (talking about <1% if anything)

              This is what I meant what I said earlier about a "completely magical scheduler" not really existing, at least not one that makes enough of a difference.
              And yet Alder Lake performs much better on Windows than it does in Linux.

              There is definitely something going on in the Windows scheduler, that much is for sure. Microsoft is no stranger to BIG.little; they dealt with it before in Windows RT on the Tegra 3 and they recently worked with it again on the SQ1 and SQ2 for the Surface Pro X. It will be hardly a surprise if the lack of performance difference for Alder Lake in Windows 10 and Windows 11 simply comes down to the fact that Microsoft has already worked on the scheduler to the point where the version in Windows 10 is punting jobs between the P cores and the E cores properly.

              Comment


              • #87
                Originally posted by mdedetrich View Post

                I am getting the impression that the biggest issue appears to be Intel trying to provide a solution for something that from at least my OS studies back at uni is not really solvable, i.e. automagic scheduling on big little design that generally works better than the alternative. Big little designs work best when developers specifically code into their applications how to use cores, i.e. if you are virus scanner you would pretty much always want to use an E core, or for background tasks like checking for emails or index'ing for fuzzy file search.

                This is because knowing what should run on an e-core and what should run on a p-core is primarily a subjective thing, its when making something deliberately slower but to save power is acceptable which depends on the context of the application.
                Intel is not trying to provide an "automagic" solution.

                Developers can still choose to directly schedule workloads onto each processor type (or ideally affinity hint the OS as to what the workload type is), see this:
                "However, it may be more optimal to run background worker threads on the Efficient-cores. The API references in the next section lists many of the functions available, ranging from those providing OS level guidance through weak affinity hits, such as SetThreadIdealProcessor() and SetThreadPriority(), through stronger control like SetThreadInformation() and SetThreadSelectedCPUSets(), to the strongest control of affinity using SetThreadAffinityMask()."

                The idea is that both the software and the hardware (Thread Director) are providing hints to the OS (Windows in this case) and the Windows scheduler matches up the workload to the right core, Intel isn't forcing any sort of automagic scheduling (again Thread Director only gives hints to the OS about the current state of the cores).

                See the diagram and description in this section: "IntelĀ® Thread Director and Operating System Vendor (OSV) Optimizations for the Performance Hybrid Architecture"

                And again to reiterate, a developer can also still choose hard affinities if they want "through stronger control like SetThreadInformation() and SetThreadSelectedCPUSets(), to the strongest control of affinity using SetThreadAffinityMask()".

                All of the quotes are from this intel developer guide: https://www.intel.com/content/www/us...per-guide.html

                Comment


                • #88
                  Originally posted by Sonadow View Post

                  And yet Alder Lake performs much better on Windows than it does in Linux.

                  There is definitely something going on in the Windows scheduler, that much is for sure. Microsoft is no stranger to BIG.little; they dealt with it before in Windows RT on the Tegra 3 and they recently worked with it again on the SQ1 and SQ2 for the Surface Pro X. It will be hardly a surprise if the lack of performance difference for Alder Lake in Windows 10 and Windows 11 simply comes down to the fact that Microsoft has already worked on the scheduler to the point where the version in Windows 10 is punting jobs between the P cores and the E cores properly.
                  Linux doesn't currently have the code to tell e-cores and p-cores apart, so the scheduler treats them identically, with terrible results. That is easily fixed and will be fixed soon.

                  Windows 11 largely uses the intel hardware scheduler apparently, but overrides in certain places by always making sure a graphical application that has focus gets put on a p-core, for example. I'd be surprised if their scheduler is actually even on par with linux, it's just that it's done and working while the linux code is completely turned off at the moment so it's doing something brain dead.

                  Comment


                  • #89
                    Originally posted by smitty3268 View Post

                    Windows 11 largely uses the intel hardware scheduler apparently, but overrides in certain places by always making sure a graphical application that has focus gets put on a p-core, for example.
                    Which is the most basic principal that nobody seems to get.

                    If an application is in focus, it means that the user intends to use it right there and then. There is no reason it should get shafted by assigning it to a lower-priority queue or an E core in Alder Lake's case. This has been the default behavior for Windows since Windows Vista, unlike Linux where every user application gets assigned a nice 0, regardless of the amount of resources and focus time it gets.

                    Originally posted by smitty3268
                    I'd be surprised if their scheduler is actually even on par with linux, it's just that it's done and working while the linux code is completely turned off at the moment so it's doing something brain dead.
                    Right, a production-use scheduler that Intel and Microsoft worked on for Intel's own hardware on Windows and is currently in widespread deployment right now is inferior to the Linux scheduler, so much so that the inferior option actually works properly right now while the Intel developers have to keep the code disabled in the Linux scheduler for various reasons.

                    Comment


                    • #90
                      Birdie is definitely not a Intel/NVIDIA fanboy, he just possess a normal need to defend a certain company and a certain product during 9 page comments thread. Just that - normal need to defend company/product, NOT a fanboy :lol: Anyway.

                      AL is a decent CPU. The main issue is PL2, which is insane on i9. Totally unpractical and beyond optimal and sane working parameters of such a CPU. I guess they needed aggressive power mode to demonstrate more advantage over ZEN3 in the benchmarks. For me personally, none of the K series, except maybe 12600K, makes sense. Non-K and F SKUs will have all the benefits and will cost less.

                      As for the hybrid architecture on the desktop itself - it has potential in the future, where E cores will grow in a count significantly. However, for Alder Lake, two more P cores instead 8 E cores would make more sense. It would me more universal architecture with less software magic needed, also, perf would be somewhat the same.

                      Comment

                      Working...
                      X