Announcement

Collapse
No announcement yet.

Intel Posts Big Linux Patch Set For "Classes of Tasks" On Hybrid CPUs, Thread Director

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #11
    Originally posted by Linuxxx View Post

    I'm not quite sure if this will actually help to implement an asymmetric ISA.

    To me, it sounds like this is intended strictly for performance (IPC) reasons:



    Maybe someone more knowledgeable on this topic can chime in...
    I have no special knowledge in this area but I read it similar to you. As it stands today, this does not enable support for heterogeneous AVX512 support.

    That said, I do think it opens the door for future work where they leverage it for detection and thread sorting for avx512/AMX instructions.

    Comment


    • #12
      Originally posted by atomsymbol

      After reading the description provided with the patch series, I have doubts whether this Linux implementation of Thread Director technology will be able to schedule Linux threads -- when the number of currently running Linux threads is larger than 10 -- among P-cores and E-cores so that the end-result will be close to the best placement of those Linux threads among all the cores, if the only optimality measure is how long (wall-clock time) it takes to complete the tasks (without taking power consumption into account).

      The examples described in the patch series are only about SMT scheduling (P-cores have SMT, E-cores don't have it). It is highly questionable whether enabling the INTEL_THREAD_DIRECTOR kernel option solves something other than just the SMT scheduling problem on homo-ISA hetero-SMT CPUs.
      So no way forward for a hypothetical asymmetric ISA?

      Which then would mean that both full-width AVX-512 & AMX are not coming to any consumer Intel CPU for the foreseeable future.

      Comment


      • #13
        Originally posted by atomsymbol

        A user-space app already has the ability to measure its IPC by using Linux performance counters/events and, based on the measured IPC, it can decide whether to pin its thread to a specific CPU core or to a specific subset of CPU cores. ---- What additional missing "user-space component" do you mean? ---- The question then would be: If an app isn't already using the existing Linux perf counters to optimize the placement of its threads on the CPU, then why (for what reason) would the app be using the user-space component of Intel Thread Director technology? In other words: who (what apps) would be using the user-space component?
        Are you saying all Linux applications must be rewritten to properly support big.LITTLE? Not going to happen. It's not how it's done in Android/Windows/MacOS/iOS either. I'm talking about an application/daemon/service similar to earlyoom/systemd-oomd. The kernel has no idea which apps are important/unimportant, or in foreground/background, etc.
        Last edited by birdie; 10 September 2022, 03:07 PM.

        Comment


        • #14
          Originally posted by birdie View Post
          Never optimized for SMT? I'd say this is a dead OS in this case. HT/SMT absolutely requires optimization or otherwise your performance may tank. Consider you're running a lightly multithreaded app with e.g. 2 threads, the system is idle and instead of scheduling the app to physical cores 0 and 1, you instead schedule it to run on Core 0 and its sibling SMT/HT core. Now, instead of getting double the performance you're only (at best) getting a 30% performance increase. And there's a ton of similar use cases when SMT/HT absolutely needs to be taken into consideration. And of course you absolutely need to optimize for Numa and Zen CCX'es.
          apple m1/m2 has no HT/SMT ... so it is not clear if this is the future. if you just go with ARM cpus and ignore intel technology...
          Phantom circuit Sequence Reducer Dyslexia

          Comment


          • #15
            Originally posted by qarium View Post

            apple m1/m2 has no HT/SMT ... so it is not clear if this is the future. if you just go with ARM cpus and ignore intel technology...
            I'd say that the "future" is actually multiple cores and processor units uniquely optimized for performance versus energy and thermal efficiency rather than playing tricks with how many threads can play at one time on the head of a single core.

            This is already playing out with the various processing units already being deployed on cell phones and the Apple laptops. Intel is moving in that same direction less obviously (in the way they are marketing things) but they already have neural units, image processing units, graphical processing units, general processing units, client cryptographic units, DSPs on the desktop alone. That doesn't include the vectorizations, data processing units, server side cryptographic processing units, network packet processing units, analytic units, etc etc for server specific processing.

            Software developers are going to have to learn how to push their data around better and more efficiently than in the past. (Read what Netflix has done with FreeBSD for an outstanding example of figuring out which processors work best for which loads.)

            Regardless of what OpenBSD decides to do, any operating system that can't dump the proper loads to whichever processing unit best supports that functionality won't be used by anyone but niche diehards. But don't believe for one moment any of the proprietary software or firmware support those processing units require will actually make it to Linux (or BSD or...) unless someone manages to reverse engineer them. Microsoft Windows is still the bread and butter of all of those big names, and Microsoft's customers are businesses which run Windows client on the desktop and Windows Server in the back office (which isn't the same thing as the data center). Those hardware vendors will only push out just enough of their technology into the open source world to produce efficient (enough) servers for their particular customers (by large the excluded data center owners above). The rest of it will remain closed off (much like the IPU in the 12th Gen Intel platform).

            Comment


            • #16
              Originally posted by atomsymbol

              There's already nice, ionice, cgroups and ulimit.
              I thought we were talking about a user-friendly OS. And, yeah, we already have taskset only how usable it is for 99.99% of people out there? It may as well not exist at all.

              Comment


              • #17
                Has anyone considered a 'really dumb' scheduler for big.LITTLE-type systems? One where all tasks start round-robin with strong affinity across E cores, only migrating to a P core when they've used 1-5 seconds tapping a core at >80%? I feel like KISS might be the best approach from a practical perspective.

                Comment


                • #18
                  Originally posted by atomsymbol

                  You were claiming that "The kernel has no idea which apps are important/unimportant, or in foreground/background, etc" which isn't true. In my opinion, the truth is that many apps can freely keep neglecting most of the already existing performance&memory optimization mechanisms offered by the Linux kernel because those mechanisms aren't being enforced by the kernel. The usage of those mechanisms is optional, informal (that is: it doesn't have a formal mathematical representation):
                  • A suboptimal scheduling decision doesn't result in the app receiving a signal from the Linux kernel (in C/C++: #include <signal.h> and similar header files)
                  • If an app is starting, the Linux kernel isn't forcing the app to examine whether the previous run of the app crashed
                  In summary: The base problem is that, in Linux as as well as in other operating systems, performance doesn't have the form of binary data that could be read by a user-space executable.
                  The kernel does not know which apps are in background/foreground/important/unimportant, period. There's no API to interact with the process scheduler to give it these hints outside of process priority (niceness value) or scheduling policies (SCHED_FIFO, SCHED_RR, SCHED_DEADLINE, SCHED_DEADLINE , SCHED_OTHER , SCHED_BATCH , and SCHED_IDLE) and even though they exist, no desktop environment or window manager that I'm aware of is using them. Lastly, the Linux kernel does not allow the user to raise the priority of the task once it's been decreased which makes things even more complicated.

                  There has to be a new mechanism to hint the kernel about which cores are preferrable for execution because niceness and scheduling priorities serve a whole different purpose. Looks like you've no idea what you're talking about. little.BIG requires optimizations and APIs which the normal kernel does not yet provide.

                  Comment


                  • #19
                    Originally posted by birdie View Post
                    Lastly, the Linux kernel does not allow the user to raise the priority of the task once it's been decreased which makes things even more complicated.
                    That's not true.

                    From nice(2) manpage:

                    since Linux 2.6.12, an unprivileged process can decrease the nice value of a
                    target process that has a suitable RLIMIT_NICE soft limit; see getrlimit(2) for details.​

                    Comment


                    • #20
                      Originally posted by birdie View Post
                      There has to be a new mechanism to hint the kernel about which cores are preferrable for execution because niceness and scheduling priorities serve a whole different purpose. Looks like you've no idea what you're talking about. little.BIG requires optimizations and APIs which the normal kernel does not yet provide.
                      That brings us two questions:
                      - who is allowed to give these hints (i.e. only same user or CAP_ADMIN)?
                      - how to treat these hints? What if these hints are wrong?

                      Lastly, arm has switched to big little arch long ago and they didn't seem to need any of this?

                      Comment

                      Working...
                      X