Announcement

Collapse
No announcement yet.

New Linux System Call Proposed To Let User-Space Pin Themselves To Specific CPU Cores

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #21
    No, cpu_set_t is a POD without any default-initialization. Clearing it via CPU_ZERO before setting with CPU_SET is correct.
    [EDIT]: No, I was wrong and right as well. Initializing with = {} enables default-initialization of the POD's members, but it isn't guaranteed that this is the same as doing CPU_ZERO. So you should still stick with CPU_ZERO.
    Last edited by Flodul; 22 January 2020, 02:35 PM.

    Comment


    • #22
      This idea sounds ill conceived. It would be better to put this facility in /proc.

      Comment


      • #23
        Originally posted by Neraxa View Post
        This idea sounds ill conceived. It would be better to put this facility in /proc.
        Grouping a number of threads to a group of hw-threads with a shared cache through /proc? WTF?

        Comment


        • #24
          Originally posted by Flodul View Post
          I think it would be the best if Linux would partitially copy the semantics of Windows SetThreadIdealProcessor.
          With that a thread gets a preferred processor / HW-thread it will run on, but it might run also on another core if the ideal core is currently occupied. It will return on the occupied core if it becomes available again to maintain the working-set in the ideal processor's caches.
          What are the use case for that API? Once the thread have been scheduled to another core due the ideal core being occupied the caches are thrashed anyway. Basically the Linux scheduler have this behaviour out of the box, aka it will try to keep the same thread running at the same core when scheduling so this API would more or less change nothing.

          Now my Windows experience is quite old but back in the W2K days, the Windows scheduler would constantly shift cores for a thread so if that is still the case then I can see that this particular API have more merit on Windows to prevent the core hopping.

          Comment


          • #25
            Originally posted by -MacNuke- View Post

            At least on Linux you can do it with pthread pretty simple (example uses std::threads in C++. i is the CPU/Thread number to pin thread 1 to CPU 1 and so on):
            Code:
            cpu_set_t cpuset;
            CPU_ZERO(&cpuset);
            CPU_SET(i, &cpuset);
            int rc = pthread_setaffinity_np(threads[i].native_handle(), sizeof(cpu_set_t), &cpuset);
            The problem is mentioned in the proposal. This approach goes nuts when you start to remove / hotplug CPUs
            How often does that happen with the majority of Linux users. I don’t even know of any local systems with hot pluggable CPUs. This does make me wonder if in the opposite direction, on embedded systems, if this would have any value.

            Comment


            • #26
              Originally posted by F.Ultra View Post
              What are the use case for that API? Once the thread have been scheduled to another core due the ideal core being occupied the caches are thrashed anyway.
              The L1-cache maybe, but not necessarily higher core-specific- cache-levels.


              Originally posted by F.Ultra View Post
              Now my Windows experience is quite old but back in the W2K days, the Windows scheduler would constantly shift cores for a thread so if that is still the case then I can see that this particular API have more merit on Windows to prevent the core hopping.
              Win10 hasn't ths behaviour anymore.

              Comment


              • #27
                Originally posted by Flodul View Post

                The L1-cache maybe, but not necessarily higher core-specific- cache-levels.


                Win10 hasn't ths behaviour anymore.
                And don't forget that L1 is shared by the two (or more) threads in an SMT capable processor.

                And, yes, Win10 still does this. Maybe it's not supposed to, but I can watch it do it.

                Comment


                • #28
                  Originally posted by wizard69 View Post

                  How often does that happen with the majority of Linux users. I don’t even know of any local systems with hot pluggable CPUs. This does make me wonder if in the opposite direction, on embedded systems, if this would have any value.
                  See my previous comment. There are (or at least were) CPUs where the added silicon for frequency scaling was not justifiable so the OS (Linux) was removing the CPU cores from the list of CPUs instead. Unfortunately I'm not able to find that notebook's review anymore (it was in the era of Radeon 9000, so it's ancient).

                  Originally posted by Ladis View Post
                  I even remembered one notebook with MIPS CPU (had Radeon 9000/9100 iGPU in the chipset - long time ago), where frequency changing worked only for the first core, the other 3 cores didn't supported frequency scaling, but supported to be completely turned off (they even dissapeared in the list of CPUs in Linux).
                  Also thinking, the ARMs with big.LITTLE design were originally "hot-plugging" CPU cores (when switching between long battery life or maximum performance, it added the big or little 4 cores, moved the running processes and removed the other 4 cores). Nowadays big.LITTLE ARMs ar simply running all cores simultaneously (so the can say they are "8cores" - some cheap phones advertising thery are 8cores simply had 8 small cores and no big core ).

                  Comment


                  • #29
                    Originally posted by Flodul View Post

                    The L1-cache maybe, but not necessarily higher core-specific- cache-levels.

                    Win10 hasn't ths behaviour anymore.
                    That is assuming that both threads (or however more can be scheduled in between) have very small datasets and that they do very little of data shuffling at which point the actual benefits of the affinity is even lower unless you have a very specific use case where your thread is mostly idle (or touches very little data) but must have realtime semantics at random intervals.

                    If WIn10 does not have this behaviour anymore than that API is even more pointless.

                    Comment


                    • #30
                      > Is there any hardware that supports CPUs with different speeds and capabilities (extensions) of the same architecture?

                      Originally posted by Britoid View Post

                      ARM big.LITTLE chips.
                      big.LITTLE cores have different speeds (both clock speed and instruction latency/throughput) but not capabilities (extensions). All the cores support the same instruction sets so programs can be moved freely between cores
                      Last edited by phuclv; 27 January 2020, 12:59 AM.

                      Comment

                      Working...
                      X