Announcement

**Drago** · 13 November 2018, 09:22 AM

How come the kernel doesn't know about the CPU topology? I mean it makes perfect sense to schedule app threads as close as possible. In the same CCX, or at least closer nodes, where the NUMA hops are fewer. Same for the Virtual Memory -> Physical memory layout.
I thought that this is already implemented, given that Linux is used on virtually all super computers.

**bridgman** · 13 November 2018, 11:26 AM

Originally posted by boxie View Post

Interesting, is that because this info passing does not exist in the kernel?

Right... the black art is defining/communicating the information that a scheduler would need to "make the right decisions". In the absense of that, applications do things like pinning processes to cores or core groups but that is not always beneficial.

The obvious question is "why is it not always beneficial, what makes those scenarios different from the rest, and what information would the scheduler use to recognize this special case and know that a different rule set is appropriate ?".

**Tomin** · 13 November 2018, 12:50 PM

Originally posted by Drago View Post

How come the kernel doesn't know about the CPU topology? I mean it makes perfect sense to schedule app threads as close as possible. In the same CCX, or at least closer nodes, where the NUMA hops are fewer. Same for the Virtual Memory -> Physical memory layout.
I thought that this is already implemented, given that Linux is used on virtually all super computers.

That's probably not the issue. Linux knows how core are grouped if that information is available. I have done some benchmarking of Linux scheduler (nothing big) and it seems that CFS doesn't have any problems to keep tasks on the same CCXs if that's possible. Whether it will schedule related tasks on same CCX that I haven't tried. How do you define related task is another question to answer but that's probably fairly easy one. A harder one is when it's useful to actually schedule those related tasks on same CCX and when they actually should be on another CCX for better performance.

**Tomin** · 13 November 2018, 01:11 PM

Originally posted by boxie View Post

Interesting, is that because this info passing does not exist in the kernel?

Schedulers tend to be relatively simple. Not quite round robin level but something that doesn't spend too many cycles trying to decide which task should run next. Notably CFS ditched any heuristics that the previous scheduler had and improved performance. There were also other things that contributed to it but generally you don't want to make scheduling too expensive.

Usually schedulers know how cores are connected (NUMA for example) and they will try to schedule so that caches are used efficiently (there can be shared caches for example). There have been some attempts to do things a bit differently but I haven't seen anything yet that is much (well, measurably) better than CFS in general case. Of course you can use whatever works best for you. Anyway, these new desktop NUMA-like architectures and chiplet designs are something that might spin up new ideas about how to redesign scheduling so I'd not be too surprised if someone came up with something that performs better on those systems. We've had CFS for over ten years now.

Something that would be interesting to try, but probably not worth it, would be to use neurons or networks to make scheduling decisions. That would require basically one core to collect information about performance and update the weights, so lots of overhead, but anyway it might be an interesting experiment. Whether something like performance counters could be used to give feedback to the neurons is one problem. It's also not quite clear how much they could affect scheduling because choosing random tasks (before network is trained) is not going to work. This might sound a bit ridiculous but AMD has used perceptrons for years in their branch predictor (at least Agner Fog says so) to learn patterns and they really are just some numbers that are adjusted based on feedback.

**boxie** · 13 November 2018, 09:51 PM

Originally posted by bridgman View Post

Right... the black art is defining/communicating the information that a scheduler would need to "make the right decisions". In the absense of that, applications do things like pinning processes to cores or core groups but that is not always beneficial.

The obvious question is "why is it not always beneficial, what makes those scenarios different from the rest, and what information would the scheduler use to recognize this special case and know that a different rule set is appropriate ?".

This is a good question to ask! The answer I am guessing is that sometimes knowing too much slows you down

As an interested bystander I do thank you for your response, your answers are always insightful - thank you for taking the time to educate us!

**boxie** · 13 November 2018, 09:58 PM

Originally posted by Tomin View Post

Schedulers tend to be relatively simple. Not quite round robin level but something that doesn't spend too many cycles trying to decide which task should run next. Notably CFS ditched any heuristics that the previous scheduler had and improved performance. There were also other things that contributed to it but generally you don't want to make scheduling too expensive.

Usually schedulers know how cores are connected (NUMA for example) and they will try to schedule so that caches are used efficiently (there can be shared caches for example). There have been some attempts to do things a bit differently but I haven't seen anything yet that is much (well, measurably) better than CFS in general case. Of course you can use whatever works best for you. Anyway, these new desktop NUMA-like architectures and chiplet designs are something that might spin up new ideas about how to redesign scheduling so I'd not be too surprised if someone came up with something that performs better on those systems. We've had CFS for over ten years now.

This all makes sense, I always thought that the kernel knew more (aka a god mode view of the system) and knows which thread belongs where. It makes sense to not spend a large amount of time optimising. one of those 90% of the performance for 50% of the work things

Originally posted by Tomin View Post

Something that would be interesting to try, but probably not worth it, would be to use neurons or networks to make scheduling decisions. That would require basically one core to collect information about performance and update the weights, so lots of overhead, but anyway it might be an interesting experiment. Whether something like performance counters could be used to give feedback to the neurons is one problem. It's also not quite clear how much they could affect scheduling because choosing random tasks (before network is trained) is not going to work. This might sound a bit ridiculous but AMD has used perceptrons for years in their branch predictor (at least Agner Fog says so) to learn patterns and they really are just some numbers that are adjusted based on feedback.

An interesting idea, and if your processor has some sort of neural cores it might actually be worth doing.

As an interesting thought - I do wonder how long it will be before CPU's come with small "Kernel Cores" that have their own dedicated resources. We are seeing the advent of lots of other things that might be leading to this (secure enclave etc)

Announcement

Mesa Drops Support For AMD Zen L3 Thread Pinning, Will Develop New Approach

Comment

Comment

Comment

Comment

Comment

Comment