Announcement

**FireBurn** · 04 December 2021, 11:13 AM

Any idea if AMD have looked into utilising this for their CCD/CCX's?

**CommunityMember** · 04 December 2021, 07:30 PM

Originally posted by FireBurn View Post

Any idea if AMD have looked into utilising this for their CCD/CCX's?

I have little doubt that most manufacturers have not investigated how to best schedule their CPUs (Some of the classic NUMA designs are the poster child examples, but newer architectures have similar issues), for it makes their systems look and work better. Historically Intel and IBM have had the largest teams looking at the issues for their processors, while big.LITTLE in arm has introduced new opportunities. AMD, while apparently hiring in the right teams, has not yet shown they are in the same league as those early vendors (it should be noted that "optimal" scheduling is still considered a very hard problem (it is typically difficult to know the future), and heuristics are still sometimes just a WAG). The good (?????(*)) news is that some hyperscalers care enough to have invested significant resources from their teams to get the most they can from their systems. We can hope the results will be positive to most.

(*) It is good that it means more (often very smart) eyes on the problem. It is potentially bad news in that the priority is for a class of use that may not be generally applicable to anyone who is not a hyperscaler, although sometimes the dregs have value too.

**Alex/AT** · 05 December 2021, 02:55 AM

Originally posted by FireBurn View Post

Any idea if AMD have looked into utilising this for their CCD/CCX's?

I don't think it's actually needed for Zen. Only L3 is of specific nature on larger desktop and many server CPUs, being possibly split between CCDs.
L1 and L2 are still per core, and L3 access is already high enough in latency to disregard relatively small additional delay from accessing 'foreign' L3 slice.
So scheduling tasks preferably to the same core normally does a good job there, no need to diverge. Windows 11 poor experience clearly shows this as well.

**yump** · 05 December 2021, 10:46 AM

Originally posted by Alex/AT View Post

I don't think it's actually needed for Zen. Only L3 is of specific nature on larger desktop and many server CPUs, being possibly split between CCDs.
L1 and L2 are still per core, and L3 access is already high enough in latency to disregard relatively small additional delay from accessing 'foreign' L3 slice.
So scheduling tasks preferably to the same core normally does a good job there, no need to diverge. Windows 11 poor experience clearly shows this as well.

Home slice vs foreign slice latency is only half the problem. The other half is that if you schedule two unrelated threads on the same CCX, they will create cache pressure for each other.

Looking at comparison tests between the Ryzen 3100 (8+8 MiB L3) and the the 3300X (16+0), I expect it could make a significant difference.

Announcement

Intel Making Cluster Scheduling Configurable, Disabled For Alder Lake Hybrid CPUs

Intel Making Cluster Scheduling Configurable, Disabled For Alder Lake Hybrid CPUs

Comment

Comment

Comment

Comment