Announcement

**pieman** · 07 March 2023, 10:26 PM

13900k makes me happy that it matches a 7900x in non gaming workloads considering its "24" cores but 16 of those cores are e cores. 24 "full" vs 24 "partial" yet the partial matches it. granted at a much higher power consumption but does match it in the end. pardon my ignorance but i am assuming linux kernel is now all fully supported for alder lakes p core vs e core then since its performance appears to be doing quite well? i was under the impression linux still had scheduling problems. from the other thread i was happy to find out that intel's cppc version does work on linux. sorry just getting excited that i might be able to come back to linux sooner rather than later with my 13700k.

**kylew77** · 07 March 2023, 11:00 PM

Michael, your tease about non-Linux benchmarks excites me on this hardware. I don't think you are gonna have a good time getting these brand new processors up on any *BSD or Illumos OS, but man if you do it would be AWESOME! You could completely fit a running *BSD system in this things L3 Cache since they measure around 30MB (though I've had a tty only Cent 7 system use only around 38MB before too...).

**drakonas777** · 08 March 2023, 03:55 AM

Originally posted by coder View Post

I think it's a lot easier to address than Intel's situation with P-cores and E-cores.

I don't think so. Intel P core is universally faster than an E core. AMD 3D CCD is not universally faster than a non-3d one and vice versa, which basically means scheduler needs to be a lot smarter in order to do optimal decisions.

**coder** · 08 March 2023, 04:32 AM

Originally posted by drakonas777 View Post

I don't think so. Intel P core is universally faster than an E core.

Not if the P-core is being shared by 2 threads. And if we're talking about a low-ILP task that's memory-bound, then it doesn't really matter where it's running.

If it were such an easy problem, Intel wouldn't have created a hardware block (i.e. the Thread Director) for accumulating metrics about threads to help the OS' scheduler decide where to run them.

https://www.anandtech.com/show/16881...rchitectures/2

Intel even claimed to have developed a deep learning model to translate the raw metrics into a classification the OS scheduler can use more easily.

Originally posted by drakonas777 View Post

AMD 3D CCD is not universally faster than a non-3d one and vice versa,

The difference in clock speed is small enough that if you have a thread where the additional L3 cache makes a significant difference in hit rate, then it's a pretty obvious win to put it on the die with the additional cache.

**niner** · 08 March 2023, 05:07 AM

Originally posted by pieman View Post

13900k makes me happy that it matches a 7900x in non gaming workloads considering its "24" cores but 16 of those cores are e cores. 24 "full" vs 24 "partial" yet the partial matches it.

Err....if you mean the 7900X with "24 full", you are off by a factor of 2. The 7900X has just 12 cores. So yes, the 13900k can keep up with the 7900X3D, but only with twice the number of CPU cores and almost twice the power consumption.

**drakonas777** · 08 March 2023, 08:10 AM

Originally posted by coder View Post

Not if the P-core is being shared by 2 threads

Does not change the fact P core is faster core.

Originally posted by coder View Post

And if we're talking about a low-ILP task that's memory-bound, then it doesn't really matter where it's running.

Does not matter for this particular workload. Yet again - does not change the fact P core is faster core.

Originally posted by coder View Post

If it were such an easy problem, Intel wouldn't have created a hardware block (i.e. the Thread Director) for accumulating metrics about threads to help the OS' scheduler decide where to run them.

What Intel did is not a proof that something is easier or harder.

**Weasel** · 08 March 2023, 09:24 AM

Originally posted by agd5f View Post

It's not an easy problem to solve regardless of the OS. If it performs well today in your use cases, I wouldn't worry about how much further it could be optimized. The scheduler doesn't really know what apps will benefit from more cache vs. more speed and there is not really a magic way to tell. Windows does not have native support for these sorts of asymmetries in their scheduler either. There are a number of ideas floating around (perf counters to look at historic trends in the app, adding hints to the binary in the compiler, etc.). I expect this will be a big area of research in the near future.

This wouldn't be a "problem" if they didn't go with a stupid asymmetric design in the first place.

I don't know who the hell came up with this trend, but it needs to stop. Give me large cache on ALL CORES god damn.

**agd5f** · 08 March 2023, 10:09 AM

Originally posted by coder View Post

I can't imagine performance counters would take more than a couple nanoseconds to read, whereas the duration of a timeslice is usually multiple milliseconds. So, you're only off by a mere 6 orders of magnitude or so. And you wouldn't even have to sample them every timeslice. Nice try, though.

I've done enough performance tuning to have seen my share of surprises. It will definitely take some experimentation and tuning of different approaches. But, to just throw up your hands strikes me as very lame. This is a sufficiently straight-forward problem that I'm sure there are scheduling strategies that can deliver a net-win or break-even on the substantial majority of workloads.

That has a strong temporal aspect to it, which tends to make it more challenging.

I like your enthusiasm, and I would love to be pleasantly surprised, but asymmetric multi-core CPUs have been around for more than a decade and I would argue it's still not a solved problem. If it were, why didn't Intel just focus on one of the existing asymmetric schedulers available in Linux already rather than the whole thread director thing? I realize some of it was probably NIH and marketing, but presumably they at least looked into it.

**agd5f** · 08 March 2023, 10:11 AM

Originally posted by Weasel View Post

This wouldn't be a "problem" if they didn't go with a stupid asymmetric design in the first place.

I don't know who the hell came up with this trend, but it needs to stop. Give me large cache on ALL CORES god damn.

Is it a problem? Even without any special scheduling, it's still a net win in most things. Having better scheduling would just be icing on the cake.

**schmidtbag** · 08 March 2023, 11:15 AM

Originally posted by Weasel View Post

This wouldn't be a "problem" if they didn't go with a stupid asymmetric design in the first place.

I don't know who the hell came up with this trend, but it needs to stop. Give me large cache on ALL CORES god damn.

Asymmetric designs are the appropriate future for desktop and laptop use, and perhaps specific HEDT workstations and servers as well. After all, GPUs used for compute is effectively an asymmetric approach, but there are some obvious challenges associated with that. We're reaching a point where it's getting harder to balance cost, performance, and efficiency. The vast majority of workloads we have, even as power users, don't require only huge caches on P cores (to combine approaches by both AMD and Intel). So long as each component isn't sitting idly (or at the very least, not drawing hardly any power if doing so), I think it's fine to sacrifice some peak performance for the sake of a chip that otherwise consumes a fraction of the power it otherwise would have. Or if you don't care that much about power consumption: a chip that is significantly cheaper.

V-cache is expensive and limits peak performance due to thermal constraints. As of today, there are very few situations where a non-HEDT PC would justify the cost of V-caching all 12-16 cores. Same goes with Intel releasing a monolithic die of 10+ P cores.

While I think the scheduler can be tweaked to intelligently figure out which tasks go to which cores/threads, I think we're reaching a point where software needs to specify what kind of resources it needs. Generally speaking, user-initiated processes could default to the V-cached/P cores while system-initiated processes can default to non-V-cached/E cores.

Announcement

AMD Ryzen 9 7900X3D Linux Performance

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment