13900k makes me happy that it matches a 7900x in non gaming workloads considering its "24" cores but 16 of those cores are e cores. 24 "full" vs 24 "partial" yet the partial matches it. granted at a much higher power consumption but does match it in the end. pardon my ignorance but i am assuming linux kernel is now all fully supported for alder lakes p core vs e core then since its performance appears to be doing quite well? i was under the impression linux still had scheduling problems. from the other thread i was happy to find out that intel's cppc version does work on linux. sorry just getting excited that i might be able to come back to linux sooner rather than later with my 13700k.
Announcement
Collapse
No announcement yet.
AMD Ryzen 9 7900X3D Linux Performance
Collapse
X
-
Michael, your tease about non-Linux benchmarks excites me on this hardware. I don't think you are gonna have a good time getting these brand new processors up on any *BSD or Illumos OS, but man if you do it would be AWESOME! You could completely fit a running *BSD system in this things L3 Cache since they measure around 30MB (though I've had a tty only Cent 7 system use only around 38MB before too...).
- Likes 1
Comment
-
Originally posted by coder View PostI think it's a lot easier to address than Intel's situation with P-cores and E-cores.
Comment
-
Originally posted by drakonas777 View PostI don't think so. Intel P core is universally faster than an E core.
If it were such an easy problem, Intel wouldn't have created a hardware block (i.e. the Thread Director) for accumulating metrics about threads to help the OS' scheduler decide where to run them.
Intel even claimed to have developed a deep learning model to translate the raw metrics into a classification the OS scheduler can use more easily.
Originally posted by drakonas777 View PostAMD 3D CCD is not universally faster than a non-3d one and vice versa,
Comment
-
Originally posted by pieman View Post13900k makes me happy that it matches a 7900x in non gaming workloads considering its "24" cores but 16 of those cores are e cores. 24 "full" vs 24 "partial" yet the partial matches it.
- Likes 2
Comment
-
Originally posted by coder View PostNot if the P-core is being shared by 2 threads
Originally posted by coder View PostAnd if we're talking about a low-ILP task that's memory-bound, then it doesn't really matter where it's running.
Originally posted by coder View PostIf it were such an easy problem, Intel wouldn't have created a hardware block (i.e. the Thread Director) for accumulating metrics about threads to help the OS' scheduler decide where to run them.
Comment
-
Originally posted by agd5f View PostIt's not an easy problem to solve regardless of the OS. If it performs well today in your use cases, I wouldn't worry about how much further it could be optimized. The scheduler doesn't really know what apps will benefit from more cache vs. more speed and there is not really a magic way to tell. Windows does not have native support for these sorts of asymmetries in their scheduler either. There are a number of ideas floating around (perf counters to look at historic trends in the app, adding hints to the binary in the compiler, etc.). I expect this will be a big area of research in the near future.
I don't know who the hell came up with this trend, but it needs to stop. Give me large cache on ALL CORES god damn.
Comment
-
Originally posted by coder View PostI can't imagine performance counters would take more than a couple nanoseconds to read, whereas the duration of a timeslice is usually multiple milliseconds. So, you're only off by a mere 6 orders of magnitude or so. And you wouldn't even have to sample them every timeslice. Nice try, though.
I've done enough performance tuning to have seen my share of surprises. It will definitely take some experimentation and tuning of different approaches. But, to just throw up your hands strikes me as very lame. This is a sufficiently straight-forward problem that I'm sure there are scheduling strategies that can deliver a net-win or break-even on the substantial majority of workloads.
That has a strong temporal aspect to it, which tends to make it more challenging.
- Likes 1
Comment
-
Originally posted by Weasel View PostThis wouldn't be a "problem" if they didn't go with a stupid asymmetric design in the first place.
I don't know who the hell came up with this trend, but it needs to stop. Give me large cache on ALL CORES god damn.
- Likes 3
Comment
-
Originally posted by Weasel View PostThis wouldn't be a "problem" if they didn't go with a stupid asymmetric design in the first place.
I don't know who the hell came up with this trend, but it needs to stop. Give me large cache on ALL CORES god damn.
V-cache is expensive and limits peak performance due to thermal constraints. As of today, there are very few situations where a non-HEDT PC would justify the cost of V-caching all 12-16 cores. Same goes with Intel releasing a monolithic die of 10+ P cores.
While I think the scheduler can be tweaked to intelligently figure out which tasks go to which cores/threads, I think we're reaching a point where software needs to specify what kind of resources it needs. Generally speaking, user-initiated processes could default to the V-cached/P cores while system-initiated processes can default to non-V-cached/E cores.
- Likes 2
Comment
Comment