Updated Ubuntu 24.10 Install Image Released For Snapdragon X1 Elite Laptops

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • Dukenukemx
    Senior Member
    • Nov 2010
    • 1385

    #31
    Originally posted by coder View Post
    I promise you that Apple doesn't give a shit about Cinebench and definitely isn't optimizing for it.
    Nobody knows that. This is why synthetic benchmarks are sus.
    But 3DMark uses its own rendering engine, right? Cinebench is not a purpose-built benchmark, but rather a wrapper around the production renderer in Maxon's Cinema4D. I'll bet if 3D Mark simply used Unreal Engine or maybe Unity3D, it would correlate better with actual game performance.
    Nvidia has been caught cheating in 3DMark. It's not even that Apple might be cheating so much they might be optimizing just for those tests.
    It's not that I don't want more tests, but Michael simply isn't running the single-threaded benchmarks that would tell us how the individual cores compare, so we have to live with what we've got.
    How many applications you use that makes use of single core performance? That's why I wouldn't get a Lunar Lake based laptop because of the terrible multicore performance. Real use for single core performance still remains to be gaming and maybe Photoshop. That and how snappy computers feel which you can't realistically benchmark effectively.
    In his M4 mini review, we got only one single-threaded benchmark where it did indeed spank the x86 crew, but someone pointed out that it was a compression benchmark and we can't rule out the possibility that the M4 simply won by virtue of fitting more of the tables in its L2 cache.
    Hardly call the Flac benchmark a spanking against x86. Apple's M4 9.0 vs Intel's 285k 9.9. The Ryzen AI HX 370 got a 17.8, and that would be a spanking.
    Probably the first thing you're going to point out is how the Zen 5 desktop CPUs beat M3 Pro. That's a laptop CPU, however. If you compare it to the HX 370, Zen 5 ain't looking so good.
    Not really. The thing that sticks out is that they used a variety of OS's to run these benchmarks. Debian 12 along with Asahi Linux as well as MacOS. When Apple Silicon is running on Asahi Linux then it's slower compared to MacOS. Also faster ram really helps the 9800X3D.
    No, just because games correlate well with single-thread performance doesn't make them single-threaded benchmarks. All games are multi-threaded, which is the first complication they pose, since that makes them susceptible to sub-optimal scheduling and frequency-scaling issues. The next issue is that they're doing lots of I/O and synchronization with the GPU. None of these are factors you want to deal with, when you're trying to tease out subtle differences between CPU microarchitectures.
    Most games tend to use one or two cores at 100% and then spread out other tasks to other cores. Why you think AMD created 3D V-Cache or otherwise known at Level 3 cache? Not all single threaded applications hit the CPU the same.
    Your understanding is too simplistic. Michael ran plenty of benchmarks on X3D CPUs and it has a wide diversity of impacts. Some of them love the extra L3 cache and others are unaffected by it. The principle factor is how much of the working set can fit in the L3 cache with/without the extra cache die. If the added cache die doesn't make a meaningful difference (either because most of the working set fit in the smaller L3 capacity or because the working set is so huge that the extra L3 cache hardly makes a dent), then the benchmark is simply going to prefer the CPU with a higher clock speed.
    The L3 cache is meant to help with single core performance with applications that work linear. The reason it doesn't benefit all applications is the same reason GPU's hardly have any cache. GPU's deal with math and math doesn't benefit much from branch prediction.
    It's not worthless, because it measures an actual application, which is Cinema4D. It correlates well with the performance of other renderers. Finally, most other programs people use don't employ AVX-512, either.
    Doesn't Cinema4D make use of the GPU? How useful is it to only use the CPU?


    Comment

    • coder
      Senior Member
      • Nov 2014
      • 8825

      #32
      Originally posted by Dukenukemx View Post
      How many applications you use that makes use of single core performance?
      That's a different argument. You're trying to change the subject at every possible opportunity, which just shows you're aware of how weak x86 is. Decoding is a bottleneck for it, and there's just no way around that. AMD drew the line at 4-wide decoders and simply added a second one for SMT, in Zen 5. When only one thread is running on a core, just one 4-wide decoder is active, which is the same as it's been since Zen 1.

      Intel has gone wider, but they also have had to get really creative to work around the complexity of x86. Since Tremont, they've also been adding additional 3-wide decode blocks that can follow different branches.

      Meanwhile, the ARM folks are now up to 10-wide decode, last I checked. In fact, AArch64 is so cheap to decode that ARM even got rid of its MOP caches.

      Originally posted by Dukenukemx View Post
      ​Real use for single core performance still remains to be gaming and maybe Photoshop.
      That's bullshit. Web-browsing is typically dominated by just a couple threads, even if parts of it are more parallelized. Much of the computer boot, login, and desktop loading is dominated by a small number of threads. Most commandline tools are single-threaded. When I'm compiling stuff, most of the time I'm doing incremental builds that often involve just a few files that need to be recompiled and linked.

      That's not to say I don't care about multi-threaded performance, but I only really care about that when I'm working on larger software projects, which I'm currently doing only at my job.

      Originally posted by Dukenukemx View Post
      ​​That and how snappy computers feel which you can't realistically benchmark effectively.
      So, you agree that it counts for something, even if you aren't sure how to measure it. Well, there are plenty of benchmark suites which try to capture that, but it generally correlates well with scalar, integer performance.

      Originally posted by Dukenukemx View Post
      Hardly call the Flac benchmark a spanking against x86. Apple's M4 9.0 vs Intel's 285k 9.9. The Ryzen AI HX 370 got a 17.8, and that would be a spanking.
      A 10% margin isn't enormous, but look at how much power they each used:





      The only one that came close to the M4's power utilization is the HX 370, which strongly suggests that the thread got scheduled on a C-core. The 285k that came closest used 3.8 times as much power. The 245k was 84.7% as fast as the M4, while using 2.9x times as much power.

      Originally posted by Dukenukemx View Post
      The thing that sticks out is that they used a variety of OS's to run these benchmarks. Debian 12 along with Asahi Linux as well as MacOS. When Apple Silicon is running on Asahi Linux then it's slower compared to MacOS.
      The main thing that's hurt Apple silicon running Asahi, when Michael has tested it recently, is bad P-core vs. E-core scheduling. In general, MacOS doesn't confer the sort of advantage butt-hurt Apple-haters like to claim it does, as clearly shown here:

      Originally posted by Dukenukemx View Post
      Why you think AMD created 3D V-Cache or otherwise known at Level 3 cache?
      They did it for server CPUs and someone decided to try hacking together some desktop CPUs with some spare X3D dies and benchmarking it. When AMD discovered how well it ran games, the 5800X3D was born.

      "The reason AMD opted to research 3D-Vcache functionality on Ryzen in the first place, was due to an “accident” during the production of presumably prototype Epyc 3D-VCache chips where 7 CCDs were left over in a batch that couldn’t be utilized in an EPYC chip — since EPYC CPUs required 8 CCDs at the time.

      This led Mehra and his cohorts to re-purpose the seven V-Cache-equipped dies for desktop use, building out multiple designs including 8, 12, and 16 core variants. This is what lead AMD to research the capabilities of 3D-VCache in desktop workloads and discover the incredible gaming performance V-Cache offers, giving birth to the Ryzen 7 5800X3D."

      Source: https://www.tomshardware.com/news/am...ache-prototype


      Originally posted by Dukenukemx View Post
      The L3 cache is meant to help with single core performance with applications that work linear.
      Again, you're out of your depth, dude. Your understanding of these concepts is way too simplistic. It really has nothing to do with how "linear" the access patterns are, but rather the working set size and the degree of contention vs. other cores sharing the same L3 slice.

      Originally posted by Dukenukemx View Post
      The reason it doesn't benefit all applications is the same reason GPU's hardly have any cache. GPU's deal with math and math doesn't benefit much from branch prediction.
      Wow. Just wow. You need to actually learn some things about computer architecture, or I guess just stick to looking at benchmarks and playing games, because that word salad is utter nonsense.

      Originally posted by Dukenukemx View Post
      Doesn't Cinema4D make use of the GPU? How useful is it to only use the CPU?
      Traditionally, it was CPU-only. GPU support is fairly recent. I know some people still do CPU rendering when there scene is too large to fit in GPU memory, although I don't know to what extent this applies to Cinema4D.

      Comment

      • Dukenukemx
        Senior Member
        • Nov 2010
        • 1385

        #33
        Originally posted by coder View Post
        That's a different argument. You're trying to change the subject at every possible opportunity, which just shows you're aware of how weak x86 is.
        I'm just saying that single core performance outside of gaming has not many uses. It's the reason why AMD's Bulldozer failed and while Intel's 285K is doing well in productivity benchmarks but if it doesn't do well in gaming then everyone hates the chip.

        That's bullshit. Web-browsing is typically dominated by just a couple threads, even if parts of it are more parallelized.
        I have hundreds of tabs open in FireFox and I'm not about to close them.
        Much of the computer boot, login, and desktop loading is dominated by a small number of threads. Most commandline tools are single-threaded. When I'm compiling stuff, most of the time I'm doing incremental builds that often involve just a few files that need to be recompiled and linked.
        Again, not saying single threaded performance doesn't matter, but it matters less today then in did in the past. Especially since more applications today can actually make use of those extra threads.
        So, you agree that it counts for something, even if you aren't sure how to measure it. Well, there are plenty of benchmark suites which try to capture that, but it generally correlates well with scalar, integer performance.
        Never said that single core performance doesn't count. My use for single threaded performance is still gaming. That's all gaming PCs are, just massive single threaded performance monsters.
        A 10% margin isn't enormous, but look at how much power they each used
        And yes, the x86 chips suck at it. Ryzen AI 9 HX 370 is the only chip that comes close in power consumption and it still used more than twice the power to get a score that's nearly twice as bad. Keep in mind that this is one of the few instances that happens, besides Geekbench and Cinebench. Even with Cinebench R23 we see the Ryzen AI 9 HX 370 beating the M3 Max in single core performance. The Apple M4 gets 2281 in R23 ST while Ryzen AI 9 HX 370 gets 2010 ST. Apple's M series chips gain a big boost in Cinebench R24. Ryzen AI 9 HX 370 is averaging 111 while Apple's M4 gets 175. So which version do we go by? Not all software hits the same.

        The only one that came close to the M4's power utilization is the HX 370, which strongly suggests that the thread got scheduled on a C-core.
        I doubt it. Unless we still have problems with which core gets used in applications on Linux. That's usually a Windows issue.
        The 285k that came closest used 3.8 times as much power. The 245k was 84.7% as fast as the M4, while using 2.9x times as much power.
        Intel is still a mess and that shouldn't shock anyone.
        The main thing that's hurt Apple silicon running Asahi, when Michael has tested it recently, is bad P-core vs. E-core scheduling. In general, MacOS doesn't confer the sort of advantage butt-hurt Apple-haters like to claim it does, as clearly shown here
        Anything problematic on Linux for Apple is entirely Apple's fault. Again, we depend on Vtubers to get Linux working on undocumented Apple Silicon. AMD and Intel both have many people working on ensuring Linux works the moment their hardware is released. Qualcomm has a similar problem, except that Qualcomm did hire people but waited until after their chips were released to market to tackle Linux.
        Wow. Just wow. You need to actually learn some things about computer architecture, or I guess just stick to looking at benchmarks and playing games, because that word salad is utter nonsense.
        If Apple silicon was good at single threaded performance then shouldn't it be the best at gaming? You're probably one of those people who think video games are a waste of time. If Apple was the performance winner then the PCMasterRace would become the AppleMasterRace. Apple isn't winning any gaming benchmarks. Don't say bad ports because PC has had bad ports since forever. Your single threaded argument doesn't hold water when it doesn't show up in the one place where it matters the most. Different workloads hit CPU's differently, which is a tale as old as time.

        Comment

        Working...
        X