Updated Ubuntu 24.10 Install Image Released For Snapdragon X1 Elite Laptops

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Dukenukemx
    replied
    Originally posted by coder View Post
    That's a different argument. You're trying to change the subject at every possible opportunity, which just shows you're aware of how weak x86 is.
    I'm just saying that single core performance outside of gaming has not many uses. It's the reason why AMD's Bulldozer failed and while Intel's 285K is doing well in productivity benchmarks but if it doesn't do well in gaming then everyone hates the chip.

    That's bullshit. Web-browsing is typically dominated by just a couple threads, even if parts of it are more parallelized.
    I have hundreds of tabs open in FireFox and I'm not about to close them.
    Much of the computer boot, login, and desktop loading is dominated by a small number of threads. Most commandline tools are single-threaded. When I'm compiling stuff, most of the time I'm doing incremental builds that often involve just a few files that need to be recompiled and linked.
    Again, not saying single threaded performance doesn't matter, but it matters less today then in did in the past. Especially since more applications today can actually make use of those extra threads.
    So, you agree that it counts for something, even if you aren't sure how to measure it. Well, there are plenty of benchmark suites which try to capture that, but it generally correlates well with scalar, integer performance.
    Never said that single core performance doesn't count. My use for single threaded performance is still gaming. That's all gaming PCs are, just massive single threaded performance monsters.
    A 10% margin isn't enormous, but look at how much power they each used
    And yes, the x86 chips suck at it. Ryzen AI 9 HX 370 is the only chip that comes close in power consumption and it still used more than twice the power to get a score that's nearly twice as bad. Keep in mind that this is one of the few instances that happens, besides Geekbench and Cinebench. Even with Cinebench R23 we see the Ryzen AI 9 HX 370 beating the M3 Max in single core performance. The Apple M4 gets 2281 in R23 ST while Ryzen AI 9 HX 370 gets 2010 ST. Apple's M series chips gain a big boost in Cinebench R24. Ryzen AI 9 HX 370 is averaging 111 while Apple's M4 gets 175. So which version do we go by? Not all software hits the same.

    The only one that came close to the M4's power utilization is the HX 370, which strongly suggests that the thread got scheduled on a C-core.
    I doubt it. Unless we still have problems with which core gets used in applications on Linux. That's usually a Windows issue.
    The 285k that came closest used 3.8 times as much power. The 245k was 84.7% as fast as the M4, while using 2.9x times as much power.
    Intel is still a mess and that shouldn't shock anyone.
    The main thing that's hurt Apple silicon running Asahi, when Michael has tested it recently, is bad P-core vs. E-core scheduling. In general, MacOS doesn't confer the sort of advantage butt-hurt Apple-haters like to claim it does, as clearly shown here
    Anything problematic on Linux for Apple is entirely Apple's fault. Again, we depend on Vtubers to get Linux working on undocumented Apple Silicon. AMD and Intel both have many people working on ensuring Linux works the moment their hardware is released. Qualcomm has a similar problem, except that Qualcomm did hire people but waited until after their chips were released to market to tackle Linux.
    Wow. Just wow. You need to actually learn some things about computer architecture, or I guess just stick to looking at benchmarks and playing games, because that word salad is utter nonsense.
    If Apple silicon was good at single threaded performance then shouldn't it be the best at gaming? You're probably one of those people who think video games are a waste of time. If Apple was the performance winner then the PCMasterRace would become the AppleMasterRace. Apple isn't winning any gaming benchmarks. Don't say bad ports because PC has had bad ports since forever. Your single threaded argument doesn't hold water when it doesn't show up in the one place where it matters the most. Different workloads hit CPU's differently, which is a tale as old as time.

    Leave a comment:


  • coder
    replied
    Originally posted by Dukenukemx View Post
    How many applications you use that makes use of single core performance?
    That's a different argument. You're trying to change the subject at every possible opportunity, which just shows you're aware of how weak x86 is. Decoding is a bottleneck for it, and there's just no way around that. AMD drew the line at 4-wide decoders and simply added a second one for SMT, in Zen 5. When only one thread is running on a core, just one 4-wide decoder is active, which is the same as it's been since Zen 1.

    Intel has gone wider, but they also have had to get really creative to work around the complexity of x86. Since Tremont, they've also been adding additional 3-wide decode blocks that can follow different branches.

    Meanwhile, the ARM folks are now up to 10-wide decode, last I checked. In fact, AArch64 is so cheap to decode that ARM even got rid of its MOP caches.

    Originally posted by Dukenukemx View Post
    ​Real use for single core performance still remains to be gaming and maybe Photoshop.
    That's bullshit. Web-browsing is typically dominated by just a couple threads, even if parts of it are more parallelized. Much of the computer boot, login, and desktop loading is dominated by a small number of threads. Most commandline tools are single-threaded. When I'm compiling stuff, most of the time I'm doing incremental builds that often involve just a few files that need to be recompiled and linked.

    That's not to say I don't care about multi-threaded performance, but I only really care about that when I'm working on larger software projects, which I'm currently doing only at my job.

    Originally posted by Dukenukemx View Post
    ​​That and how snappy computers feel which you can't realistically benchmark effectively.
    So, you agree that it counts for something, even if you aren't sure how to measure it. Well, there are plenty of benchmark suites which try to capture that, but it generally correlates well with scalar, integer performance.

    Originally posted by Dukenukemx View Post
    Hardly call the Flac benchmark a spanking against x86. Apple's M4 9.0 vs Intel's 285k 9.9. The Ryzen AI HX 370 got a 17.8, and that would be a spanking.
    A 10% margin isn't enormous, but look at how much power they each used:





    The only one that came close to the M4's power utilization is the HX 370, which strongly suggests that the thread got scheduled on a C-core. The 285k that came closest used 3.8 times as much power. The 245k was 84.7% as fast as the M4, while using 2.9x times as much power.

    Originally posted by Dukenukemx View Post
    The thing that sticks out is that they used a variety of OS's to run these benchmarks. Debian 12 along with Asahi Linux as well as MacOS. When Apple Silicon is running on Asahi Linux then it's slower compared to MacOS.
    The main thing that's hurt Apple silicon running Asahi, when Michael has tested it recently, is bad P-core vs. E-core scheduling. In general, MacOS doesn't confer the sort of advantage butt-hurt Apple-haters like to claim it does, as clearly shown here:

    Originally posted by Dukenukemx View Post
    Why you think AMD created 3D V-Cache or otherwise known at Level 3 cache?
    They did it for server CPUs and someone decided to try hacking together some desktop CPUs with some spare X3D dies and benchmarking it. When AMD discovered how well it ran games, the 5800X3D was born.

    "The reason AMD opted to research 3D-Vcache functionality on Ryzen in the first place, was due to an “accident” during the production of presumably prototype Epyc 3D-VCache chips where 7 CCDs were left over in a batch that couldn’t be utilized in an EPYC chip — since EPYC CPUs required 8 CCDs at the time.

    This led Mehra and his cohorts to re-purpose the seven V-Cache-equipped dies for desktop use, building out multiple designs including 8, 12, and 16 core variants. This is what lead AMD to research the capabilities of 3D-VCache in desktop workloads and discover the incredible gaming performance V-Cache offers, giving birth to the Ryzen 7 5800X3D."

    Source: https://www.tomshardware.com/news/am...ache-prototype


    Originally posted by Dukenukemx View Post
    The L3 cache is meant to help with single core performance with applications that work linear.
    Again, you're out of your depth, dude. Your understanding of these concepts is way too simplistic. It really has nothing to do with how "linear" the access patterns are, but rather the working set size and the degree of contention vs. other cores sharing the same L3 slice.

    Originally posted by Dukenukemx View Post
    The reason it doesn't benefit all applications is the same reason GPU's hardly have any cache. GPU's deal with math and math doesn't benefit much from branch prediction.
    Wow. Just wow. You need to actually learn some things about computer architecture, or I guess just stick to looking at benchmarks and playing games, because that word salad is utter nonsense.

    Originally posted by Dukenukemx View Post
    Doesn't Cinema4D make use of the GPU? How useful is it to only use the CPU?
    Traditionally, it was CPU-only. GPU support is fairly recent. I know some people still do CPU rendering when there scene is too large to fit in GPU memory, although I don't know to what extent this applies to Cinema4D.

    Leave a comment:


  • Dukenukemx
    replied
    Originally posted by coder View Post
    I promise you that Apple doesn't give a shit about Cinebench and definitely isn't optimizing for it.
    Nobody knows that. This is why synthetic benchmarks are sus.
    But 3DMark uses its own rendering engine, right? Cinebench is not a purpose-built benchmark, but rather a wrapper around the production renderer in Maxon's Cinema4D. I'll bet if 3D Mark simply used Unreal Engine or maybe Unity3D, it would correlate better with actual game performance.
    Nvidia has been caught cheating in 3DMark. It's not even that Apple might be cheating so much they might be optimizing just for those tests.
    It's not that I don't want more tests, but Michael simply isn't running the single-threaded benchmarks that would tell us how the individual cores compare, so we have to live with what we've got.
    How many applications you use that makes use of single core performance? That's why I wouldn't get a Lunar Lake based laptop because of the terrible multicore performance. Real use for single core performance still remains to be gaming and maybe Photoshop. That and how snappy computers feel which you can't realistically benchmark effectively.
    In his M4 mini review, we got only one single-threaded benchmark where it did indeed spank the x86 crew, but someone pointed out that it was a compression benchmark and we can't rule out the possibility that the M4 simply won by virtue of fitting more of the tables in its L2 cache.
    Hardly call the Flac benchmark a spanking against x86. Apple's M4 9.0 vs Intel's 285k 9.9. The Ryzen AI HX 370 got a 17.8, and that would be a spanking.
    Probably the first thing you're going to point out is how the Zen 5 desktop CPUs beat M3 Pro. That's a laptop CPU, however. If you compare it to the HX 370, Zen 5 ain't looking so good.
    Not really. The thing that sticks out is that they used a variety of OS's to run these benchmarks. Debian 12 along with Asahi Linux as well as MacOS. When Apple Silicon is running on Asahi Linux then it's slower compared to MacOS. Also faster ram really helps the 9800X3D.
    No, just because games correlate well with single-thread performance doesn't make them single-threaded benchmarks. All games are multi-threaded, which is the first complication they pose, since that makes them susceptible to sub-optimal scheduling and frequency-scaling issues. The next issue is that they're doing lots of I/O and synchronization with the GPU. None of these are factors you want to deal with, when you're trying to tease out subtle differences between CPU microarchitectures.
    Most games tend to use one or two cores at 100% and then spread out other tasks to other cores. Why you think AMD created 3D V-Cache or otherwise known at Level 3 cache? Not all single threaded applications hit the CPU the same.
    Your understanding is too simplistic. Michael ran plenty of benchmarks on X3D CPUs and it has a wide diversity of impacts. Some of them love the extra L3 cache and others are unaffected by it. The principle factor is how much of the working set can fit in the L3 cache with/without the extra cache die. If the added cache die doesn't make a meaningful difference (either because most of the working set fit in the smaller L3 capacity or because the working set is so huge that the extra L3 cache hardly makes a dent), then the benchmark is simply going to prefer the CPU with a higher clock speed.
    The L3 cache is meant to help with single core performance with applications that work linear. The reason it doesn't benefit all applications is the same reason GPU's hardly have any cache. GPU's deal with math and math doesn't benefit much from branch prediction.
    It's not worthless, because it measures an actual application, which is Cinema4D. It correlates well with the performance of other renderers. Finally, most other programs people use don't employ AVX-512, either.
    Doesn't Cinema4D make use of the GPU? How useful is it to only use the CPU?


    Leave a comment:


  • coder
    replied
    Originally posted by Dukenukemx View Post
    It's not the reviewers so much as companies catering their hardware and software to maximize them.
    I promise you that Apple doesn't give a shit about Cinebench and definitely isn't optimizing for it.

    Originally posted by Dukenukemx View Post
    ​It's much easier to spot this with Qualcomm's Snapdragon X chips where they do perform well in 3D Mark, but are horrible at 3D outside of it. This is why the PCMR frowns upon synthetic tests because they are easy to if not cheat then optimize specifically for those tests.
    But 3DMark uses its own rendering engine, right? Cinebench is not a purpose-built benchmark, but rather a wrapper around the production renderer in Maxon's Cinema4D. I'll bet if 3D Mark simply used Unreal Engine or maybe Unity3D, it would correlate better with actual game performance.

    Originally posted by Dukenukemx View Post
    ​​This is why doing lots of tests will give you a better idea of the performance, compared to Geekbench and Cinebench. Lets be honest here, but the tests done by Michael are more realistic than anything you could get from Geekbench and Cinebench.
    It's not that I don't want more tests, but Michael simply isn't running the single-threaded benchmarks that would tell us how the individual cores compare, so we have to live with what we've got. In his M4 mini review, we got only one single-threaded benchmark where it did indeed spank the x86 crew, but someone pointed out that it was a compression benchmark and we can't rule out the possibility that the M4 simply won by virtue of fitting more of the tables in its L2 cache.

    However, I've found some SPECint2017 rate-1 benchmarks that include a nice diversity of CPUs:
    Probably the first thing you're going to point out is how the Zen 5 desktop CPUs beat M3 Pro. That's a laptop CPU, however. If you compare it to the HX 370, Zen 5 ain't looking so good.

    He also went to the trouble of computing perf/MHz, which is a rough estimate of IPC. I think this is highly enlightening:




    Originally posted by Dukenukemx View Post
    ​​​The best single threaded benchmarks are games as games heavily depend on good single-threaded performance.
    No, just because games correlate well with single-thread performance doesn't make them single-threaded benchmarks. All games are multi-threaded, which is the first complication they pose, since that makes them susceptible to sub-optimal scheduling and frequency-scaling issues. The next issue is that they're doing lots of I/O and synchronization with the GPU. None of these are factors you want to deal with, when you're trying to tease out subtle differences between CPU microarchitectures.

    Originally posted by Dukenukemx View Post
    ​​​Because Cinebench says so? There's a reason why I mentioned that AMD's V-Cache has no benefits in Cinebench. Here we see that the 9800X3D is slower than then 9700X in Cinebench. It makes no sense since the level 3 cache is meant to boost single threaded performance?
    Your understanding is too simplistic. Michael ran plenty of benchmarks on X3D CPUs and it has a wide diversity of impacts. Some of them love the extra L3 cache and others are unaffected by it. The principle factor is how much of the working set can fit in the L3 cache with/without the extra cache die. If the added cache die doesn't make a meaningful difference (either because most of the working set fit in the smaller L3 capacity or because the working set is so huge that the extra L3 cache hardly makes a dent), then the benchmark is simply going to prefer the CPU with a higher clock speed.

    Originally posted by Dukenukemx View Post
    ​​​even the ARM CPU manufacturers are starting to lean away from efficiency cores.
    You mean because they're moving away from including A5xx cores? What happened is basically an inflationary situation. ARM added the X tier of cores, which sit above the A7xx tier. SoC vendors became so hungry for performance that they eagerly embraced them and they became the new P-cores, while the A7xx tier became the new E-core. A5xx is becoming less relevant, because ARM is heavily optimizing them for area and energy efficiency, but that makes them so slow that they become a scheduling hazard for general-purpose workloads. Basically the A5xx cores have inherited the role formerly played by the A3xx tier.

    Originally posted by Dukenukemx View Post
    ​​​Cinebench is worthless because it doesn't take advantage of modern CPU designs like AMD's Zen5. Since tech reviewers are lazy, especially Apple reviewers, they tend to just run it and analyze it for 30 minutes while declaring a winner.
    It's not worthless, because it measures an actual application, which is Cinema4D. It correlates well with the performance of other renderers. Finally, most other programs people use don't employ AVX-512, either.

    Furthermore, Maxon releases new versions of Cinema4D every couple years. Perhaps the next release will make better use of AVX-512 & AVX10.

    Originally posted by Dukenukemx View Post
    That turned out to be either fraudulent or at least a gross misunderstanding due to comparing against a baseline with compiler optimizations completely disabled. The code they were tweeting about wasn't even ffmpeg, which is why I say it could've just been a misunderstanding.

    More to the point, if you just look at the speedup gained by the other hand-coded versions, you can see that most of the benefits are gained simply by going to SSSE3. The actual improvements between AVX2 and AVX-512 were: -3.2%, 15.2%, 96.6%, and 40.2%. Except for the first one, which was actually a regression, those aren't small improvements. However, these were micro-benchmarks that measured basically a single loop. The overall benefit to AV1 decoding performance would be much smaller.

    I'm not saying it's not a good thing. SVE shares many of its key features, like per-lane predication. And yes, neither Snapdragon X nor Apple have SVE, yet. It's a good bet this will change, so if we're talking about ISA differences, I don't put AVX-512 in the "win" column for x86. Plenty of CPUs, like Amazon's Graviton 4, Nvidia's Grace, Google's Axion, not to mention the last couple generations of phone SoCs have cores supporting SVE2.

    Originally posted by Dukenukemx View Post
    This is why distros like CachyOS are using a V4 repository to boost performance, because V4 tries to make use of AVX-512. Like I said, AVX-512 is just hardly used and this includes games. I don't think there's a single game that uses it.
    To get the most benefit from it, code either needs to be written with asm/intrinsics, or it needs to at least be written in a way that's easy for the compiler to vectorize. As most code doesn't meet that standard, its overall benefits aren't large. In specific niches, it's a big win, but I don't have a CPU with AVX-512 and I'm not running out to buy one anytime soon.

    Originally posted by Dukenukemx View Post
    Variety is how you do benchmarks. Would you have preferred that he was sponsored? I personally avoid benchmarks that involve sponsorships. What would you recommend then? Please don't say Cinebench.
    A couple or 3 years ago, Michael mentioned that it would take over a month to run all of PTS on a high-end CPU. So, he has very many benchmarks to choose from. With so many possible benchmarks to run, he does weird things like include a dozen OpenVINO test cases, which really favor AVX-512. Back in the Zen 4 era, I recomputed the Geomean in one of his review by excluding them, and found that they were significantly skewing it.

    Now, we can only guess at his methodology for picking which benchmarks to run and how many cases of each, because there's rarely any transparency, there. As for sponsorship, most of the hardware used for his benchmarks is donated by the manufacturers. While he sometimes buys laptops, mini-PCs, or GPUs, he cannot possibly afford to buy big server CPUs or systems on his own dime.

    He's also not transparent about who donates to the site, or how much. So, we can't even say there are no financial ties either directly to the companies or their employees.

    With all that said, I still appreciate Phoronix. I just mention it because you have to be circumspect and thoughtful about what it's showing.

    Originally posted by Dukenukemx View Post
    Do me a favor and tell me what I should be paying attention too?
    Which benchmarks he runs in which articles. Depending on the focus of the article, sometimes the selection is obvious and logical. Other times, it seems quite a bit more arbitrary and like it could be tilting the scales.

    Leave a comment:


  • Dukenukemx
    replied
    Originally posted by coder View Post
    How? There are 3rd party reviewers running these benchmarks, so how is Apple going to manipulate them?
    It's not the reviewers so much as companies catering their hardware and software to maximize them. Some reviewers do try to screw results like one guy tested some games on the M4 Max vs a laptop with an RTX 4090m. Except the laptop has an Ultra 9 185H which is not a CPU you'd want to couple with an RTX 4090m. Synthetic tests have historically had manufacturers cater as much as possible to make themselves look good. It's much easier to spot this with Qualcomm's Snapdragon X chips where they do perform well in 3D Mark, but are horrible at 3D outside of it. This is why the PCMR frowns upon synthetic tests because they are easy to if not cheat then optimize specifically for those tests.
    This is a mismatched comparison, because obviously the thing with more cores & threads is going to be more efficient at scale. If you're comparing two products against each other to understand how things like ISA, manufacturing, and microarchitectural differences affect performance, then you need to use workloads that don't unfairly favor one vs. the other on the basis of thread or core count.
    This is why doing lots of tests will give you a better idea of the performance, compared to Geekbench and Cinebench. Lets be honest here, but the tests done by Michael are more realistic than anything you could get from Geekbench and Cinebench.
    This is why Lunar Lake is perfect point of comparison. When comparing anything else vs. M3, if the goal is really to support detailed analysis, then the next best option is to use single-threaded benchmarks.
    The best single threaded benchmarks are games as games heavily depend on good single-threaded performance. How many games you see perform better on M4's, if at all? We'll get back to that later.
    This is a more interesting subject, for me. Where initial benchmarks showed the weakest results on Arrow Lake, I think it had a lot to do with P-core vs. E-core scheduling. Because it reuses the problematic I/O tile of Meteor Lake, things like memory latency seem to be an issue for it. For these reasons, I plan to skip Arrow Lake.

    But, this discussion isn't really about Arrow Lake, anyhow.
    You generally don't see these problems on Linux. This has been the case for both AMD's Zen5 and now Intel's Arrow Lake. Arrow Lake is particularly bad because Intel didn't save that much power, but at least AMD was able to get a good 30% power savings from Zen5 compared to Zen4.

    But Zen 5's single-threaded performance generally lags both the M3's and Arrow Lake's.
    Because Cinebench says so? There's a reason why I mentioned that AMD's V-Cache has no benefits in Cinebench. Here we see that the 9800X3D is slower than then 9700X in Cinebench. It makes no sense since the level 3 cache is meant to boost single threaded performance? Cinebench is a math heavy application that can run code in any order, which means branch prediction has not much use here. AMD's 3D V-Cache is meant to boost performance with code that has in order traversal. You know, if else, while loop and etc. You put any M4 against AMD's 9800X3D in a game and there's no chance it'll come close in performance. Games love single threaded performance and Apple's M4's have it in spades, so why is Apple terrible when it comes to gaming? Include something like AVX-512 which no game uses as far as I know, but RPCS3 does, as do some other emulators and again the M4 can't match the performance. The reason for this is the lack of SVE2. You can pick and choose your battles and will certainly win, but that's the problem with Cinebench. You run a verity of real world tests and then you can determine the benefits of the hardware.



    With their P-cores, AMD is more concerned about perf/area and perf/W than Intel. Because Intel has E-cores, they lean harder into just making their P-cores fast, yet they still struggle against Apple.
    I don't think Intel even knows what they want. For the most part Intel is copying Apple and hoping to beat them at their own game. AMD hasn't gone that route and even the ARM CPU manufacturers are starting to lean away from efficiency cores. I think AMD has the right idea.
    Yeah? That just underscores that you need a diversity of benchmarks to fully characterize a CPU. It doesn't tell us that Cinebench is worthless. If one benchmark could tell us everything about a CPU, then Michael wouldn't need thousands of them in PTS.
    Cinebench is worthless because it doesn't take advantage of modern CPU designs like AMD's Zen5. Since tech reviewers are lazy, especially Apple reviewers, they tend to just run it and analyze it for 30 minutes while declaring a winner.
    That just makes the M4's vector/FP performance that much more impressive!
    Except when AVX-512 is used then Apple's vector performance looks terrible. The problem with AVX-512 is that the applications that can benefit from it, haven't been updated to do so. Look at FFMPEG where they gained a 94x after implementing AVX-512. Or just read the blog from the RPCS3 developer where he shows the benefits. This is why distros like CachyOS are using a V4 repository to boost performance, because V4 tries to make use of AVX-512. Like I said, AVX-512 is just hardly used and this includes games. I don't think there's a single game that uses it.

    From left to right: SSE2, SSE4.1, AVX2/FMA, and Icelake tier AVX-512.

    Of-fucking-course he is! He knows which tests tend to favor which types of CPUs and he knows how to play to his audience and/or sponsors. He bought this Mac on his own dime, which means he's under no obligation or pressure to show it in a good light. He provides virtually no transparency into his benchmark selection and it varies quite a lot, from one article to the next!
    Variety is how you do benchmarks. Would you have preferred that he was sponsored? I personally avoid benchmarks that involve sponsorships. What would you recommend then? Please don't say Cinebench.
    This just shows you haven't been paying attention.
    Do me a favor and tell me what I should be paying attention too? Michael even showed power consumption which heavily favored Apple.
    Last edited by Dukenukemx; 01 December 2024, 02:36 AM.

    Leave a comment:


  • coder
    replied
    Originally posted by Dukenukemx View Post
    They're both synthetic tests, and that's easy to manipulate.
    How? There are 3rd party reviewers running these benchmarks, so how is Apple going to manipulate them?

    Originally posted by Dukenukemx View Post
    ​AMD's Strix Point were sometimes more power efficient.
    This is a mismatched comparison, because obviously the thing with more cores & threads is going to be more efficient at scale. If you're comparing two products against each other to understand how things like ISA, manufacturing, and microarchitectural differences affect performance, then you need to use workloads that don't unfairly favor one vs. the other on the basis of thread or core count.

    This is why Lunar Lake is perfect point of comparison. When comparing anything else vs. M3, if the goal is really to support detailed analysis, then the next best option is to use single-threaded benchmarks.

    Originally posted by Dukenukemx View Post
    ​​You do know that Apple doesn't cater to the FOSS community either?
    Yes. I already said I'm not a fan of them, as a company. I never have and never will buy their products, even just to run Linux on them. I don't want any part of that whole ecosystem.

    Originally posted by Dukenukemx View Post
    ​​​Yes but, Intel is still a mess. Look at their Arrow Lake chips and how bad the performance is and how little power savings there is compared to 14900K.
    This is a more interesting subject, for me. Where initial benchmarks showed the weakest results on Arrow Lake, I think it had a lot to do with P-core vs. E-core scheduling. Because it reuses the problematic I/O tile of Meteor Lake, things like memory latency seem to be an issue for it. For these reasons, I plan to skip Arrow Lake.

    But, this discussion isn't really about Arrow Lake, anyhow.

    Originally posted by Dukenukemx View Post
    ​​​​Intel's x86 is 82% bigger, but not AMD.
    But Zen 5's single-threaded performance generally lags both the M3's and Arrow Lake's.





    With their P-cores, AMD is more concerned about perf/area and perf/W than Intel. Because Intel has E-cores, they lean harder into just making their P-cores fast, yet they still struggle against Apple.

    Originally posted by Dukenukemx View Post
    ​​​​​AMD is even putting in V-Cache in their X3D chips that increases the size massively, but also doesn't show any performance increases in Cinebench. It does show up for games and certain applications.
    Yeah? That just underscores that you need a diversity of benchmarks to fully characterize a CPU. It doesn't tell us that Cinebench is worthless. If one benchmark could tell us everything about a CPU, then Michael wouldn't need thousands of them in PTS.

    Originally posted by Dukenukemx View Post
    ​​​​​​AMD also supports AVX-512 in all their cores, while Apple's M4's aren't even using SVE let alone SVE2.
    That just makes the M4's vector/FP performance that much more impressive!

    Originally posted by Dukenukemx View Post
    ​​​​​​​I don't think Michael is cherry picking anything.
    Of-fucking-course he is! He knows which tests tend to favor which types of CPUs and he knows how to play to his audience and/or sponsors. He bought this Mac on his own dime, which means he's under no obligation or pressure to show it in a good light. He provides virtually no transparency into his benchmark selection and it varies quite a lot, from one article to the next!

    Originally posted by Dukenukemx View Post
    ​​​​​​​​I think he just ran his usual tests and that's it.
    This just shows you haven't been paying attention.

    Leave a comment:


  • dfyt
    replied
    I was gifted a MacBook Pro M3 max 2 weeks ago. When it comes to performance what people ignore is the horrendous throttling on it. I was running an av1 encode and sits on 118C. Speed after about 20 mins drops to about 1/3. On the Apple threads they always defend but the max beats 9950x. Benchmarks rarely factor throttling of real workloads. I so so so wish I could run Linux. It's like a glorified paper weight and the lack of a numpad, home, delete keys, lack of type A usb. My gosh it sucks.
    Last edited by dfyt; 30 November 2024, 01:12 PM.

    Leave a comment:


  • Dukenukemx
    replied
    Originally posted by coder View Post
    Cinebench certainly represents real world performance, since it's based on a production renderer. Geekbench indeed has weird MT scaling, so I just look at the ST numbers.
    They're both synthetic tests, and that's easy to manipulate. Hardly anyone benchmarking Apple products runs real world applications. Strix Point is just better against Apple's M3.
    That's not a good comparison, since it included only one other laptop SoC and that was AMD's top-end Strix Point model that has more cores and used about double the power.
    AMD's Strix Point were sometimes more power efficient. Specifically Kvazaar and there were others where it was close. There were other tests where it used 3x more power too, but again it's on 4nm. Even Intel's Lunar Lake was on an older 3nm process compared to M4. I'm not sure if AMD's new Strix Halo is going to use 3nm, but it will be their new top-end mobile chip.
    If you strictly compare either on the basis of single-threaded tests or vs. something like Lunar Lake, that has the same number of cores & threads, Apple comes out well ahead. The other wildcard is the amount of optimization, like hand-coded AVX2 or AVX-512.
    We know Lunar Lake isn't exactly a performer. Especially when it comes to multi-threaded performance. Who at Intel thought it was a good idea to remove Hyper Threading?
    Right at the top of the front page, it says:

    Latest Linux Hardware Reviews, Open-Source News & Benchmarks





    Linux and Open-Source. So, a bad idea to assume someone on here gives a shit about proprietary software, because the site caters to the FOSS community. I'm not saying nobody does, but enough of us don't that you can't just assume someone does.
    You do know that Apple doesn't cater to the FOSS community either? If you know Apple you know that their dealings with open source is terrible. Apple doesn't exactly open source MacOS, because otherwise we'd have distros based on it. The industry is moving towards open source and here we are with Apple who's behind the times. As for closed software, it's not like the industry is going to start to open source it. Would be great, but if you do any professional work then you need closed source software, like Adobe.
    Intel's Lunar Lake is the most efficient x86, at least in the ballpark of 4P + 4E cores and if you pick a mid-range, like the 256V (which Michael has and mysteriously omitted from that M4 comparison).
    Yes but, Intel is still a mess. Look at their Arrow Lake chips and how bad the performance is and how little power savings there is compared to 14900K. Much like AMD, they are finding hidden performance that might uplift something like the 285K.
    Same. That's because I don't like Apple as a company. However, I think their hardware is a good example of what's possible.
    I think the industry needs to benchmark their hardware better to really see what is possible. I'm not convinced that Apple should be looked up to when it comes to good CPU performance. Cinebench is still a bad benchmark, but things like Davinci Resolve and games are good examples of real use cases.

    Depends on what for. Lunar Lake is held back by 8 cores / 8 threads. So, for a software development machine, I'd probably also prefer a HX 370. However, if I'm mainly using it for video calls, productivity apps, web, and remote access, then I'd go with Lunar Lake for sure.
    I don't like the idea of limiting what I can do with my hardware. The only reason to go with Lunar Lake is better power efficiency, and I don't often find myself that far away from an outlet when using a laptop.
    Where the fuck did you get that number? Even just the compute tile of Lunar Lake is 140 mm^2! The I/O die adds another 80 mm^2, for a total of 219.7 mm^2.
    Yea you're right, it's not 100mm^2. I don't think it's 219.7 either. This is all from the Lunar Lake wiki. As for M3 it might be closer to 150mm^2. I initially went with Google AI search results and they weren't accurate.
    TSMC N3B 140mm2
    TSMC N6 46mm2
    That's also a misleading comparison, because of things like iGPUs and NPUs dominating the dies (or the M3's at least) and we're just talking about CPU cores here, not GPU. If you compare the CPU cores, the M3's P-core is 2.49 mm^2, while Lunar Lake's is 4.53 mm^2. That puts the x86 core at 81.9% bigger!
    Intel's x86 is 82% bigger, but not AMD. AMD is on 4nm and even their Zen5c cores aren't entirely cut down either. AMD is even putting in V-Cache in their X3D chips that increases the size massively, but also doesn't show any performance increases in Cinebench. It does show up for games and certain applications. AMD also supports AVX-512 in all their cores, while Apple's M4's aren't even using SVE let alone SVE2. But hey, at least Apple took the time to include SME which Geekbench happily tested and gave 400 points towards Apple's M4's. Sometimes comparing cores isn't exactly useful when different cores have different use cases. AMD and Intel will pump a lot of cache because people do play games on these CPU's and cache really benefits games. AVX-512 also benefits games... more specifically emulators.
    I don't care about Apple, as a company, or even ARM, for that matter. I'm just interested in Apple as an example of what's technically possible and I'm interested in ARM because it has the maturity to be a practical alternative to x86. Even if there are a few benchmarks Michael can cherry-pick that run better on x86, it doesn't negate the fact that most software and tools have excellent ARM support, following from the decades of work that's gone into the mobile and then server software ecosystems on ARM.
    I don't think Michael is cherry picking anything. I think he just ran his usual tests and that's it. The problem with x86 is Windows as it really sucks for performance. Even though he tested Ubuntu 24.04 with a 6.10 kernel, it's going to perform a lot faster compared to Windows, especially Windows 11. This is why his tests were shocking because not many people expected that difference when reviewers with Windows 11 laptops running Cinebench and Geekbench were getting vastly different results. Keep in mind that distros like CachyOS which cater to V3/V4 plus optimized kernels may just further that gap. Windows has been a sh*tshow for AMD and Intel recently. They both keep finding hidden performance from stupid things like an admin account and a patch that was delayed. The CPU performance is still faster on Linux. Like I said Davinci Resolve was 15% faster on Linux... from 5 years ago. That could be the difference of Apple being faster or AMD/Intel being faster. People really need to do more tests.
    Last edited by Dukenukemx; 30 November 2024, 09:49 AM.

    Leave a comment:


  • coder
    replied
    Originally posted by bernstein View Post
    ​​No it shows, that Apple either had a bigger engineering budget or made more out of the budget. Also it shows Qualcomm had a much smaller budget (or did worse engineering) than Apple.​
    Lunar Lake is Intel, not Qualcomm. Apple has deep pockets, but I seriously doubt their CPU core design team is bigger than Intel's. But, it's clear from this error that you don't even know what you're talking about, so I'll just leave it at that.

    Originally posted by bernstein View Post
    ​​​AWS is heavily subsidizing ARM because they want more competition in the server cpu space
    You can believe this if you want, but I think it doesn't make economic sense for Amazon to take big losses in hope of gaining slightly better pricing, in the future. Most of Amazon's server fleet is now ARM-based Graviton CPUs, which isn't something they'd do without deriving an economic benefit by it.

    Leave a comment:


  • bernstein
    replied
    Originally posted by coder View Post
    Apple M3 vs. Lunar Lake shows that ARM cores can provide performance better on the same process node.
    Originally posted by coder View Post
    Typical PC Master Race copium. No, look at Apple M3 vs. Lunar Lake. Exact same process node. Lunar Lake is the most power-efficient x86 and the M3 still beats it.
    ​​No it shows, that Apple either had a bigger engineering budget or made more out of the budget. Also it shows Qualcomm had a much smaller budget (or did worse engineering) than Apple.
    Originally posted by coder View Post
    ​This is also a lie. Lunar Lake cores are much bigger than the M3's. Lunar Lake has significantly more total cache, as well.​
    ​TL;DR. The efficiency advantage of apple boils down to: a) vertical integration of soft- & hardware b) node advantage
    Originally posted by coder View Post
    ​​The spot pricing on AWS Graviton 4 instances suggest that TCO of ARM is still lower than any x86 options.​
    ​AWS is heavily subsidizing ARM because they want more competition in the server cpu space (than the intel/amd duopoly). Doing your own ARM SoC design is way more expensive than buying the relatively few Graviton's they build at TSMC.
    Originally posted by coder View Post
    ​​> ​​​for the foreseeable x86 will remain the default in server.
    Wishing for a thing and saying it doesn't make it so.​​
    ​Certainly not, but industry shifts like that takes at least a decade. Likely far longer.
    Originally posted by coder View Post
    > x86 is still miles ahead in software support,
    Not in anything I care about.
    Hardware independence is probably the one you care the most about. Currently for every ARM SBC the image have to be built separately. There is no system in place to handle with the different device tree's. That's why it's comparatively cumbersome to support ARM SBCs when comparing to x86. Not too much of a problem for current hardware, huge problem when it's ten years old, because unless it was VERY popular (like the Pi3) no-one will continue building images.

    Leave a comment:

Working...
X