Announcement

**carewolf** · 07 June 2022, 03:49 AM

Originally posted by WannaBeOCer View Post

What's wrong with being stuck on macOS aside from gaming?

That it is shit?

**mdedetrich** · 07 June 2022, 04:51 AM

Originally posted by torsionbar28 View Post

I don't think that's exactly the case, as the AMD and intel products are certainly capable of this behavior. It's just that it isn't their default configuration. AMD and intel are going for maximum processing power within the available TDP envelope. So long as there is thermal headroom available, they ramp up the clocks to consume it.

No they really aren't, nothing can currently compete with Apple when it comes to performance per watt and there has been extensive testing to show this.

**jaxa** · 07 June 2022, 05:00 AM

Originally posted by tildearrow View Post

What about idle? Majority of desktop/server/workstation x86 motherboards idle at 5-25W, whereas M1 and Raspberry Pi idle at <5W.

I don't think most people care about idle being less than an incandescent light bulb, unless there is a need to put x86 in around a Pico-ITX form factor, which is doable with 6-15W TDP x86 chips. It's the pushing of CPUs and GPUs to absurd power levels which is going to get noticed. And AMD is following in Intel's footsteps on CPU power usage, by raising the maximum TDP to 170W on the AM5 socket.

It looks like idle could come down a bit for Zen 4 chiplet-based designs, because of the move from a 14nm to 6nm I/O die. Motherboard will depend on the chipset layout. X670/X670E will use dual chipsets which will improve signal integrity for PCIe 5.0 but presumably double power consumption in comparison to B650/A620 (+5-7 Watts?).

https://www.anandtech.com/show/17399...m5-coming-fall

The new IOD also affords AMD the opportunity for some significant platform power savings. Not only is TSMC's 6nm process well ahead of GlobalFoundries' old 14nm process, but the design process has allowed AMD to incorporate many of the power-saving technologies that were first developed for the Ryzen 6000 Mobile series, such as additional low power states and active power management capabilities. As a result, Ryzen 7000 should fare much better at idle and low utilization workloads, and it's a reasonable assumption to see the IOD drawing less power at load, as well (at least with graphics disabled). Though at full load, with up to 16 cores running at over 5GHz, the CCDs are still going to draw a lot of power.

On the power delivery front, AMD has confirmed that AM5 will support AMD's Serial Voltage 3 (SVI3) standard. First introduced as part of the Ryzen 6000 Mobile series, SVI3 allows for finer grained power control and significantly faster voltage response capabilities. And for desktop boards in particular, SVI3 also supports a larger number of power phases, which will be especially useful for high-end X670E motherboards.

I think I saw a table somewhere listing chipset power consumption but I couldn't find it.

**luno** · 07 June 2022, 05:00 AM

Originally posted by birdie View Post

It can decode AV1 4K@120fps in software without breaking a sweat.

that is true, but native support will make it more power efficient and faster, I think they will add AV1 when it is supported in all plateforms and apps, also it seems like it will come in IPhone first.

**luno** · 07 June 2022, 05:05 AM

Originally posted by qarium View Post

this all has a logical error: linux on apple M1 is faster on CPU tasks than MACOS...

apple can compensate the inferior product "macos" with their good hardware.

but just think about this: what if apple switch to the linux kernel ?...

isn't their Kernel Open Source too ?

**mdedetrich** · 07 June 2022, 05:09 AM

Originally posted by benjiro View Post

https://www.notebookcheck.net/AMD-Ry....623763.0.html

People still believe that stories or what...

Asus ZenBook S 13	Ryzen 7 6800U	28W	25.5W	10468	374 / 411
Apple MacBook Pro 14	M1 Pro (8 Cores)	25W	21W	9581	383 / 456

Its barely a difference of a few percentage when comparing the power vs performance on MT load. Alder Lake is a power drainer, not AMD.

Apple their benefits are that they hide a lot of their performance under their ( oa media ) encoding engine for a lot of tasks. When the CPU is put pure on CPU tasks, that famous power efficiency scales very close. Notice how a 6nm vs 5nm are pulling very close the same power in MT tasks.

Apple gains are mostly in the ST tasks where its reporting better performance on single core tasks. We see 4W * 4 + Efficiency cores (21W) result in 9581 in MT. But why can AMD deliver 10468 on a 25W power budget because X86 when not turbo boosted to hell, is actually very efficient.

Asus ZenBook S 13	Ryzen 7 6800U	28W	25.5W	10468	374 / 411
Apple MacBook Pro 14	M1 Pro (8 Cores)	25W	21W	9581	383 / 456

Asus Zenbook S 13 Flüstermodus

Ryzen 7 6800U

12W

10W

6725

560 / 672

Take a look at the 10W result in MT for AMD resulting in 6725. How is it possible for AMD to use only 10W and still deliver 70% of the performance that took Apple M1 21W?? Its funny these results are they not. And that is on 6nm, a process that is not supposed to deliver a increase in power efficiency, compared to 7nm. Unlike 5nm that gives Apple a 20% gain.

Its been clear for YEARS that AMD and Intel have been turbo boosting their CPU way too much for ST tasks. So why is AMD so efficient in MT tasks? Because CPU are not designed for laptop first, or desktop first. Its server first. Where you want great MT performance at the best possible power usage ( most server CPUs that use the exact same cores are sold with very conservative clock speeds for that reason ).

Then those CPUs get filtered down to desktop, where they need to show great benchmark/gaming results, so there goes the clock speed up because that is the most easy way to reuse the same design. That CPU then needs to conform to laptops and well, your just trying to shoehorn server / desktop designed CPUs into laptops. And that becomes harder and harder but when you really analyse the result on a more equal playing field, ARM is not that special.

We already see how Smartphone are becoming hotboxes because of that same drive for more performance at any cost, despite it also being ARM technology.

I don't know if we are reading the same article at https://www.notebookcheck.net/AMD-Ry....623763.0.html but it clearly shows that M1 is the most efficient chip per watt in all of their tests and thats not only with hardware accelerated tasks.

If your argument is that specifically at 10W AMD is faster well yeah, the firestorm cores which is what apples use for the M1 aren't designed for such high efficiency at such low TDP and the Apple laptops contain both firestorm and icestorm cores (icestorm cores are designed for maximum power at such low TDP's).

This makes sense though because its very rare that laptops need to run at such low TDP when doing compute intensive tasks, the Apple engineers did a tradeoff and optimizing for 10W (at the cost of the 50-70W sweet spot). The problem here can simply be put that Apple M1 cannot be set to "only" use icestorm cores since thats a very typical contrived scenario. Unless you can somehow trick the Apple M1 into only running icestorm cores for whatever you are benchmarking its not surprising you will get these results.

**sinepgib** · 07 June 2022, 05:11 AM

Originally posted by qarium View Post

you say modern systems does not "does a good job of parallelization" and this is right if you consider the fact that there are single socket 128core systems and if you use todays software stack on these 128core cpus then the result on every day tasks is that it can not utilize the 128cores. but thats not the point at all because it is pointless the big part of the market do not buy 128core cpus.

No, again. Systems vs applications. Applications, individually, can't tell if most do or do not a good job of parallelization. But your claim was that they did. Systems do. Systems do an awesome work at parallelization. Mostly because it's easier to do so: you just need many processes running (pretty much any system for which these chips are targetted for do) and those processes are quite light in terms of communication and sharing with each other, so as long as the OS doesn't trash the cache they should be really really happy using a multicore computer.
Seriously, take a breath, read what I write, then respond, because I'm not contradicting the main thesis at all.

Originally posted by qarium View Post

you want these numbers: "but statistics about multicore applications and how they scale." you already have it without you know it.

the same reason why the main part of the market is not buy 128core cpus is the same reason why they buy 8core cpus.

they buy 8core cpus because the today software stack is scaling on these 8 core cpus. they do not even buy 12core or 16core cpus because this does not """scale"""

That's your explanation. My explanation is that most users won't fully utilize 128 cores ever. It has less to do with how the applications are programmed than it has to do with sheer load. And those systems are mad expensive, so you really need to convince me to pay for one when I have the option of going for 8 cores at a fraction of the price.

Originally posted by qarium View Post

"Kernel calls are not multicore from the POV of Python:"

whatever python calls from the kernel the kernel does multicore internally.

Originally posted by qarium View Post

"Essentially, pure Python can only parallelize IO."

yeah right first you say python can not do multicore and now you claim python can do multicore for IQ

No. I used parallelism in a vague sense there. You generally talk about concurrent threads when all threads make progress after a certain time. They may do so by proper CPU parallelism (multicore execution is one case) and by interleaving execution. What Python does by default (i.e. without going to a native extension to release the GIL) is interleaving execution, but it's smart enough to release the GIL when what's going to happen is IO. IO doesn't use the CPU (well, a few instructions obviously) but waits for external events instead. While that IO is in progress and the thread is locked, because the GIL is released the OS scheduler may (and unless your load is absurd, will) run another CPU Python thread, but here's the thing: it will be the only Python thread running at that time. It is not multicore.

Originally posted by qarium View Post

your system run multible python programms in parallel... just in case you miss this point.

Sorry, but that's precisely the point you've been missing all along.

Originally posted by qarium View Post

"Sorry, but what do you suppose my theory is, exactly?"

in my point of view your theory is single core performance is all you need because of this you buy "overpowered single core cpu" well because this is no longer in the market you buy any cpu with the highest single core performance in the market.

It was rethorical. I noticed that's what you think I'm proposing. I stated several many times that I'm not, and that the only thing I disagree is about what individual applications make of it.

Originally posted by qarium View Post

"in fact I asserted, that multicore performance is by today's standards absolutely more important than single core performance."

you say this yes but you claim otherwise you claim your python app is only singlecore...

so how can multicore performance be more important if your python app is only single core ?

That does not follow in the slightest. Python is indeed single core. That doesn't contradict the idea that multicore performance matters more because it is not the only program that will be running. How many times do you need me to repeat it?
The only reason Python is a part of the discussion is that you asked for a counterexample to your claim about individual applications, so Python counts for individual applications.
There's a thing called implication in logic. A => B means that if A is true, then B is true. A being false doesn't mean B is false. B being false means A is false.
Now, let's say we have three assertions and two implications between those:
A: most applications (as individual entities) make good use of multiple cores.
B: most systems (as aggregates of applications) have many processes running at a time.
C: optimizing hardware for multicore performance will have more impact than optimizing it for single core performance.

Now, it's easy to see that A => C. B => C is not really valid by itself, but another reasonable assertion we can use is:
B': processes that are not part of the same program tend to not need synchronization so a set of processes make good use of multiple cores.
And then, it obviously follows that B&B' => C.

I left implicit that system performance is what one cares about, because I think that's obvious enough for everyone. Nobody cares if your video decodes at 260fps if in the meantime you can't even move the mouse and your mic is not working and what not.

So, see. I may disagree that A is true in opinion, and be pretty sure in fact that even if it's true, you need really good data to backup the claim. That doesn't mean that I disagree with C. In fact, I claimed B and B' which are quite obviously true to prove C.
Is that clear enough?

Originally posted by qarium View Post

in reality it is nonesense then as soon as your system runs 2 different python tasks who do not need to sync you are already in the multicore world

The nonsense is that you still think after this long conversation that I'm saying otherwise. I don't think I'll be responding any posts that insists that I'm arguing against something I'm not and said so for several posts already.

**kgardas** · 07 June 2022, 05:47 AM

Originally posted by tildearrow View Post

What about idle? Majority of desktop/server/workstation x86 motherboards idle at 5-25W, whereas M1 and Raspberry Pi idle at <5W.

That's the most modern TSMC process + obvious Apple SoC development choices where development is done purely for mobile devices and then migrated to desktop. And for mobile you definitely need low idle.

**kgardas** · 07 June 2022, 05:58 AM

Originally posted by Developer12 View Post

X86 chips still pay the price despite all the instruction caching they claim. There's no free lunch for having a bad ISA. That caching is of limited size, consumes massive amounts of die area in addition to the decoding circuitry, and the ISA still imposes a low limit on how quickly you can decode *new* code while following program execution. Since the dawn of pentium, x86 has always spent more than double the number of transistors to achieve the same performance.

Seriously I very much doubt your claim here. What performance? Raw number or perf/watt? Also in comparison with what exactly? I hope you understand you can't compare 2 CPU preciselly when they are on different processes right? And also you don't know their transistor counts...

**carewolf** · 07 June 2022, 05:59 AM

Originally posted by tildearrow View Post

What about idle? Majority of desktop/server/workstation x86 motherboards idle at 5-25W, whereas M1 and Raspberry Pi idle at <5W.

Remember to compare apple to apples.. In windows/linux desktop/laptop space we typically compare the power of the entire platform, not just that of the processor, where Apple is talking about the CPU only. Memory is often the most energy consuming part at idle on a desktop, especially if you are not using low-power memory (which very few non-Apple desktop manufacturers do). And turning on the screen and including that, makes the idle power consumption of the CPU a rounding error. If you want lower idle power on a Windows desktop, start by turning of your screen, if you want to save more power, then replace your memory with low-power memory, and put it to sleep.

Announcement

Apple Announces Its New M2 Processor

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment