Announcement

**willmore** · 07 January 2023, 10:43 PM

Originally posted by qarium View Post

right now we have 3 different asymetric cpu modells:

big.little = ARM
big.bigger= Intel
Fat-Cache.highclock= AMD

and it looks like the intel cpus are the ones who are the hardest to make usefull scheduling decisions..

For intel, I think you mean "big.crippled" or are we going to pretend that they didn't put AVX512 in the big cores but not in the little ones.

FWIW, ARM has gone through a lot of pain to make sure that there is a 'little' core to match every 'big' core so that they didn't have that problem. Some vendors chose to not take advantage of that (Samsung come to mind) and suffered scheduling problems because of it. There's no reason Intel shouldn't have seen that coming.

**coder** · 08 January 2023, 02:49 AM

Originally posted by willmore View Post

For intel, I think you mean "big.crippled" or are we going to pretend that they didn't put AVX512 in the big cores but not in the little ones.

Eh, doesn't matter if it's physically there or not, because the microcode no longer lets you enable it.

Originally posted by willmore View Post

FWIW, ARM has gone through a lot of pain to make sure that there is a 'little' core to match every 'big' core so that they didn't have that problem.

Well, you mean usually. There was this little incident:

Linux 6.1 Drops BF16 Support For Cortex-A510 Due To Hardware Bug - Phoronix

https://www.phoronix.com/news/Linux-6.1-ARM64-Updates

Phoronix, Linux Hardware Reviews, Linux hardware benchmarks, Linux server benchmarks, Linux benchmarking, Desktop Linux, Linux performance, Open Source graphics, Linux How To, Ubuntu benchmarks, Ubuntu hardware, Phoronix Test Suite

**Keats** · 08 January 2023, 04:05 AM

Originally posted by qarium View Post

your imagination lag creativity. i think AMD will release a ThreadRipper to the AM5 platform.

similar to the threadripper 2950X compared to the 2990WX amd did it in the pasŧ the 2950X was high performance desktop focused and the 2990WX was a HPC focused cpu...

AMD can do the same on AM5 ... i did watch videos about a problem all ultra modern nodes have it is in 3nm and 4nm and 5nm that sdram cells do not scale to make a sdram cache in 6/7nm takes the same size than make a sdram cell cache in 5nm or 4nm or 3nm this makes a big problem to all of the companies. right now only AMD has a solution to this problem by chiplet design like in the RDNA3 GPUs they put the sdram cell cache on lager and older nodes means 6nm and the other gpus are 5nm and the newest APUs of AMD will be 4nm.

if you see the 7950X from the inside with its 3 chiplets there is room left in 2D space and also in 3D space there is also room.

in the near future amd will go from 8cores per chiplet to 12cores per chiplet all the sdram cache stuff will move to older nodes like 6nm/7nm staged

and also AMD maybe will go from 3 chiplets on an AM5 cpu to 4 chiplets this is possible in the move to 4nm...

then you have up to 36cores on 3 chiplets of 4nm and 3D cache stagged on all the cpu dies and also IO chip...

i am pretty sure they will call it threadripper for AM5 ...

A "threadripper" with dual-channel RAM and 20-odd PCIe lanes would be a joke.

**willmore** · 08 January 2023, 11:50 AM

Originally posted by coder View Post

Eh, doesn't matter if it's physically there or not, because the microcode no longer lets you enable it.

Well, you mean usually. There was this little incident:

Linux 6.1 Drops BF16 Support For Cortex-A510 Due To Hardware Bug - Phoronix

https://www.phoronix.com/news/Linux-6.1-ARM64-Updates

Phoronix, Linux Hardware Reviews, Linux hardware benchmarks, Linux server benchmarks, Linux benchmarking, Desktop Linux, Linux performance, Open Source graphics, Linux How To, Ubuntu benchmarks, Ubuntu hardware, Phoronix Test Suite

And if you read that you'll see that the problems with that core are due to specific ways vendors choose to implement it. It's almost like I just said that. And Linux choosing not to support a feature on a core due to a potential erratum isn't the same as Intel retroactively fusing off functionality that they initially shipped.

**coder** · 08 January 2023, 12:57 PM

Originally posted by willmore View Post

And if you read that you'll see that the problems with that core are due to specific ways vendors choose to implement it.

Nice try. No, the A510 was designed by ARM to be used in either a fused-FPU mode or with distinct FPUs. Qualcomm's only fault was ordering the wrong item off the menu. The chef (ARM) is the one who (accidentally) poisoned the dish.

https://www.anandtech.com/show/16693...0-cortexa510/4

Originally posted by willmore View Post

It's almost like I just said that.

If you're going to hear only what you want, I guess talking to you is pointless.

Originally posted by willmore View Post

And Linux choosing not to support a feature on a core due to a potential erratum isn't the same as Intel retroactively fusing off functionality that they initially shipped.

First, Intel didn't intend that AVX-512 should be used in Alder Lake. It was disabled by default, in every single Alder Lake system anyone ever bought. It's just that a couple motherboard's BIOS gave you the option to re-enable it. Intel warned people against doing that, as they said they had not even validated that part of the chip - it could have design or manufacturing defects. Then, to be certain people wouldn't start using it and complaining when they encountered problems, Intel also disabled it in microcode. That was the only retroactive part.

Apparently, it's not even an uncommon practice for chips to ship with experimental features disabled. The main difference, in this case, is that Intel seems to have disabled it to retain ISA-symmetry with the E-cores. So, it's merely unvalidated -- not exactly experimental.

The thing that's worse about what happened with ARM is that the chips did ship with defective IP that was completely enabled. Merely patching the OS not to advertise the bf16 instructions is the least-desirable option, since it's still voluntary for programs to even check the CPUID bits. If, for instance, they were compiled with -march=native before picking up that patch, they could still execute the errant instructions.

Worse, ARM disabled bf16 for A510 cores in all CPUs -- even ones without the defect. I get that mistakes happen, but this damage-control measure was unfortunate in a variety of respects.

**qarium** · 08 January 2023, 01:53 PM

Originally posted by coder View Post

I don't expect AMD to continue that approach, in future generations. I think they're just still in the learning curve, regarding cache-stacking. We'll see.

why? ... its sounds like a very good solution... you have a task who needs high clock speed it goes to a core without stacked 3D cache
if a task need high cache it goes to the cores with stagged 3D cache.

and it also sounds like it works better than intels big.bigger design.

**qarium** · 08 January 2023, 01:59 PM

Originally posted by willmore View Post

For intel, I think you mean "big.crippled" or are we going to pretend that they didn't put AVX512 in the big cores but not in the little ones.
FWIW, ARM has gone through a lot of pain to make sure that there is a 'little' core to match every 'big' core so that they didn't have that problem. Some vendors chose to not take advantage of that (Samsung come to mind) and suffered scheduling problems because of it. There's no reason Intel shouldn't have seen that coming.

the lasŧ 2-3 generations of intel cpus for the desktop does not have AVX512 because of this it is not big.crippled design compared to ARM big.Little
the intel design is big.bigger because even the small efficiency cores are bigger than the big cores of ARM design.

in future intel will bring AVX512 in both the performance cores and the effiency cores like this: the big cores will get the full fatt AVX512 implementation and the small cores will get the double-pumped-256bit emulation of AVX512 similar to the AMD cpus do emulate AVX512 in 256 via double pumped design.... (there was double pump in oentium 4 but this is not the same and also P4 is so old now that people no longer remember this failure.)

with full AVX512 on the big cores and 256bit double pumped emulation in the small cores it is still a big-bigger design because again the small cores of intel are bigger than the big cores of ARM...

**qarium** · 08 January 2023, 02:01 PM

Originally posted by Keats View Post

A "threadripper" with dual-channel RAM and 20-odd PCIe lanes would be a joke.

why ? this 4nm 36core threadripper would have ~512mb L3 cache direct or via stagged 3D cache...

it would still be faster than a 32 core threadripper 2990WX for example ---

**edwaleni** · 08 January 2023, 08:54 PM

I like where AMD is headed in the low power/high core space. A 16c/32t at 30W or less would make a great low power server. Each generation of Ryzen/Epyc gets less wattage per core. With server CPU's getting into the 400W+ range, its nice to see these enhancements coming in at the bottom of the range.

Announcement

AMD Announces Ryzen 7040/7045HX Mobile CPUs, Ryzen 7000 Series X3D, Instinct MI300

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment