Announcement

**kozman** · 02 February 2023, 05:59 PM

Kind of sad that this is languishing though the benefits are actual. And to read that AMD has zero interesting in figuring out this problem? Come on. 96 core, 128 core. You KNOW 192, 256, 384, 512 cores are coming down the road for sure. To let Intel take away some glory from AMD in that this patch seems to be ok with Intel chips but not AMD? All those datacenters who went out and bought tons of AMD CPUs are gonna be gnashing their teeth knowing this feature is a no-go for them for the foreseeable future.

So the v5 patch re-basing against 6.1 kernel will do what exactly? Still seems it's a dud for AMD CPUs unless something in 6.1 may better reveal what's failing on the AMD side?

**waxhead** · 02 February 2023, 07:30 PM

In the old days you could be mocked for "using a Unix system and get laid about as often as you needed to restart it" and still... in some sense be kind of proud... sigh... those days are long gone it seems.

...and perhaps just as good in some sense. It could be quite a thrill to see if all the stuff that was running came up again after you had to "do a Windows" after wrecking your three digit uptime day counter...

On a more serious note, boot time is only part of the problem. What is interesting is of course the time from you ask the system to restart, until the services you provide is up and running again AND is responding at an adequate rate. Not that I anything against improving the time it takes to bring CPU's online, but there are more to the equation as well, and maybe there are other things that are as- , if not more important....

**pipe13** · 02 February 2023, 07:40 PM

Originally posted by kozman View Post

Kind of sad that this is languishing though the benefits are actual. And to read that AMD has zero interesting in figuring out this problem? Come on. 96 core, 128 core. You KNOW 192, 256, 384, 512 cores are coming down the road for sure. To let Intel take away some glory from AMD in that this patch seems to be ok with Intel chips but not AMD? All those datacenters who went out and bought tons of AMD CPUs are gonna be gnashing their teeth knowing this feature is a no-go for them for the foreseeable future.

So the v5 patch re-basing against 6.1 kernel will do what exactly? Still seems it's a dud for AMD CPUs unless something in 6.1 may better reveal what's failing on the AMD side?

Possibly what's failing on the AMD side is the failure of any AMD user to step up with a use case. From TFA:

With the revised patches this week, the INIT/SIPI/SIPI in parallel for a 128 CPU core configuration system it dropped the boot time from ~700ms to around 100ms.

6 tenths of a second. About a slow blink of an eye. What is the total boot time of your workstation?

Now, someone at Amazon -- David Woodhouse -- thought for whatever reason that is was worth the effort for their Intel use case. Whatever reason that might be. But if no one is clamoring for this feature for Zen 3 or Zen 4, perhaps it will hurt no one for AMD to put it off until Zen 5 or Zen 6. There will be substantial architecture update with Zen 5.

**hotaru** · 02 February 2023, 08:27 PM

Originally posted by pipe13 View Post

Possibly what's failing on the AMD side is the failure of any AMD user to step up with a use case. From TFA:

6 tenths of a second. About a slow blink of an eye. What is the total boot time of your workstation?

Now, someone at Amazon -- David Woodhouse -- thought for whatever reason that is was worth the effort for their Intel use case. Whatever reason that might be. But if no one is clamoring for this feature for Zen 3 or Zen 4, perhaps it will hurt no one for AMD to put it off until Zen 5 or Zen 6. There will be substantial architecture update with Zen 5.

6 tenths of a second is really nothing when you consider how many servers take a full minute or longer just to POST.

**Paradigm Shifter** · 02 February 2023, 08:57 PM

Originally posted by hotaru View Post

6 tenths of a second is really nothing when you consider how many servers take a full minute or longer just to POST.

Ain't it the truth.

Our new Epyc server takes multiple minutes to finish all of its pre-POST routines, bringing up AGESA, BMC, etc, etc, etc... POST itself is fairly fast but is still about 90 seconds including the RAID controller. Handing off to the OS and actually loading that takes... eh, about 10 seconds? It's fast enough that it's almost blink-and-miss-it compared to the rest of the boot process.

I thought our Intel servers were slow, but Epyc really takes the cake, and the CPUs aren't even the super-high-core-count variants.

**sawyerbergeron** · 02 February 2023, 09:35 PM

If I had to guess, if this carries over to VMs as well this can actually mean a pretty significant cost savings. Consider how many VMs are provisioned every year on AWS, and how this could mean a multiple percent imprvement in deployment latency for fast ramp up of a service that needs more capacity immediately

**NobodyXu** · 02 February 2023, 09:45 PM

Originally posted by kozman View Post

You KNOW 192, 256, 384, 512 cores are coming down the road for sure.

IMO I don't think 384/512 physical cores or even 192/256 physical cores are coming soon, since the current silicon technology has almost reached its limit, continuing scaling up is hard.

AMD's best server CPU AMD EPYC™ 9654P has 96 physical cores here, with Ampere altra with 80 physical cores.
I'm not aware of any one with more physical cores than them.

In order to have 192 physical cores, which is 2x core count improvement, either the silicon technology must be able to shrink the node by 2x, which is impossible as we almost reach the limit.

Or AMD can downscale the single core performance to add more cores to one CPU and also potentially use stacking silicons on top, which also cause heating issues and will downclock the single core CPU performance.

That's a valid solution, maybe the right solution for some highly parallel program, but I'm not entirely sure this is the right solution for cloud.

While the cloud mostly care about core counts as they have a lot of program/VMs running on it, single-core perf is still important since not everything can be run in parallel, single core perf still matter for these applications.

Even if that is the right decisions, to double the physical core count is still going to be quite hard.

**Paradigm Shifter** · 02 February 2023, 10:03 PM

Originally posted by NobodyXu View Post

IMO I don't think 384/512 physical cores or even 192/256 physical cores are coming soon, since the current silicon technology has almost reached its limit, continuing scaling up is hard.

AMD's best server CPU AMD EPYC™ 9654P has 96 physical cores here, with Ampere altra with 80 physical cores.
I'm not aware of any one with more physical cores than them.

Ampere have a 128 core model.

And Intel are talking up their new Xeons which are efficiency-cores-only and have an absolutely colossal socket for 2024 (I'll believe it when I see it). Videocardz has come out with some pretty wild extrapolations regarding Intel of late, which are, given the issues Intel have had with even getting Sapphire Rapids out the door, a little hard to believe until solid evidence is presented (i.e.: I can place an order for said CPU and have it arrive the same month).

So it seems like the solution might be "make the CPUs physically larger". Which runs into potential yield issues... unless a chiplet design is utilised, of course...

**jeisom** · 02 February 2023, 10:07 PM

Originally posted by NobodyXu View Post

Or AMD can downscale the single core performance to add more cores to one CPU and also potentially use stacking silicons on top, which also cause heating issues and will downclock the single core CPU performance.

The caches take up most of the space. They are already stacking it one the x3d chips. I can imagine that future chips would just stack the cache on top. I don't know how much more dense that would allow, but it shouldn't be negligible. But yeah we are approaching space, size and power limits.

Announcement

Work Revived On Parallel CPU Bring-Up To Boot Linux Faster On Large Systems/Servers

Work Revived On Parallel CPU Bring-Up To Boot Linux Faster On Large Systems/Servers

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment