Announcement

**schmidtbag** · 05 December 2023, 12:43 PM

Originally posted by Anux View Post

They are still boosting just with more conservative thresholds.

Having not used a product with E-cores: if the package temperature is relatively cool (let's say below 50C), is it known that E-cores reduce their boost due to thermal limits? Because if an i7 with E-cores can maintain their boost clocks, surely a more climate-controlled server with lower-frequency E-cores could do the same.

No, the heat per core is more or less instant while changes on the die take several seconds. With modern 5nm silicon one core is super small and the material is not fast enough to spread the heat. Your CPU would already have crashed if you did measure the neighbor core.

Understood, but since we're talking server E-cores, their upper limit is low enough that I don't think these cores would be at any significant risk of overheating. I acknowledge I could be wrong, but this goes back to my previous question.

Valid question, the answer is, it's not sufficient. There are actually many sensors per core, see here for Zen 1 (it only increased afterwards):

Interesting - thanks for the link.

I don't think any server provider cares about individual core temps. It's more likely for drivers that control boosting/powermanagement etc.

Shouldn't the boosting be controlled by the firmware rather than OS? As for power management, isn't the wattage sensor all that's needed?

**Anux** · 06 December 2023, 05:49 AM

Originally posted by schmidtbag View Post

Having not used a product with E-cores: if the package temperature is relatively cool (let's say below 50C), is it known that E-cores reduce their boost due to thermal limits? Because if an i7 with E-cores can maintain their boost clocks, surely a more climate-controlled server with lower-frequency E-cores could do the same.

I don't see why boosting would be different for any kind of core. The lower your temps the higher you can boost without instability. Of course boosting is not only limited by temp but also max current, time, TDP, usage of the other cores and your energy preference.

When you look at it in a µs time scale, you would see a core boost to its highest freq, then reach thermal limit, reduce boost level, cool down, boost again, change to a cooler core and boost there, become limited by any of the above reasons, ...

Understood, but since we're talking server E-cores, their upper limit is low enough that I don't think these cores would be at any significant risk of overheating. I acknowledge I could be wrong, but this goes back to my previous question.

It's not about overheating, Intels desktop lineup shows that > 100 °C is fine for stable operation. The problem is you need more voltage to operate at higher temperatures -> cores become inefficient and you don't want that in a server CPU.
Running a core stable is a mixture of manufacturing process (Intels 10 nm+++++), architecture (E-Cores), frequency, temperature and current flow. And server CPUs are running with higher stability margin then desktops, therefore the lower clocks.

You want to have the most accurate temp readings to operate each core at it's sweet spot, if you only had 1 sensor for 4 cores you would need an even higher stability margin.
Edit: also not every core is the same, some clock better, some are more power efficient, some are just bad. So you might need even higher margin if your sensor is on the best of 4 cores.

Shouldn't the boosting be controlled by the firmware rather than OS? As for power management, isn't the wattage sensor all that's needed?

You're right, I'm actually not sure where this is needed. Maybe just to report the highest temp, you still need all temps to decide which is the highest.

**schmidtbag** · 06 December 2023, 11:44 AM

Originally posted by Anux View Post

I don't see why boosting would be different for any kind of core. The lower your temps the higher you can boost without instability. Of course boosting is not only limited by temp but also max current, time, TDP, usage of the other cores and your energy preference.

Server chips have traditionally used significantly lower boost clocks. So, if an E-core in an i9 (which is basically a worst-case scenario of boost clocks, climate control, and heat dissipation) doesn't lower its boost clocks due to thermal headroom (not sure if it does with a proper heatsink), then surely E-cores that aren't pushed hard in a well-controlled environment aren't going to face that either. Key word there is "if".

It's not about overheating, Intels desktop lineup shows that > 100 °C is fine for stable operation. The problem is you need more voltage to operate at higher temperatures -> cores become inefficient and you don't want that in a server CPU.
Running a core stable is a mixture of manufacturing process (Intels 10 nm+++++), architecture (E-Cores), frequency, temperature and current flow. And server CPUs are running with higher stability margin then desktops, therefore the lower clocks.

Understood, but as you alluded to, Intel's desktop lineup can more than double the boost frequency compared to their server chips. Frequency and wattage do not scale proportionately beyond a certain point. Intel's desktop parts are pushing the limits of what can be stable in lower-quality motherboards, and part of their way of achieving that would be to use a higher voltage than may be necessary. So, this is where I question whether E-cores in a server chip are at enough of a risk of reaching thermal limits.

You want to have the most accurate temp readings to operate each core at it's sweet spot, if you only had 1 sensor for 4 cores you would need an even higher stability margin.
Edit: also not every core is the same, some clock better, some are more power efficient, some are just bad. So you might need even higher margin if your sensor is on the best of 4 cores.

If it is known that the cores in a many-core server environment will suffer temperature-caused stability issues (when unchecked), then yes, more sensors is absolutely necessary.

Announcement

Intel's Linux CPU Temperature Driver Being Adapted To Handle 128+ CPU Cores

Comment

Comment

Comment