Announcement

Collapse
No announcement yet.

Intel's Linux CPU Temperature Driver Being Adapted To Handle 128+ CPU Cores

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • schmidtbag
    replied
    Originally posted by Anux View Post
    I don't see why boosting would be different for any kind of core. The lower your temps the higher you can boost without instability. Of course boosting is not only limited by temp but also max current, time, TDP, usage of the other cores and your energy preference.
    Server chips have traditionally used significantly lower boost clocks. So, if an E-core in an i9 (which is basically a worst-case scenario of boost clocks, climate control, and heat dissipation) doesn't lower its boost clocks due to thermal headroom (not sure if it does with a proper heatsink), then surely E-cores that aren't pushed hard in a well-controlled environment aren't going to face that either. Key word there is "if".
    It's not about overheating, Intels desktop lineup shows that > 100 °C is fine for stable operation. The problem is you need more voltage to operate at higher temperatures -> cores become inefficient and you don't want that in a server CPU.
    Running a core stable is a mixture of manufacturing process (Intels 10 nm+++++), architecture (E-Cores), frequency, temperature and current flow. And server CPUs are running with higher stability margin then desktops, therefore the lower clocks.
    Understood, but as you alluded to, Intel's desktop lineup can more than double the boost frequency compared to their server chips. Frequency and wattage do not scale proportionately beyond a certain point. Intel's desktop parts are pushing the limits of what can be stable in lower-quality motherboards, and part of their way of achieving that would be to use a higher voltage than may be necessary. So, this is where I question whether E-cores in a server chip are at enough of a risk of reaching thermal limits.
    You want to have the most accurate temp readings to operate each core at it's sweet spot, if you only had 1 sensor for 4 cores you would need an even higher stability margin.
    Edit: also not every core is the same, some clock better, some are more power efficient, some are just bad. So you might need even higher margin if your sensor is on the best of 4 cores.
    If it is known that the cores in a many-core server environment will suffer temperature-caused stability issues (when unchecked), then yes, more sensors is absolutely necessary.

    Leave a comment:


  • Anux
    replied
    Originally posted by schmidtbag View Post
    Having not used a product with E-cores: if the package temperature is relatively cool (let's say below 50C), is it known that E-cores reduce their boost due to thermal limits? Because if an i7 with E-cores can maintain their boost clocks, surely a more climate-controlled server with lower-frequency E-cores could do the same.
    I don't see why boosting would be different for any kind of core. The lower your temps the higher you can boost without instability. Of course boosting is not only limited by temp but also max current, time, TDP, usage of the other cores and your energy preference.

    When you look at it in a µs time scale, you would see a core boost to its highest freq, then reach thermal limit, reduce boost level, cool down, boost again, change to a cooler core and boost there, become limited by any of the above reasons, ...

    Understood, but since we're talking server E-cores, their upper limit is low enough that I don't think these cores would be at any significant risk of overheating. I acknowledge I could be wrong, but this goes back to my previous question.
    It's not about overheating, Intels desktop lineup shows that > 100 °C is fine for stable operation. The problem is you need more voltage to operate at higher temperatures -> cores become inefficient and you don't want that in a server CPU.
    Running a core stable is a mixture of manufacturing process (Intels 10 nm+++++), architecture (E-Cores), frequency, temperature and current flow. And server CPUs are running with higher stability margin then desktops, therefore the lower clocks.

    You want to have the most accurate temp readings to operate each core at it's sweet spot, if you only had 1 sensor for 4 cores you would need an even higher stability margin.
    Edit: also not every core is the same, some clock better, some are more power efficient, some are just bad. So you might need even higher margin if your sensor is on the best of 4 cores.

    Shouldn't the boosting be controlled by the firmware rather than OS? As for power management, isn't the wattage sensor all that's needed?
    You're right, I'm actually not sure where this is needed. Maybe just to report the highest temp, you still need all temps to decide which is the highest.
    Last edited by Anux; 06 December 2023, 05:53 AM.

    Leave a comment:


  • schmidtbag
    replied
    Originally posted by Anux View Post
    They are still boosting just with more conservative thresholds.
    Having not used a product with E-cores: if the package temperature is relatively cool (let's say below 50C), is it known that E-cores reduce their boost due to thermal limits? Because if an i7 with E-cores can maintain their boost clocks, surely a more climate-controlled server with lower-frequency E-cores could do the same.
    No, the heat per core is more or less instant while changes on the die take several seconds. With modern 5nm silicon one core is super small and the material is not fast enough to spread the heat. Your CPU would already have crashed if you did measure the neighbor core.
    Understood, but since we're talking server E-cores, their upper limit is low enough that I don't think these cores would be at any significant risk of overheating. I acknowledge I could be wrong, but this goes back to my previous question.
    Valid question, the answer is, it's not sufficient. There are actually many sensors per core, see here for Zen 1 (it only increased afterwards):
    Interesting - thanks for the link.
    I don't think any server provider cares about individual core temps. It's more likely for drivers that control boosting/powermanagement etc.
    Shouldn't the boosting be controlled by the firmware rather than OS? As for power management, isn't the wattage sensor all that's needed?

    Leave a comment:


  • Anux
    replied
    Originally posted by schmidtbag View Post
    Remember: we're talking about E-cores in a server chip here. So, there wouldn't be any high boosting going on
    They are still boosting just with more conservative thresholds.

    and the heat would be spread through the large surface area of the package.
    No, the heat per core is more or less instant while changes on the die take several seconds. With modern 5nm silicon one core is super small and the material is not fast enough to spread the heat. Your CPU would already have crashed if you did measure the neighbor core.

    But also like I said: E-cores are roughly 1/4 the size of a P-core. If your logic is that 1 sensor for 4 E-cores won't show you the real temp, why would 1 sensor be sufficient for a P-core?
    Valid question, the answer is, it's not sufficient. There are actually many sensors per core, see here for Zen 1 (it only increased afterwards): https://en.wikichip.org/wiki/amd/mic...ures/zen#Power What you see as a user is just the hotspot of all sensors in that area.

    And yes it is more a sensor per area thing than a sensor per core thing.

    Well, isn't the point of this driver for the user?
    I don't think any server provider cares about individual core temps. It's more likely for drivers that control boosting/powermanagement etc.

    Leave a comment:


  • schmidtbag
    replied
    Originally posted by Anux View Post
    It is totally necessary, if you measure core 1 but load core 4 then core 1 will never show the real temp, probably 20 or more degree difference. That is what decides between BSOD and super high turbo boost.
    Remember: we're talking about E-cores in a server chip here. So, there wouldn't be any high boosting going on and the heat would be spread through the large surface area of the package. If one of these cores is to get too hot, chances are, the environment as a whole is too hot.
    Like I said, for desktop chips (in particular, P-cores), having a sensor per core makes sense, for the reasons you mentioned.
    But also like I said: E-cores are roughly 1/4 the size of a P-core. If your logic is that 1 sensor for 4 E-cores won't show you the real temp, why would 1 sensor be sufficient for a P-core? After all, you never really use 100% of a core's transistors at a time. Different sections are bound to get hotter than others. So, wouldn't that mean we should see multiple sensors per P core?
    If you mean showing this data to the end user, then even 1/4th is too much. Most users don't care about temps and those that do would probably be fine with one peak and one average value.
    Only in very specific cases do you care about individual temps (overclocking etc.).
    Well, isn't the point of this driver for the user?
    Last edited by schmidtbag; 05 December 2023, 10:52 AM.

    Leave a comment:


  • Anux
    replied
    Originally posted by schmidtbag View Post
    ... I'm beginning to question whether it's really necessary to have a sensor for each core.
    It is totally necessary, if you measure core 1 but load core 4 then core 1 will never show the real temp, probably 20 or more degree difference. That is what decides between BSOD and super high turbo boost.

    If you mean showing this data to the end user, then even 1/4th is too much. Most users don't care about temps and those that do would probably be fine with one peak and one average value.
    Only in very specific cases do you care about individual temps (overclocking etc.).

    Leave a comment:


  • schmidtbag
    replied
    We're reaching a point of core count and transistor density where I'm beginning to question whether it's really necessary to have a sensor for each core. In Intel's case, one E core is almost 1/4 the size of a single P core, yet, E cores inherently run cooler, so wouldn't one sensor for a cluster of 4 cores be sufficient?
    None of this is to say the driver shouldn't be updated anyway, but I just don't think it's necessary to be measuring per-core temperatures in servers anymore. I do feel it is necessary for desktop CPUs since their high clock speeds means a lot of energy density in a small area that isn't necessarily climate controlled very well.

    Leave a comment:


  • Anux
    replied
    Originally posted by rene View Post
    don't tell me Intel is also going to unprofessionally just glue CPU dies together like AMD, ...! :-/
    Even better, they are running a new campaign in the same veins: https://wccftech.com/intel-calls-out...ths-marketing/ despite them doing the exact same thing several times. Not to mention they have no standing against either AMD CPU and their new core extreme lineup is actually slower than their previous generation.
    I guess this is just Intels way of saying they can't compete anymore. A little bit sad but also funny.

    Leave a comment:


  • rene
    replied
    don't tell me Intel is also going to unprofessionally just glue CPU dies together like AMD, ...! :-/

    Leave a comment:


  • timofonic
    replied
    Six months later...

    AMD releases a CPU with 512 powerful cores and other optimized for power efficiency.

    Leave a comment:

Working...
X