Announcement

Collapse
No announcement yet.

AMDGPU VEGA hot spot temperature

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • AMDGPU VEGA hot spot temperature

    Hi. I made simple patch for AMDGPU driver that adds additional temperature sensors for the VEGA graphics card:
    main graphics card temperature (temp2_input) and the GPU hot spot temperature.

    Link: https://github.com/matszpk/amdgpu-vega-hotspot

    Now is possible to read the VEGA GPU hot spot temperature from Linux. Before this patch, only the AMDGPU-PRO drivers read a VEGA hot spot (very likely) temperature. This patch has been made for anybody, who need better thermal control for own VEGA graphics card. Under Windows, some utilities reports a VEGA hot spot temperature (like GPU-Z) but not the official driver control panel.

    EDIT: now, current version adds only one temperature sensor: hot spot (ASIC_MAX).
    EDIT2: Add hot spot (ASIC_MAX) to all graphics cards.
    Last edited by matszpk; 04-25-2018, 01:02 PM.

  • #2
    Originally posted by matszpk View Post
    Hi. I made simple patch for AMDGPU driver that adds additional temperature sensors for the VEGA graphics card:
    main graphics card temperature (temp2_input) and the GPU hot spot temperature.

    Link: https://github.com/matszpk/amdgpu-vega-hotspot

    Now is possible to read the VEGA GPU hot spot temperature from Linux. Before this patch, only the AMDGPU-PRO drivers read a VEGA hot spot (very likely) temperature. This patch has been made for anybody, who need better thermal control for own VEGA graphics card. Under Windows, some utilities reports a VEGA hot spot temperature (like GPU-Z) but not the official driver control panel.
    The packaged drivers use the same kernel driver as what it upstream. Your vega10_thermal_get_temperature_ctf() function does the same thing we do in the current get_temp function:
    https://git.kernel.org/pub/scm/linux...thermal.c#n351
    Maybe you are using an older kernel?

    Comment


    • #3
      I added two temperature sensors: main graphics temperature (CTF) and (very likely) GPU hot spot (ASIC_MAX). The first new temperature sensor has been added, because original the AMDGPU-PRO driver read hot spot temperature (to distinguish main temperature sensor provided by driver from these two specific sensors). Maybe removal of first new temperature sensors have some sens. I have some plans to port this patch for AMDGPU-PRO driver.

      Comment


      • #4
        Ok. maybe this is good idea to remove the first additional doubling sensor. I will make separate patch for AMDGPU-PRO driver later.

        Comment


        • #5
          It's not really two sensors. Both are aggregates. ASIC_MAX is just the highest returned temperature across all internal sensors. What CTF returns is configurable, but by default I think it returns the highest temperture as well.

          Comment


          • #6
            Thank you for informations, agd5f. I suspected that mmCG_MULT_THERMAL_STATUS register have different meaning than I was thinking. So, can you told me, what registers can give access to real sensors? I tested some thermal registers mmTHM_TCON_LOCALX and I got some raw values that correlate with values given by CTF or ASIC_MAX. Maybe mmTHM_TMON0_RDILX_DATA can get some interesting raw datas?

            However, that ASIC_MAX is matching to a meaning of the hot-spot, because it returns the highest temperature (hot-spot - point where is temperature is highest), but this is not a some single sensor in some point inside (or between) core that return highest possible temperature. The CTF returns same temperature as Windows driver (crimson settings wattman panel) so I guessed, that register return overall (main) temperature of graphics card.
            Last edited by matszpk; 04-25-2018, 02:01 AM.

            Comment


            • #7
              Hi,

              Sorry to bump this thread. Just want to mention I find these patches potentially useful (didn't try them yet), and thank the author .

              I made another thread asking for possibility to get more sensor data here. Seems these cards do have more sensors as for example in this thread, where we have GPU, GPU Hotspot, HBM (seems bogus), VRM SOC and VRM Mem temperatures. Have there been any attempts in getting more than 1x GPU sensor and Hotspot sensor data under Linux? Is there anything I can help in making this reality?

              Are there any plans on integrating this patch to the Mainline Kernel?

              Comment


              • #8
                These patches are not my own work and I have no any plans to merge these patches with main line kernel and other line of a kernel.
                About getting second temperature sensor: you can just use my utility: AMDCOVC (https://github.com/matszpk/amdcovc) or just you can get second temperature from command line (that has been described in README.md on the github project page).

                Comment

                Working...
                X