Announcement

Collapse
No announcement yet.

Gigabyte Motherboard WMI Temperature Driver Queued Ahead Of Linux 5.13

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Gigabyte Motherboard WMI Temperature Driver Queued Ahead Of Linux 5.13

    Phoronix: Gigabyte Motherboard WMI Temperature Driver Queued Ahead Of Linux 5.13

    Earlier this month I reported on a WMI temperature driver for Gigabyte motherboards being worked on by an independent developer. That "gigabyte-wmi" driver is now slated for inclusion in the upcoming Linux 5.13 cycle...

    https://www.phoronix.com/scan.php?pa...For-Linux-5.13

  • #2
    Meanwhile ASUS users are screwed.

    Comment


    • #3
      well, That got in quick! Anyone know why zenpower is left out?

      Comment


      • #4
        A screenshot where to see the result of this feature would be appreciated.

        Comment


        • #5
          Originally posted by Azrael5 View Post
          A screenshot where to see the result of this feature would be appreciated.
          Type "sensors" in the terminal. I posted my terminal output in the last thread. This added the gigabyte_wmi-virtual-0 entries.

          Comment


          • #6
            Knowing your temps is nice I guess. But knowing when you have memory errors is even more important. A board (and/or CPU) that doesn't support ECC is useless. Memory sizes are large enough now that you WILL experience bit flips in RAM. From just this morning, dmesg on my home server reports the following two events:

            [14401.374881] mce: [Hardware Error]: Machine check events logged
            [14401.374909] [Hardware Error]: Corrected error, no action required.
            [14401.374971] [Hardware Error]: CPU:4 (15:2:0) MC4_STATUS[-|CE|MiscV|-|AddrV|-|-|CECC]: 0x9c5c400002080a13
            [14401.375053] [Hardware Error]: Error Addr: 0x0000001adc63ffc0
            [14401.375098] [Hardware Error]: MC4 Error (node 1): DRAM ECC error detected on the NB.
            [14401.375177] EDAC MC1: 1 CE on mc#1csrow#0channel#1 (csrow:0 channel:1 page:0x1adc63f offset:0xfc0 grain:0 syndrome:0x2b8)
            [14401.375181] [Hardware Error]: cache level: L3/GEN, mem/io: MEM, mem-tx: RD, part-proc: RES (no timeout)

            [14851.395420] mce: [Hardware Error]: Machine check events logged
            [14851.395440] [Hardware Error]: Corrected error, no action required.
            [14851.395488] [Hardware Error]: CPU:4 (15:2:0) MC4_STATUS[-|CE|MiscV|-|AddrV|-|-|CECC]: 0x9c5c400002080a13
            [14851.395545] [Hardware Error]: Error Addr: 0x0000001adc63ffc0
            [14851.395578] [Hardware Error]: MC4 Error (node 1): DRAM ECC error detected on the NB.
            [14851.395634] EDAC MC1: 1 CE on mc#1csrow#0channel#1 (csrow:0 channel:1 page:0x1adc63f offset:0xfc0 grain:0 syndrome:0x2b8)
            [14851.395637] [Hardware Error]: cache level: L3/GEN, mem/io: MEM, mem-tx: RD, part-proc: RES (no timeout)


            Notice where it says "Corrected error, no action required"? That's ECC doing its job. Without ECC, the errors would not be corrected, and they would not even be logged. Silent errors, which would lead to data corruption and/or software crash. I'm with Linus on this one, ECC is a required feature.

            Comment


            • #7
              Originally posted by torsionbar28 View Post
              Notice where it says "Corrected error, no action required"? That's ECC doing its job. Without ECC, the errors would not be corrected, and they would not even be logged. Silent errors, which would lead to data corruption and/or software crash. I'm with Linus on this one, ECC is a required feature.
              I'm pretty sure you're at least partially wrong there. I had those same messages in dmesg - including the "corrected error, no action required" part, but I don't have no ECC RAM. I suspect these errors are the same that got reported for Windows as "WHEA" errors and they occur with certain BIOS versions (more likely when overclocking ).

              Other than that I agree that ECC would be a good thing to have everywhere.

              On topic: it's great to get more support in the sensors area - but it seems to me that all the big motherboard manufacturers (Asus, ASRock, MSI, Gigabyte) are similarily bad with supporting linux. So we depend on guesswork and independent contributors (thanks a lot to Thomas Weißschuh!).
              For example my motherboard, a Gigabyte X570 Aorus Master does also have the IT8688 chip, but no sensors are exposed via WMI. There is a second ITE chip that is recognized by the it87 driver and that does show some sensors, but I have no idea if that's all of them or how the chips work together. And Gigabyte doesn't give out any info, not even to developers doing their work for them
              Last edited by mazumoto; 14 April 2021, 09:24 AM.

              Comment


              • #8
                Originally posted by mazumoto View Post
                I'm pretty sure you're at least partially wrong there. I had those same messages in dmesg - including the "corrected error, no action required" part, but I don't have no ECC RAM. I suspect these errors are the same that got reported for Windows as "WHEA" errors and they occur with certain BIOS versions (more likely when overclocking ).
                The errors are in a standard format, so while they may appear similar at first glance, the details are important. AFAIK modern x86 CPU caches utilize ECC even if the main memory doesn't. So potentially you could have seen a similar error on a PC lacking ECC main memory, if a bit flip occurs in L2/L3 cache. In the examples I posted above however, the message specifically states "detected on the NB". NB being the "North Bridge" aka the main memory controller. So this error must have occurred in main memory, as the L2/L3 cache access do not go through the NB.

                To be clear, if you don't have ECC main memory, and a bit flip occurs there, you will never be notified - not by BIOS not by WHEA not by anything - as that capability does not exist without ECC. And the error will not be corrected, which means you'll have unexplained corrupted data and/or software crashes.
                Last edited by torsionbar28; 14 April 2021, 10:51 AM.

                Comment


                • #9
                  Originally posted by mazumoto View Post

                  I'm pretty sure you're at least partially wrong there. I had those same messages in dmesg - including the "corrected error, no action required" part, but I don't have no ECC RAM. I suspect these errors are the same that got reported for Windows as "WHEA" errors and they occur with certain BIOS versions (more likely when overclocking ).

                  Other than that I agree that ECC would be a good thing to have everywhere.

                  On topic: it's great to get more support in the sensors area - but it seems to me that all the big motherboard manufacturers (Asus, ASRock, MSI, Gigabyte) are similarily bad with supporting linux. So we depend on guesswork and independent contributors (thanks a lot to Thomas Weißschuh!).
                  For example my motherboard, a Gigabyte X570 Aorus Master does also have the IT8688 chip, but no sensors are exposed via WMI. There is a second ITE chip that is recognized by the it87 driver and that does show some sensors, but I have no idea if that's all of them or how the chips work together. And Gigabyte doesn't give out any info, not even to developers doing their work for them

                  My Aorus Gaming K7 X370 also has two sets of sensors. The standard IT87 driver detects one set, but those are not all the sensors available. I need to use the patched IT87 driver and add acpi_enforce_resources=lax to my kernel boot line so that the other set of sensors gets exposed. Hopefully with this I no longer have to do this.

                  Comment


                  • #10
                    Originally posted by birdie View Post
                    Meanwhile ASUS users are screwed.
                    I was super hopeful that I just didn't have the experience to understand the issue... Crap

                    Comment

                    Working...
                    X