Announcement

Collapse
No announcement yet.

Next-Gen AMD EPYC Changes To EDAC Driver Sent In For Linux 5.2 Kernel

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Next-Gen AMD EPYC Changes To EDAC Driver Sent In For Linux 5.2 Kernel

    Phoronix: Next-Gen AMD EPYC Changes To EDAC Driver Sent In For Linux 5.2 Kernel

    The notable change with the "EDAC" changes for Linux 5.2 comes down to the "Zen 2" support for the new AMD EPYC processors launching later this year...

    http://www.phoronix.com/scan.php?pag...en-2-EPYC-EDAC

  • #2
    Unified Memory Controllers ... looks like Rome will indeed be a UMA design, not complex NUMA. Fun times ahead.

    Comment


    • #3
      Off Topic:

      Lenovo is finally offering Ryzen on their top of the line T-series Thinkpad: https://www.notebookcheck.net/Lenovo....418208.0.html

      Comment


      • #4
        Originally posted by pegasus View Post
        Unified Memory Controllers ... looks like Rome will indeed be a UMA design, not complex NUMA. Fun times ahead.
        The statement really says nothing about the system memory topology.
        Only that they have increased the possible number of unified memory controller per _die_ from two to eight, not per socketed CPU or any higher.
        I'm not saying that you are wrong. But this could mean that you can potentially have 8*4 channels NUMA in a 4-chiplet CPU for example.

        Comment


        • #5
          Originally posted by milkylainen View Post

          The statement really says nothing about the system memory topology.
          Only that they have increased the possible number of unified memory controller per _die_ from two to eight, not per socketed CPU or any higher.
          I'm not saying that you are wrong. But this could mean that you can potentially have 8*4 channels NUMA in a 4-chiplet CPU for example.
          I really doubt that NUMA is going away. The more cores you have the more sense it makes. This especially when you consider the markets AMD is going after. This could mean lots of things though from more core per chiplet to special function blocks on each chiplet.

          In any event i love love to see AMD being resurrected from the dead. In some ways I see their tech as the better solution for a variety of users.

          Comment


          • #6
            Originally posted by wizard69 View Post
            I really doubt that NUMA is going away.
            It's not going away for multi socket but Rome is definitely UMA for single socket. All chiplets have identical paths to memory with the same latency (chiplet -> I/O die -> memory). There is a small possibility that the chiplets that are physically further away from the I/O die have slightly higher latency but I assume that's insignificant.

            I pretty much rely on this will fix the Windows scheduling problems with the 24 and 32 core TRs. Since Microsoft obviously don't care.

            Comment


            • #7
              This patch does not work for the new Ryzen 3000 Zen2 series of AMD CPUs.

              I got the Ryzen 3900x with ECC memory and this EDAC patch does not support the new ryzen as far as I can tell... the new ryzen I have is Familiy 17h and model 71h, and the patch was for for F17_M30H (0x30 to 0x3F only). So there is currently no ECC support in the linux kernel with the new Ryzen that have been released. I managed to patch the linux kernel (5.2.1) by changing the PCI device IDs which appear to be different from all other devices so far in the AMD EDAC driver (different than F17 M30H as well). It appear to load the EDAC driver on boot and detect all ECC DIMM properly, but it does not report any ECC CE or UE errors although they appear to be happening based on my memory overclocking test and being corrected when ECC is enabled.

              It is a bit disappointing that it does not work, when I read this some time ago I felt that this would support all new Zen 2 CPUs, not just EPIC.

              Comment


              • #8
                Originally posted by Jeff View Post
                This patch does not work for the new Ryzen 3000 Zen2 series of AMD CPUs.

                I got the Ryzen 3900x with ECC memory and this EDAC patch does not support the new ryzen as far as I can tell... the new ryzen I have is Familiy 17h and model 71h, and the patch was for for F17_M30H (0x30 to 0x3F only). So there is currently no ECC support in the linux kernel with the new Ryzen that have been released. I managed to patch the linux kernel (5.2.1) by changing the PCI device IDs which appear to be different from all other devices so far in the AMD EDAC driver (different than F17 M30H as well). It appear to load the EDAC driver on boot and detect all ECC DIMM properly, but it does not report any ECC CE or UE errors although they appear to be happening based on my memory overclocking test and being corrected when ECC is enabled.

                It is a bit disappointing that it does not work, when I read this some time ago I felt that this would support all new Zen 2 CPUs, not just EPIC.
                Not 100% sure what you mean by "does not support the new ryzen as far as I can tell", as you posted no actual findings, but does the below proof, in your opinion that Linux 5.4 does support it?

                Ubuntu 19.10 (Linux kernel 5.3)

                [email protected]:~# find /lib/modules/5.3.0-19-generic/ | grep -i -E 'edac'
                /lib/modules/5.3.0-19-generic/kernel/drivers/edac
                /lib/modules/5.3.0-19-generic/kernel/drivers/edac/i7core_edac.ko
                /lib/modules/5.3.0-19-generic/kernel/drivers/edac/skx_edac.ko

                /lib/modules/5.3.0-19-generic/kernel/drivers/edac/amd64_edac_mod.ko
                /lib/modules/5.3.0-19-generic/kernel/drivers/edac/i5100_edac.ko
                /lib/modules/5.3.0-19-generic/kernel/drivers/edac/i10nm_edac.ko
                /lib/modules/5.3.0-19-generic/kernel/drivers/edac/x38_edac.ko
                /lib/modules/5.3.0-19-generic/kernel/drivers/edac/i3000_edac.ko
                /lib/modules/5.3.0-19-generic/kernel/drivers/edac/sb_edac.ko
                /lib/modules/5.3.0-19-generic/kernel/drivers/edac/i3200_edac.ko
                /lib/modules/5.3.0-19-generic/kernel/drivers/edac/i7300_edac.ko
                /lib/modules/5.3.0-19-generic/kernel/drivers/edac/i5400_edac.ko
                /lib/modules/5.3.0-19-generic/kernel/drivers/edac/i82975x_edac.ko

                /lib/modules/5.3.0-19-generic/kernel/drivers/edac/edac_mce_amd.ko
                /lib/modules/5.3.0-19-generic/kernel/drivers/edac/e752x_edac.ko
                /lib/modules/5.3.0-19-generic/kernel/drivers/edac/pnd2_edac.ko
                /lib/modules/5.3.0-19-generic/kernel/drivers/edac/ie31200_edac.ko
                /lib/modules/5.3.0-19-generic/kernel/drivers/edac/i5000_edac.ko
                [email protected]:~# apt list edac-utils
                Listing... Done

                edac-utils/eoan,now 0.18-1build1 amd64 [installed]
                edac-utils/eoan 0.18-1build1 i386
                [email protected]:~# edac-util -vs
                edac-util: EDAC drivers loaded. No memory controllers found
                [email protected]:~# edac-util -v
                edac-util: Error: No memory controller data found.
                [email protected]:~#


                Fedora Rawhide (Linux kernel 5.4)

                [[email protected] ~]# find /lib/modules/5.4.0-0.rc3.git0.1.fc32.x86_64/ | grep -i -E 'edac'
                /lib/modules/5.4.0-0.rc3.git0.1.fc32.x86_64/kernel/drivers/edac

                /lib/modules/5.4.0-0.rc3.git0.1.fc32.x86_64/kernel/drivers/edac/amd64_edac_mod.ko.xz
                /lib/modules/5.4.0-0.rc3.git0.1.fc32.x86_64/kernel/drivers/edac/e752x_edac.ko.xz
                /lib/modules/5.4.0-0.rc3.git0.1.fc32.x86_64/kernel/drivers/edac/edac_mce_amd.ko.xz
                /lib/modules/5.4.0-0.rc3.git0.1.fc32.x86_64/kernel/drivers/edac/i10nm_edac.ko.xz
                /lib/modules/5.4.0-0.rc3.git0.1.fc32.x86_64/kernel/drivers/edac/i3000_edac.ko.xz
                /lib/modules/5.4.0-0.rc3.git0.1.fc32.x86_64/kernel/drivers/edac/i3200_edac.ko.xz
                /lib/modules/5.4.0-0.rc3.git0.1.fc32.x86_64/kernel/drivers/edac/i5000_edac.ko.xz
                /lib/modules/5.4.0-0.rc3.git0.1.fc32.x86_64/kernel/drivers/edac/i5100_edac.ko.xz
                /lib/modules/5.4.0-0.rc3.git0.1.fc32.x86_64/kernel/drivers/edac/i5400_edac.ko.xz
                /lib/modules/5.4.0-0.rc3.git0.1.fc32.x86_64/kernel/drivers/edac/i7300_edac.ko.xz
                /lib/modules/5.4.0-0.rc3.git0.1.fc32.x86_64/kernel/drivers/edac/i7core_edac.ko.xz
                /lib/modules/5.4.0-0.rc3.git0.1.fc32.x86_64/kernel/drivers/edac/i82975x_edac.ko.xz
                /lib/modules/5.4.0-0.rc3.git0.1.fc32.x86_64/kernel/drivers/edac/ie31200_edac.ko.xz
                /lib/modules/5.4.0-0.rc3.git0.1.fc32.x86_64/kernel/drivers/edac/pnd2_edac.ko.xz
                /lib/modules/5.4.0-0.rc3.git0.1.fc32.x86_64/kernel/drivers/edac/sb_edac.ko.xz
                /lib/modules/5.4.0-0.rc3.git0.1.fc32.x86_64/kernel/drivers/edac/skx_edac.ko.xz
                /lib/modules/5.4.0-0.rc3.git0.1.fc32.x86_64/kernel/drivers/edac/x38_edac.ko.xz
                [[email protected] ~]# yum info edac-utils
                Last metadata expiration check: 0:01:47 ago on Sun 27 Oct 2019 11:44:47 PM CET.
                Installed Packages
                Name : edac-utils
                Version : 0.16
                Release : 21.fc31
                Architecture : x86_64
                Size : 101 k
                Source : edac-utils-0.16-21.fc31.src.rpm
                Repository : @System
                From repo : rawhide
                Summary : Userspace helper for kernel EDAC drivers
                URL : http://sourceforge.net/projects/edac-utils/
                License : GPLv2+
                Description : EDAC is the current set of drivers in the Linux kernel that handle
                : detection of ECC errors from memory controllers for most chipsets
                : on i386 and x86_64 architectures. This userspace component consists
                : of an init script which makes sure EDAC drivers and DIMM labels
                : are loaded at system startup, as well as a library and utility
                : for reporting current error counts from the EDAC sysfs files.
                [[email protected] ~]# edac-util -vs
                edac-util: EDAC drivers are loaded. 1 MC detected:

                mc0:F17h_M70h
                [[email protected] ~]# edac-util -v
                mc0: 0 Uncorrected Errors with no DIMM info
                mc0: 0 Corrected Errors with no DIMM info
                mc0: csrow2: 0 Uncorrected Errors
                mc0: csrow2: mc#0csrow#2channel#0: 0 Corrected Errors
                mc0: csrow2: mc#0csrow#2channel#1: 0 Corrected Errors
                mc0: csrow3: 0 Uncorrected Errors
                mc0: csrow3: mc#0csrow#3channel#0: 0 Corrected Errors
                mc0: csrow3: mc#0csrow#3channel#1: 0 Corrected Errors
                [[email protected] ~]#

                Comment


                • #9
                  This is using a Ryzen 3600 on an ASRock Rack X470D4U2-2T (sorry for the 2nd post, but I don't see an edit button?? @ Mods: Feel free to merge)

                  Comment

                  Working...
                  X