Announcement

Collapse
No announcement yet.

AMD Preparing 5-Level Paging Linux Support For Future CPUs

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • AMD Preparing 5-Level Paging Linux Support For Future CPUs

    Phoronix: AMD Preparing 5-Level Paging Linux Support For Future CPUs

    Future AMD CPUs -- potentially AMD EPYC 7004 "Genoa" -- will be supporting 5-level paging...

    https://www.phoronix.com/scan.php?pa...vel-Paging-KVM

  • #2
    Hi, I just noticed that the Link to the White-Paper doesn't work!

    Here's one that works
    https://software.intel.com/content/d...hite-paper.pdf

    Best wishes

    Comment


    • #3
      I assume this relates to some degree to CXL attached memory support becoming available with the next gen servers. With those you'll be able to plug in many many terabytes of memory as RAM under one system, without mucking with strange numa configs and so on. Looking forward to systems with 1TB ram per core ...

      Comment


      • #4
        I was wondering what would happen if a kernel was designed that used Huge Pages natively, inside the kernel itself, userland, buffers, and caches. Would the page table efficiency benefits overcome the loss of a few megabytes of RAM?

        Comment


        • #5
          Originally posted by mangeek View Post
          I was wondering what would happen if a kernel was designed that used Huge Pages natively, inside the kernel itself, userland, buffers, and caches. Would the page table efficiency benefits overcome the loss of a few megabytes of RAM?
          How huge? 1 TiB pages with 512 GiB page table should cover entire 64 bit address space.
          Last edited by pkunk; 10 August 2021, 12:10 PM.

          Comment


          • #6
            Originally posted by pkunk View Post

            How huge?
            I'm thinking the standard 1MB or 2MB hugepages instead of 4k. I know my desktop is fine with THP 'always', but there are a lot of things that are still handled 4k at a time. What would happen if the kernel's internals sacrificed some efficiency on footprint and allocated everything they could in 2MB chunks? What if my 8GB desktop only had 4096 page table entries. What if 4K pages were the 'exception' and not the rule (e.g., 'mUNadvise' in userland, and all drivers/filesystems/swap/kernel pages using 2MB).

            I'm not an expert on any of this, but five levels of page tables seems like a lot, and I think casual desktop-and-server workloads might benefit from Huge Pages throughout.

            Comment


            • #7
              Originally posted by mangeek View Post

              I'm thinking the standard 1MB or 2MB hugepages instead of 4k. I know my desktop is fine with THP 'always', but there are a lot of things that are still handled 4k at a time. What would happen if the kernel's internals sacrificed some efficiency on footprint and allocated everything they could in 2MB chunks? What if my 8GB desktop only had 4096 page table entries. What if 4K pages were the 'exception' and not the rule (e.g., 'mUNadvise' in userland, and all drivers/filesystems/swap/kernel pages using 2MB).

              I'm not an expert on any of this, but five levels of page tables seems like a lot, and I think casual desktop-and-server workloads might benefit from Huge Pages throughout.
              I'm certainly no expert either, but AFAIK, there was a time when Linux actually had Huge Pages enabled by default; however, it became apparent that the daemon responsible ('khugepaged' AFAIR) could cause noticeable system stalls of up to a few seconds[!], which is why upstream came up with the nowadays sane default of "madvise" in the first place.

              Comment


              • #8
                Originally posted by Linuxxx View Post

                I'm certainly no expert either, but AFAIK, there was a time when Linux actually had Huge Pages enabled by default; however, it became apparent that the daemon responsible ('khugepaged' AFAIR) could cause noticeable system stalls of up to a few seconds[!], which is why upstream came up with the nowadays sane default of "madvise" in the first place.
                Oh yeah, I think RHEL/CentOS/Fedora still have it on by default. What I'm saying is to try doing it differently, jump -all the way in- so the huge pages for apps aren't swimming around in a sea of 4K pages, with khugepaged trying to clear out space for THP allocations. I'm WAY out of my league, but it seems like the stalling wouldn't be a problem if all the pages were huge to start with, including inside kernel memory structures and caches and stuff. The goal would be to dramatically reduce TLB misses across the board.

                Comment


                • #9
                  Please note:

                  The fact that "Intel worked on the 5-level paging Linux kernel support going back five years", *does not* automatically mean that Intel's work on the 6-level paging support goes back six years.

                  Comment


                  • #10
                    Originally posted by mangeek View Post

                    I'm thinking the standard 1MB or 2MB hugepages instead of 4k. I know my desktop is fine with THP 'always', but there are a lot of things that are still handled 4k at a time. What would happen if the kernel's internals sacrificed some efficiency on footprint and allocated everything they could in 2MB chunks? What if my 8GB desktop only had 4096 page table entries. What if 4K pages were the 'exception' and not the rule (e.g., 'mUNadvise' in userland, and all drivers/filesystems/swap/kernel pages using 2MB).

                    I'm not an expert on any of this, but five levels of page tables seems like a lot, and I think casual desktop-and-server workloads might benefit from Huge Pages throughout.
                    This could work, but only if we rewrite the dynamic linker and lots of stuff depends on 4K to use the fallback flag.

                    Almost every dynamic linker and elf assumes you got 4K pages so that they can map part of the executable as read-only or execute-only while others being RW.

                    If the kernel switches to 1MB/2MB by default, then I think they would have to rewrite that, and it’s unclear how many of these out there assumes 4K pages.

                    Most of the benefits of huge pages is said to come from database though, which indeed can benefit from mapping their database in 1MB/2MB or even 1G to speed up searching by reducing TLB cache misses.

                    IDK whether huge page benefits other memory-hungry applications like web server, web browser and JVM.

                    Comment

                    Working...
                    X