Announcement

Collapse
No announcement yet.

AMD Preparing 5-Level Paging Linux Support For Future CPUs

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • mangeek
    replied
    Consider that THP is already default enabled for RHEL, and the page cache and swap are recent experimental kernel options that seem to help a lot. I have a suspicion that doing things like allocating memory within the kernel in huge pages might 'unclog' the TLB and lead to significant performance gains. If the TLB gets clobbered every time the system needs to do kernel stuff, the full advantages of THP aren't being realized.


    Leave a comment:


  • NobodyXu
    replied
    Originally posted by mangeek View Post

    I'm thinking the standard 1MB or 2MB hugepages instead of 4k. I know my desktop is fine with THP 'always', but there are a lot of things that are still handled 4k at a time. What would happen if the kernel's internals sacrificed some efficiency on footprint and allocated everything they could in 2MB chunks? What if my 8GB desktop only had 4096 page table entries. What if 4K pages were the 'exception' and not the rule (e.g., 'mUNadvise' in userland, and all drivers/filesystems/swap/kernel pages using 2MB).

    I'm not an expert on any of this, but five levels of page tables seems like a lot, and I think casual desktop-and-server workloads might benefit from Huge Pages throughout.
    This could work, but only if we rewrite the dynamic linker and lots of stuff depends on 4K to use the fallback flag.

    Almost every dynamic linker and elf assumes you got 4K pages so that they can map part of the executable as read-only or execute-only while others being RW.

    If the kernel switches to 1MB/2MB by default, then I think they would have to rewrite that, and it’s unclear how many of these out there assumes 4K pages.

    Most of the benefits of huge pages is said to come from database though, which indeed can benefit from mapping their database in 1MB/2MB or even 1G to speed up searching by reducing TLB cache misses.

    IDK whether huge page benefits other memory-hungry applications like web server, web browser and JVM.

    Leave a comment:


  • lowflyer
    replied
    Please note:

    The fact that "Intel worked on the 5-level paging Linux kernel support going back five years", *does not* automatically mean that Intel's work on the 6-level paging support goes back six years.

    Leave a comment:


  • mangeek
    replied
    Originally posted by Linuxxx View Post

    I'm certainly no expert either, but AFAIK, there was a time when Linux actually had Huge Pages enabled by default; however, it became apparent that the daemon responsible ('khugepaged' AFAIR) could cause noticeable system stalls of up to a few seconds[!], which is why upstream came up with the nowadays sane default of "madvise" in the first place.
    Oh yeah, I think RHEL/CentOS/Fedora still have it on by default. What I'm saying is to try doing it differently, jump -all the way in- so the huge pages for apps aren't swimming around in a sea of 4K pages, with khugepaged trying to clear out space for THP allocations. I'm WAY out of my league, but it seems like the stalling wouldn't be a problem if all the pages were huge to start with, including inside kernel memory structures and caches and stuff. The goal would be to dramatically reduce TLB misses across the board.

    Leave a comment:


  • Linuxxx
    replied
    Originally posted by mangeek View Post

    I'm thinking the standard 1MB or 2MB hugepages instead of 4k. I know my desktop is fine with THP 'always', but there are a lot of things that are still handled 4k at a time. What would happen if the kernel's internals sacrificed some efficiency on footprint and allocated everything they could in 2MB chunks? What if my 8GB desktop only had 4096 page table entries. What if 4K pages were the 'exception' and not the rule (e.g., 'mUNadvise' in userland, and all drivers/filesystems/swap/kernel pages using 2MB).

    I'm not an expert on any of this, but five levels of page tables seems like a lot, and I think casual desktop-and-server workloads might benefit from Huge Pages throughout.
    I'm certainly no expert either, but AFAIK, there was a time when Linux actually had Huge Pages enabled by default; however, it became apparent that the daemon responsible ('khugepaged' AFAIR) could cause noticeable system stalls of up to a few seconds[!], which is why upstream came up with the nowadays sane default of "madvise" in the first place.

    Leave a comment:


  • mangeek
    replied
    Originally posted by pkunk View Post

    How huge?
    I'm thinking the standard 1MB or 2MB hugepages instead of 4k. I know my desktop is fine with THP 'always', but there are a lot of things that are still handled 4k at a time. What would happen if the kernel's internals sacrificed some efficiency on footprint and allocated everything they could in 2MB chunks? What if my 8GB desktop only had 4096 page table entries. What if 4K pages were the 'exception' and not the rule (e.g., 'mUNadvise' in userland, and all drivers/filesystems/swap/kernel pages using 2MB).

    I'm not an expert on any of this, but five levels of page tables seems like a lot, and I think casual desktop-and-server workloads might benefit from Huge Pages throughout.

    Leave a comment:


  • pkunk
    replied
    Originally posted by mangeek View Post
    I was wondering what would happen if a kernel was designed that used Huge Pages natively, inside the kernel itself, userland, buffers, and caches. Would the page table efficiency benefits overcome the loss of a few megabytes of RAM?
    How huge? 1 TiB pages with 512 GiB page table should cover entire 64 bit address space.
    Last edited by pkunk; 10 August 2021, 12:10 PM.

    Leave a comment:


  • mangeek
    replied
    I was wondering what would happen if a kernel was designed that used Huge Pages natively, inside the kernel itself, userland, buffers, and caches. Would the page table efficiency benefits overcome the loss of a few megabytes of RAM?

    Leave a comment:


  • pegasus
    replied
    I assume this relates to some degree to CXL attached memory support becoming available with the next gen servers. With those you'll be able to plug in many many terabytes of memory as RAM under one system, without mucking with strange numa configs and so on. Looking forward to systems with 1TB ram per core ...

    Leave a comment:


  • phielix
    replied
    Hi, I just noticed that the Link to the White-Paper doesn't work!

    Here's one that works


    Best wishes

    Leave a comment:

Working...
X