AMD Preparing 5-Level Paging Linux Support For Future CPUs

mangeek replied

13 August 2021, 01:06 AM
Consider that THP is already default enabled for RHEL, and the page cache and swap are recent experimental kernel options that seem to help a lot. I have a suspicion that doing things like allocating memory within the kernel in huge pages might 'unclog' the TLB and lead to significant performance gains. If the TLB gets clobbered every time the system needs to do kernel stuff, the full advantages of THP aren't being realized.

Transparent huge pages in the page cache [LWN.net]

https://lwn.net/Articles/686690/

The final step for huge-page swapping [LWN.net]

https://lwn.net/Articles/758677/
Leave a comment:
NobodyXu replied

11 August 2021, 03:32 AM
Originally posted by mangeek View Post

I'm thinking the standard 1MB or 2MB hugepages instead of 4k. I know my desktop is fine with THP 'always', but there are a lot of things that are still handled 4k at a time. What would happen if the kernel's internals sacrificed some efficiency on footprint and allocated everything they could in 2MB chunks? What if my 8GB desktop only had 4096 page table entries. What if 4K pages were the 'exception' and not the rule (e.g., 'mUNadvise' in userland, and all drivers/filesystems/swap/kernel pages using 2MB).

I'm not an expert on any of this, but five levels of page tables seems like a lot, and I think casual desktop-and-server workloads might benefit from Huge Pages throughout.

This could work, but only if we rewrite the dynamic linker and lots of stuff depends on 4K to use the fallback flag.

Almost every dynamic linker and elf assumes you got 4K pages so that they can map part of the executable as read-only or execute-only while others being RW.

If the kernel switches to 1MB/2MB by default, then I think they would have to rewrite that, and it’s unclear how many of these out there assumes 4K pages.

Most of the benefits of huge pages is said to come from database though, which indeed can benefit from mapping their database in 1MB/2MB or even 1G to speed up searching by reducing TLB cache misses.

IDK whether huge page benefits other memory-hungry applications like web server, web browser and JVM.
Leave a comment:
lowflyer replied

11 August 2021, 02:14 AM
Please note:

The fact that "Intel worked on the 5-level paging Linux kernel support going back five years", *does not* automatically mean that Intel's work on the 6-level paging support goes back six years.
Leave a comment:
mangeek replied

10 August 2021, 07:08 PM
Originally posted by Linuxxx View Post

I'm certainly no expert either, but AFAIK, there was a time when Linux actually had Huge Pages enabled by default; however, it became apparent that the daemon responsible ('khugepaged' AFAIR) could cause noticeable system stalls of up to a few seconds[!], which is why upstream came up with the nowadays sane default of "madvise" in the first place.

Oh yeah, I think RHEL/CentOS/Fedora still have it on by default. What I'm saying is to try doing it differently, jump -all the way in- so the huge pages for apps aren't swimming around in a sea of 4K pages, with khugepaged trying to clear out space for THP allocations. I'm WAY out of my league, but it seems like the stalling wouldn't be a problem if all the pages were huge to start with, including inside kernel memory structures and caches and stuff. The goal would be to dramatically reduce TLB misses across the board.
Likes 1
Leave a comment:
Linuxxx replied

10 August 2021, 06:43 PM
Originally posted by mangeek View Post

I'm thinking the standard 1MB or 2MB hugepages instead of 4k. I know my desktop is fine with THP 'always', but there are a lot of things that are still handled 4k at a time. What would happen if the kernel's internals sacrificed some efficiency on footprint and allocated everything they could in 2MB chunks? What if my 8GB desktop only had 4096 page table entries. What if 4K pages were the 'exception' and not the rule (e.g., 'mUNadvise' in userland, and all drivers/filesystems/swap/kernel pages using 2MB).

I'm not an expert on any of this, but five levels of page tables seems like a lot, and I think casual desktop-and-server workloads might benefit from Huge Pages throughout.

I'm certainly no expert either, but AFAIK, there was a time when Linux actually had Huge Pages enabled by default; however, it became apparent that the daemon responsible ('khugepaged' AFAIR) could cause noticeable system stalls of up to a few seconds[!], which is why upstream came up with the nowadays sane default of "madvise" in the first place.
Likes 1
Leave a comment:
mangeek replied

10 August 2021, 02:05 PM
Originally posted by pkunk View Post

How huge?

I'm thinking the standard 1MB or 2MB hugepages instead of 4k. I know my desktop is fine with THP 'always', but there are a lot of things that are still handled 4k at a time. What would happen if the kernel's internals sacrificed some efficiency on footprint and allocated everything they could in 2MB chunks? What if my 8GB desktop only had 4096 page table entries. What if 4K pages were the 'exception' and not the rule (e.g., 'mUNadvise' in userland, and all drivers/filesystems/swap/kernel pages using 2MB).

I'm not an expert on any of this, but five levels of page tables seems like a lot, and I think casual desktop-and-server workloads might benefit from Huge Pages throughout.
Likes 1
Leave a comment:
pkunk replied

10 August 2021, 11:31 AM
Originally posted by mangeek View Post

I was wondering what would happen if a kernel was designed that used Huge Pages natively, inside the kernel itself, userland, buffers, and caches. Would the page table efficiency benefits overcome the loss of a few megabytes of RAM?

How huge? 1 TiB pages with 512 GiB page table should cover entire 64 bit address space.

Last edited by pkunk; 10 August 2021, 12:10 PM.
Leave a comment:
mangeek replied

10 August 2021, 11:16 AM
I was wondering what would happen if a kernel was designed that used Huge Pages natively, inside the kernel itself, userland, buffers, and caches. Would the page table efficiency benefits overcome the loss of a few megabytes of RAM?
Leave a comment:
pegasus replied

10 August 2021, 09:20 AM
I assume this relates to some degree to CXL attached memory support becoming available with the next gen servers. With those you'll be able to plug in many many terabytes of memory as RAM under one system, without mucking with strange numa configs and so on. Looking forward to systems with 1TB ram per core ...
Leave a comment:
phielix replied

10 August 2021, 08:26 AM
Hi, I just noticed that the Link to the White-Paper doesn't work!

Here's one that works

Access Denied

https://software.intel.com/content/dam/develop/public/us/en/documents/5-level-paging-white-paper.pdf

Best wishes
Leave a comment:

Announcement

AMD Preparing 5-Level Paging Linux Support For Future CPUs

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment: