The Linux Kernel Is Preparing To Enable 5-Level Paging By Default

caligula replied

15 September 2019, 06:40 AM
Originally posted by torsionbar28 View Post

Ok, so with "some vendors" hitting 64 TiB of physical memory, that must mean either 14 sockets of latest Xeon Platinum, or 32 sockets of EPYC. That's a huge machine, something in the class of an HP Superdome. IME, those types of machines typically support hardware partitioning, so they rarely run a single OS instance. In my 10 years of working on Superdome (and AlphaServer GS before that) I don't think I ran into even a single customer who was running a single OS instance across all the sockets. I guess I can see the need to increase this memory limit, but we're talking very niche use case right now.

Well you know, it takes years to get these things in mainline. Linux is still struggling with the year 2038 problem. It will still take 10 years until all enterprise distros are year 2038 compatible, that is, in 2029. Some unlucky customers might still be running RHEL 6 with 2.6 kernels when the Y2038 problem kicks in. Probably some embedded systems will run Linux 2.2 in 2040.
Leave a comment:
caligula replied

15 September 2019, 06:35 AM
Originally posted by andyprough View Post

Only 4 PiB? Apparently I'm going to have to downgrade my laptop. Not sure how I will run Chrome with 8 trillion open tabs all streaming Netflix now. Never going to get through all the episodes of It's Always Sunny in Philadelphia at this rate.

It's only a matter of time. My first laptop had 16 MB of RAM (in 1997). Now my latest laptop has had 32 GB for almost a year now. Probably will take a similar amount of time (20 years) to reach 32 TB of RAM on every laptop. It's easy to imagine even small workstation and servers will have 10-100 times more capacity.
Leave a comment:
Alex/AT replied

15 September 2019, 05:08 AM
I guess intel is aligning industry the wrong way yet again. Paging for paging for paging for paging.

IMOSNSHO, instead of making more nested page levels of fixed size pages, 4K-aligned extents should just be introduced to the page management 'API', along with 'interrupts' to make OS kernel (re)designate a block of physical RAM for CPU paging operations. Per paging namespace (CPL or VM nesting).

Then CPUs would be able to arrange MM internally any way they want (direct extent based MMU operations and xlate caching, or 3 levels of fixed pages, 100 levels, 100500 levels, whatever). This will give enough flexibility and compatibility, while simplifying software based memory managing logic at the same time. Huge sized pages transparency issue goes here as well.

Last edited by Alex/AT; 15 September 2019, 05:12 AM.
Leave a comment:
yoshi314 replied

15 September 2019, 04:22 AM
Intel's 5-level paging works by extending the size of virtual addresses to 57 bits from 48 bits.

wait, how does that work, why so uneven amount of bytes?
Leave a comment:
andyprough replied

14 September 2019, 10:43 PM
Only 4 PiB? Apparently I'm going to have to downgrade my laptop. Not sure how I will run Chrome with 8 trillion open tabs all streaming Netflix now. Never going to get through all the episodes of It's Always Sunny in Philadelphia at this rate.
Likes 1
Leave a comment:
HyperDrive replied

14 September 2019, 09:55 PM
Originally posted by chithanh View Post

Actually, the comment on Power was on the mark. PowerPC uses hash-table paging so does not need to introduce a new translation layer just because that much memory could exist in a system.

And Power ISA 3.0 (POWER9) introduced radix tree page tables because, guess what, hashed page tables suck for cache locality.
Likes 3
Leave a comment:
torsionbar28 replied

14 September 2019, 09:00 PM
Originally posted by Space Heater View Post

From the documentation on 5-level paging:
Original x86-64 was limited by 4-level paing to 256 TiB of virtual address space and 64 TiB of physical address space. We are already bumping into this limit: some vendors offers servers with 64 TiB of memory today.

Ok, so with "some vendors" hitting 64 TiB of physical memory, that must mean either 14 sockets of latest Xeon Platinum, or 32 sockets of EPYC. That's a huge machine, something in the class of an HP Superdome. IME, those types of machines typically support hardware partitioning, so they rarely run a single OS instance. In my 10 years of working on Superdome (and AlphaServer GS before that) I don't think I ran into even a single customer who was running a single OS instance across all the sockets. I guess I can see the need to increase this memory limit, but we're talking very niche use case right now.

Last edited by torsionbar28; 14 September 2019, 09:03 PM.
Likes 1
Leave a comment:
coder replied

14 September 2019, 05:25 PM
Originally posted by abott View Post

Servers can use non-RAM storage as RAM cache. That alone is a reason to increase it to as large as is possible at any given time.

That would only explain the virtual address space increase, but they also increased the physical address space to 4 PiB.
Leave a comment:
coder replied

14 September 2019, 05:22 PM
Originally posted by torsionbar28 View Post

today, Xeon can only do 768 GiB and EPYC can do 2 TiB per socket? How are we in jeopardy of hitting this 256 TiB limit today or in the near future?

The 8280L Xeon can allegedly support up to 4.5 TB of memory, and that's per-CPU. I believe it scales up to 8-socket configurations.

https://ark.intel.com/content/www/us...-2-70-ghz.html

Edit: Oops, I see someone beat me to the punch. Well, Setif 's post doesn't mention multi-socket, so I'll leave this here.

Last edited by coder; 14 September 2019, 05:27 PM.
Likes 1
Leave a comment:
coder replied

14 September 2019, 05:16 PM
Originally posted by ThoreauHD View Post

This also smells like Zen 3 hbm/3D die stacking prep to me.

IMO, it's all about supporting Optane DIMMs. Nonvolatile storage is the only way I see them getting to petabytes.
Likes 1
Leave a comment:

Announcement

The Linux Kernel Is Preparing To Enable 5-Level Paging By Default

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment: