Announcement

**Weasel** · 15 June 2018, 08:37 AM

Originally posted by chithanh View Post

The introduction of 5-level-paging last year is testament to that.

You are either delusional or have literally no idea how paging works. Without this indirection, the increase in memory usage (physical memory, not swappable) would be linear with the addressing space. You do realize there are page tables stored in memory for each page right?

That is, the larger the addressing space, the more RAM would be used, linearly (twice the addressing space = twice the RAM usage). Yes, actual, physical RAM -- not just addressing space limits. Do you have terabytes of RAM that you can simply waste on the stupid page tables? (this RAM will be completely unusable for anything else)

tl;dr 5-level-paging is a solution to a 64-bit problem. If you don't like the penalty when accessing memory that hits the page tables, then don't use 64-bit OS. Or get terabytes of RAM (i think that's still too little btw). Or learn how stuff works before you parroting nonsense hating on x86's success.

Also, we'll actually need at least 6-level paging to address the full 64-bit addressing space, so I wonder what will you say then. Maybe it finally seems like a waste to have such a large addressing space after all?

**chithanh** · 15 June 2018, 09:40 AM

Originally posted by Weasel View Post

tl;dr 5-level-paging is a solution to a 64-bit problem. If you don't like the penalty when accessing memory that hits the page tables, then don't use 64-bit OS. Or get terabytes of RAM (i think that's still too little btw). Or learn how stuff works before you parroting nonsense hating on x86's success.

Also, we'll actually need at least 6-level paging to address the full 64-bit addressing space, so I wonder what will you say then. Maybe it finally seems like a waste to have such a large addressing space after all?

Not true. You are delusional if you think that addressing 64-bits of memory requires 5-level (or even 6-level) paging in any way. The 5-level paging is entirely an artifact of early design decisions which legacy software now depends on, and is now forever enshrined in the lower 4 levels of paging. In other words, unfixable x86 crap.

**Weasel** · 15 June 2018, 09:56 AM

Delusion is one who thinks that mapping physical memory to virtual memory randomly (in any order) with one layer can be done with anything other than linear cost. (linear means, in case you don't know, that increasing the addressing space by a factor X increases the page table size by the same factor X). This isn't about the x86 design, it's a logical fact.

Explain how paging works down to the assembly level and how you would design it otherwise where each virtual memory page can be mapped to any physical memory page, or even on the disk (paging file) in a complete random way. That is, without wasting terabytes of RAM. People like you are a joke especially for talking about things you have ZERO CLUE of (have you even coded one userland application in x86 assembly which is already much simpler? so why speak about x86?).

In 32-bit (which is far simpler), you have the Page directory and the Page table. The Page table is an array of 1024 32-bit entries, where the address within (it's a struct with bitfields) points to the actual physical address that each page is mapped to. There are 1024 page tables and 1024 directory entries. So you have 1024*1024 mappings, but since the size of the page is 4 KiB, you can map all of the 32-bit addressing space with just this.

So in 32-bit you have ~1 million structures just for mapping pages, and since they're 32-bits each, that is 4 MiB of RAM wasted just on mapping a 4 GiB addressing space. It sounds minuscule, but if you were to increase the addressing space to 64 bits (which is 4 BILLION times more! that factor X), you would need 4 MiB * 4 billion = 16 million Gigabytes = 16 thousand Terabytes = 16 Petabytes of RAM just for the page mappings. Note that this is PERMANENTLY USED PHYSICAL RAM.

Obviously, without the multi-layer paging that you think is so bad mr. delusion.

How much RAM do you have on your machine again?

**chithanh** · 15 June 2018, 10:21 AM

I am not only talking about computational cost. It adds complexity to the whole thing which is bad in its own regard. I encourage you too to watch Gal Diskin's talk, especially the part about nested paging and VT-d nested translation (the latter is already at 7 levels a virtual address is removed from a physical address).

And not all of the translations are used to access memory more efficiently. The translation between system view and physical memory (where chipset firmware etc. reserves memory for itself) does not in any way save page table size.

**Weasel** · 15 June 2018, 11:07 AM

Who said anything about computational cost? I'm talking about *memory usage*.

Do you understand the simple (logical) fact that mapping zones requires data and information (i.e. page tables)? Stuff like "if you see virtual address X, then it refers to physical address Y" is information that has to be stored in RAM (it's not just that, of course, I'm keeping it simple...). Without multi-level paging, and a 64-bit addressing space, you'd need simply way too much memory for this.

If you dislike such translation, then don't use paging at all. Use 16-bit real mode, or an architecture that doesn't use paging (i.e. one in which you access memory directly, like MS-DOS but for a different architecture). Obviously, the system will be extremely fragile and insecure (imagine 1 application able to take down the entire system with just a simple buffer overflow, there will be no "page faults" or "segmentation fault", it will be silent data corruption or crash, since those are protections offered by PAGING).

Obviously it adds complexity to the whole thing. The entire paging mechanism does, but complexity is desirable, or you want to use MS-DOS? That's a very simple OS running in real mode, which uses no paging whatsoever and directly accesses memory -- no translation at all. FreeDOS (modern DOS) is a great simple OS, but that's the problem. Most people find it extremely limited because that's how "simple" OSes are. No paging means no protection, no separation of addresses between processes, no virtual memory at all. Any process can crash any other process or cause damage to the entire system. That's what no paging gets you.

The multi-layer thing is a 64-bit problem though. People who go like "but mah 64-bit address space is teh free!!! put 64-bits powah even on 1GB of RAM dudez, it's teh fashion coolzors" are always hilariously funny parrots to me.

Also I don't know Gal Diskin is, he might be right and you misunderstood him, or he's totally wrong. I don't have to listen to others, I know this stuff since I've developed around it back in the days (I've obviously simplified it but feel free to look it up and confirm it for yourself). These days I do mostly C/C++ though, not much asm anymore (and no system programming, only userland).

That said I'm not surprised that you might need 7-layers of paging for Virtualization. I never programmed anything regarding Virtualization, but it does make sense you'd have to add at least 1 extra layer if not more. So he's probably right.

**carewolf** · 15 June 2018, 02:47 PM

Originally posted by brrrrttttt View Post

Yup. You can either have the complexity at run time in the CPU, or in the compiler. In the CPU is the status quo, so there are no (good) compilers for VLIW. Classic chicken and egg.

Not entirely, because no only are there no good compilers for VLIW, but no compiler for VLIW can ever be written that is as efficient as one for CISC or RISC. Runtime scheduling is by nature more efficient than compile scheduling, more and better information is available at runtime and thus better scheduling performed.

**Hugh** · 20 June 2018, 10:24 PM

Originally posted by carewolf View Post

Not entirely, because no only are there no good compilers for VLIW, but no compiler for VLIW can ever be written that is as efficient as one for CISC or RISC. Runtime scheduling is by nature more efficient than compile scheduling, more and better information is available at runtime and thus better scheduling performed.

Each has wins.

I think that the fundamental problem is how to deal with memory latency.

Do you use a hardware managed cache? If you do, then the memory latency of each reference in the code is (in general) not known by an ahead-of-time compiler. But the wins of caches are so tasty! So AoT scheduling is not sufficient for most workloads. To be a little more honest: a software-managed cache (which is really what general-purpose registers are) can do a pretty good job, but there are limits and real programs appear to benefit from hardware-managed opportunistic caches (this is opinion, not fact).

On the other hand, scheduling AoT does have its own wins. For one thing, scheduling in hardware (part of which is the speculative execution that causes Spectre) requires a lot of hardware, all of which is hot (in use most of the time), all of which is on critical paths.

Some computations have such predictable data-flows that memory accesses are best scheduled by the AoT compiler. Typically these are big numerical codes. Processors these days have instructions to assist with this (memory prefetch instructions and loads that bypass the cache, for example).

Another way of hiding memory latency is by using SMT. When one thread is blocked on a memory fetch, run a different thread. This requires SMT to be designed with this goal. One earlyish example of this was the Sun Niagara architecture.

Announcement

CVE-2018-3665: Lazy State Save/Restore As The Latest CPU Speculative Execution Issue

Comment

Comment

Comment

Comment

Comment

Comment

Comment