Announcement

**aaronw** · 16 September 2019, 04:46 AM

With RDMA and Infiniban this becomes less of an issue. Lustre also supports mmap file I/O. According to this page:

http://wiki.lustre.org/Lustre_Clients_Overview

The full POSIX test suite passes in an identical manner to a local EXT4 file system, with limited exceptions on Lustre clients. In a cluster, most operations are atomic so that clients never see stale data or metadata. The Lustre software supports mmap() file I/O.

**chithanh** · 16 September 2019, 06:15 AM

Originally posted by HyperDrive View Post

And Power ISA 3.0 (POWER9) introduced radix tree page tables because, guess what, hashed page tables suck for cache locality.

If and how much exactly this is better squarely depends on the type of workload. AI/HPC in particular benefits from radix tree page tables and POWER is now heavily marketed towards it, so the decision is understandable. And it's not like the 5-level paging was without performance issues (as mentioned in the article), how much the latest implementation is able to solve remains to be seen.

**caligula** · 18 September 2019, 11:56 AM

Originally posted by coder View Post

Node shrinks make transistors cheaper and more power-efficient - stacking does not.

Well, the original discussion revolved around capacity, not price. Sure, larger capacity results in more expensive designs.

Originally posted by coder View Post

You do get a one-time power-efficiency dividend with stacking, but as DRAM dies still burn power, even if stacking would somehow let you have more of them (just for the sake of argument), those capacity increases would not be applicable to power-constrained use cases, like laptops.

You can decrease the speed to conserve (dynamic) power. It's a tradeoff, but maybe a larger memory capacity might be desired property at some point. After all more memory means you need to swap out stuff less often. Larger memory capacities also enable building systems that store some data offline, e.g. 3d xpoint bcache/swap.

Originally posted by coder View Post

Only as a side-effect of HBM2, but you don't get any more capacity from doing that.

Of course you do. Having more memory channels implies you have more DRAM sockets, too. 4 sockets can provide twice as much memory as 2. For example many desktop systems support up to 16 or 32 gigabytes of RAM with 2 DDR4 sockets and up to 32 or 64 GB with 4 sockets and up to 64-128 GB with 8 sockets. There's plenty of space inside the chassis. I don't know why the laptops are becoming smaller each year, but it's technically 100% feasible to have 3 kg laptops instead of 1,2 kg ultrabooks. My first laptops were even larger than that. Just enlarge the chassis and put more memory sockets inside. It doesn't automatically lead to designs where you need heavyweight external batteries.

**coder** · 19 September 2019, 01:33 AM

Originally posted by caligula View Post

You can decrease the speed to conserve (dynamic) power.

The "one-time power-efficiency dividend" I mentioned is a result of lowering the interface speed. As for the DRAM, itself, it requires dynamic refresh - and that takes power. Just by having the dies, you need to power them. The more dies or the larger they are, the more idle power they'll require - stacking only saves interface power.

Originally posted by caligula View Post

Of course you do. Having more memory channels implies you have more DRAM sockets, too.

I thought you were talking about more channels as a side-effect of HBM2. I agree with that, because HBM2 channels (in GPUs, at least) are narrow, and the interfaces are wider due to being in-package. So, you somewhat naturally get more channels as a side-effect.

However, if you're talking about adding more channels for out-of-package memory, then I reject that scenario. Doing so has the inevitable consequences of:

increasing power consumption, by having to drive more memory
increasing system memory costs, by requiring more DIMMs
increasing board cost, by requiring more traces, layers, and possibly DIMM slots
increasing CPU/package cost, by requiring more memory controllers (needs more silicon) and requiring more pins.

It's an expensive and power-intensive way to add bandwidth, it doesn't scale well, and it doesn't apply to your laptop use case.

Originally posted by caligula View Post

it's technically 100% feasible to have 3 kg laptops instead of 1,2 kg ultrabooks.

These exist, but they're expensive, not very popular, and provide poor performance (or battery life) on battery. Over the years, I'm pretty sure I've seen mobile workstations with E5 Xeons, but I'm currently not finding any that are based on Xeon W or ThreadRipper - probably because both companies now offer so many cores in their mainstream desktop socket. And, BTW, they're almost certainly more than 3 kg.

**JustRob** · 23 September 2019, 01:30 PM

Originally posted by AsuMagic View Post

[insert xkcd joke]

The "Oracle StorageTek SL8500 Modular Library System" supports 2.1EBs (and you can chain them together), CleverSafe claims 10EBs, then there is this old quote that's easy to find: "The Large Hadron Collider generates around 15 petabytes of data every year. AT&T transfers approximately 20 petabytes of data through its network every day.".

Some of the comments are what has perpetuated the myth that human's only use 10% of their brains, those people also don't use 0.01% of the storage currently in use.

Announcement

The Linux Kernel Is Preparing To Enable 5-Level Paging By Default

Comment

Comment

Comment

Comment

Comment