7.4M IOPS Achieved Per-Core With Newest Linux Patches

coder replied

31 October 2021, 11:25 PM
Originally posted by nils_ View Post

Not at all, they are quite expensive (compared to regular NVMe drives) and difficult to get due to low stock but they don't cost as much as a (good) car. I paid 1800€ for the 800GB P5800X.

I just ran across some "lightly used" 400GB P5800X on ebay for $625 (buy-it-now):

Error Page | eBay

https://www.ebay.com/itm/165157927653

Note they're 2.5" U.2 drives, so you'll need a cable kit to use them.
Leave a comment:
coder replied

17 October 2021, 02:05 PM
Originally posted by pal666 View Post

compiling is easily parallelizable, which hides read latency.

Even doing parallel builds on HDDs, about a decade ago, I didn't see any real need or benefit from running more jobs than hardware threads. Disk cache & write buffering seem to do a very good job of alleviating disk bottlenecks. Of course, I was nearly always building C++ code with -O2, so my experiences could differ from someone doing kernel builds, for instance.

Again, linking could be a different story, depending on whether all of the input files were still in disk cache. This comes down to a question of how much RAM you have vs. the size of the codebase you're building.

Last edited by coder; 17 October 2021, 02:08 PM.
Leave a comment:
pal666 replied

17 October 2021, 12:16 PM
Originally posted by coder View Post

I'm guessing that has something to do with the reason Intel is yet to release any Gen2 Optane devices for consumers.

iirc intel decided to kill consumer optanes, so don't hold your breath
Leave a comment:
pal666 replied

17 October 2021, 12:08 PM
Originally posted by bug77 View Post

And to actually provide a use case: compiling a program is almost exclusively about 4k random access. Imagine if incremental compiling in the background would suddenly become feasible. It would make writing compiled code, feel almost like scripting.

compiling is easily parallelizable, which hides read latency. and usually everything your compiler reads is already in page cache. i.e. while you probably will suffer from compiling on hdd, it would be very hard to measure difference between optane and any decent ssd
Leave a comment:
coder replied

16 October 2021, 12:10 AM
Originally posted by quaz0r View Post

when somebody re-engineers code to do something way faster and more efficient than before, that means the previous implementation was doing it wrong. the new thing can be great and brilliant and cause for celebration and all, but it still also means the previous thing was doing it wrong, and i think we do ourselves as programmers a disservice by never acknowledging that. if you one day discover a direct route to the grocery store, where before your route consisted of first driving 500 miles in the opposite direction and then driving in circles for a week, its not so much that you engineered a brilliant new path, its that the previous thing was doing it wrong.

There's something else that occurred to me, which is that you seem to be suggesting io_uring is simpler, which it's definitely not. Not in its implementation and certainly not in its usage. That's yet another reason I don't consider legacy I/O APIs to be "wrong".
Leave a comment:
coder replied

16 October 2021, 12:04 AM
Originally posted by bug77 View Post

All things you do on a computer need 4k random reads (or maybe writes). Most of these are linked (but not limited) to reading (and saving) state/config.
4k random access is not always the bottleneck. but improving 4k performance is like improving the 99th percentile for games' fps.

For reads, caching and read-ahead works very well. For those times when it doesn't, the latency of regular SSDs is good enough that it almost doesn't matter.

As far as writes go, write buffering is pretty powerful stuff. That and caching are the main reasons PCs with modern operating systems were quite usable with HDDs.

Regarding the analogy with 99th percentile gaming framerates, it's not a bad one but also kind of pointless. For games, the reason 99th percentile matters is that it's a realtime application. When the framerate drops, it's very noticeable. Even if it happens just a couple times in a session, it's still enough to potentially cause problems for the player. Though it might not often cause them to get killed or miss a shot, it could happen at inopportune times that really spike their stress levels, and that makes it very memorable and a problem worth trying to solve.

However, when we're talking about run-of-the-mill computer usage, 99th percentile stuff is likely to go unnoticed and doesn't really matter, because hey it's less than 1% of the time. And a lot of that is going to be when the user isn't expecting an immediate response, anyhow.

Now, I think we can imagine there's some barely-perceptible improvement in startup times of big apps. That would be consistent with initial reviews I read of Intel's 900p and 905p consumer Optane drives, at least. So, I'm not trying to say Optane is completely irrelevant for consumers, but it's generally not even close to justifying the price delta. I'm guessing that has something to do with the reason Intel is yet to release any Gen2 Optane devices for consumers.

Originally posted by bug77 View Post

And to actually provide a use case: compiling a program is almost exclusively about 4k random access.

This is BS. I've been messing around with parallel and distributed builds since 2005, far back into the HDD era. At that time, I was even doing builds on NFS mounts. If ever disk I/O should've been a bottleneck, that was it. Yet, only during linking did I sometimes see the disk I/O bottleneck actually bite. Again, caching and write buffering do a tremendous job of hiding the raw latency and media transfer performance of the underlying storage device.

Originally posted by bug77 View Post

Imagine if incremental compiling in the background would suddenly become feasible. It would make writing compiled code, feel almost like scripting.

I wouldn't know from experience, but I think there are some IDEs that have done that for quite a while. At least, to the degree of telling you when you have a syntax error or have referenced a nonexistent symbol.

Last edited by coder; 16 October 2021, 10:29 AM.
Leave a comment:
coder replied

15 October 2021, 08:56 PM
Originally posted by ermo View Post

For the sake of argument, say you use Virtual Machines on a workstation as part of your $DAYJOB. Would doing regular snapshots of states for a high degree of data protection on the fly be a use case where Optane would offer a noticeable benefit over NVMe SSDs as perceived by a user?

Copying entire images should be a mostly sequential operation, which NAND SSDs can handle quite well. I'd just check that they can sustain write throughput for at least the size of your images. Write performance usually falls off a cliff, after a certain point, once their pseudo-SLC buffers are exhausted. You can read a bit about that, here:

Samsung 980 PRO PCIe 4.0 NVMe SSD Review

https://www.storagereview.com/review/samsung-980-pro-pcie-4-0-nvme-ssd-review

The Samsung 980 Pro was the best-performing consumer SSD we’ve tested to date, more than 2x the numbers of its competitors in some areas.

You'll definitely want to steer clear of QLC drives. There aren't any consumer-grade MLC options, AFAIK. I think everything is now TLC or QLC.

Originally posted by ermo View Post

BTW, Intel mentions that for a specific workload including medical imaging ... So that tracks with your assertion.

I'd characterize it as a mere supposition, but thanks for sharing the link.

Last edited by coder; 15 October 2021, 10:08 PM.
Leave a comment:
coder replied

15 October 2021, 08:50 PM
Originally posted by nils_ View Post

The consumer drives aren't particularly useful

I didn't mean for your purposes. They did have good QD=1 IOPS numbers, even if their sequential throughput wasn't above leading NAND-based competitors of their day.

Originally posted by nils_ View Post

and haven't been refreshed for PCIe4.0.

Well, we can still hope, I suppose.
Leave a comment:
bug77 replied

15 October 2021, 09:48 AM
Originally posted by coder View Post

That's a performance metric, not a use case. A use case is an example of what sort of tasks a user would perform that would noticeably benefit from high sequential, random 4k IOPS. Reboots would be one such example. That's about the only thing a normal user would do that comes to mind, where they could probably observe a performance improvement.

Examples of things professionals might do could involve searching through GIS data or maybe volumetric medical imaging on a dataset that's too big to fit in memory.

All things you do on a computer need 4k random reads (or maybe writes). Most of these are linked (but not limited) to reading (and saving) state/config.
4k random access is not always the bottleneck. but improving 4k performance is like improving the 99th percentile for games' fps.

And to actually provide a use case: compiling a program is almost exclusively about 4k random access. Imagine if incremental compiling in the background would suddenly become feasible. It would make writing compiled code, feel almost like scripting.
Leave a comment:
ermo replied

15 October 2021, 09:35 AM
Originally posted by coder View Post

That's a performance metric, not a use case. A use case is an example of what sort of tasks a user would perform that would noticeably benefit from high sequential, random 4k IOPS. Reboots would be one such example. That's about the only thing a normal user would do that comes to mind, where they could probably observe a performance improvement.

Examples of things professionals might do could involve searching through GIS data or maybe volumetric medical imaging on a dataset that's too big to fit in memory.

For the sake of argument, say you use Virtual Machines on a workstation as part of your $DAYJOB. Would doing regular snapshots of states for a high degree of data protection on the fly be a use case where Optane would offer a noticeable benefit over NVMe SSDs as perceived by a user?

BTW, Intel mentions that for a specific workload including medical imaging and analysis at an Italian university, Optane cut the necessary analysis time from 40 minutes to just 2 minutes (source). So that tracks with your assertion.

Last edited by ermo; 15 October 2021, 09:38 AM.
Leave a comment:

Announcement

7.4M IOPS Achieved Per-Core With Newest Linux Patches

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment: