Originally posted by BillBroadley
View Post
Announcement
Collapse
No announcement yet.
Axboe Achieves 8M IOPS Per-Core With Newest Linux Optimization Patches
Collapse
X
-
-
Originally posted by coder View PostRealistically, anyone doing anything like that amount of IOPS is probably going to use NVDIMMs and PMEMFILE.
Sure desktop rarely need 2M IOPs, but games are often written with ease of programming and not optimal I/O. Additionally 3D environments with various z-buffer, load objects as you run/fly/drive around 3D environments, on demand textures (in multiple resolutions), etc generate large amounts of I/O. Sure it might not be 10M IOPS, but having to dedicate 5% of a single core instead of 10% is a win. Doubly so if *gasp* you actually multitask while in games, maybe recording a video stream of the game, or running anything else intensive. Even rather sedate games like MS flight sim can generate a fair bit of I/O.
On more mobile platforms running on battery, using 5-10% less power for I/O can be a noticeable savings.Last edited by BillBroadley; 19 October 2021, 12:58 AM.
- Likes 1
Leave a comment:
-
Originally posted by yump View Post64 cores * 3 GHz / (165 MIOP/s) is a little over 1000 CPU cycles per I/O. That doesn't sound like much to me.
However, if they have some reason not to, then don't forget that these numbers only accounted for a single CPU. You could scale up to more CPUs. In the future, CPUs could scale up to more cores, there's potential clock scaling, IPC improvements, DDR5, chip stacking (AMD's V-Cache, for instance), and CPUs are continually adding tweaks like TSX or Intel's upcoming userspace interrupts, which could serve to further optimize some otherwise-stubborn syscall overheads. So, I wouldn't worry about CPUs running out of gas anytime soon.
And, if that's still not enough compute, CXL's recently-added support for memory devices will enable you to even scale up to more than 2 Epyc CPUs sharing a pool of nonvolatile memory.
As a matter of fact, it's really Optane that's running out of gas! Intel's 2nd generation Optane has only managed 4 layers, while 3D NAND is now up to something like 384 layers?
According to this, Samsung is developing 5-layer DDR5 DRAM. I don't know how the areal density of DRAM compares with 3D XPoint, but it'd be ironic if Optane even lost the density and GB/$ race to DDR5.
Last edited by coder; 18 October 2021, 01:35 AM.
Leave a comment:
-
Originally posted by coder View PostTo put some numbers to it, I think Axboe said the single SSD could handle only 5.5 M IOPS. If you put 30 of them on a single 64-core Epyc, then that's just 165 M IOPS worth of SSD capacity. At 8 M IOPS per core, linear scaling would predict 512 M IOPS. Of course, the server CPUs run a lower clockspeed and we know scaling won't be linear, but I also didn't count the SMT threads.
Of course, that's all very simplistic, but I think it's clear the CPU is still far ahead of storage, leaving plenty of cycles for the network stack and for userspace code to do interesting things with the data.
Leave a comment:
-
Originally posted by sdack View PostWhat you have is a case of whataboutism.
You clearly stated the following:
Originally posted by sdack View PostUNIX/Linux systems have always dominated the server market, because of their persistency. No other OS could deliver the reliability and thus uptimes as UNIX/Linux could.
Originally posted by sdack View PostUNIX/Linux beat the dominance of Microsoft's operating systems, because one cannot run a reliable service when every software update requires a reboot.
Originally posted by sdack View PostOther OSes did not manage to dominate, not because they did not offer persistency, but they lacked in other qualities, which UNIX/Linux has in addition to its persistency.
Originally posted by sdack View PostAs you may know, has UNIX also become unpopular and it is now mostly only Linux.
- Likes 4
Leave a comment:
-
a print on demand phoronix tshirt could be "Go AxBoe or go home."
- Likes 3
Leave a comment:
-
Originally posted by Space Heater View PostIBM i, z/OS, OpenVMS, and HPE nonstop are clear examples of operating systems that typically have greater availability compared with Unix and Unix-like systems. Yet Unix and Unix-like systems still took over the market.
Leave a comment:
-
Originally posted by sdack View PostYou do not want CPUs to do menial workloads of shuffling memory around. This is part of the point here. Memory is handled by memory controllers and MMUs,
Of course, someone is probably going to chime in about the new data-streaming accelerator engines, in Ice Lake SP or Sapphire Rapids (I forget which). However, if your goal is to replace DRAM with nonvolatile memory, then you can't outsource all memory accesses to a separate engine - it's got to be the CPU accessing it.
Originally posted by sdack View Postwhen Axboe drops his Intel box for an AMD one, because the Intel box could not max out the full potential of an Intel storage device ... You did notice it, too, right?!
Originally posted by sdack View PostIntel will solve it not by tweaking the CPUs or relying on software, but they will seek to develop standards with the memory industry to integrate these new technologies on the hardware level without relying too much on a CPU's processing power.
They are also pushing CXL and probably helped drive the inclusion of memory devices into recent version of the spec.
Originally posted by sdack View PostConsumer SSDs are now reaching 7GB/sec, although weak in IOPS, but as you can see is this limitation falling quickly.
- Likes 1
Leave a comment:
-
Originally posted by coder View PostOf course, that's all very simplistic, but I think it's clear the CPU is still far ahead of storage, leaving plenty of cycles for the network stack and for userspace code to do interesting things with the data.
Leave a comment:
-
Originally posted by blackshard View Post
I understand, but it is not the idea of optimizing the api that I'm criticizing, but the numbers!
As long as there is not a serious benchmark with consistent variables, all those numbers (7M, 7.4M, 8M IOPS...) are just trash...
I mean: I could take a 5900X and do 8M IOPS. Then I overclock the 5900X to an higher stellar frequency and do 9M IOPS, and so I reach a new record; but what matters? The api/algorithm below isn't any better, just throwing out a bigger useless number.
Originally posted by blackshard View Postthe numbers are not contextualized. We don't know the variables in the game so we can't say how much of the bigger number is due to io_uring optimization and how much due to just more powerful and capable hardware.
All by the same guy, he shares more info in tweets that the articles reference.
He's only used two different systems, but the Optane storage he tested against remained the same until he saturated it's controller. I don't recall the exact amount before he upgraded to the newer system, I think it might have been around 3.8M or something. The linked tweet is about his new record on the upgraded system, where he notes he got it to the point that one of these storage devices alone was the new bottleneck, regardless of CPU speed.
So with that goal achieved, he added a 2nd Optane disk (same model), and had the goal of seeing how much he could get his particular CPU core to handle across both devices. We're now at 8M and these devices handle around 5M IOPS each.
So yes, there may be a slight boost during the CPU/system upgrade, but he's made steady improvements on both systems prior and after. You don't have to pay attention to the specific numbers, but the scale/ratio of improvements is worthwhile. We went from like 2M to 8M, a 4x improvement, and that was a big improvement over what AIO was capable of already IIRC.
As for benchmarking, besides him being the only source with hardware only changing once, he details that he uses FIO, a common disk I/O benchmarking tool. The linked tweet thread even has him share the command he's been using to get these results: taskset -c 0 t/io_uring -b512 -d128 -s32 -c32 -p1 -F1 -B1 -n1 /dev/nvme2n1
So... not that many "variables in the game"? Bulk of the improvements reported are from optimizations, very little from more powerful hardware.
Although in order to benefit from those new records, you would need hardware of similar capability (eg without the Optane, you'd bottleneck on the storage device considerably earlier, top NVMe products like WD SN850 or Samsung 980 Pro peak IOPS around 1M, regular SATA SSDs like a Crucial MX500 at around 100k IOPS. If you don't saturate the storage device, then like the developer, you need a CPU that can handle such load.
In practice, most of us won't perform workloads that push hardware like that... we probably don't even regularly saturate a SATA SSD IOPS. AFAIK, it's only going to matter if you can't already saturate the IOPS capability due to bottleneck of CPU. That shouldn't be an issue for the SATA SSD with the 100k IOPS, but less CPU usage should be required to perform the same amount of IOPS, but for short bursts of random I/O you'd probably not notice... 1k I/O issued on that 100k IOPS device would be done in 10ms? The blip in CPU usage wouldn't be perceived. (I could be completely misunderstanding the topic here, not an expert)
- Likes 6
Leave a comment:
Leave a comment: