That's the benchmark I was looking for. Especially Openfoam and other HPC workloads are benefitting alot.
AMD Ryzen 7 5800X3D On Linux: Not For Gaming, But Very Exciting For Other Workloads
Collapse
X
-
Originally posted by skeevy420 View Post
Zstd as well. Like Michael points out in the article, that probably had good benefits for file systems using Zstd for compression. I wonder if LZ4, XZ, and other codecs get performance improvements as well.
Lz4 has a sliding window for compression (see LZ77), so a tiny dictionary of sorts and it'll likely be slower on the 3D variant.
Xz can have a big dictionary. It would likely benefit from the increased cache allowing speed operation with bigger dictionaries.
Comment
-
-
Originally posted by domih View PostThanks for this article as well as the one on Milan X!
I'm not a gamer so I'm mostly interested in the possible performance increase brought by 3D V-cache in development related tools and servers. What about (beyond ML/DL):
- JSON parsing (in various languages),
- XML parsing and other XML operations (in various languages),
- MySQL, PostgreSQL,
- Cassandra,
- Large Python or PHP list and dict handling,
- JIT compilation in Java, Python, PHP,
- Crypto (AES, RSA),
- JavaScript,
- Web servers (Apache, Nginx).
For MySQL, it may help with filesorts, though you'll need to increase sort_buffer_size to take advantage of it (and if you're doing filesorts where that matters your schema/indexes/queries really need a rethink).
For Cassandra, I can't see it helping much. Usually the heap in a Java app is much bigger than 96 MB. When I run Cassandra in production, I generally set newgen to several gigabytes. Compactions spew new objects that end up polluting oldgen if newgen is too small.
PHP is unlikely to benefit. In all my own benchmarking, PHP cares mostly about memory latency. It suffers from the higher cache latency in Epyc (even Zen 3) compared to Xeon. Ditto memcache. It wouldn't surprise me if Python behaves differently, but I don't have any highly used Python running.
Java apps can be a mixed bag. Some care about cache latency, like API servers. Others don't, like Kafka. I could see the 3D variant being useful in some situations.
For PHP, Python, Java, and JavaScript, a lot of what the programs do often results in pointer-chasing. If your working set fits in the 96 MB cache and not in the 32 MB cache, that can make a big difference. The JIT process doesn't get run often.
Crypto is all frequency oriented as keys fit in the L1 cache. 3D won't help here.
Nobody is using Apache where performance matters. Context switches galore. The extra cache might help keeping all the processes' memory in L3, maybe.
It could be a boost for Nginx. The extra cache could keep more static files in L3 (great for mapped IO) and could also be used for TCP buffers if Nginx is producing traffic... but on a highly loaded server with tens of thousands of connections, TCP buffers consume hundreds of MB of memory even if default sizes are small. Though Nginx would have no trouble saturating a 10 Gb connection with a 5800X3D well before the cache would help. You'd probably have to go 100 Gb before it would matter: but then the number of connections would need too much buffer memory anyway.
I'd be interested to see how it impacts over-subscribed environments. I bet you could run more VMs/containers/processes well on a 5800X3D compared to the 5800X. Maybe on a busy dev box with a replicated stack it would give better performance when paging from a bigger L3. But if you're running that many processes, a CPU with more cores is probably a better idea. On a dev box, I'd probably spend the money on a better NVMe drive first.
Comment
-
-
Originally posted by Michael View Post
Click the openbenchmarking.org link on the last page of the article, some of those are covered. Others are coming.
Also navigating from this page will also yield in-progress metrics for other tests - https://openbenchmarking.org/s/AMD+R...5800X3D+8-Core
Comment
-
-
Originally posted by LinuxID10T View Post
I think it is just the type of games and applications being used. For games, most Windows games are larger and more complex and need a larger amount of space in the cache. You only see increases in performance where core logic to games is too big to fit in a traditional L3 cache and has to be fetched from memory. As for applications, most publications didn't really test any specialized deep learning or HPC applications due to the audience. Linux is just much bigger than Windows in those spaces.
DDraceNetwork? Xonotic? Tesseract? Those aren't going to show any benefit no matter what OS you're on.
It would be interesting to see someone do some tests on actual windows AAA games through Proton, to see how they perform. I'm not saying they will definitely show any benefit, but it's certainly possible they'll show a large one and the Deus Ex results make me think it's likely they would. And yes, I know why Michael doesn't do that. Doesn't change the fact that it'd be far more interesting results.Last edited by smitty3268; 25 April 2022, 08:45 PM.
Comment
-
-
Originally posted by willmore View PostYou're speaking in the context of compression programs here and you are dead wrong. Saying that a compression program is using too much memory is like saying that an algorythm is lazy because it didn't find a way to solve the Traveling Salesman problem in polynomial time. I don't want to quote the Cat in the Hat, but he's right.
To address your overly pedantic opinion, a program is well written if it takes advantage of the hardware on which it runs. LZ4 doesn't happen to be able to do that as it's already fully satisfied by even a basic processor, but that's because it's designed for a little processor. Honestly, it's a bit silly to be using it as a benchmark in this way. It's about as good of a benchmark as 'grep /proc/cpuinfo "cpu MHz"'
Comment
-
-
Originally posted by willmore View PostLZ4 is very cache friendly. It reads through its input buffer and coppies that to the output buffer and occasionally reads back a little bit in the output buffer and coppies that to the current end of the output buffer. This makes it very tolerant of cache eviction, etc. It's very easy for simple fetch predictors to keep up with the way LZ4 accesses memory. Other compression programs use different methods with much larger cache footprints, but LZ4 is not one of them. It's suitable for microcontrollers, etc. If you have enough memory for the input buffer and the output buffer, you can do LZ4 without extra storage. You can't do that with BZ2 or other transform based systems. You're speaking in the context of compression programs here and you are dead wrong. Saying that a compression program is using too much memory is like saying that an algorythm is lazy because it didn't find a way to solve the Traveling Salesman problem in polynomial time. I don't want to quote the Cat in the Hat, but he's right. To address your overly pedantic opinion, a program is well written if it takes advantage of the hardware on which it runs. LZ4 doesn't happen to be able to do that as it's already fully satisfied by even a basic processor, but that's because it's designed for a little processor. Honestly, it's a bit silly to be using it as a benchmark in this way. It's about as good of a benchmark as 'grep /proc/cpuinfo "cpu MHz"'
Comment
-
-
Originally posted by Raka555 View Post
Being a bit pedantic here but the app don't "take advantage" of larger cache.
It is more like bloated apps that require larger caches.
Comment
-
-
Originally posted by miskol View PostIt will be nice to see benchmarks 5800X vs 5800X3D on same frequency
So we see how much V- cache add
as 5800X and 5800X3D has different frequency
or downclock 5800X
Comment
-
Comment