Announcement

Collapse
No announcement yet.

Benchmarking AMD FX vs. Intel Sandy/Ivy Bridge CPUs Following Spectre, Meltdown, L1TF, Zombieload

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Benchmarking AMD FX vs. Intel Sandy/Ivy Bridge CPUs Following Spectre, Meltdown, L1TF, Zombieload

    Phoronix: Benchmarking AMD FX vs. Intel Sandy/Ivy Bridge CPUs Following Spectre, Meltdown, L1TF, Zombieload

    Now with MDS / Zombieload being public and seeing a 8~10% performance hit in the affected workloads as a result of the new mitigations to these Microarchitectural Data Sampling vulnerabilities, what's the overall performance look like now if going back to the days of AMD FX Vishera and Intel Sandybridge/Ivybridge processors? If Spectre, Meltdown, L1TF/Foreshadow, and now Zombieload had come to light years ago would it have shaken that pivotal point in the industry? Here are benchmarks looking at the the performance today with and without the mitigations to the known CPU vulnerabilities to date.

    http://www.phoronix.com/vr.php?view=27898

  • skeevy420
    replied
    Originally posted by atomsymbol View Post

    It is probable you will regret your behavior 20 years from now.

    Monkeys cannot write Shakespeare's work in finite time. https://en.wikipedia.org/wiki/Infinite_monkey_theorem

    You have free will to choose whether to incline to monkeys or to Shakespeare.
    If you had any idea how obscene Shakespeare actually is you probably wouldn't use that as your example. The puns and wordplay used back then don't come off the same way these days...but if you do know what to look for in those regards, Shakespeare makes for some good and funny reading.

    Honestly, I can tell you right now I'm not going to regret saying "IO fucked" or calling a certain "testing mouse" "retarded".

    Are you passing -pipe to the compiler?
    Yes I am. Is it detrimental in a ramdisk context? Note that I have a 24GB ramdisk (systemd default) and 24GB of system memory available while it's compiling (48GB total).

    Leave a comment:


  • atomsymbol
    replied
    Originally posted by skeevy420
    Because ...
    It is probable you will regret your behavior 20 years from now.

    Monkeys cannot write Shakespeare's work in finite time. https://en.wikipedia.org/wiki/Infinite_monkey_theorem

    You have free will to choose whether to incline to monkeys or to Shakespeare.

    Originally posted by skeevy420
    it's the compile that messes with the browser tabs and everything else. IO intensive parts of compiles bring desktops to a standstill.
    Are you passing -pipe to the compiler?

    Leave a comment:


  • atomsymbol
    replied
    Originally posted by skeevy420 View Post
    Multitask during a Wine or kernel compile on spinners without using a ramdisk and get back to me.

    I suppose if all one is doing is just compiling something, sure, probably not as necessary. The second you want to ... comment on Phoronix with Firefox you'll throw your computer out of the window.
    Are you starting a fresh new instance of Firefox (that is: not a new tab) to comment on Phoronix?

    I don't see how a new browser tab can heavily interfere with compilation on a spinning disk.

    I am running a web browser all the time. I never shutdown the machine for the night, I am suspending it (sync; echo mem > /sys/power/state).

    Leave a comment:


  • atomsymbol
    replied
    Originally posted by skeevy420 View Post
    Multitask during a Wine or kernel compile on spinners without using a ramdisk and get back to me.

    I suppose if all one is doing is just compiling something, sure, probably not as necessary. The second you want to watch Game of Thrones with SMPlayer or comment on Phoronix with Firefox you'll throw your computer out of the window.

    EDIT: And that's with keeping my sources on one disk, my os on another, and my media on yet another disk. It's very easy to IO fuck yourself with spinners.
    I don't understand why you are using vulgarisms in your posts.

    I would like to recommend to you to be more careful about publicly admitting that you are violating the copyright law by watching Game of Thrones with SMPlayer.

    Leave a comment:


  • skeevy420
    replied
    Originally posted by atomsymbol View Post

    I am not entirely convinced ramdisk makes sense on a spinning hard-disk for compilation tasks if the machine has a lot of RAM. A lot of RAM means that almost all intermediate data generated during the task will be served from the Linux kernel disk cache rather than be re-read from the disk. Writes to the spinning disk happen asynchronously, so they aren't interrupting the task as long as the writes do not exceed the disk write bandwidth. The initial (cold) read from the disk is there in both cases, a ramdisk does not reduce the amount of initial cold reads from the spinning disk. A ramdisk compared to a cache prevents automatic eviction of data from RAM, the user has explicit control over which data is in RAM and which data is on a disk.

    In summary, ramdisk is only useful when:
    • Task data writes exceed the disk write bandwidth (which is about 100 MB/s in case of a spinning disk)
      • The task write bandwidth requirement can be lowered by data compression (gzip/xz/zstd on individual files, compressed debug sections, ccache compression, btrfs filesystem compression, jpeg instead of png, ...)
    • The task's data access pattern does not match the Linux kernel's disk cache eviction policy
    Multitask during a Wine or kernel compile on spinners without using a ramdisk and get back to me.

    I suppose if all one is doing is just compiling something, sure, probably not as necessary. The second you want to watch Game of Thrones with SMPlayer or comment on Phoronix with Firefox you'll throw your computer out of the window.

    EDIT: And that's with keeping my sources on one disk, my os on another, and my media on yet another disk. It's very easy to IO fuck yourself with spinners.

    Leave a comment:


  • numacross
    replied
    Originally posted by Wielkie G View Post
    How are they different? The SMT is symmetric - each hardware thread is equal to the other one.

    Try your favorite workload on core 0 (by setting core affinity) and then core 1. See that there is no difference between these two results.

    Now try to run two instances of the workload, one on core 0 and the other on core 1. You will see that each workload is slower, but the aggregate throughput might be higher. For example, if each core throughput is 60% the original, then the aggregate is 120% and the SMT performance uplift is +20%.

    For example (Windows, as I don't have access to Linux right now), the 7zip compression benchmark on my machine (i7 3770k) shows 4500-4600 MIPS on core 0 and on core 1, when only one core is being used. When I run two instances (one on core 0 and the other on core 1) they show 3000-3100MIPS each - that's 6000-6200 MIPS aggregate and a 30-40% uplift.
    7-zip is a well behaved integer load which scales great on HT. The decompression benchmark would produce even greater numbers.

    Trying it with ffmpeg version 4.1.1 which abuses every SIMD including AVX2 on my 4790K (Turbo disabled, constant 4.3GHz):
    • Single instance limited to 1 thread (both decoding and encoding) in a cmd.exe started with /affinity 1 - core 0
    ffmpeg -threads 1 -i test.mp4 -benchmark -preset slow -crf 22 -c:a copy -threads 1 test_out.mkv
    bench: rtime=150.166s
    • Single instance limited to 2 threads in a in a cmd.exe started with /affinity 3 - core0 and 1
    ffmpeg -threads 2 -i test.mp4 -benchmark -preset slow -crf 22 -c:a copy -threads 2 test_out2.mkv
    bench: rtime=130.316s
    • Single instance limited to 1 thread in a in a cmd.exe started with /affinity 1 - core 0 and prime95 FMA3 load running on core 1 at the same time
    bench: rtime=257.803

    As you can see even an AVX2 load that is well behaved (-threads 2) actually got sped up a little by HT, however when the physical core is already loaded with AVX2 both virtual threads have to compete for shared resources. The aggregate bandwidth still being a bit better than -threads 1, let's be fair.

    I'm not arguing that HT doesn't work - in most cases it does, but it has quirks and disadvantages with some loads. It can also complicate matters if you are reliant on latency of execution or are running threads that have long chains of computation dependencies.

    Another potential trap is the cost of keeping all 8 HT cores synchronized vs. the cost of doing it for only 4 physical cores. It's not unheard of for games to perform better with HT disabled:

    Leave a comment:


  • atomsymbol
    replied
    Originally posted by skeevy420 View Post
    if you compile a lot of source it [ramdisk] is a damn great speed up for IO on spinners
    I am not entirely convinced ramdisk makes sense on a spinning hard-disk for compilation tasks if the machine has a lot of RAM. A lot of RAM means that almost all intermediate data generated during the task will be served from the Linux kernel disk cache rather than be re-read from the disk. Writes to the spinning disk happen asynchronously, so they aren't interrupting the task as long as the writes do not exceed the disk write bandwidth. The initial (cold) read from the disk is there in both cases, a ramdisk does not reduce the amount of initial cold reads from the spinning disk. A ramdisk compared to a cache prevents automatic eviction of data from RAM, the user has explicit control over which data is in RAM and which data is on a disk.

    In summary, ramdisk is only useful when:
    • Task data writes exceed the disk write bandwidth (which is about 100 MB/s in case of a spinning disk)
      • The task write bandwidth requirement can be lowered by data compression (gzip/xz/zstd on individual files, compressed debug sections, ccache compression, btrfs filesystem compression, jpeg instead of png, ...)
    • The task's data access pattern does not match the Linux kernel's disk cache eviction policy

    Leave a comment:


  • skeevy420
    replied
    Originally posted by atomsymbol View Post

    I am not sure I understand the advantages of ramdisks. Assuming the machine already has a SSD/NVMe disk mounted to /, is there a measurable performance advantage to using a ramdisk for /tmp?
    You assume too much with my system

    But, yeah, if you compile a lot of source it's a damn great speed up for IO on spinners and reduces drive wear regardless of the underlying disk. You can also move games over to them and be able to load up assets just a hair faster...I've been known to set mine as large as 40gb to play modded Skyrim.

    While it has never happened to me, I'd rather my ramdisk get full of a buggy program's super log spam over my hard drive.

    Leave a comment:


  • atomsymbol
    replied
    Originally posted by skeevy420 View Post
    That's sort of how I factored ram for my current system. 8 cores * 2 for HT = 16 * 2 = 32GB So i figured 32gb was a decent starting point and ended up getting 48Gb because it was $10 more. Why not? What I didn't account for was systemd using half of that for /tmp by default so it really comes out to 48gb / 2sysd = 24Gb / 16HT = 1.5GB per thread. Just means I need to pick up another 24GB of ram to get that 2gb per cpu (with a 36gb ramdisk as a bonus). I do all my compiles on my current 24gb ramdisk except for Firefox with PGO...24Gb ain't enough for that (seriously) so I could actually make use of 72GB of ram.
    I am not sure I understand the advantages of ramdisks. Assuming the machine already has a SSD/NVMe disk mounted to /, is there a measurable performance advantage to using a ramdisk for /tmp?

    Leave a comment:

Working...
X