Announcement

Collapse
No announcement yet.

The Speculative Execution Impact For A 4-Core POWER9

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • phoronix
    started a topic The Speculative Execution Impact For A 4-Core POWER9

    The Speculative Execution Impact For A 4-Core POWER9

    Phoronix: The Speculative Execution Impact For A 4-Core POWER9

    Last year we looked at the Spectre mitigation cost on POWER9 using the high-end Talos II server while now several kernel releases later and also having the desktop Blackbird system in our lab, here is a look at how the Spectre/Meltdown mitigation impact is for an IBM POWER9 4-core processor running Ubuntu 19.04.

    http://www.phoronix.com/vr.php?view=27977

  • toxicdragon
    replied
    Hi, phoronix Any news on testing the Blackbird against e.g. current Intel and AMD systems? Really waiting for such a test (premium member writing)

    Leave a comment:


  • angrypie
    replied
    Originally posted by Konstantin A. View Post


    AMD is going to release a 64-core/128-thread TR in Q4 2019.

    I guess it will suffice.
    Those are only rumors for now but it won't surprise me if they actually release it. After all they will be selling 12- and 16-core CPUs for a mainstream platform, might as well ramp up core count on TR to twist the knife on Intel a little more.

    Leave a comment:


  • smitty3268
    replied
    Originally posted by milkylainen View Post
    Trust me. If there were a lot of magic gains to be had from adding SMT4 and SMT8, Intel and AMD would have done it.
    I'd much rather get performance up by other means and be done with the unpredictability of SMT once and for all.
    Rumor is that Zen 3 will provide SMT4. Take that with a big grain of salt, obviously. The rumor is that the desktop parts might be limited to SMT3 because you need to carefully program to take advantage of SMT4+ in order to get gains. Server/TR would have SMT4.

    Leave a comment:


  • Zan Lynx
    replied
    I've heard rumors that AMD is planning to use SMT-4 for the next generation of EPYC targeted for late 2020 or 2021.

    If it is a good idea or not is entirely dependent on what the code is like. Tight mathematical computation is bad for SMT. Branch heavy database lookup loves it.

    Leave a comment:


  • Konstantin A.
    replied
    Originally posted by DoMiNeLa10 View Post

    I see the potential of more aggressive SMT in workstation loads, where you want your compile job to finish as quickly as possible. I think it would be nice if AMD released a Threadripper with SMT4, it could be a performance gain for certain uses.

    AMD is going to release a 64-core/128-thread TR in Q4 2019.

    I guess it will suffice.

    Leave a comment:


  • DoMiNeLa10
    replied
    Originally posted by damentz View Post

    SMT is great for servers where bulk processing is run without any interactivity. Going with SMT4/8 would be even better. But SMT interferes with the kernel's ability to schedule processing time fairly.

    The original purpose of SMT was to reduce stalled pipelines on the CPU, keeping it busy as close to 100% as possible. Synthetic benchmarks show SMT enabled processors scoring higher in high multithreaded workloads. But make a search online for the Intel 9700K vs 9900K in gaming benchmarks. It turns out in latency sensitive applications, like video games, SMT reduces 0.1% and 1% lows when there's enough cores to process all active threads in the game engine.

    This translates somewhat to desktop interactivity as well, but most process schedulers (CFS, MuQSS), are somewhat smart enough to not schedule unrelated tasks on the same core through SMT when it can avoid it. And interestingly, MuQSS avoids scheduling work on thread siblings for SCHED_IDLEPRIO tasks entirely. This is the only sure way to guarantee that background tasks cannot interfere with foreground tasks.

    TLDR; SMT was designed to keep processors busy doing work when stalled, boosting throughput. Gaming benchmarks show SMT gives lower FPS in 0.1% and 1% lows (jank/stutter), when there's enough cores to run the game without waiting. Going for more than 2 threads in SMT will only benefit servers and non-interactive applications.
    I see the potential of more aggressive SMT in workstation loads, where you want your compile job to finish as quickly as possible. I think it would be nice if AMD released a Threadripper with SMT4, it could be a performance gain for certain uses.

    Leave a comment:


  • jrch2k8
    replied
    Originally posted by milkylainen View Post

    There is little need for SMT to begin with. Unless your CPU, application, compiler is constructed like a pile of crap.
    The purpose of SMT is to fill up UNUSED execution slots. Unused execution slots are side effects of a lot of different things.
    SMT was added because it is easier than doing the "other things" right.
    Adding more threads create more context switches and contention for CPU resources and OS administrative tasks.
    More synthetic execution threads does not make your CPU execute faster.
    Unless you have an extremely wide CPU and rubbish dynamic execution model you're unlikely to get any gains by tacking a new number to SMT.

    Trust me. If there were a lot of magic gains to be had from adding SMT4 and SMT8, Intel and AMD would have done it.
    I'd much rather get performance up by other means and be done with the unpredictability of SMT once and for all.
    Well, i agree partially with you but i have my reservations about latency.

    If we are talking about Windows i do agree with you, specially since is public knowledge that Microsoft doesn't have one decent engineer that understand how a scheduler is supposed to work, even Windows Servers scheduler are atrocious and go full retard more often than not.

    If we are talking about Linux, i don't agree with you completely because in my experience Linux handle SMP/NUMA and locality allocations pretty damn well by default and even better picking a custom scheduler.

    about gaming is true that SMP/NUMA/SMT will have a bit more latency compared to real cores but lately i removed my Windows partition and migrated all my games to Wine/esync/DXVK i have have noticed that even when the FPS is indeed a bit lower the smoothness is impressive as well as the load times, for example:

    Windows Sata3 SSD NTFS + NTFS 2TB hdd for games and mods:

    1.) Skyrim SE: main game SSD + 250 mods(MO2) on 2 TB disk

    Startup time: Around 2-5m(depends warm vs cold)
    FPS: around 80
    Constant stutter because Windows either randomly start trashing my I/O with random process spawns or goes full retard handling my 4k textures streaming specially when moving on the wild and the scenery changes
    CPU dependent stuff like mods actions under heavy scripts sometime stalls for now good damn reason(CPU usage is low overall) and suddenly my char is trying to 10 things at once a second later
    Loading screens: in resume time to get a coffee, specially outside zones(huge amount of 4k+ textures)

    2.) Linux/Wine: added back the SSD to my existing Linux one(both 256GB) into a ZFS RAID1 and converted the Game drive to ZFS as well

    Startup: between 10s to 1m(depends warm vs cold)
    FPS: around 65-70
    Smooth AF, i couldn't believe my eyes, even when my VRAM usage is around 7GB and i change to a completely different place
    Loading screens: 20-30 seconds tops(i don't fully understand yet why the difference is so huge tho)
    All my scripts run at the time they are supposed too(outside know broken mods and stuff) and don't affect the smoothness of the game play

    Also noticed that Linux never pegged one or two specific cores but most of the time keep the load distributed as much as possible, while Windows always punishes core 0 heavily.

    A friend of mine reproduced something similar on a ThreadRipper 1920x/Vega56 system(also converted from Windows to Linux, since he was pissed off about the game feeling like crap when modded on that beast CPU/GPU)(CPU is on NUMA mode BTW not gaming) and he is using 2x1TB M.2 SSDs on a ZFS raid1(according to him SkyrimSE don't have load screens anymore LOL)

    My system is an Xeon E3 1231v3/16gb/Rx470 8gb(under Linux i undercloked the RAM and undervolted the core to be safe on Windows was overclocked with AMD software)

    I have similar experiences with Witcher 3, DOA6, DMC5 and others.

    Note: i haven't found a way to use ENB yet on DXVK, use Reshade for now before anyone asks

    Leave a comment:


  • milkylainen
    replied
    Originally posted by sophisticles View Post
    I wish AMD and Intel would take notice, why are we still using SMT when SMT4 and SMT8 exist? Imagine if either AMD or Intel released a 4C/16T or 4C/32T processor, everyone would go nuts. SMT can be added to a cpu for relatively cheap from an added circuitry standpoint and tests with 12C/96T cpu vs a 24C/96T cpu show that under some workloads the former is faster than the latter.
    There is little need for SMT to begin with. Unless your CPU, application, compiler is constructed like a pile of crap.
    The purpose of SMT is to fill up UNUSED execution slots. Unused execution slots are side effects of a lot of different things.
    SMT was added because it is easier than doing the "other things" right.
    Adding more threads create more context switches and contention for CPU resources and OS administrative tasks.
    More synthetic execution threads does not make your CPU execute faster.
    Unless you have an extremely wide CPU and rubbish dynamic execution model you're unlikely to get any gains by tacking a new number to SMT.

    Trust me. If there were a lot of magic gains to be had from adding SMT4 and SMT8, Intel and AMD would have done it.
    I'd much rather get performance up by other means and be done with the unpredictability of SMT once and for all.

    Leave a comment:


  • damentz
    replied
    Originally posted by sophisticles View Post
    I wish AMD and Intel would take notice, why are we still using SMT when SMT4 and SMT8 exist? Imagine if either AMD or Intel released a 4C/16T or 4C/32T processor, everyone would go nuts. SMT can be added to a cpu for relatively cheap from an added circuitry standpoint and tests with 12C/96T cpu vs a 24C/96T cpu show that under some workloads the former is faster than the latter.
    SMT is great for servers where bulk processing is run without any interactivity. Going with SMT4/8 would be even better. But SMT interferes with the kernel's ability to schedule processing time fairly.

    The original purpose of SMT was to reduce stalled pipelines on the CPU, keeping it busy as close to 100% as possible. Synthetic benchmarks show SMT enabled processors scoring higher in high multithreaded workloads. But make a search online for the Intel 9700K vs 9900K in gaming benchmarks. It turns out in latency sensitive applications, like video games, SMT reduces 0.1% and 1% lows when there's enough cores to process all active threads in the game engine.

    This translates somewhat to desktop interactivity as well, but most process schedulers (CFS, MuQSS), are somewhat smart enough to not schedule unrelated tasks on the same core through SMT when it can avoid it. And interestingly, MuQSS avoids scheduling work on thread siblings for SCHED_IDLEPRIO tasks entirely. This is the only sure way to guarantee that background tasks cannot interfere with foreground tasks.

    TLDR; SMT was designed to keep processors busy doing work when stalled, boosting throughput. Gaming benchmarks show SMT gives lower FPS in 0.1% and 1% lows (jank/stutter), when there's enough cores to run the game without waiting. Going for more than 2 threads in SMT will only benefit servers and non-interactive applications.
    Last edited by damentz; 06-12-2019, 10:12 PM. Reason: Add TLDR

    Leave a comment:

Working...
X