Announcement

Collapse
No announcement yet.

A Look At Linux Application Scaling Up To 128 Threads

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • bridgman
    replied
    Originally posted by Michael View Post
    Not sure if Bridgman played a role, but I'm told it was some VP heard I didn't have a dual EPYC server but figured I needed one for testing given all of my Linux benchmarking.
    In this case I passed along a reminder that it would be good to get a dual-Epyc system that Michael could use, but someone else had already noticed and gotten the discussion going.

    Leave a comment:


  • nuetzel
    replied
    Originally posted by Michael View Post

    Not sure if Bridgman played a role, but I'm told it was some VP heard I didn't have a dual EPYC server but figured I needed one for testing given all of my Linux benchmarking.
    GREAT to hear! -- And BIG compliment for _your_ GREAT (Linux) support!

    All I 'know', Bridgman was 'all time' involved (worked on it). 'cause I hammered on him...;-)
    Last edited by nuetzel; 14 October 2018, 12:43 PM.

    Leave a comment:


  • nuetzel
    replied
    @Michael:
    Now, with M.2 and/or Intel Optane 800p/900p 'disks' for _real_ server comparisons with the 'others'...

    Leave a comment:


  • Michael
    replied
    Originally posted by nuetzel View Post

    Thank you AMD, then.
    Thank you bridgman, you didn't forgett it...
    Not sure if Bridgman played a role, but I'm told it was some VP heard I didn't have a dual EPYC server but figured I needed one for testing given all of my Linux benchmarking.

    Leave a comment:


  • nuetzel
    replied
    Originally posted by Michael View Post

    AMD sent it out, not Dell.
    Thank you AMD, then.
    Thank you bridgman, you didn't forgett it...

    Leave a comment:


  • mhartzel
    replied
    Michael Would it make sense to do some 1920 x 1080 h.264 encoding benchmarks ? They seems to be missing most of the time from 7601 benchmarks. Video encoding is one of the most demanding workloads for both home and enterprise (for example a tv broadcast house), so I surely would like to see them always included in every multicore benchmark.

    Leave a comment:


  • ypnos
    replied
    Very interesting benchmark, thank you very much. Interestingly enough, on our Dual Epyc 7551 system, we render BMW27 in 54.67 seconds. I wonder what is holding back the Dell server, I would expect a better, not worse, result from an EPYC 7601 system. We also only have eight memory channels equipped, as compared to the 16 in the Dell system.

    The benchmark result in question: https://opendata.blender.org/benchma...b-71687bbdf83e

    Leave a comment:


  • duby229
    replied
    Originally posted by varikonniemi View Post
    what magic does stockfish and vgr do when 32->64 threads more than doubles performance?
    It probably has a lot to do with NUMA.

    Leave a comment:


  • willmore
    replied
    Originally posted by varikonniemi View Post
    what magic does stockfish and vgr do when 32->64 threads more than doubles performance?
    We would need to know how those 32 and 64 threads are allocated on cores. Let's look at the two different situations:
    1) 32 threads all on one package, 64 threads spread across two packages, and no SMT

    In the 32 thread case, a NUMA aware allocator would keep all memory allocations local to the CCX (if possible). This would limit memory bandwidth to that of one package. When we go to 64 threads, we get twice the memory bandwidth and twice the cache. This can lead to superlinear scaling when working set sizes now fit in the increased cache space where it didn't before. The increase in memory bandwidth will only allow linear scaling, so it's unlikely to be a factor. If the memory allocator isn't NUMA aware or of the program in question doesn't have strict memory locality, then communication overheads can cause either sub or supra linear scaling, it's hard to analyze that without detailed knowledge of the link bandwidth, link latencies, and how bursty these accesses are.

    2) 32 threads and 64 threads are both spread across both packages

    Much like the previous situation, the NUMA awareness of the allocator plays a big factor. As does the locallity of the data accesses of the program in question. Thermal considerations may help in this case as 32 cores may clock higher when spread across two packages than being confined to one. This would lead to sublinear scaling as the per core thermal limit would halve when going from 32 to 64.

    Summary: Way too many variables to really be able to tell and further analysis would require a detailed knowledge of the programs in question and a better knowledge of the low level architecture of this processor.

    Leave a comment:


  • torsionbar28
    replied
    Originally posted by schmidtbag View Post
    Entirely correct. Servers like this are really meant for multitasking, not highly parallel workloads. It is because of this why I don't see myself buying myself a CPU beyond 16 threads for a very long time. I don't have any workloads that are especially demanding of an entire CPU and if I feel the need to multitask, I'll just use more than 1 computer at the same time.
    x2, probably 16 threads is the sweet spot for a single user workstation. Personally, the most demanding tasks I do are transcoding video with Handbrake, which will happily eat as many cores as you can give it. And running desktop VM's in VirtualBox, where it's nice to load up the VM(s), while still having enough free cores to keep the rest of the machine snappy and responsive. And the occasional code compile.

    FWIW my aging Ivy Bridge Xeon 2680 v2 workstation is still quite capable in this regard with its 10c/20t and 32 GB of ECC DDR3-1866. But I will likely replace it with Epyc next year, after the 7 nm Zen2 parts hit the market. Epyc is clearly superior to Xeon technically, and it's priced better. Win-win. The process improvement going from 14 nm to 7 nm is massive, I'm anticipating some truly amazing numbers from Zen2.
    Last edited by torsionbar28; 11 October 2018, 12:35 PM.

    Leave a comment:

Working...
X