Announcement

**numacross** · 10 February 2020, 03:40 AM

Originally posted by tildearrow View Post

Hmmmm...

My 6700K took 8 hours to compile Chromium...

...which means: 64/4=16

(8/16)*4.0/2.9=approx. 0.69 hours on a 3990X (not considering architecture differences)

Unfortunately it's not as simple as that. From the few compilation benchmarks I've seen the scaling between CPUs is not linear and very dependant on what you're actually compiling. For example GCC sees almost no improvement in high core counts while Firefox scales better. Hopefully Michael can supply us with some more compile tests.

**willmore** · 10 February 2020, 07:35 AM

Originally posted by xinorom View Post

Way to make your censorship obvious, tildearrow . If you're going to delete my original post and his quoting of it, you may just as well hide your tracks and delete his whole post too.

Looks like we got a new Reddit-style power mod over here.

Or, crazy idea, try to stay on topic and not drag unrelated political crap into every discussion?

There's a # of posts counter, can we get a "# of posts that were OT and useless" counter? Maybe a ratio? As of this moment, you have 74 posts and I'm willing to bet a good chunk of them are useless crap like you posted earlier.

Back on topic:

I picked up a used dual Xeon 5660 server a while back (6 cores/12 threads each CPU) and I've been having a fun time seeing how well threaded (or not) different pieces of software are. It's been quite an education coming from a desktop/laptop CPU background where you have maybe 4 threads to work with. Most software fits into three categores: Not threaded at all, lightly threaded, and more-cores-please. It's the middle category that I've found to be the most interesting. If I had been asked, I would have guessed that programs like x265 would be heavily threaded, but that's not the case. Even with agressive settings, it uses around 6 threads.

Other than knowing how different programs approach threading, would there be a way to indicate the 'threadieness' of a task in the benchmark results? I know that Michael often comments on the performance and uses that (or headings) to indicate the single threaded/multi-threaded nature of certain benchmarks, but we see people way too often say silly things like "I see this 64 core processor is slowe than a <insert high clocked low core CPU>, LOLz."

**numacross** · 10 February 2020, 09:02 AM

Originally posted by willmore View Post

I picked up a used dual Xeon 5660 server a while back (6 cores/12 threads each CPU) and I've been having a fun time seeing how well threaded (or not) different pieces of software are. It's been quite an education coming from a desktop/laptop CPU background where you have maybe 4 threads to work with. Most software fits into three categores: Not threaded at all, lightly threaded, and more-cores-please. It's the middle category that I've found to be the most interesting. If I had been asked, I would have guessed that programs like x265 would be heavily threaded, but that's not the case. Even with agressive settings, it uses around 6 threads.

Interesting, I launched ffmpeg with -c:v libx265 and it's saturating my 8c16t Ryzen most of the time, but it's not constant 100% utilization. Maybe yours is limited by the NUMA of your system? It might be a performance optimization to reduce the cost of going across CPU packages for RAM.

Originally posted by willmore View Post

Other than knowing how different programs approach threading, would there be a way to indicate the 'threadieness' of a task in the benchmark results? I know that Michael often comments on the performance and uses that (or headings) to indicate the single threaded/multi-threaded nature of certain benchmarks, but we see people way too often say silly things like "I see this 64 core processor is slowe than a <insert high clocked low core CPU>, LOLz."

Other than measuring it directly while benchmarking not really since there's just too much variation.

**willmore** · 10 February 2020, 09:18 AM

Originally posted by numacross View Post

Interesting, I launched ffmpeg with -c:v libx265 and it's saturating my 8c16t Ryzen most of the time, but it's not constant 100% utilization. Maybe yours is limited by the NUMA of your system? It might be a performance optimization to reduce the cost of going across CPU packages for RAM.

Interesting. I'm using it as part of handbrake, so there's a bunch of other options on the command line. I've tried it on the dual xeon system and on a Ryzen 3700X and seen similar thread use. When I looked into it, I found a lot of people having the same problem and no solutions. It's probably worth looking into again.

**pszilard** · 11 February 2020, 02:33 PM

Michael, any reason why there are no GROMACS benchmarks in here -- in particular as the 39x0X benchmarks did include it? Of the molecular dynamics codes it has the best SIMD support (well, AFAIK NAMD and LAMMMPS do not have AVX2/AVX512 kernels written in SIMD intrisics), and in general it is quite well-tuned tuned for CPU instruction sets compared to most other codes. Hence, it can give a good idea of how a well-tuned code that does take advantage of vector instructions, including Intel's 512-bit AVX.

**Michael** · 11 February 2020, 02:35 PM

Originally posted by pszilard View Post

Michael, any reason why there are no GROMACS benchmarks in here -- in particular as the 39x0X benchmarks did include it? Of the molecular dynamics codes it has the best SIMD support (well, AFAIK NAMD and LAMMMPS do not have AVX2/AVX512 kernels written in SIMD intrisics), and in general it is quite well-tuned tuned for CPU instruction sets compared to most other codes. Hence, it can give a good idea of how a well-tuned code that does take advantage of vector instructions, including Intel's 512-bit AVX.

Only had so much time between Wednesday and Friday for the embargo lift. (I re-test all processors each time.) But I think GROMACS is in some new 3990X benchmarks out later today IIRC.

**dud225** · 12 February 2020, 01:05 AM

Hi Michael
You should try the kernel compilation test again in light of the recent optimization Linus has brought !

**nuetzel** · 12 February 2020, 01:20 AM

Originally posted by dud225 View Post

Hi Michael
You should try the kernel compilation test again in light of the recent optimization Linus has brought !

Maybe @Michael's 'make' has the mentioned fix already...
...mine (openSUSE TW) has.

* Mo Jul 16 2018 [email protected]
- pselect-non-blocking.patch: Use a non-blocking read with pselect to avoid
hangs (bsc#1100504)

**xinorom** · 12 February 2020, 03:09 AM

Originally posted by nuetzel View Post

Maybe @Michael's 'make' has the mentioned fix already...
...mine (openSUSE TW) has.

* Mo Jul 16 2018 [email protected]
- pselect-non-blocking.patch: Use a non-blocking read with pselect to avoid
hangs (bsc#1100504)

What part of this patch looks like the Make codebase to you?

**nuetzel** · 12 February 2020, 03:35 AM

Originally posted by xinorom View Post

What part of this patch looks like the Make codebase to you?

You have to read Linus' whole explanation:

Real World Technologies - Forums - Thread: Nuances related to Spinlock implementation and the Linux Scheduler

https://www.realworldtech.com/forum/?threadid=189711&curpostid=189958

Announcement

AMD Ryzen Threadripper 3990X Offers Incredible Linux Performance

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment