Announcement

**cb88** · 19 September 2019, 09:51 AM

Originally posted by schmidtbag View Post

Holy crap people give it a rest. It's really not a big deal. We're only talking a few threads here. If you're really that worried, chances are you already have crappy performance to begin with.

The problem is stupid code like this gets written every 3mo on Mesa... they should just write it once and be done instead of reoptimising it every time the typical number of CPUs change. Obviously it sholdn't bother spawning more than whatever get_nprocs() returns.

**ssokolow** · 19 September 2019, 11:05 AM

Originally posted by x_wing View Post

multiprocessing is a Python package that is shipped by default with the interpreter (AFAIK). As we're talking of C code, I think that the only option here is to parse /proc/cpuinfo, but that is probably not portable.

See the comment I was writing at the same time you were writing yours.

glibc provides an API just as simple and easy to use as the Python one:

Code:

#include <stdio.h>
#include <sys/sysinfo.h>

int main(int argc, char *argv[]) {
    printf("This system has %d processors configured and "
            "%d processors available.\n",
            get_nprocs_conf(), get_nprocs());
    return 0;
}

-- https://linux.die.net/man/3/get_nprocs
Sure, it's glibc-specific, but it's just an extra four or five lines to #ifdef your way to falling back to the current behaviour and "do it properly on the UNIX-like with the largest market share after OSX, fall back to current behaviour elsewhere" is better than "use current behaviour everywhere"

Something like this:

Code:

    int threads = 4;
#ifdef __GLIBC__
    if (get_nprocs() < 4)
        threads = 1;
#endif

**M@yeulC** · 19 September 2019, 11:41 AM

But by making its behaviour more dependent of the system config, you risk making it more fragile... You multiply the number of possible states once more, and make possible bugs harder to track and reproduce.

A while back, I'd have agreed that making it dynamic was a good idea. I'm not so sure nowadays. This is at worse a very minor performance hit, on systems that are not that performant to begin with, so a small absolute loss. Yes, one can hypothesize that many such design decisions appear, but that only reinforces my feeling that robustness and predictability needs to be privileged over raw performance numbers, to a certain extent... And there are diminishing returns everywhere, so I'd happily trade 1% perf for 400% stability.

That said, that whole discussion really is nitpicking/bikeshedding, in the literal interpretation: people spending a lot of time on the only topic they can understand in the discussion. If everyone gets that part, everyone wants to share their opinion on it

And this is counterproductive. I'll plead guilty.

**aufkrawall** · 19 September 2019, 12:45 PM

I suppose it doesn't really matter if more threads than available are used, as it's just a tiny copy operation and the rendering is put on halt anyway until the binaries are in VRAM.
When you let x264 7-zip run on 1-2 more cores than being available, it doesn't really impact the performance either.

**PuckPoltergeist** · 19 September 2019, 01:39 PM

Originally posted by ssokolow View Post

Something like this:

Code:

int threads = 4;
#ifdef __GLIBC__
if (get_nprocs() < 4)
threads = 1;
#endif

Build-time check for a runtime option?

**ssokolow** · 19 September 2019, 03:03 PM

Originally posted by PuckPoltergeist View Post

Build-time check for a runtime option?

If built against glibc, call get_nprocs() at runtime.

**smitty3268** · 20 September 2019, 12:04 AM

Originally posted by ssokolow View Post

To be honest, that sounds like an extreme case of premature optimization and being penny-wise and pound-foolish, given that it's apparently as simple as adding #include <sys/sysinfo.h> followed by calling get_nprocs() while, even at lowest priority, spawning more threads still has potential to cause unexpected effects when dealing with a CPU scheduler operating on the system as a whole.

I think you have your understanding of premature optimization reversed here.

Regardless, if it matters so much to you feel free to submit a patch to mesa updating it to check how many cpus there are. That's part of the beauty of open source.

Just be prepared to show some evidence of a case where this actually helps performance, as that will probably be required to justify adding new code and making it more complicated.

**ssokolow** · 20 September 2019, 02:26 AM

Originally posted by smitty3268 View Post

I think you have your understanding of premature optimization reversed here.

Normally, you'd be correct, but I'm thinking of it as a premature optimization in the domain of simplicity of testing and maintenance, not performance.

When such a check is so short and simple, the only other interpretation that readily comes to mind doesn't lend itself to a very favourable impression of how the developer goes about their craft.

Originally posted by smitty3268 View Post

Regardless, if it matters so much to you feel free to submit a patch to mesa updating it to check how many cpus there are. That's part of the beauty of open source.

Just be prepared to show some evidence of a case where this actually helps performance, as that will probably be required to justify adding new code and making it more complicated.

I don't have time to familiarize myself with a new codebase and, even if I did, I can't test it. I haven't bought a new GPU in long enough that all my GPUs are still nVidia ones from the days when AMD wasn't a viable option for my needs.

**schmidtbag** · 20 September 2019, 09:15 AM

Originally posted by cb88 View Post

The problem is stupid code like this gets written every 3mo on Mesa... they should just write it once and be done instead of reoptimising it every time the typical number of CPUs change. Obviously it sholdn't bother spawning more than whatever get_nprocs() returns.

On the surface, I would normally totally agree with you. But, contrary to what a lot of people believe, more cores doesn't always yield better performance. In some cases, you might actually hurt performance. Some tasks are better off having a cap on how many threads you use. Seeing as the disk cache is probably going to get bottlenecked by disk write speeds, I'm sure 4 threads is more than enough, even for high-end SSDs.

**cb88** · 20 September 2019, 03:49 PM

Originally posted by schmidtbag View Post

On the surface, I would normally totally agree with you. But, contrary to what a lot of people believe, more cores doesn't always yield better performance. In some cases, you might actually hurt performance. Some tasks are better off having a cap on how many threads you use. Seeing as the disk cache is probably going to get bottlenecked by disk write speeds, I'm sure 4 threads is more than enough, even for high-end SSDs.

I agree. It has been a theme over the past few years though of incrementally bumping these up instead of finding a more intelligent way of doing it.

Announcement

Mesa's Disk Cache Code Now Better Caters To 4+ Core Systems

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment