Announcement

Collapse
No announcement yet.

Windows 11 vs. Linux Performance For Intel Core i9 12900K In Mid-2022

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Linuxxx
    replied
    Originally posted by arQon View Post

    Now you're going to make me look like the bad guy for having to point out the rather obvious problem with that, which I really hate to do. :/

    You have two workloads. One is latency sensitive, and one is throughput sensitive. Are you going to reboot each time you need to run one of them? Are you going to manipulate it via debugfs as an unprivileged user? What if you need their runs to be at least partially concurrent?

    I'm sorry, but there's just isn't any "Option A is always the best choice" answer here, even *before* you get into heterogeneous core performance.
    It's great that you've found what works best for you, but if things were really that simple then there's been an awful lot of effort wasted on this topic by a lot of very clever people over a very long time. :P

    As far as "no other OS" goes, well, the answer to that is also something that you're probably not going to enjoy hearing, so let's leave that for another day.
    Simple, on an interactive desktop/workstation preempt=full should always be the default, unless you actually enjoy your computer jerking around instead of you doing the same.

    Also, you've got me curious there:
    Which other OS allows changing the kernel-level preemption model without recompiling?

    Leave a comment:


  • NobodyXu
    replied
    Originally posted by Anux View Post
    It says so in the man page:
    Code:
    MAP_NONBLOCK (since Linux 2.5.46)
    This flag is meaningful only in conjunction with
    MAP_POPULATE. Don't perform read-ahead ...
    On most workloads you are probably good with the system default read-ahead. On de/compression you read a file and probably profit from read-ahead since that file is not scattered all over your storage.
    Thanks.

    It seems that using `MAP_POPULATE` is actually a good idea for de/compression since it tries to populate the entire file if possible, which is definitely more efficient than faulting.

    Leave a comment:


  • Anux
    replied
    Originally posted by NobodyXu View Post
    I wonder whether mmap affects read-ahead, and does any compression/decompression library set that?
    It says so in the man page:
    Code:
    MAP_NONBLOCK (since Linux 2.5.46)
    This flag is meaningful only in conjunction with
    MAP_POPULATE.  Don't perform read-ahead ...
    On most workloads you are probably good with the system default read-ahead. On de/compression you read a file and probably profit from read-ahead since that file is not scattered all over your storage.

    Leave a comment:


  • arQon
    replied
    Originally posted by Linuxxx View Post
    The throughput vs. latency problem is solved on Linux in the most convenient way imaginable, namely by providing a boot-time configurable PREEMPT_DYNAMIC kernel parameter option
    Now you're going to make me look like the bad guy for having to point out the rather obvious problem with that, which I really hate to do. :/

    You have two workloads. One is latency sensitive, and one is throughput sensitive. Are you going to reboot each time you need to run one of them? Are you going to manipulate it via debugfs as an unprivileged user? What if you need their runs to be at least partially concurrent?

    I'm sorry, but there's just isn't any "Option A is always the best choice" answer here, even *before* you get into heterogeneous core performance.
    It's great that you've found what works best for you, but if things were really that simple then there's been an awful lot of effort wasted on this topic by a lot of very clever people over a very long time. :P

    As far as "no other OS" goes, well, the answer to that is also something that you're probably not going to enjoy hearing, so let's leave that for another day.

    Leave a comment:


  • NobodyXu
    replied
    Originally posted by Anux View Post
    You seem to confuse read-ahead with caching. Read-ahead is done at block level and always a fixed value around 64 kB to 512kB (changeable with hdparm). You read a 1 byte file but read-ahead triggers it to read the whole 64 kb that are stored after the first byte. There is no way to not use read-ahead but f/madvise are ways to change its size for specific file access patterns. Read-ahead will not fill up your RAM it just reads a few kB more than you requested because you will probably read them anyway if your'e not doing random 4k access.
    Also without m/fadvise disk caching will fill up your free memory and evict any old caches given a big enough read and it has nothing to do with read-ahead.
    Thanks for pointing it out!

    I wonder whether mmap affects read-ahead, and does any compression/decompression library set that?

    Leave a comment:


  • NobodyXu
    replied
    Originally posted by coder View Post
    I'm talking about at the block device level, like c

    Check what yours is. Mine is 512, on a machine with 32 GB of RAM running a 5.3 kernel.
    My ubuntu 22.04 configure that value to be 128.
    That's so small.

    Originally posted by coder View Post
    That's the page cache. I'm not sure whether read_ahead_kb operates there or via buffers. There could be higher-level read-ahead occurring, like at the file-level, which would be reflected there.

    More to the point, simply looking at it won't let you distinguish read-ahead from caching what was previously read.
    You are right, I mixed these two concepts.

    Originally posted by coder View Post
    IMO, huge pages on x86 are too big for normal usage. I think 64 kB would've been a good option.

    That said, for memory tiering, 4 kB could turn out to be a very nice size.
    For compression/decompression which handles a lot of data, I think huge page is reasonable especially if it uses mmap.
    And doesn't Linux also have transparent huge page?
    Though it only works for anonymous memory mappings and tmpfs/shmem and depends on the Linux kernel to exploit it.

    Leave a comment:


  • Anux
    replied
    Originally posted by NobodyXu View Post
    I am pretty sure it is adaptive.
    Last time I do a zstd compression, I can see that my memory is filled with caches from filesystem using `free -hw`.



    While I haven't tried, I am pretty sure that it would trigger read-ahead just like reads, though I am not sure whether it will setup the page mapping at background for you after the page is read in.
    You seem to confuse read-ahead with caching. Read-ahead is done at block level and always a fixed value around 64 kB to 512kB (changeable with hdparm). You read a 1 byte file but read-ahead triggers it to read the whole 64 kb that are stored after the first byte. There is no way to not use read-ahead but f/madvise are ways to change its size for specific file access patterns. Read-ahead will not fill up your RAM it just reads a few kB more than you requested because you will probably read them anyway if your'e not doing random 4k access.
    Also without m/fadvise disk caching will fill up your free memory and evict any old caches given a big enough read and it has nothing to do with read-ahead.

    Leave a comment:


  • Linuxxx
    replied
    coder & arQon

    Both answers are interesting enough on a theoretical level, while here's a practical observation:

    The throughput vs. latency problem is solved on Linux in the most convenient way imaginable, namely by providing a boot-time configurable PREEMPT_DYNAMIC kernel parameter option (called so beginning with Linux 5.18 --> check with "uname - a").
    AFAIK, no other OS provides this kind of fundamental change to behavior in such an easy to change manner.

    When set to "preempt=full", it can provide a smooth web-browsing experience with Firefox even on an Intel Core 2 Duo E8400 [mitigations=auto] with 4GB DDR2-800 running off a fully encrypted HDD downloading pr0n in the background.

    I don't know about you guys, but in my book that makes Linux look pretty efficient & impressive...

    Leave a comment:


  • arQon
    replied
    Originally posted by Linuxxx View Post
    Shouldn't the scheduling part of Linux be among the most optimized parts of the kernel by now given that the Top500 is relying on it exclusively?
    That's a bit optimistic at best, I think. The scheduler is known to perform poorly in at least desktop use; hybrid cores are very new in the x86 space despite now being well established on ARM; and consumer HW is significantly different to multi-socket servers running at half the clock rate with 4x-8x the cores. Schedulers invariably favor some types of workloads over others, and generally anything that benefits the "Top500" comes at the *expense* of normal use cases, not coincidentally improving those too.
    For example, a patch that improves Benchmark X by 20% on a NUMA system while reducing overall performance for an i7 by a "negligible" amount, say 0.8%, will invariably end up being merged because it is, *on average*, "better than" the existing behavior. Over time, those 0.8%s add up.

    Optimal behavior for any given scheduler requires tuning (or biasing, if you prefer) *for* the specific hardware and workloads involved. The "least bad, typically" set of defaults that your kernel comes with will never be ideal for you or anyone else, it will simply be exactly that: "on average, the least-bad option for an imagined common set of workloads on an imagined 'typical' system". Generally that's good enough, but the last complete scheduler rewrite is still recent enough that you should remember it, as well as the objections to it over how it (among other things) still falls short of actually performing "well" on the desktop most of the time. The breadth of Linux adoption makes this a *harder* problem, not an easier one, even without considering the significant additional complications of non-uniform core performance. (Which to a lesser extent is present even in "normal" cores these days, with Ryzen in particular nearly always having a "golden" core).

    Leave a comment:


  • coder
    replied
    Originally posted by Linuxxx View Post
    Shouldn't the scheduling part of Linux be among the most optimized parts of the kernel by now
    There's not a single, optimal way to do thread scheduling (or I/O scheduling) for everyone. To a large extent it boils down to trading off latency vs. throughput. But that's why you have multiple scheduling options that you're free to choose from.

    Originally posted by Linuxxx View Post
    given that the Top500 is relying on it exclusively?
    Cloud and HPC users tend to prefer optimizing for throughput, while embedded/robotics users, Linux gamers, and people using it for audio workstations want to minimize latency.
    Last edited by coder; 10 July 2022, 05:29 PM.

    Leave a comment:

Working...
X