Announcement

Collapse
No announcement yet.

Windows 11 vs. Linux Performance For Intel Core i9 12900K In Mid-2022

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • NobodyXu
    replied
    coder Thanks for the detailed explanation.

    Leave a comment:


  • coder
    replied
    Originally posted by NobodyXu View Post
    I think this is still better than frequent faulting, which is very expensive and blocks the computation none the less.
    That's what I said, right? But, we've established that linear access would benefit from read-ahead, the same as if you simply made read() calls. The main difference with read() calls is that you can do much larger reads than the default read-ahead, thereby likely causing it to come out ahead.

    Originally posted by NobodyXu View Post
    It might be also better than simply read them into userspace if the file is large enough, since it would require a lot of copying between kernel space and user space.
    You're over-estimating the cost of copying from kernel -> userspace. A simple memcpy() tends to run at > 1/3 the channel memory bandwidth of the system, assuming it won't fit in the cache hierarchy. If it does, then you can do even better. All told, the impact of copying from kernel -> userspace is probably a small minority of the time spent in a read() call.

    Let's say you read 1 MiB of data from a NVMe drive. The syscall overhead is somewhere around 0.6 microseconds[1], the fastest PCIe 4.0 NVMe drives[2] would copy the data in 143 microseconds, the access latency is between 60 - 80 microseconds, and copying the data on a PC with dual-channel DDR4-3200 is 20-39 microseconds (since it fits in cache). Worst case would be more like 59 microseconds. However, the real kicker is that even if you eliminate the kernel -> userspace copy, you're still going to hit most of that 20-39 microseconds, because devices typically copy straight to memory and then whether it's the kernel -> userspace copy or just userspace accessing it directly, you're still going to have to fetch it into the cache hierarchy. Anyway, we're talking about 10% to 19% (29%, at the worst), though much of that you can't avoid even by cutting out the copy. And that's one of the fastest PCIe 4.0 NVMe SSDs. If we're talking about commodity SSDs or even hard drives, then the % of time copying would drop by a couple orders of magnitude.

    The point is that for a system with a single SSD, kernel <-> userspace copies just aren't a big bottleneck. They're not immeasurable, but definitely not dominant. If we're talking about GPUs, then the story is a little different, but I think graphics APIs are already designed to minimize the amount of such copies. The main time you care about zero-copy is for many-core servers with lots of SSDs and high-bandwidth networking. Memory often is a significant bottleneck, in those machines.

    Sources:
    1. A PTS benchmark I'm too lazy to look up.
    2. https://www.anandtech.com/show/16505...0-ssd-review/3

    Leave a comment:


  • NobodyXu
    replied
    Originally posted by coder View Post
    Okay, then it's definitely not a win for this use case, because you don't get to overlap that I/O with any computation, which we know will happen if mmap'd memory is subject to read-ahead. You'd only do it if you planned to do lots of random access to a file -- enough that the up-front cost of pre-loading it would tend to be much less than all the page faults you'd expect.
    I think this is still better than frequent faulting, which is very expensive and blocks the computation none the less.
    It might be also better than simply read them into userspace if the file is large enough, since it would require a lot of copying between kernel space and user space.

    Leave a comment:


  • coder
    replied
    Originally posted by NobodyXu View Post
    The man page of MAP_POPULATE says that it will block the mmap syscall
    Okay, then it's definitely not a win for this use case, because you don't get to overlap that I/O with any computation, which we know will happen if mmap'd memory is subject to read-ahead. You'd only do it if you planned to do lots of random access to a file -- enough that the up-front cost of pre-loading it would tend to be much less than all the page faults you'd expect.

    Leave a comment:


  • arQon
    replied
    Originally posted by Linuxxx View Post
    I know, but arQon alluded to the existence of yet another mysterious OS that is supposedly superior to Linux in this regard.
    No I didn't, because that's not what your original statment was.

    What I was getting at is the priority boost that Windows give focused windows. (ugh, that's an ugly sentence). It even has a GUI for it and is changeable at runtime, which I think is fair to say beats your "most convenient way imaginable" claim of *editing the command line in grub* by several miles, to put it mildly.

    From a user perspective it results in the same "fundamental change to behavior" in that it gives you a responsive browser / etc while still torrenting pr0n in the background - and it did it back in the single-core era. That responsiveness is something that desktop Linux has struggled with since before some of the commenters here were born.

    Like I say, it's great that you've found something that works for you. But it's something that's not even a year old, and requires unreasonable arcana to achieve at all. What I'm trying to get at is that you should balance that enthusiasm with an understanding that there's more than one way to skin a cat, and that there are people in the world who are not only Not You but also want, need, and deserve to have computers that "just work". Not because they're stupid, but because their lives orbit around something other than IT. It's on us to make that happen, not them, and we should be doing a better job of it than we are.

    Leave a comment:


  • NobodyXu
    replied
    Originally posted by coder View Post
    It'd be interesting to know what this does with files approximately the same size as the machine's RAM or larger. In a pathological case of a program that can't use the data as fast as it's read, you could have the old blocks of the file getting evicted before they could be used, leading to nearly 2x the I/O.
    The man page of MAP_POPULATE says that it will block the mmap syscall and tries the best to load them all until there isn't enough memory.

    So I guess this will at the very least do no harm, and it might reduce page faults and IO.

    Leave a comment:


  • Linuxxx
    replied
    Originally posted by NobodyXu View Post

    I think Linux support changing preemption model using cmdline arguments.
    I know, but arQon alluded to the existence of yet another mysterious OS that is supposedly superior to Linux in this regard.

    Leave a comment:


  • tildearrow
    replied
    Originally posted by birdie View Post

    1. Absolute most people nowadays are on SSDs. 2. Microsoft has recently started to require SSDs for laptops sold with Windows 11. 3. For the second time now: people start their computers normally just once a day, barely anyone cares about boot speed.
    1. Solving software problems with hardware has always been Windows mentality.
    Windows 10 from 2017 onward and APFS are two examples of this.

    2. That's a result of the aforementioned point.

    3. Not true in areas where power outages are common and laptops aren't.

    Originally posted by birdie View Post
    BUT YEAH LINUX IS SO MUCH BETTER THAN WINDOWS IN TERMS OF BOOT SPEED EXCEPT NO ONE HERE HAS TESTED IT.
    SO WHAT AM I? I HAVE TESTED EXACTLY THAT BUT YOU MERELY REFUSE TO BELIEVE.

    Originally posted by birdie View Post
    God damn it. I really really really hate when people try hard to prove that something is bad but they conveniently forget to provide the data that the opposite is actually good.
    Keep indirectly attacking us.

    Leave a comment:


  • coder
    replied
    Originally posted by NobodyXu View Post
    It seems that using `MAP_POPULATE` is actually a good idea for de/compression since it tries to populate the entire file if possible, which is definitely more efficient than faulting.
    It'd be interesting to know what this does with files approximately the same size as the machine's RAM or larger. In a pathological case of a program that can't use the data as fast as it's read, you could have the old blocks of the file getting evicted before they could be used, leading to nearly 2x the I/O.

    Leave a comment:


  • NobodyXu
    replied
    Originally posted by Linuxxx View Post

    Simple, on an interactive desktop/workstation preempt=full should always be the default, unless you actually enjoy your computer jerking around instead of you doing the same.

    Also, you've got me curious there:
    Which other OS allows changing the kernel-level preemption model without recompiling?
    I think Linux support changing preemption model using cmdline arguments.

    Leave a comment:

Working...
X