Windows 11 vs. Linux Performance For Intel Core i9 12900K In Mid-2022

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • Linuxxx
    Senior Member
    • Jul 2011
    • 1062

    #91
    Originally posted by arQon View Post

    Now you're going to make me look like the bad guy for having to point out the rather obvious problem with that, which I really hate to do. :/

    You have two workloads. One is latency sensitive, and one is throughput sensitive. Are you going to reboot each time you need to run one of them? Are you going to manipulate it via debugfs as an unprivileged user? What if you need their runs to be at least partially concurrent?

    I'm sorry, but there's just isn't any "Option A is always the best choice" answer here, even *before* you get into heterogeneous core performance.
    It's great that you've found what works best for you, but if things were really that simple then there's been an awful lot of effort wasted on this topic by a lot of very clever people over a very long time. :P

    As far as "no other OS" goes, well, the answer to that is also something that you're probably not going to enjoy hearing, so let's leave that for another day.
    Simple, on an interactive desktop/workstation preempt=full should always be the default, unless you actually enjoy your computer jerking around instead of you doing the same.

    Also, you've got me curious there:
    Which other OS allows changing the kernel-level preemption model without recompiling?

    Comment

    • NobodyXu
      Senior Member
      • Jun 2021
      • 815

      #92
      Originally posted by Linuxxx View Post

      Simple, on an interactive desktop/workstation preempt=full should always be the default, unless you actually enjoy your computer jerking around instead of you doing the same.

      Also, you've got me curious there:
      Which other OS allows changing the kernel-level preemption model without recompiling?
      I think Linux support changing preemption model using cmdline arguments.

      Comment

      • coder
        Senior Member
        • Nov 2014
        • 8841

        #93
        Originally posted by NobodyXu View Post
        It seems that using `MAP_POPULATE` is actually a good idea for de/compression since it tries to populate the entire file if possible, which is definitely more efficient than faulting.
        It'd be interesting to know what this does with files approximately the same size as the machine's RAM or larger. In a pathological case of a program that can't use the data as fast as it's read, you could have the old blocks of the file getting evicted before they could be used, leading to nearly 2x the I/O.

        Comment

        • tildearrow
          Senior Member
          • Nov 2016
          • 7096

          #94
          Originally posted by birdie View Post

          1. Absolute most people nowadays are on SSDs. 2. Microsoft has recently started to require SSDs for laptops sold with Windows 11. 3. For the second time now: people start their computers normally just once a day, barely anyone cares about boot speed.
          1. Solving software problems with hardware has always been Windows mentality.
          Windows 10 from 2017 onward and APFS are two examples of this.

          2. That's a result of the aforementioned point.

          3. Not true in areas where power outages are common and laptops aren't.

          Originally posted by birdie View Post
          BUT YEAH LINUX IS SO MUCH BETTER THAN WINDOWS IN TERMS OF BOOT SPEED EXCEPT NO ONE HERE HAS TESTED IT.
          SO WHAT AM I? I HAVE TESTED EXACTLY THAT BUT YOU MERELY REFUSE TO BELIEVE.

          Originally posted by birdie View Post
          God damn it. I really really really hate when people try hard to prove that something is bad but they conveniently forget to provide the data that the opposite is actually good.
          Keep indirectly attacking us.

          Comment

          • Linuxxx
            Senior Member
            • Jul 2011
            • 1062

            #95
            Originally posted by NobodyXu View Post

            I think Linux support changing preemption model using cmdline arguments.
            I know, but arQon alluded to the existence of yet another mysterious OS that is supposedly superior to Linux in this regard.

            Comment

            • NobodyXu
              Senior Member
              • Jun 2021
              • 815

              #96
              Originally posted by coder View Post
              It'd be interesting to know what this does with files approximately the same size as the machine's RAM or larger. In a pathological case of a program that can't use the data as fast as it's read, you could have the old blocks of the file getting evicted before they could be used, leading to nearly 2x the I/O.
              The man page of MAP_POPULATE says that it will block the mmap syscall and tries the best to load them all until there isn't enough memory.

              So I guess this will at the very least do no harm, and it might reduce page faults and IO.

              Comment

              • arQon
                Senior Member
                • Sep 2019
                • 940

                #97
                Originally posted by Linuxxx View Post
                I know, but arQon alluded to the existence of yet another mysterious OS that is supposedly superior to Linux in this regard.
                No I didn't, because that's not what your original statment was.

                What I was getting at is the priority boost that Windows give focused windows. (ugh, that's an ugly sentence). It even has a GUI for it and is changeable at runtime, which I think is fair to say beats your "most convenient way imaginable" claim of *editing the command line in grub* by several miles, to put it mildly.

                From a user perspective it results in the same "fundamental change to behavior" in that it gives you a responsive browser / etc while still torrenting pr0n in the background - and it did it back in the single-core era. That responsiveness is something that desktop Linux has struggled with since before some of the commenters here were born.

                Like I say, it's great that you've found something that works for you. But it's something that's not even a year old, and requires unreasonable arcana to achieve at all. What I'm trying to get at is that you should balance that enthusiasm with an understanding that there's more than one way to skin a cat, and that there are people in the world who are not only Not You but also want, need, and deserve to have computers that "just work". Not because they're stupid, but because their lives orbit around something other than IT. It's on us to make that happen, not them, and we should be doing a better job of it than we are.

                Comment

                • coder
                  Senior Member
                  • Nov 2014
                  • 8841

                  #98
                  Originally posted by NobodyXu View Post
                  The man page of MAP_POPULATE says that it will block the mmap syscall
                  Okay, then it's definitely not a win for this use case, because you don't get to overlap that I/O with any computation, which we know will happen if mmap'd memory is subject to read-ahead. You'd only do it if you planned to do lots of random access to a file -- enough that the up-front cost of pre-loading it would tend to be much less than all the page faults you'd expect.

                  Comment

                  • NobodyXu
                    Senior Member
                    • Jun 2021
                    • 815

                    #99
                    Originally posted by coder View Post
                    Okay, then it's definitely not a win for this use case, because you don't get to overlap that I/O with any computation, which we know will happen if mmap'd memory is subject to read-ahead. You'd only do it if you planned to do lots of random access to a file -- enough that the up-front cost of pre-loading it would tend to be much less than all the page faults you'd expect.
                    I think this is still better than frequent faulting, which is very expensive and blocks the computation none the less.
                    It might be also better than simply read them into userspace if the file is large enough, since it would require a lot of copying between kernel space and user space.

                    Comment

                    • coder
                      Senior Member
                      • Nov 2014
                      • 8841

                      Originally posted by NobodyXu View Post
                      I think this is still better than frequent faulting, which is very expensive and blocks the computation none the less.
                      That's what I said, right? But, we've established that linear access would benefit from read-ahead, the same as if you simply made read() calls. The main difference with read() calls is that you can do much larger reads than the default read-ahead, thereby likely causing it to come out ahead.

                      Originally posted by NobodyXu View Post
                      It might be also better than simply read them into userspace if the file is large enough, since it would require a lot of copying between kernel space and user space.
                      You're over-estimating the cost of copying from kernel -> userspace. A simple memcpy() tends to run at > 1/3 the channel memory bandwidth of the system, assuming it won't fit in the cache hierarchy. If it does, then you can do even better. All told, the impact of copying from kernel -> userspace is probably a small minority of the time spent in a read() call.

                      Let's say you read 1 MiB of data from a NVMe drive. The syscall overhead is somewhere around 0.6 microseconds[1], the fastest PCIe 4.0 NVMe drives[2] would copy the data in 143 microseconds, the access latency is between 60 - 80 microseconds, and copying the data on a PC with dual-channel DDR4-3200 is 20-39 microseconds (since it fits in cache). Worst case would be more like 59 microseconds. However, the real kicker is that even if you eliminate the kernel -> userspace copy, you're still going to hit most of that 20-39 microseconds, because devices typically copy straight to memory and then whether it's the kernel -> userspace copy or just userspace accessing it directly, you're still going to have to fetch it into the cache hierarchy. Anyway, we're talking about 10% to 19% (29%, at the worst), though much of that you can't avoid even by cutting out the copy. And that's one of the fastest PCIe 4.0 NVMe SSDs. If we're talking about commodity SSDs or even hard drives, then the % of time copying would drop by a couple orders of magnitude.

                      The point is that for a system with a single SSD, kernel <-> userspace copies just aren't a big bottleneck. They're not immeasurable, but definitely not dominant. If we're talking about GPUs, then the story is a little different, but I think graphics APIs are already designed to minimize the amount of such copies. The main time you care about zero-copy is for many-core servers with lots of SSDs and high-bandwidth networking. Memory often is a significant bottleneck, in those machines.

                      Sources:
                      1. A PTS benchmark I'm too lazy to look up.
                      2. https://www.anandtech.com/show/16505...0-ssd-review/3

                      Comment

                      Working...
                      X