Announcement

Collapse
No announcement yet.

GNU Coreutils 9.5 Can Yield 10~20% Throughput Boost For cp, mv & cat Commands

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #21
    Wonder how it will compare to uutils-coreutils

    Comment


    • #22
      Originally posted by sophisticles View Post
      I have been saying for decades that the supposed superiority of open source software is a myth, a scam, promoted by people with ulterior motives and believed by people that lack the technical knowledge to analyze the claims properly.
      Then you have people like me, that know the truth and use it sometimes primarily to avoid having to pay per seat licensing fees associated with properly coded software.
      your life motto: Ignorance is strength​
      Phantom circuit Sequence Reducer Dyslexia

      Comment


      • #23
        Originally posted by Akiko View Post

        As a kernel dev I can give you a hint:
        1. cache sizes: Trying to stay below the size of the first and second level cache yields better results in the hw prefetchers and usually better performance. (To big buffer may result in cache trashing.)
        2. different platforms: You try to go for values which work well on most of the platforms, especially in embedded devices.
        3. page sizes (goes hand in hand with 1 and 2): there is a direct connection between page sizes, I/O and memory allocation. 4k pages where quiet common for a while, but there are also 16k, 64k, 2m and 1g page sizes. Though, I must admit that it depends highly and on the implementation details which leads to the 4. issue, which is the most important one also depends on 1, 2 and 3.
        4. I/O sizes and schedulers: the older smaller values did work very well with the sectors sizes of harddisks (512b and 4k), but now flash memory is a thing, buffered by various amounts of local DRAM. Though, flash is also done in sectors, 64k were quite common and shifted to 128k sectors with bigger NAND/NOR flashes (usb sticks). And todays NVMe harddrives seem to got with 256k sectors. On Linux you can actually test this with some simple methods like "dd bs=128k/256k ..." and you may see a sweet spot. The Linux schedulers always try to collect several pages before working on them, on weak hardware that can result in I/O stalls. If you want to see it yourself, in linux/mm/page_alloc.c (grep for zone_managed_pages) are some of these critical paths which can lead to some ugly stalls on systems which have a low core count or just a weak cpu, or especially worse, a single cores and very fast storage memory like NVMe.

        If you wonder why such simple tools for copying files can be such a trouble: well they are just some nice wrappers around syscalls, and hitting on syscalls in a high frequency is just another problem topic.
        The real question here is not why a certain amount is preferable from another, it is why this is not a per storage device or per system setting instead. Strictly speaking there should be a /sys/block/<device> or /proc/<something> where this could be defined.

        Comment


        • #24
          Originally posted by Phoronos View Post

          ok what is the best blksize ???
          262144 ???
          As shown in the table it depends on your system. They differ in where peak performance is and when it starts to decline instead. Since this is not a runtime-tunable setting they're setting a fairly conservative default, whereas if you have something like system #10 or #11 - or better - you might want to set it to 512k - or more - instead, for maximum performance.

          Comment


          • #25
            Originally posted by thulle View Post
            As shown in the table it depends on your system. They differ in where peak performance is and when it starts to decline instead. Since this is not a runtime-tunable setting they're setting a fairly conservative default, whereas if you have something like system #10 or #11 - or better - you might want to set it to 512k - or more - instead, for maximum performance.
            I meant for most of the systems in 2024

            Comment


            • #26
              Originally posted by sophisticles View Post
              Then you have people like me, that know the truth and use it sometimes primarily to avoid having to pay per seat licensing fees associated with properly coded software.
              properly coded? and you are the one that "knows the truth"!!! evidently you have zero experience in enterprise closed-source software or even worse closed-source device drivers....in the x86 world they are crap, in the arm world they are utter shit

              Comment

              Working...
              X