Announcement

Collapse
No announcement yet.

GNU Coreutils 9.5 Can Yield 10~20% Throughput Boost For cp, mv & cat Commands

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #11
    Code:
       With the results shown for these systems:
       system #1: 1.7GHz pentium-m with 400MHz DDR2 RAM, arch=i686
       system #2: 2.1GHz i3-2310M with 1333MHz DDR3 RAM, arch=x86_64
       system #3: 3.2GHz i7-970 with 1333MHz DDR3, arch=x86_64
       system #4: 2.20GHz Xeon E5-2660 with 1333MHz DDR3, arch=x86_64
       system #5: 2.30GHz i7-3615QM with 1600MHz DDR3, arch=x86_64
       system #6: 1.30GHz i5-4250U with 1-channel 1600MHz DDR3, arch=x86_64
       system #7: 3.55GHz IBM,8231-E2B with 1066MHz DDR3, POWER7 revision 2.1
       system #8: 2.60GHz i7-5600U with 1600MHz DDR3, arch=x86_64
       system #9: 3.80GHz IBM,02CY649 with 2666MHz DDR4, POWER9 revision 2.3
       system 10: 2.95GHz IBM,9043-MRX, POWER10 revision 2.0
       system 11: 3.23Ghz Apple M1 with 2666MHz DDR4, arch=arm64
    
                    per-system transfer rate (GB/s)
       blksize   #1    #2    #3    #4    #5    #6    #7    #8    #9     10    11
       ------------------------------------------------------------------------
          1024  .73   1.7   2.6   .64   1.0   2.5   1.3    .9   1.2    2.5   2.0
          2048  1.3   3.0   4.4   1.2   2.0   4.4   2.5   1.7   2.3    4.9   3.8
          4096  2.4   5.1   6.5   2.3   3.7   7.4   4.8   3.1   4.6    9.6   6.9
          8192  3.5   7.3   8.5   4.0   6.0  10.4   9.2   5.6   9.1   18.4  12.3
         16384  3.9   9.4  10.1   6.3   8.3  13.3  16.8   8.6  17.3   33.6  19.8
         32768  5.2   9.9  11.1   8.1  10.7  13.2  28.0  11.4  32.2   59.2  27.0
         65536  5.3  11.2  12.0  10.6  12.8  16.1  41.4  14.9  56.9   95.4  34.1
        131072  5.5  11.8  12.3  12.1  14.0  16.7  54.8  17.1  86.5  125.0  38.2
     -> 262144  5.7  11.6  12.5  12.3  14.7  16.4  40.0  18.0 113.0  148.0  41.3 <-
        524288  5.7  11.4  12.5  12.1  14.7  15.5  34.5  18.0 104.0  153.0  43.1
       1048576  5.8  11.4  12.6  12.2  14.9  15.7  36.5  18.2  87.9  114.0  44.8​
    ​

    Comment


    • #12
      OK, so this is like "default" which also happens to be minimum. And there is no automatic detections, user is expected to specify. In practice the change made no difference except helping the noobest of noobs that uses dd without specifying bs

      Comment


      • #13
        Originally posted by sophisticles View Post

        Then how do you explain a security vulnerability that has existed for 34 years?

        But I am the biased, ignorant troll.
        Yes.
        One doesn't even need to be a developer to know that for as long as a piece of code exists, there is a chance to find a bug in it.

        It happens at least as often in proprietary software.
        Most of the time we either don't know about it, or the software is just no longer supported anyway.

        Just an example in Windows, but this is very common:
        Atherton Research's Principal Analyst and Futurist Jeb Su weighs in on the discovery of several critical security flaws that exist in all versions of Microsoft's operating system for the past 20 years since Windows XP and which was made public today by Google's Project Zero elite security team.

        Comment


        • #14
          Originally posted by varikonniemi View Post
          If increasing the minimum gives better performance, why has it not used higher values previously and minimum only when needed?

          Also, does that chmod issue mean that any user has been able to change and execute any file since the beginning of time?
          the code uses st_blksize aka the "Preferred I/O block size" as reported by the filesystem the file is on, but have a lower limit (which was 128k), the main issue is that most FS:s have a very low block size still (usually just 4k).

          Comment


          • #15
            Does cp use sendfile() calls, yet? Last time I checked, I could gain quite a speed advantage over the conventional read() and write() based copy performance when using sendfile() from/to fast nvme storage.

            Comment


            • #16
              Originally posted by sophisticles View Post

              Then how do you explain a security vulnerability that has existed for 34 years?

              But I am the biased, ignorant troll.

              LOL!
              Ah, my dear skeptic, let me elucidate: I bleed Windows blue, dream in PowerPoint slides, and my heart beats in binary.

              Comment


              • #17
                Originally posted by Akiko View Post

                As a kernel dev I can give you a hint:
                1. cache sizes: Trying to stay below the size of the first and second level cache yields better results in the hw prefetchers and usually better performance. (To big buffer may result in cache trashing.)
                2. different platforms: You try to go for values which work well on most of the platforms, especially in embedded devices.
                3. page sizes (goes hand in hand with 1 and 2): there is a direct connection between page sizes, I/O and memory allocation. 4k pages where quiet common for a while, but there are also 16k, 64k, 2m and 1g page sizes. Though, I must admit that it depends highly and on the implementation details which leads to the 4. issue, which is the most important one also depends on 1, 2 and 3.
                4. I/O sizes and schedulers: the older smaller values did work very well with the sectors sizes of harddisks (512b and 4k), but now flash memory is a thing, buffered by various amounts of local DRAM. Though, flash is also done in sectors, 64k were quite common and shifted to 128k sectors with bigger NAND/NOR flashes (usb sticks). And todays NVMe harddrives seem to got with 256k sectors. On Linux you can actually test this with some simple methods like "dd bs=128k/256k ..." and you may see a sweet spot. The Linux schedulers always try to collect several pages before working on them, on weak hardware that can result in I/O stalls. If you want to see it yourself, in linux/mm/page_alloc.c (grep for zone_managed_pages) are some of these critical paths which can lead to some ugly stalls on systems which have a low core count or just a weak cpu, or especially worse, a single cores and very fast storage memory like NVMe.

                If you wonder why such simple tools for copying files can be such a trouble: well they are just some nice wrappers around syscalls, and hitting on syscalls in a high frequency is just another problem topic.
                Nice explanation so the size should be a multiple of 64k to cover 4/16/64 and not larger then 512k to fit in a "modern" cpus l2 cache.

                Comment


                • #18
                  Originally posted by yoshi314 View Post
                  I am not exactly sure why 32bit systems did not suffer from this issue, or at least not to the extent of 64bit systems.
                  That's easy. For 32bit systems vm.dirty_bytes is limited to 200mb and vm.dirty_background_bytes​ is limited to 100mb. In my expirience it's not the best but way better compared to insane amounts of RAM used in 64bit systems by default.

                  Comment


                  • #19
                    Originally posted by thulle View Post
                    Code:
                    -> 262144 5.7 11.6 12.5 12.3 14.7 16.4 40.0 18.0 113.0 148.0 41.3 <-
                    ​
                    ok what is the best blksize ???
                    262144 ???

                    Comment


                    • #20
                      Originally posted by varikonniemi View Post
                      Also, does that chmod issue mean that any user has been able to change and execute any file since the beginning of time?
                      No, not at all. To make use of this bug you need a very narrow scenario:
                      * There is a bunch of files that you already have write access to.
                      * A user with higher privileges (e.g. root) uses chmod -R to change permissions on these files.
                      * This change of permissions would need to result in you gaining access (e.g. o+r).

                      Then, you have to be fast. While chmod is running, you need to replace one of the target files with a symlink. Chmod would then change the permissions on the linked file. But it would still only change the permissions to what was requested by that other user. And other filters still apply. If e.g. they were running chmod -R u+w and you redirect this call to a file owned by root, you still won't have write access - root does. So you'd need to redirect it at least to a file where you apply for group permissions and the chmod would have to change those or the chmod would have to give world permissions.

                      This probably explains, why the bug stayed hidden for so long: it's not exactly a huge deal. While it can lead to elevated privileges, you need a lot of help from the root user in addition to good timing.

                      Comment

                      Working...
                      X