Announcement

Collapse
No announcement yet.

Microsoft Has More SMB3/CIFS Enhancements For Linux 5.16, Including For Performance

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #31
    Originally posted by DrYak View Post
    Yup. I am wondering how much the userland-accessible zero-copy previous work (such as the DMA-BUF used in GPUs) could eventually be leveraged for zero-copy userland filesystem daemons.
    It's already been done, many many years ago. That's pretty much the whole *point* of sendfile, so that you can keep the daemon in userspace but still avoid constantly round-tripping data into and out of the kernel. When the network IF has TCP offload and is already DMAing packets into a buffer, you can get that buffer onto disk without ever pulling it into userspace by having the drive DMA it out. (And vice versa, obviously).

    Comment


    • #32
      Originally posted by arQon View Post

      It's already been done, many many years ago. That's pretty much the whole *point* of sendfile, so that you can keep the daemon in userspace but still avoid constantly round-tripping data into and out of the kernel. When the network IF has TCP offload and is already DMAing packets into a buffer, you can get that buffer onto disk without ever pulling it into userspace by having the drive DMA it out. (And vice versa, obviously).
      However sendfile still requires kernel and CPU to issue the ordere.

      In order to fully offload it onto the network cards, some other low-level technologies need to be used.

      Not to mention that fs like Btrfs have compression builtin, and web servers usually do not just send files.

      Comment


      • #33
        Originally posted by arQon View Post
        It's already been done, many many years ago. That's pretty much the whole *point* of sendfile, so that you can keep the daemon in userspace but still avoid constantly round-tripping data into and out of the kernel.
        Wait, what... (reads quickly about sendfile, and finds up about receivefile, too) OMG, I've completely missed this past development!

        (And apparently lighttpd and nginx are capable of the same achievement too)
        Last edited by DrYak; 15 November 2021, 05:29 AM.

        Comment


        • #34
          Originally posted by DrYak View Post

          Wait, what... (reads quickly about sendfile, and finds up about receivefile, too) OMG, I've completely missed this past development!

          (And apparently lighttpd and nginx are capable of the same achievement too)
          I don’t think linux has a recvfile syscall.

          Comment


          • #35
            Originally posted by NobodyXu View Post

            I don’t think linux has a recvfile syscall.
            Can't find a manpage. Samba (and UDT, a userspace UDP-based protocol for file transfer that seems pretty abandoned but at the same time very cool) has one, but at a higher level. Which makes sense, there's no way intrinsic to the network protocol to know what it's receiving from the remote is a file. Besides, even if there was, because sendfile takes two file descriptors it should be usable as a "recvfile" as well just reverting the arguments (in terms of API I mean). It isn't tho, the input file descriptor needs to support mmap, which sockets obviously can't.

            Comment


            • #36
              Originally posted by sinepgib View Post

              Which kind of embedded are we talking about? Raspberry Pi level or 68k level? Does it make sense to have an SMB server for the latter?
              If the embedded device has decent usb/sata ports, why not run a fileserver on a device that's already on 24/7?

              The main reason we (OpenWRT) adopted ksmbd so early, was that starting with samba-4.x it was impossible to backport all the samba 2/3.x compile tricks/hacks to get a small static binary. The main culprit was the switch from a basic makefile to WAF by the samba-team, while also not providing the means to build a minimal fileserver anymore, without all the extra features. Thats why samba4 ends-up being a 15-40MB uncompressed package and ksmbd is around 900kb in total, with a static linked glib.

              So samba4 would not fit on a lot of devices with 8/16 MB nvram, depending on what else was installed. So we where looking for a smaller alternative and found ksmbd, so to-be honest if the samba team had provided the means to build a simple small static fileserver, we would never had looked for something else or tried to adopt ksmbd so early on.

              In the end it worked out ok and we now offer both samba4 and ksmbd in openWRT.
              Last edited by andy22; 15 November 2021, 11:01 AM.

              Comment


              • #37
                Originally posted by andy22 View Post
                If the embedded device has decent usb/sata ports, why not run a fileserver on a device that's already on 24/7?
                It makes sense. I simply did not think of routers to be honest.

                Originally posted by andy22 View Post
                The main reason we (OpenWRT) adopted ksmbd so early, was that starting with samba-4.x it was impossible to backport all the samba 2/3.x compile tricks/hacks to get a small static binary. The main culprit was the switch from a basic makefile to WAF by the samba-team, while also not providing the means to build a minimal fileserver anymore, without all the extra features.
                So it was more about the userspace implementation breaking the ability to make small binaries rather than actually needing to reside in the kernel for performance reasons, right? If that were fixed, would the io_uring implementation be viable for this?

                Originally posted by andy22 View Post
                Thats why samba4 ends-up being a 15-40MB uncompressed package and ksmbd is around 900kb in total, with a static linked glib.
                When you mention glib here is for the rest of the system, right? I think the kernel doesn't depend on any external libc.


                Anyway, thanks for your work on OpenWRT! I can't use it because I only use my ISP's router, by contract I can't mod it, but my brother does and loves it.

                Comment


                • #38
                  Originally posted by sinepgib View Post
                  So it was more about the userspace implementation breaking the ability to make small binaries rather than actually needing to reside in the kernel for performance reasons, right? If that were fixed, would the io_uring implementation be viable for this?
                  We did add io_uring for the 20.x/master builds and we also build/run samba4 with the uring vfs module. Yet you have to keep in mind that io_uring does nearly nothing for a simple single file read/write request, which is what most OpenWRT users use SMB for. The main use-case is as simple media server or simple LAN/Cloud storage, with a single or a few users. So there is not much that io_uring can do for such simple use-cases.

                  The main issues are usually some weird smb.conf settings or bad USB/Sata hardware ports and lastly users try to use NTFS instead of a native linux filesystem, which hurt samba performance the most.

                  This means in our tests there was nearly no performance gain by enabling io_uring, which is to-be expected since the 10x or 400% more performance numbers are with extreme numbers of threads and deeply queued read/write requests. This is a purely professional file server use-case, with many users working simultaneously on the same smb server or some big database workload.

                  When you mention glib here is for the rest of the system, right? I think the kernel doesn't depend on any external libc.
                  The ksmbd userspace part needs glibc, which is quite big compared to our default musl lib. So we took the time to specifically build ksmbd against a static glibc, so we don't have to ship the glibc lib, which would otherwise add ~1MB just for ksmbd.

                  PS: Just as a baseline most arm based routers are capable of maximum sustained LAN read/writes with either samba4 or ksmbd, without maxing any of the CPU cores. Example Linksys WRT-1200AC can do 115 MB/s read/writes without any issues at around 50% core usage. So which smb server to pick comes down to nvram space and if you need some of the other samba vfs modules for like shadowcopy or timemachine.
                  Last edited by andy22; 15 November 2021, 01:36 PM.

                  Comment


                  • #39
                    Originally posted by andy22 View Post
                    Yet you have to keep in mind that io_uring does nearly nothing for a simple single file read/write request, which is what most OpenWRT users use SMB for.
                    You mean in that use case it wouldn't be 10x faster? Something it does is provide kernel-mapped buffers, for example, which would be used also in a single file request, right?

                    Comment


                    • #40
                      Originally posted by indepe View Post

                      You mean in that use case it wouldn't be 10x faster? Something it does is provide kernel-mapped buffers, for example, which would be used also in a single file request, right?
                      Sure, but the overhead for low queue depth or single file read/writes is not the issue even on low end devices.

                      Comment

                      Working...
                      X