Announcement

Collapse
No announcement yet.

Microsoft Has More SMB3/CIFS Enhancements For Linux 5.16, Including For Performance

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • arQon
    replied
    Originally posted by DrYak View Post
    Yup. I am wondering how much the userland-accessible zero-copy previous work (such as the DMA-BUF used in GPUs) could eventually be leveraged for zero-copy userland filesystem daemons.
    It's already been done, many many years ago. That's pretty much the whole *point* of sendfile, so that you can keep the daemon in userspace but still avoid constantly round-tripping data into and out of the kernel. When the network IF has TCP offload and is already DMAing packets into a buffer, you can get that buffer onto disk without ever pulling it into userspace by having the drive DMA it out. (And vice versa, obviously).

    Leave a comment:


  • indepe
    replied
    Originally posted by DrYak View Post
    Well indeed, it would have been hard to argue against in-kernel SMB given that in-kernel is how NFS has been done. Which by the way enables also leveraging RDMA. So if you have a 10Gbit Ethernet (RDMA capable) and an inkernel NFS, you can do zero copy remote filesystem access over the network.
    NFS was done before IO_URING was available.

    Leave a comment:


  • DrYak
    replied
    Originally posted by kylew77 View Post
    Sounds like a huge security vulnerability waiting to happen to me. Not involving the OS sounds really dangerous to just have a computer on the network take content from memory. What if the remote computer took your AES decryption keys for example?
    Originally posted by arQon View Post
    But once the inevitable DMA extensions come in, we'll have our very own version of a WannaCry-level remote code execution engine. Yay! :/
    Well, there was DMA-based attacks on some older versions of, e.g., Windows and Mac OS X, relying on pluggin stuff into Firewire (which is DMA-capable), but modern OSes (e.g. less old versions of Linux and Windows 10) support hardware such as IOMMU that will put the same concepts as virtual memory but on the hardware bus between IO Devices and memory, rendering such attacks less likely.

    In a modern era with IOMMU-supporting linux kernel and IOMMU enabled hardware, the risk of such security hole happening is dramatically lessened.

    Originally posted by arQon View Post
    I suspect that if weren't in the era of the new "mellower" Linus the idea would have been shot down early on and rightly abandoned, but instead it was allowed to progress to being ready to merge, and by the point he was pretty much stuck with taking it. {...} it's not substantially worse than having NFS there, which is the line of argument used to get it added.
    Well indeed, it would have been hard to argue against in-kernel SMB given that in-kernel is how NFS has been done. Which by the way enables also leveraging RDMA. So if you have a 10Gbit Ethernet (RDMA capable) and an inkernel NFS, you can do zero copy remote filesystem access over the network.

    Originally posted by NobodyXu View Post
    It is just another way to optimize the transfer of data by using zero-copy networking.

    Though maybe io-uring in future will also support zero-copy.
    Yup. I am wondering how much the userland-accessible zero-copy previous work (such as the DMA-BUF used in GPUs) could eventually be leveraged for zero-copy userland filesystem daemons.

    Originally posted by oleid View Post

    OpenWRT was mentioned. They already adopted ksmbd. Typical router hardware has a few 100 MHz of CPU, hopefully more than 32 MiB of RAM and 8 MiB of flash.
    With such anemic hardware (found only on very old routers, which will anyway have trouble handling most modern high-speed connections, one would have to wonder if SMB is really the best option (instead of some simpled FTP or HTTP server. Though with FTP support in browser going the way of the dodo...)

    I mean even the oldest Raspberry Pi could handle Samba without problems at a decent speed, modern ISP-provided router tend to be much better than that hardware-wise (simply because handling some high-speed gigabit FTTH link is challenging enough) and don't get me started about the enthousiast high-range router that one can buy (and which most advertise OpenWRT compatibility out of the box, no rooting required). Though...

    Originally posted by Chrispynut View Post
    Renews my tears that my ISP supplied router (for FTTH that was installed a few months ago) uses SMB1.
    GJ Vodafone, GFJ!
    ...well, point taken. Sorry that you have to live in a country where such abusive monopolies can thrive successfully.

    Leave a comment:


  • oleid
    replied
    Originally posted by sinepgib View Post
    Which kind of embedded are we talking about? Raspberry Pi level or 68k level?
    OpenWRT was mentioned. They already adopted ksmbd. Typical router hardware has a few 100 MHz of CPU, hopefully more than 32 MiB of RAM and 8 MiB of flash.

    Leave a comment:


  • sinepgib
    replied
    Originally posted by oleid View Post

    the memory footprint and processor usage are important to consider. You cannot beat an in kernel server when it comes to processor efficiency. And that's important for the embedded world.
    Which kind of embedded are we talking about? Raspberry Pi level or 68k level? Does it make sense to have an SMB server for the latter? How much would be the memory overhead assuming shared memory? AFAICT it would be only the rings, which could be, say, 1MiB, totally reasonable for the first case. I can't see why there would be a significant CPU overhead considering there wouldn't be many context switches with io_uring.

    Leave a comment:


  • Vistaus
    replied
    in b4 EEE

    Leave a comment:


  • oleid
    replied
    Originally posted by indepe View Post

    Says who? I think this is more or less contradicted by the finding that with IO_URING throughput is increased 10x. (Unless that is somehow a misleading benchmark.)

    There is usually a close relationship between throughput and processor effiency.
    IO_URING is about handling many requests efficiently, mostly in the context of multiple cores. And about reducing the cost of Kernel/Userland context switches.

    The hardware I'm talking about doesn't have multiple cores and probably doesn't even have enough memory to handle that many clients and certainly doesn't have the memory bandwidth of the "10x faster"-benchmark.

    I'd, however, be interested in a CPU and memory usage benchmark of a minimal userspace implementation of smbd+io_uring compared with ksmbd.

    Leave a comment:


  • Lycanthropist
    replied
    Originally posted by arQon View Post
    SMB3 in its own right isn't THAT terrible an idea - it's not substantially worse than having NFS there, which is the line of argument used to get it added. But once the inevitable DMA extensions come in, we'll have our very own version of a WannaCry-level remote code execution engine. Yay! :/
    Not necessarily. NFS has support for RDMA as well and doesn't have WannaCry-level vulnerabilities. I'm sure the same can eventually be achieved for SMB3 as well.

    Leave a comment:


  • indepe
    replied
    Originally posted by mdedetrich View Post
    [...] This is unavoidable, so if there is a bottleneck [...]
    Not unavoidable with IO_URING.

    Leave a comment:


  • mdedetrich
    replied
    Originally posted by lacek View Post

    Why is putting SMB3 file server inside the kernel a good idea? Does it allow something that user-space application would not allow? If there are some bottlenecks that make user appliations slow, perhaps fixing those would allow not only quick SMB3 but other servers as well?

    There is 30-70% speedup on the horison. Isn't it because the current implementation does have some features yet, and completing it will also slow it down?

    Not really a joke: Why not KHTTPD ? KSSHD? KGNOME? or KBLENDER ? maybe a in-kernel Steam client?


    Because whenever you go from kernel to user space (or vice versa) that is context switching as you go from one ring to another. This is unavoidable, so if there is a bottleneck in this context switching (which isn't uncommon for filesystems/network stacks etc etc) then this is your only option.

    Note: I haven't looked into this change specifically but in general this is true. Its why (for similar reasons) fuse based filesystems will always be slower than the equivalent in the kernel.

    Leave a comment:

Working...
X