Announcement

Collapse
No announcement yet.

Linus Torvalds Doesn't Recommend Using ZFS On Linux

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Originally posted by k1e0x View Post
    Loopback <- not used (implemented as ZVOL)
    VFS <- not used (implemented by ZPL)
    Loopback is used by snapd and other items. Snapd is not smart enough to use ZVOL. So maybe it important to improve loopback for legacy applications not just say not used.

    VFS is used in Linux file system namespaces(systemd uses these like they are going out of fashion) so the Linux VFS layer is not implemented by ZPL because ZPL does not contain the namespace features. So the Linux kernel VFS layer with the page cache is always there in Linux because ZFS is round peg square hole without being redesigned from the ground up.

    Originally posted by k1e0x View Post
    Linux Block layer <- Not used, redundant.
    Your sata/SAS controllers drivers and so on under Linux force you back to the Linux Block Layer to interface with them. So not redundant as claimed without the Linux block layer under Linux ZFS cannot write to local discs. Now if you are going to networking you still need to keep zero copy if possible to use the same allocation system as the Linux kernel.

    So all your so called corrections were wrong k1e0x

    Originally posted by k1e0x View Post
    The entire stack is different, it doesn't lay on top / reuse like everything else in Linux.
    That is the problem ZFS is breaking the possibility of zero copy operations from the block device(sata/sas... controller) to the page cache.

    Originally posted by k1e0x View Post
    I also know you can change the block sizes, thats age old like you said and that isn't the point. The point is ZFS is variable and does it on the fly automatically. (So 512k write is 512 block, 4k write is 4k. etc, it means it efficiency manages slack.).
    I am not just that its how those placed in ram. So that you end up in zero copy.

    Originally posted by k1e0x View Post
    Also the slab is really really good.. if it wasn't it wouldn't have been imitated or copied by every other OS (including linux).. be very careful redesigning this..
    The horrible point here is slab in the Linux kernel is superseded technology. Yes Linux kernel took in slab but the Linux kernel developers reworked how it functions in a big way. With Linux kernel being made support hugepages slab got redesigned to the new beast that was not named. The new system uses a different solution to work better with multi different page sizes. Large pages work redesign sections of the memory system again.

    So basically the slab technology ZFS is using is between 1 to 2 generations behind what the Linux kernel has. So ZFS slab usage under Linux is pointless duplication with a less effective method. One problem the Linux replacement to SLAB is under GPLv2 that happens to be incompatible with CDDL.

    Comment


    • Originally posted by oiaohm View Post

      Loopback is used by snapd and other items. Snapd is not smart enough to use ZVOL. So maybe it important to improve loopback for legacy applications not just say not used.

      VFS is used in Linux file system namespaces(systemd uses these like they are going out of fashion) so the Linux VFS layer is not implemented by ZPL because ZPL does not contain the namespace features. So the Linux kernel VFS layer with the page cache is always there in Linux because ZFS is round peg square hole without being redesigned from the ground up.



      Your sata/SAS controllers drivers and so on under Linux force you back to the Linux Block Layer to interface with them. So not redundant as claimed without the Linux block layer under Linux ZFS cannot write to local discs. Now if you are going to networking you still need to keep zero copy if possible to use the same allocation system as the Linux kernel.

      So all your so called corrections were wrong k1e0x



      That is the problem ZFS is breaking the possibility of zero copy operations from the block device(sata/sas... controller) to the page cache.



      I am not just that its how those placed in ram. So that you end up in zero copy.



      The horrible point here is slab in the Linux kernel is superseded technology. Yes Linux kernel took in slab but the Linux kernel developers reworked how it functions in a big way. With Linux kernel being made support hugepages slab got redesigned to the new beast that was not named. The new system uses a different solution to work better with multi different page sizes. Large pages work redesign sections of the memory system again.

      So basically the slab technology ZFS is using is between 1 to 2 generations behind what the Linux kernel has. So ZFS slab usage under Linux is pointless duplication with a less effective method. One problem the Linux replacement to SLAB is under GPLv2 that happens to be incompatible with CDDL.
      Well, nothing's perfect. lol

      Yeah, there are differences (the slab on Solaris and FreeBSD's version of it has also seen improvement and changes). FreeBSD is much less radical about the changes they make. The block layer uses a driver??? no way! Every OS does, the difference is HOW it uses that. They would NEVER rip out ifconfig. You'd never see it happen.. they would fix it's limitation (and have, it does wifi on FreeBSD, like it *should*.) It's developed by people who *like* unix and are not in a power struggle with the other distro or the whims of whoever control X project. It's a team (core team) and they make sound engineering decisions for what all of their future is going to be.. A lot of them seamed to be more seasoned programmers too. Kirk McKusick still maintains and improves UFS that he created on it as a student at Berkley in the 70's. (guess what, the grandfather of filesystems likes ZFS too)

      On ZFS it would be really nice to see Linux do the same thing FreeBSD did and not need some of these pieces imported from the Illumos branch. I don't see any reason why snap can't use a ZVOL and being all Ubuntu technology you can just import it. You can DD anything to a zvol and use it like a file. ZVOL's are really nice btw.. have you played with them? You can do a lot of interesting things with them. That layer really needs to get adopted on other systems.

      I think in the end fundamentally Linux wants to be something VERY different from what sysadmins want. I haven't really seen anything that great come out of Linux in about 10 years.. They are catering to a home user market that just doesn't exist and they no longer seem to give a shit about being Like-Unix anymore so.. yeah.. not much of a loss.. I personally think Linux gaming is the dumbest thing in the world, it's fine if it works but Linux is a server OS. : shrug : Isn't it? And if it's not.. I don't care about it. What a user runs on his workstation doesn't interest me.. could be android for all I care.. it's just a browser anyhow.

      I think I've kind of won you over btw on ZFS. Yeah.. it's a good thing in the world. Hopefully the dream of making filesystems as easy as ram will be a reality. "Storage your computer manages for you" That is the idea.. They aren't there yet, but they know that and are working towards it.

      How does ZFS do in the enterprise? People don't really tell us who is *really* using it, business tend to be very private about their critical core infrastructure.. but anonymously who uses it..? Well if you look at Oracle's stats.

      ZFS was able to top SAP storage benchmarks for 2017.
      2 / 3 Top SaaS companies use it.
      7 / 10 Top financial companies use it.
      9 / 10 Top telecommunication companies use it.
      5 / 5 Top semiconductor companies use it.
      3 / 5 Top gas and oil companies use it.
      2 / 3 Top media companies use it.
      Has good data center foot print too 9pb per 1 rack.

      But who would care about that market.. small peanuts. Disney, Intel, AT&T. JP Morgan, or ARCO meh.. not important. No need to put this in Linux. Linus is probably right. We should make gaming better on Linux. lol
      Last edited by k1e0x; 02-03-2020, 04:34 AM.

      Comment


      • Originally posted by k1e0x View Post

        1. I have never seen Oracle ZFS storage appliance used in the enterprise personally, I know they exist I've just never seen them. Only Sun Microsystem's before Solaris version 10. (Pool version 28~) Generally it's used on FreeBSD storage clusters that are secondary storage to NetApp, EMC or DDN. (around 32-64 spindles) You can get paid commercial ZFS (and ZoL) support from both FreeBSD 3rd parties and Canonical. Ubuntu 19.10 has ZFS on root in the installer and so will 20.04-LTS
        A. I was talking about Oracle Solaris and Oracle Unbreakable Linux.
        B. I'm not sure whether Canonical Enterprise offering includes full support for ZFS. Last I heard, it was marked as "experimental" (read: unsupported).

        2. All I got to say about ext4 is hope you like fsck.
        I've got a couple of PB worth of storage, be that glusterfs clusters, ovirt clusters or our own proprietary application, that seems to suggest otherwise.
        But feel free to think otherwise.

        3. You don't know what you're talking about. COW alone has nothing to do with bit-rot or uncorrectable errors. You're thinking of block checksums, and yes, they are good. COW provides other features such as snapshots, cloning and boot environments. Boot environments are pretty cool.. maybe Linux should get on that... oh wait.. ZFS is the only Linux file system that does it and we can't have *that*.
        A. You assume I don't understand the difference between checksums and COW. No idea why.
        B. .... What makes you think I need it (not the former, the latter)?

        I believe you're completely missing my point.
        Using ZOL in an enterprise environment without enterprise grade support is *foolish*. If it breaks, you get to keep all the pieces.

        - Gilboa
        Last edited by gilboa; 02-03-2020, 11:39 AM.
        DEV: Intel S2600C0, 2xE5-2658V2, 32GB, 6x2TB, GTX1080, F32, Dell UP3216Q 4K.
        SRV: Intel S2400GP2, 2xE5-2448L, 96GB, 6x2TB, GTX550, F32, Dell U2711.
        WIN: Gigabyte B85M-HD3, E3-1245V3, 32GB, 5x1TB, GTX980, Win10Pro.
        BAK: Asus H110M-K, i5-6500, 16GB, 3x1TB + 128GB-SSD, F32.
        LAP: ASUS Strix GL502V, i7-6700HQ, 32GB, 1TB+256GB, 1070M, F31.

        Comment


        • Originally posted by gilboa View Post
          Using ZOL in an enterprise environment without enterprise grade support is *foolish*. If it breaks, you get to keep all the pieces.
          But that's true of anything in an enterprise environment.

          Comment


          • Originally posted by skeevy420 View Post

            But that's true of anything in an enterprise environment.
            Not really. I've seen huge enterprises use CentOS + XFS / ext4 without enterprise grade support.
            It's extremely stable and the free support can be sufficient for a skilled customer.
            ... However, try getting *any* type of support for ZOL running on CentOS (or RHEL), you'll have a blast.

            - Gilboa
            DEV: Intel S2600C0, 2xE5-2658V2, 32GB, 6x2TB, GTX1080, F32, Dell UP3216Q 4K.
            SRV: Intel S2400GP2, 2xE5-2448L, 96GB, 6x2TB, GTX550, F32, Dell U2711.
            WIN: Gigabyte B85M-HD3, E3-1245V3, 32GB, 5x1TB, GTX980, Win10Pro.
            BAK: Asus H110M-K, i5-6500, 16GB, 3x1TB + 128GB-SSD, F32.
            LAP: ASUS Strix GL502V, i7-6700HQ, 32GB, 1TB+256GB, 1070M, F31.

            Comment


            • Originally posted by k1e0x View Post
              Yeah, there are differences (the slab on Solaris and FreeBSD's version of it has also seen improvement and changes). FreeBSD is much less radical about the changes they make. The block layer uses a driver??? no way! Every OS does, the difference is HOW it uses that.
              The Linux Block Layer from the drivers has changed as well. So ZoL no longer provides requests correctly to the block layer of Linux for performance. Yes how the block layer is used is important. Its possible to be using the Block layer wrong and have performance problems.

              They would NEVER rip out ifconfig. You'd never see it happen.. they would fix it's limitation (and have, it does wifi on FreeBSD, like it *should*.)

              Originally posted by k1e0x View Post
              I don't see any reason why snap can't use a ZVOL and being all Ubuntu technology you can just import it.
              Turns out that ZVOL is slower than using the Linux loopback on top of ZFS. So no point in importing it. Loopback usage of the Linux kernel pagecache bring some advantages to performance and is optimised for the huge page usage and will be optimised for the large page usage in future. Does not matter if you can do things if those things don't in fact perform.

              Originally posted by k1e0x View Post
              I think I've kind of won you over btw on ZFS. Yeah.. it's a good thing in the world. Hopefully the dream of making filesystems as easy as ram will be a reality. "Storage your computer manages for you" That is the idea.. They aren't there yet, but they know that and are working towards it.
              The memory operation requirements is why I see ZFS/ZoL as doomed long term. There is already sign of this..

              Originally posted by k1e0x View Post
              How does ZFS do in the enterprise? People don't really tell us who is *really* using it, business tend to be very private about their critical core infrastructure.. but anonymously who uses it..? Well if you look at Oracle's stats.

              ZFS was able to top SAP storage benchmarks for 2017.
              Lets throw out some out of date information to attempt to win point is all this line is.
              https://www.intel.com/content/dam/ww...tion-guide.pdf

              The current recommend install information in 2020 for SAP to get best performance is XFS dax and HANA. Both are using the Linux kernel block device drivers means to place like a 2/4Meg huge page in memory of Linux in a single operation from storage media. No middle crap like the arc cache. Using application intelligent check-summing of data that can in fact reduce the amount of checksum processing you need to do so you have the same level of data protection.

              So ZFS is currently classed for SAP as under-performing junk. Its not like SAP developers did not look at ZFS and take some ideas. Basically SAP developers looked at ZFS took some ideas and worked out they could do it faster and better. Lot of this is caused by ZoL deciding to use their own memory manager and not being altered to Linux VFS layer so request with X size page required that is then passed though the file system for the block device to read into that page correctly aligned and read to be used with no latter modification required.

              Originally posted by k1e0x View Post
              2 / 3 Top SaaS companies use it.
              7 / 10 Top financial companies use it.
              9 / 10 Top telecommunication companies use it.
              5 / 5 Top semiconductor companies use it.
              3 / 5 Top gas and oil companies use it.
              2 / 3 Top media companies use it.
              I guess this is all parties using SAP that if are following current day SAP recommendations are no longer using ZFS but are in fact using XFS and HANA.

              Has good data center foot print too 9pb per 1 rack.

              Originally posted by k1e0x View Post
              But who would care about that market.. small peanuts. Disney, Intel, AT&T. JP Morgan, or ARCO meh.. not important. No need to put this in Linux. Linus is probably right. We should make gaming better on Linux. lol

              The XFS developer being focused on performance that requires system wide intergration more than data security has turned out to be important. There are many ways achive data integrity without having checksum in file system. There is very limited ways to achieve performance.

              ZFS market position not stable. The concept that ZFS developers have they can ignore the way the host operating system works is leading to ZFS losing market share to XFS due to poor performance of the ZFS option.

              Comment


              • Originally posted by k1e0x View Post
                Well, nothing's perfect. lol
                That is so true. But time moves on.

                https://youtube.com/watch?v=xWjOh0Ph8uM

                Here is Linus tech tips setting up a new server started with plan he would use ZFS. End up back on XFS and dm.

                Current day systems are running into a new problem. ZFS was designed on the concept that your ram bandwidth is in fact greater than your storage bandwidth so you can be wasteful of ram bandwidth. Now we have a new nightmare the new NVME drive in volume equal more bandwidth from your drives than your complete cpu ram bus.

                The requirement of zero copy from device into memory and back out is not coming a optional required feature its coming a mandatory required feature. So there is no time for the fancy arc cache stunts resulting in 3 copies in ram. There is no time for the file system to have it own allocate system and copy out to the host OS memory system.

                Yes we do really need to rethink where checksum of data need to be as well. End to End maybe the checksum of data from a file server in fact needs to be done by the client with the client having means to inform server that X data appears to have a problem. Same with compression because the storage server can be out of resources purely caused by running the storage.

                So yes the video above is ZFS losing a user because ZFS design cannot perform in the modern day nasty hardware. Yes the modern day hardware is nasty in more ways than one with the fact you can have to poll instead of using interrupts because interrupts are getting lost due to the massive presure.

                Yes it kind of insane that a dual socket 128 cores/256 thread setup can basically be strangled by the current day high speed storage media due to not having enough memory bandwidth as some other parties have found out. When ZFS was design the idea that you could be strangled in memory bandwidth was not even a possibility. Performance optimisation of not duplicate in memory (zero copy operation) is not an optional feature its a require feature in these new setups and will come more common.


                k1e0x you might not like this. But I don't see ZFS lasting out 10 years if it does not major-ally alter it path because it just going to come more and more incompatible with current day hardware. Yes the current day hardware is going to have to cause us to reconsider how we do things as well. Being strangled in the ram bus of the storage server is a really new problem.

                Comment


                • I seen that, ya that is an extreme problem, it's more so with the OS itself tho.. Netflix had the same issue with SSL at 100gb. It will take serious OS engineering to solve this. Not Linux throw it at the wall and see if it sticks engineering.

                  But yes, ZFS only works if the CPU is x times faster than the disk. If it isn't, you need something else. Generally that is true and its' been true for a long time. I don't see magnetic storage going away so I think things are fine. I don't think storage at that level really is practical economically .. but some people clearly have a use case here.. and we'll have to find a solution for them.

                  I'm curious what Wendell did with polling (kqueue tends to way out preform epoll)

                  Jeff Bonwick (ZFS creator) actually was trying to solve that problem.. He was working on 3d flash raid implementations.. DSSD I think was the company and they were sold to Dell. Dell dropped it so... wonder what Bonwick is up to now. The world needs him to solve storage (again). lol
                  Last edited by k1e0x; 02-05-2020, 03:44 AM.

                  Comment


                  • Originally posted by k1e0x View Post
                    I seen that, ya that is an extreme problem, it's more so with the OS itself tho.. Netflix had the same issue with SSL at 100gb. It will take serious OS engineering to solve this. Not Linux throw it at the wall and see if it sticks engineering.
                    Others using Linux did find a solution to the 100gb SSL problem.

                    Originally posted by k1e0x View Post
                    But yes, ZFS only works if the CPU is x times faster than the disk. If it isn't, you need something else.
                    That the problem this is not going to stay a functional option.

                    Originally posted by k1e0x View Post
                    Generally that is true and its' been true for a long time. I don't see magnetic storage going away so I think things are fine.
                    Magnetic storage can cause the same nightmare. Note all those nvme drives in that video are pci-e connected So every two of those can come 1 SAS port that is 12GB/s when multiplexed to drives. So still stomped cleanly into the ground.

                    Originally posted by k1e0x View Post
                    I don't think storage at that level really is practical economically .. but some people clearly have a use case here.. and we'll have to find a solution for them.
                    Storage is not exactly the problem. You get stomped into the ground because we have more PCI-e lanes with possibility to transfer more data than you can safely cpu handle.

                    Originally posted by k1e0x View Post
                    I'm curious what Wendell did with polling (kqueue tends to way out preform epoll)
                    Yes kqueue is faster as long as the interrupts are not getting stopped into the ground by the massive flow of data. Wendell did is using both Polling and kqueue in combination if kqueue has not got interrupt in X time do a poll to see if the interrupt has disappeared or not. Yes this massive data flow problem means if you don't pick up interrupt in time another one from a different item has come in overwriting the information. What Wendell did prevents having to reset drives/controllers that will stall everything to death.

                    Basically your cpu is being stomped into the ground and you have to be dealing with that fact it being stomped into the ground so items like interrupts back by PCIe are no longer dependable.

                    Originally posted by k1e0x View Post
                    Jeff Bonwick (ZFS creator) actually was trying to solve that problem.. He was working on 3d flash raid implementations.. DSSD I think was the company and they were sold to Dell. Dell dropped it so... wonder what Bonwick is up to now. The world needs him to solve storage (again). lol
                    You really missed it this problem is system wide. Same thing can happen if you connect up a lot of accelerators. Basically modern larger server systems have way way too much pci-e bandwidth this is only going to get worse with when systems move from pcie 4 to pcie 5/6.

                    Basically with this problem a storage specialist is basically useless. Fixing this problem a memory management specialist and a pcie evils specialist. Requirement is that those implementing file systems also don't attempt to do their own thing with memory management as duplication in memory makes your lack of memory bandwidth even worse.

                    It bad enough dealing with the massive wave of data the pcie lanes in these modern systems allows without third party file systems like ZFS doing things presuming CPU is X times faster than disc.

                    Remember in epic cpu from AMD instead what LTT attempted with a 24 core then step up to a 32 core you could have a 12 or 16 core chip with the exact same number of pcie 4.0 lanes and half the memory bandwidth again. This is your poor low end storage server based on AMD chips perfectly designed to be mega stomped by the drives.

                    So having CPU x times faster than disk is not the case all the time any more. It is now possible even with the old school hdd to have disks X times faster than CPU.


                    The concept ZFS was designed with AMD has basically thrown upside down now we have to deal with it.

                    Comment


                    • I don't think this has anything to do with filesystems per say. It's a kernel OS limitation.. you just can't move data through the kernel fast enough. All general purpose OS's have this problem.

                      and kqueue is freebsds poll method.. so.. there you go again : shrug :

                      Comment

                      Working...
                      X