No announcement yet.

ZFS On Linux Runs Into A Snag With Linux 5.0

  • Filter
  • Time
  • Show
Clear All
new posts

  • #41
    Originally posted by Rallos Zek View Post
    It's not the kernel devs that need to do anything for ZFS to work with Linux that ball is in the ZFS devs court. I am for one happy ZFS is not in Linux for those damn ZFS cultist scare me.
    ZFS cultists -- you mean IT professionals who work with real users / real workloads?


    • #42
      Do any of the people referring to ZOL users/admins as cultists deal with real world IT problems? Have they ever used ZFS? The filesystems I deal with on a daily basis: ext4, XFS, NTFS, ZFS, and md. For some situations, ZFS is the only currently reasonable solution. When you purchase storage servers, you have to decide a priori whether or not to configure with RAID controlers or HBA. If HBA, yes, you can use md, but this is inadequate for enterprise or even work group scale issues where data integrity is absolutely critical. mdadm will happily report that a RAID 5/6 is "healthy" when even a short smartctl test indicates disk errors. Been there, done that, and was barely able to recover the data from the RAID before replacing the (RAID-certified) disks that had developed unreadable sectors. ZFS stays on top of this with checksums. ZFS not working means that my Windows-oriented colleagues advocating for Windows Storage Server solutions win the argument on technical grounds -- without ZFS, linux can't compete.
      Last edited by pgoetz; 11 January 2019, 12:56 PM.


      • #43
        Originally posted by aht0 View Post

        I accept the explanation as plausible. I ask though, what's the probability that ZoL devs simply did not notice depreciation 10 years a go. Does all deprecated stuff gets re-quoted version after version or what? That code is quite a handful in itself, without adding Linux kernel to it.
        Also the feature that breaks ZoL from from when it was introduced was a optional feature to be built into kernel and missing from many architectures.
        This 2013 on the arm platform where they get rid of kernel space floating point emulation so now arm chips without float point cannot use it in kernel space at all. x86 float point emulation was dropped before this.

        2003 __kernel_fpu_begin and __kernel_fpu_end was in kernel space only for some platforms the Linux supported even when new.. File system driver you would hope is cpu neutral or at least has the fall backs so can operate cpu netural.

        To be truthful there is a problem with the Linux kernel that you need to reference the mailing list and source code comments. In this case you need to look up the mailing list.

        2003-2017 no one was being paid to work on Linux kernel documentation so documentation was targeted..

        Linux kernel function does not have a comment describing how to use it be worried and you should do a bit more research on. Function planned changed/removed without notice is also undocumented.

        Also compare __kernel_fpu_begin and __kernel_fpu_end vs kernel_fpu_begin and kernel_fpu_end in that file.

        You will notice something bad. not only is kernel_fpu_begin and kernel_fpu_end is tagged GPL only meaning if you use this you should be expecting to mainline your code. The GPL functions also have protect from CPU switching away mid FPU function. So __kernel_fpu_end vs kernel_fpu_begin are technically unsafe.

        The __ at the start of call is also warning that is not stable export.

        aht0 basically its a up hill battle.
        1) __kernel_fpu_end vs kernel_fpu_begin really were never safe.
        2) __ is a marker for unstable export.
        3) Not all architectures have float point so you driver should have a fall back.

        Only reason you should have been using __kernel_fpu_begin and __kernel_fpu_end is if you were attempting to avoid GPL license of the linux kernel. But the __ at the start says it can be removed at any point.

        ZoL kernel module we are told is GPLv2 like the Linux kernel so why in heck was it not using kernel_fpu_begin and kernel_fpu_end that are still there in the 5.0 Linux kernel.

        Next why is ZoL using floatpoint without fallback for platforms without floating point. Yes x86 cpus without floating point are still made.

        If you want dependable floating point that will work no matter the cpu you have to go to userspace as it userspace that has floating point emulation on Linux. This is usermode helper.

        Yes the fact that kernel_fpu_begin and kernel_fpu_end are exported as GPL symbols they are free to be deleted in future kernel versions as well.

        So __kernel_fpu_begin and __kernel_fpu_end were exported without GPL flag but the __ at the start tells you these are unstable and may disappear without notice and kernel_fpu_begin and kernel_fpu_end are GPL symbols that may disappear without notice.

        There is only one path to have floating point perform for your driver in kernel space that is API stableish.
        All your usermode helper ones and this path will be CPU/ARCH netural.
        Lot of cases you see people making drivers for Linux having problems they are not looking at the exports carefully enough. __ at start of symbol in the Linux kernel API or EXPORT_SYMBOL_GPL are truly here be dragons. If you dare these can disappear, change return values or change arguements when Linux version changes. At some point both can burn you.
        When you wake up the __ at the start of the symbol name was telling you everything you need to know if you understood Linux kernel symbol naming ZoL developers either never learnt the naming pattern or choose to ignore it..

        Windows developer write a driver using some undocumented function windows gets a update and the driver breaks the Windows driver developer is at fault not Microsoft. What ZoL has done is absolutely the Linux equal. Linus perfectly right to say we are not fixing this their code should have been designed to cope with the functions disappearing. Its not like the __ flag is new.


        • #44
          Originally posted by oiaohm View Post

          This is when __kernel_fpu_begin and __kernel_fpu_end features were introduced into the Linux kernel. Do notice that Linus was very clear you should only use fpu in special conditions and not all arch Linux supports will support floating point in driver code. So you driver code should contain a fall back if the functions don't exist.

          So the fact ZFS breaks when __kernel_fpu_begin and __kernel_fpu_end are removed this is their fault because those functions were always meant to be special case and ZFS for Linux developers should have coded around them.

          To make this even worse __kernel_fpu_begin and __kernel_fpu_end have been deprecated for over a decade. The replacement to .__kernel_fpu_begin and __kernel_fpu_end is usermode mode helper. Yes run you float point stuff in userspace so that if the fpu stuff up and damages memory you don't have a kernel panic. Yes this is a feature 2.6.27 Linux kernel released 9 October 2008 and that is also when .__kernel_fpu_begin and __kernel_fpu_end both come deprecated.

          __kernel_fpu_begin and __kernel_fpu_end don't exist in the Linux kernel before 2003 when added were marked as highly questionable. Deprecated 5 years later. 10 years later Linux kernel removes them. Some how aht0 is attempting to make out Linux mainline is at fault.

          The reality here like it or not the ZFS On Linux are at fault. ZFS On Linux started Feb 27, 2008. Maybe if they had started 1 year later they might have avoided this disaster or maybe they still would have used deprecated API and failed to build in fall back code.

          aht0 I really do think at a decade is more than enough notice that an API is going to be removed particularly and API that from the start was marked do not expect to be functional or exist as this case is from the start.

          Yes there are times Linux mainline developers do deserve to be yelled at for removing stuff without clear notice this is not one them. When you had a decade of notice to correct you code and you have not really there is no point complaining when it no longer works. ZFS on Linux developer just need to fix the code as they should have a decade ago.
          And this highlights the difference between Windows and Linux. On Windows, APIs get depreciated, but they still functionally work. And Linux, they just remove the API then blame everyone else when their previously working software stops working.

          And this highlights why I stopped developing for Linux over a decade ago.


          • #45
            Originally posted by pgoetz View Post
            "My tolerance for ZFS is pretty non-existant. Sun explicitly did not want their code to work on Linux, so why would we do extra work to get their code to work properly?"

            What someone who doesn't have to deal with real users and real world workloads might say. For at least a couple of my projects, ZFS is by far the best solution. Disappointed to hear stuff like this uttered by high level kernel developers.
            What someone who does not have to deal with real licenses and real world project management might say.

            You don't bend the rules of your project just because you think something might be useful for someone, you end with a lawless mess or worse with a nepotism-based system where sucking the right sock is the only way to get what you want.


            • #46
              Originally posted by aht0 View Post
              I think it as a case of upstream breaking the internal API's yet again, not caring in the least how it would affect downstream. Happens all the time. Shit breaks because upstream dev thinks it good to do some minor random change and like a "butterfly effect" bunch of stuff gets broken suddenly downstream. Does "Mr or Ms "upstream dev" cares? Not in the least.

              Mr "2.nd in command after Linus" seems to be guided here by his own preconceptions rather than anything else - I checked follow-up mails and reached that conclusion. Biggest problem for him seems to be that ZFS originated from Solaris (NIH). Not that I actually particularly care, more power to FreeBSD.
              Pretty much, this is the problem with Linux (the kernel).


              • #47
                Originally posted by oiaohm View Post
                Yes run you float point stuff in userspace so that if the fpu stuff up and damages memory you don't have a kernel panic.


                • #48
                  Originally posted by gamerk2 View Post
                  And this highlights the difference between Windows and Linux. On Windows, APIs get depreciated, but they still functionally work. And Linux, they just remove the API then blame everyone else when their previously working software stops working.

                  And this highlights why I stopped developing for Linux over a decade ago.
                  Amen. (though it's really mostly pointless maintenance on the code than developing)


                  • #49
                    Originally posted by starshipeleven View Post
                    What someone who does not have to deal with real licenses and real world project management might say.
                    Note that I never suggested the kernel developers should bend the rules or even that it wasn't a good idea to remove the fpu_begin/end hack. I was responding to the idea that ZFS is irrelevant and/or the bailiwick of cultists. Anyone who says this is speaking from the perspective of a home user, not an IT professional dealing with real world problems.


                    • #50
                      Originally posted by ferry View Post
                      People here seem to be forgetting that developing free software is not free. Major new stuff only gets accepted when there is corporate backing (i.e. money or people are allocated to it). Even if ZFS code would be suddenly re-licensed, more is needed to get it upstream.

                      There are many projects, including file systems, with compatible licenses that are having a hard time being accepted upstream.
                      OpenZFS is a fork and it's *not* a corporate filesystem project. They have ZERO involvement with Oracle. It's license is similar to MPL (are you reading this in Firefox, it's MPL too) and one could argue it's more permissive and free than Linux itself. (hence it's inclusion in FreeBSD, Mac OSX, even Microsoft Windows now. The only place it's "not allowed" is Linux.)

                      My problem isn't the API change. I'm sure they will work that out. The issue is the emotional comments by Linux kernel dev's and the users.. whatever Sun's motivation. (and someone show me some proof they actually believed CDDL was not compatible, because it's was Debian that made that determination.) ..Whatever their motivation.. they are long since dead. Time to move on.

                      IMO this is dumb. NIH and zealots a lot. You have two pieces of awesome totally open source technology Linux and ZFS, make them work together and put *actual* commercial storage (cough NetApp) out of business. Technology and solutions to problems should not be based on how you feel about Sun.
                      Last edited by k1e0x; 11 January 2019, 01:38 PM.