Announcement

Collapse
No announcement yet.

Linux 6.6 To Better Protect Against The Illicit Behavior Of NVIDIA's Proprietary Driver

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Originally posted by braiam View Post

    There's no guarantee of stability of EXPORT_GPL_ONLY functions. EXPORT functions however have a stable API and they are documented. if you don't use them, that's your issue, not the kernels.
    so not a nvidia issue then, in fact completely the opposite of a nvidia issue since that only applies to GPL software that is capable of using them in the first place.

    Comment


    • Originally posted by mSparks View Post
      ->data corruption caused by using EXPORT_GPL_ONLY functions that had changed

      That doesn't sound like an nvidia issue, that sounds like a "EXPORT_GPL_ONLY function" issue, why wouldn't this manifestation affect any else using them?
      Not quite. You have to remember the Linux kernel uses semantic patches

      This means for any driver that is mainline the complete function that marked EXPORT_SYMBOL_GPL​ could be completely removed also means the arguments can be completely changed for all mainline Linux kernel drivers without causing any problems.

      So mainline Linux kernel drivers can use EXPORT_SYMBOL_GPL​ without problems because any change to a EXPORT_SYMBOL_GPL​ will be updated in the mainline drivers code by semantic patch.

      Issues come about if any driver that is not a mainline Linux kernel driver uses EXPORT_SYMBOL_GPL​. Remember to Linux kernel developers a function tagged with EXPORT_SYMBOL_GPL is not used outside the Linux kernel source tree. If you wish to use EXPORT_SYMBOL_GPL outside the source tree you are meant to bring this up on the Linux kernel mailing list and get it changed to a EXPORT_SYMBOL if it possible. Yes asking for this change your companies legal department may get informed that this is EXPORT_SYMBOL_GPL due to XYZ patents that you have not paid for.

      The Linux kernel has two different function exports for kernel mode drivers.
      EXPORT_SYMBOL that the Linux kernel developers will keep stable in a LTS branch.
      EXPORT_SYMBOL_GPL that the Linux kernel developers will not keep stable in LTS branch and this is not a problem if parties like Nvidia don't use EXPORT_SYMBOL_GPL functions..

      EXPORT_SYMBOL_GPL is internal function of the Linux kernel. Parties like Nvidia closed source drivers need to keep their hands out this location.

      Nvidia has started working on open source driver GPLv2 for their GPUs but since this is not mainline its also not safe for it to be using EXPORT_SYMBOL_GPL functions.

      Comment


      • Originally posted by oiaohm View Post

        This end up in a Microkernel design. Turns out microkernel designs with drivers in userspace you don't end up with major disasters. The internal EXPORT_SYMBOL_GPL functions are not exposed to userspace Linux.

        This is also a mistake on your part. Linux kernel supports UIO and usermode helpers and so on. To be correct the Linux kernel as part of it feature set includes doing drivers as a closed source user-space program.



        FOSS victory that correct its not with a user space binary driver. Healthy that were things get interesting. Being userspace code it possible to use emulations and other things to allow that code to function. Yes containers and other features as well can be used to run old programs.

        Secure implementation. Binary driver loaded into monolithic kernel memory space or Binary driver loaded into a Microkernel as userspace program there is no debate on what one is more secure and stable all studies say the userspace program solution is more stable.

        There is a catch here the userspace/microkernel driver program driver has overhead so does not perform as well.

        Points
        1) Linux User mode driver closed source is in alignment with Linux kernel GPLv2 with exception for user-space. Result of doing this you don't end up using patents without a valid license and other things that get parties upset.
        2) Linux User mode driver cannot access EXPORT_SYMBOL_GPL functions because those are not exposed to userspace.
        3) Linux User mode driver can have containers and Linux security module limitations applied that a kernel mode driver cannot. So from security a Linux userspace closed source driver is more secure than a Linux kernel mode driver if the correct protections around the linux userspace closed source module.

        Remember each userspace program has it own memory space. Where everything in kernel space of Linux is really sharing memory space with each other.

        So gnattu healthy and secure closed source usermode driver is. Even better closed source usermode driver you are not going to be upsetting parties who have licensed patents to the Linux kernel that you maybe charging money of to use your patents because you are obeying the GPLv2 with exception the Linux kernel has.

        Linux kernel mode driver closed source you are walking a tight rope where is really easy todo the wrong thing and use something you don't have the legal right to. That some of the reason why these protections are being added.
        You are describing the theoretical implementation an I will show you a real world implementation. Marvell has a mvDmaDrv driver and what it does is to map the DMA access into user-space. This is how user-mode drivers are implemented by some vendors. Such practice is so hacky and insecure that it even requires intel_iommu=off to work on Intel CPUs. And such kernel driver is still licensed under GPL.

        You are keep talking about it is important to comply with GPL in kernel, I agree with you but all of my posts are explaining how hardware vendors managed to hide their code and still be GPL compatible. My point that "GPL is not helpful in hardware enablement" does not change by all of your points. GPL does not prevent vendors hiding their code, GPL does not help the kernel to receive timely hardware support, and the hacks to workaround GPL from some vendors looks horrible and makes the driver worse than it should be.

        Comment


        • Originally posted by mdedetrich View Post
          That's not what is meant a stable in kernel ABI. A stable ABI is stable across multiple kernel versions
          Something to be aware of what you wrote is not correct because what you asked for exists in the words you used. Of course don't think what exists is what you intended by what you wrote.
          EXPORT_SYMBOL is stable on in Stable/LTS kernel branch. So if you go from 6.4.1 to 6.4.13 this will be stable. Yes this does cross multi kernel versions.
          EXPORT_SYMBOL_GPL on the other hand in Stable/LTS kernel branch going from 6.4.1 to 6.4.13 there is no promise that this function will function the same at all.

          You want stable kernel ABI across multi kernel branches that a different thing. You missed that stable ABI driver across multiple kernel versions does exist with Linux just it does not cross between kernel branches.

          Yes todo what you are talking about would need another EXPORT_SYMBOL thing called like EXPORT_SYMBOL_STABLE or the like that would be a subset of EXPORT_SYMBOL tagged not to be changed between major versions.

          The protections that are being implemented around EXPORT_SYMBOL_GPL to prevent this from being used would have to be implemented on the new EXPORT_SYMBOL_STABLE thing to prevent models using EXPORT_SYMBOL_STABLE the from using functions not in the list.

          Then you will have those writing drivers complaining about degraded performance as wrappers have to be implemented to keep legacy EXPORT_SYMBOL around so not breaking the stable ABI thing.

          Fun part here mdedetrich EXPORT_SYMBOL_STABLE use to exist in the Linux kernel back in the 1990s and early 2000. Parties like Nvidia back then complained about the lack of performance and would use EXPORT_SYMBOL instead to the point the Linux kernel developers gave up on the idea. People forgot that the kernel modules and the core kernel use to be two different tar files and you use to be able to mix and match.

          There is left overs in the kernel of this by the way.
          https://cateee.net/lkddb/web-lkddb/MODVERSIONS.html this from 2006. Yes back in the day it was possible when parties played by the rules to use binary kernel drivers across multi Linux major kernel releases.

          Maybe we can return that in future by extending these protections against Illicit Behavior of driver makers to enforce subset of exported functions usage for non mainline drivers. Remember it is Illicit usage causing stability problems why Linux today does not have the feature you want. Yes Nvidia is one of the parties who in the 2000s undermined the functionality you want with miss use.

          Comment


          • Originally posted by mSparks View Post
            so not a nvidia issue then, in fact completely the opposite of a nvidia issue since that only applies to GPL software that is capable of using them in the first place.
            That not true.
            Phoronix, Linux Hardware Reviews, Linux hardware benchmarks, Linux server benchmarks, Linux benchmarking, Desktop Linux, Linux performance, Open Source graphics, Linux How To, Ubuntu benchmarks, Ubuntu hardware, Phoronix Test Suite

            "Christoph's new fix is done by clarifing __symbol_get() was only ever intended to prevent module reference loops by Linux kernel modules and so making it only find symbols exported via EXPORT_SYMBOL_GPL(). The circumvention tactic used by Nvidia was to use symbol_get() to purposely swift through proprietary module symbols and completley bypass our traditional EXPORT_SYMBOL*() annotations and community agreed upon restrictions."
            Read the very start of this

            Nvidia has been using stuff that is tagged GPL only.
            This has resulted in:
            1) Stability problems that should not exist because Nvidia driver should not be using anything tagged EXPORT_SYMBOL_GPL.
            2) Different companies get really upset that their patents they agreed to be used under GPLv2 are being used by Nvidia closed source module that is not GPLv2.

            Comment


            • Originally posted by oiaohm View Post

              That not true.
              Phoronix, Linux Hardware Reviews, Linux hardware benchmarks, Linux server benchmarks, Linux benchmarking, Desktop Linux, Linux performance, Open Source graphics, Linux How To, Ubuntu benchmarks, Ubuntu hardware, Phoronix Test Suite


              Read the very start of this

              Nvidia has been using stuff that is tagged GPL only.
              How do you think an export they don't use in production resulted in anything in happening production?

              by importing exports from their proprietary modules into an allegedly GPL licensed module
              Nvidia closed source kernel drivers everyone uses "taints the kernel".
              The GPL stuff is "opt in" and new and marked as alpha quality
              The first open-source release of GPU kernel modules for the Linux community helps improve NVIDIA GPU driver quality and security.


              In this open-source release, support for GeForce and Workstation GPUs is alpha-quality
              Last edited by mSparks; 30 August 2023, 05:05 PM.

              Comment


              • Originally posted by gnattu View Post
                You are describing the theoretical implementation an I will show you a real world implementation. Marvell has a mvDmaDrv driver and what it does is to map the DMA access into user-space. This is how user-mode drivers are implemented by some vendors. Such practice is so hacky and insecure that it even requires intel_iommu=off to work on Intel CPUs. And such kernel driver is still licensed under GPL.
                The Marvell driver is kind of a broken bit of works. UIO driver doing the same things does not require intel_iommu off.



                Yes it a warpped one wrap the Marvel mvDmaDrv driver do the required vfio house keeping you can leave intel_iommu on.

                Yes Linux kernel mode driver that does not do the vfio house keeping when using DMA also can result in you needing to turn intel_iommu off as well so it works.

                Originally posted by gnattu View Post
                hacks to workaround GPL from some vendors looks horrible and makes the driver worse than it should be.
                The reality here if a vendor is going todo their driver wrong it they will do it wrong be they write a usermode driver or kernel mode driver. If the vendor does the driver wrong you will need to apply workarounds so it works correctly. Kernel mode driver is not suited to applying workarounds where usermode ones there are many ways to add extra code to fix issues if the will is there. This is all because userspace drivers are in their own memory space that makes fixing them way more possible without rebuilding them.

                Yes that hacky Marvell driver does not end up using anything that not allowed by license. To use UIO with iommu enabled is possible of the correct things are done.

                Please note there is a bigger hacky thing out there like usermode network drivers for Linux that do all the things Marvel myDmaDrv does yes DMA interupts and everything else with intel iommu on because they do all the correct usage declares to vfio.

                The reality here to a person who worked with the hacky usermode network drivers over the years I have seen many times worse than that myDmaDrv. Yes really bad drivers being userspace there is many things you can do to make them function more correctly..

                Comment


                • Originally posted by mSparks View Post
                  How do you think an export they don't use in production resulted in anything in happening production?
                  They did use in production.

                  "The circumvention tactic used by Nvidia was to use symbol_get() to purposely swift through proprietary module symbols and completley bypass our traditional EXPORT_SYMBOL*() annotations and community agreed upon restrictions."

                  It noted in the first post. Nvidia got caught with hand in cookie jar bipassing the EXPORT_SYMBOL_GPL check. They really got caught when hard-drive data corruption turned up because of it.

                  Something to be aware of is so far every time Nvidia been caught they have end up just using a new bipass method. Maybe this time they have learnt their leason or maybe not. Yes if Nvidia has not learnt their lesson and we end up in a DMCA court case the risk is you lose all the closed source drivers.

                  Please note the recent project to open source their Nvidia driver starts mostly to try to make IBM/Redhat happy after being caught with hand in cookie jar again.

                  Also note Nvidia has been caught so far over 20 times using EXPORT_SYMBOL_GPL items when they should not be. So at this point I am kind of not trusting they are doing the right thing as doing the right thing with EXPORT_SYMBOL_GPL items does not align with NVIDIA track record. Yes multi times in a single year at times as well.

                  Comment


                  • Originally posted by oiaohm View Post

                    They did use in production.

                    "The circumvention tactic used by Nvidia was to use symbol_get() to purposely swift through proprietary module symbols and completley bypass our traditional EXPORT_SYMBOL*() annotations and community agreed upon restrictions."

                    It noted in the first post. Nvidia got caught with hand in cookie jar bipassing the EXPORT_SYMBOL_GPL check. They really got caught when hard-drive data corruption turned up because of it.

                    Something to be aware of is so far every time Nvidia been caught they have end up just using a new bipass method. Maybe this time they have learnt their leason or maybe not. Yes if Nvidia has not learnt their lesson and we end up in a DMCA court case the risk is you lose all the closed source drivers.

                    Please note the recent project to open source their Nvidia driver starts mostly to try to make IBM/Redhat happy after being caught with hand in cookie jar again.

                    Also note Nvidia has been caught so far over 20 times using EXPORT_SYMBOL_GPL items when they should not be. So at this point I am kind of not trusting they are doing the right thing as doing the right thing with EXPORT_SYMBOL_GPL items does not align with NVIDIA track record. Yes multi times in a single year at times as well.
                    Lovely theory save one problem.

                    An actual link to the source code of this module "bipassing the EXPORT_SYMBOL_GPL check" would make it less of a bullshit theory.
                    Heck, even a binary would be sufficient.

                    Their kernel-modules stuff is here:
                    NVIDIA Linux open GPU kernel module source. Contribute to NVIDIA/open-gpu-kernel-modules development by creating an account on GitHub.


                    only place it seems to be used is
                    export_symbol_gpl_conftest() {
                    #
                    # Check Module.symvers to see whether the given symbol is present and its
                    # export type is GPL-only (including deprecated GPL-only symbols).
                    #
                    SYMBOL="$1"
                    TAB=' '
                    ifgrep-e"${TAB}${SYMBOL}${TAB}.*${TAB}EXPORT_\(UNUSED_\)*SYMBOL_GPL\$"\
                    "$OUTPUT/Module.symvers">/dev/null2>&1; then
                    echo"#define NV_IS_EXPORT_SYMBOL_GPL_$SYMBOL 1"|
                    append_conftest"symbols"
                    else
                    # May be a false negative if Module.symvers is absent or incomplete,
                    # or if the Module.symvers format changes.
                    echo"#define NV_IS_EXPORT_SYMBOL_GPL_$SYMBOL 0"|
                    append_conftest"symbols"
                    fi
                    }

                    and
                    nv_set_dma_address_size(
                    nv_state_t*nv,
                    NvU32 phys_addr_bits
                    )
                    {
                    nv_linux_state_t*nvl =NV_GET_NVL_FROM_NV_STATE(nv);
                    NvU64 start_addr =nv_get_dma_start_address(nv);
                    NvU64 new_mask = (((NvU64)1) << phys_addr_bits) -1;
                    nvl->dma_dev.addressable_range.limit= start_addr + new_mask;
                    /*
                    * The only scenario in which we definitely should not update the DMA mask
                    * is on POWER, when using TCE bypass mode (see nv_get_dma_start_address()
                    * for details), since the meaning of the DMA mask is overloaded in that
                    * case.
                    */
                    if (!nvl->tce_bypass_enabled)
                    {
                    dma_set_mask(&nvl->pci_dev->dev, new_mask);
                    /* Certain kernels have a bug which causes pci_set_consistent_dma_mask
                    * to call GPL sme_active symbol, this bug has already been fixed in a
                    * minor release update but detect the failure scenario here to prevent
                    * an installation regression */
                    #if!NV_IS_EXPORT_SYMBOL_GPL_sme_active
                    dma_set_coherent_mask(&nvl->pci_dev->dev, new_mask);
                    #endif
                    }
                    }

                    So this applies to confidence testing and a alternate way of fixing a bug in certain kernel versions. That's a f'ing looooooong way from most of the claims in here, and light years off your hyperbole.
                    Last edited by mSparks; 30 August 2023, 06:43 PM.

                    Comment


                    • Originally posted by jorgepl View Post

                      Removing the GPL_ONLY thing is absolutely not relicensing anything, it's removing barriers, and these are not in any way an infringement (except for the part mSparks said which might -or might not- be arguable in court). But it's absolutely by no means relicensing anything.
                      The only reason you'd ever need to remove _GPL_ONLY is get the module loader to not error out when a module that doesn't reports itself as GPL attempt to load and use that symbol.

                      It changes the behavior or the linker/loader which is a license issue.

                      Further while the tag itself is not a license, it's a very clear indicator that those who are most familiar with the module believe use of the symbol to be covered by the GPL Disregard at your peril,as intentional/willfull infringement is expensive.

                      Comment

                      Working...
                      X