Announcement

Collapse
No announcement yet.

David Airlie Moves Toward Upstreaming Soft FP64 Support In Mesa

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • David Airlie Moves Toward Upstreaming Soft FP64 Support In Mesa

    Phoronix: David Airlie Moves Toward Upstreaming Soft FP64 Support In Mesa

    There's been work going on for years of "soft" FP64 support to allow emulated support for the double-precision floating-point data types for GPUs not otherwise inherently supporting this capability. The soft support would allow for some older GPUs to then advertise OpenGL 4.0+ support now that ARB_gpu_shader_fp64 support could be enabled. That day looks like it's finally coming for mainline Mesa...

    http://www.phoronix.com/scan.php?pag...gling-Upstream

  • #2
    Great news. Thanks to David Airlie and helpers for their good work, and also many thanks to Michael Larabel for reporting all these good news for us Linux users with older AMD graphics cards.

    Comment


    • #3
      Great news indeed!

      But if the Radeon HD 5000/6000 series GPUs fully support OpenGL 4.3 with the exception of FP64 in hardware, doesn't that mean that in theory, a Vulkan driver could be written for this generation of hardware as well? The Vulkan specifications do not explicitly require hardware FP64 capability, do they? This might be useful for supporting this hardware in the long term. Wasn't support for Compute Shaders and OpenGL ES 3.1 enough to support Vulkan as well?

      But regardless of that, impressive work, David Airlie!

      Comment


      • #4
        OMG. Please get this asap to the masses in a Mesa release!
        That should be the last item for a lot of chips to be able to highly officially advertise OpenGL 4.x support in the driver.
        Stop TCPA, stupid software patents and corrupt politicians!

        Comment


        • #5
          I'm still don't understand whether the CPU is doing the work or the GPU. If the CPU is doing it, I imagine the performance would be so bad that you'd be better off doing strictly CPU calculations (assuming you're doing OpenCL; I'm not aware of any OpenGL programs that use FP64), because you'd be wasting a lot of time in communicating over the PCIe bus. If the GPU itself is emulating FP64, I'm not exactly sure how that's achievable, but I could definitely see the benefit in that.

          Does Arlie intend to improve upon OpenCL for R600? Though I certainly appreciate his efforts, I don't quite understand what his plans are. I have this old Firepro card (based on the HD 6670) that I use for BOINC, and pretty much the only workunit within my interest that it can handle without failure is SETI. It'd be great if I didn't have to blacklist it from other projects, but, I'm not sure if it can be used with open-source drivers.

          Comment


          • #6
            Originally posted by schmidtbag View Post
            If the GPU itself is emulating FP64, I'm not exactly sure how that's achievable
            just as infinite precision floating point is achievable or floating point on cpus without fpu is achievable

            Comment


            • #7
              Originally posted by schmidtbag View Post
              I'm still don't understand whether the CPU is doing the work or the GPU. If the CPU is doing it, I imagine the performance would be so bad that you'd be better off doing strictly CPU calculations (assuming you're doing OpenCL; I'm not aware of any OpenGL programs that use FP64), because you'd be wasting a lot of time in communicating over the PCIe bus. If the GPU itself is emulating FP64, I'm not exactly sure how that's achievable, but I could definitely see the benefit in that.

              Does Arlie intend to improve upon OpenCL for R600? Though I certainly appreciate his efforts, I don't quite understand what his plans are. I have this old Firepro card (based on the HD 6670) that I use for BOINC, and pretty much the only workunit within my interest that it can handle without failure is SETI. It'd be great if I didn't have to blacklist it from other projects, but, I'm not sure if it can be used with open-source drivers.
              The work is done on the GPU. It's achieved by splitting one instruction in 64bit to multiple instructions in 32bit and combining their results. That's why it's slower, because you need to execute 3,4,5,.... instructions instead of a single one to get the same result. How many instructions depends on what you want to do but it's at least 3 (2*32bit and the combination of the result). So best case is 3 instructions instead of a single one. Many cases require more than 3.

              It's pretty much like it's described in the article. It's only done because it is needed for claiming OpenGL >= 4.0 support. Nobody uses it in games, so performance doesn't matter, but a lot of games want OpenGL >= 4.0. With this enabled you jump from 3.3 straight to 4.4 support for these cards.

              Comment


              • #8
                Originally posted by pal666 View Post
                just as infinite precision floating point is achievable or floating point on cpus without fpu is achievable
                GPUs are structured very differently so I'm not sure it's that simple. For example, using FP16 on a GPU that doesn't have the hardware to process that will not run better than FP32. There are "half-precision" GPUs out there that are significantly faster with FP16 vs FP32. Meanwhile, CPUs are a lot more dynamic, where they will actually run faster by lowering the precision.
                I'm not entirely sure how GPUs are programmed at lower levels, but when you develop a program to be multi-threaded on a CPU, each thread operates independently, and inherently does not share memory with other threads (whether they're related or not). You can make them share memory, but it costs performance (I'm assuming this is because each thread has to spend more processing time talking to each other). What I'm getting at is if a GPU's individual cores are designed to be FP32, each "thread" (if that's the right term) cannot allocate any more memory to be FP64.
                Last edited by schmidtbag; 03-12-2018, 10:50 AM.

                Comment


                • #9
                  Originally posted by droste View Post
                  The work is done on the GPU. It's achieved by splitting one instruction in 64bit to multiple instructions in 32bit and combining their results. That's why it's slower, because you need to execute 3,4,5,.... instructions instead of a single one to get the same result. How many instructions depends on what you want to do but it's at least 3 (2*32bit and the combination of the result). So best case is 3 instructions instead of a single one. Many cases require more than 3.
                  I had a feeling that they just simply combined multiple instructions, but I was confused how that'd work since each core runs in parallel. But, if it takes multiple executions, that makes more sense. Anyway thanks for the clarification.
                  It's pretty much like it's described in the article.
                  I don't see where in the article it answers my question, which is why I asked it. I'm aware of why FP64 is needed, but the article didn't get into detail about how it was done.

                  Comment


                  • #10
                    Sounds like the time is coming when people can bump up minimum expected GL version from 3.3 to 4.3.

                    Comment

                    Working...
                    X