Announcement

Collapse
No announcement yet.

Radeon Gallium3D OpenCL Is Coming Close

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Radeon Gallium3D OpenCL Is Coming Close

    Phoronix: Radeon Gallium3D OpenCL Is Coming Close

    Following the OpenCL Gallium3D state tracker having been merged into Mesa earlier this week, the open-source Radeon OpenCL support is coming close...

    http://www.phoronix.com/vr.php?view=MTEwMjU

  • #2
    any apps that work with it or is it tooo early for that???? (ie darktable i think uses OpenCL)


    awesome work anyway

    Comment


    • #3
      it would be really good to know how we are going to move forward from here as far as what ir language will be used in gallium. right now we have tgsi being used everywhere except radeon compute, but, there were all these plans o replace the entire ir stack with llvm. this would be a start i guess, but is that what we really want? we need to stick with something though, that was kinda the entire point of gallium to try and unify as much of the graphics stack between different drivers as we can.

      Comment


      • #4
        Originally posted by 89c51 View Post
        any apps that work with it or is it tooo early for that???? (ie darktable i think uses OpenCL)


        awesome work anyway
        I'm curious too. I moved my A8-3850 to my HTPC because it was too finickey on my Ubuntu machine. I'd love to leverage the GPU on it without resorting to AMDs drivers, all I'd really like to see is some better video encoding.

        Then again, Quicksync on Linux would be nice for my i3 too.

        Comment


        • #5
          Originally posted by benjamin545 View Post
          it would be really good to know how we are going to move forward from here as far as what ir language will be used in gallium. right now we have tgsi being used everywhere except radeon compute, but, there were all these plans o replace the entire ir stack with llvm. this would be a start i guess, but is that what we really want? we need to stick with something though, that was kinda the entire point of gallium to try and unify as much of the graphics stack between different drivers as we can.
          I'm not sure "in gallium" is the right boundary. There's also the GLSL IR vs TGSI question.

          At the time we started working on GPU compute clover used LLVM exclusively so that was the obvious choice, particularly since we were looking at LLVM for SI shader compiler...

          Francisco then refactored clover as a Gallium3D state tracker so TGSI was the obvious choice, particularly since the nouveau folks were *not* looking at LLVM at the time....

          I imagine the Intel devs are looking at compute with IB and for them GLSL IR will presumably be the obvious choice, since they just finished designing and implementing it...

          At last year's AFDS we talked about a proposed common IR for GPU/CPU compute we were going to support as part of our Heterogeneous System Architecture (HSA, previously FSA) initiative, so *that* might seem like an obvious choice if we get it right....

          When we discussed this at the last XDS it seemed there hadn't been enough work done with compute to have a clear consensus on what IR should be used between GLSL IR, TGSI, LLVM, FSAIL/HSAIL or something else. We concluded that the best thing for now would be for all of us to get some experience running real world compute workloads on GPUs using whatever IR was most convenient then get back together in 12-18 months and try to converge on a single IR for graphics and compute.

          There was probably an unspoken hope that we would get convergence on graphics IR first
          Last edited by bridgman; 05-13-2012, 11:57 AM.

          Comment


          • #6
            Originally posted by MonkeyPaw View Post
            I'm curious too. I moved my A8-3850 to my HTPC because it was too finickey on my Ubuntu machine. I'd love to leverage the GPU on it without resorting to AMDs drivers, all I'd really like to see is some better video encoding.

            Then again, Quicksync on Linux would be nice for my i3 too.
            Every single OpenCL/dedicated block encoder so far have terrible quality, including QuickSync and what have we. Just curious, what are you doing that needs the speed more than the quality?

            Comment


            • #7
              Originally posted by curaga View Post
              Every single OpenCL/dedicated block encoder so far have terrible quality, including QuickSync and what have we. Just curious, what are you doing that needs the speed more than the quality?
              I do various encodes from HD home videos to upload, as well as taking recorded TV (in HD) and encoding to x264. Last I heard, Quicksync had the best IQ when compared to other GPU off-loads, and it's ridiculously fast.

              Comment


              • #8
                @ bridgeman

                Ah, so, basicaly, we( we as in you, dont you just love when people say we but really they themselves aren't part of the "we") have no clue how demanding compute will be on the ir and in what way the ir will need to bend to effectivly operate.

                can you tell me this then. i know we often hear about the ir languages probobly more so than any other componant of the graphics stack below the actual end user api's, but how inconviniant is it really to switch from one to the other? in a game engine it would be a real job to change out from ogl to dx, even from ogl to ogl es if ddone certain ways, but how much of a bother would it be for you to change amd compute back end frrom the llvm over to tgsi if that was the more unified aproch?

                also, whats the chances someone will start slinging gcc ir in there as an option what with their plans to try and make a competing ir more like what llvm has?

                Comment


                • #9
                  Originally posted by benjamin545 View Post
                  Ah, so, basicaly, we( we as in you, dont you just love when people say we but really they themselves aren't part of the "we") have no clue how demanding compute will be on the ir and in what way the ir will need to bend to effectivly operate.
                  The problem is that AFAIK essentially all of the "serious" GPU compute experience has been in proprietary stacks so far, generally using proprietary IRs. The source programming languages and runtime environments are evolving as well, which makes it even harder to leverage existig experience.

                  Originally posted by benjamin545 View Post
                  can you tell me this then. i know we often hear about the ir languages probobly more so than any other componant of the graphics stack below the actual end user api's, but how inconviniant is it really to switch from one to the other? in a game engine it would be a real job to change out from ogl to dx, even from ogl to ogl es if ddone certain ways, but how much of a bother would it be for you to change amd compute back end frrom the llvm over to tgsi if that was the more unified aproch?
                  So far I haven't seen much in the way of *changing* IRs... it's more common to just translate from the new IR to whatever was being used before. If you look at the r3xx driver as an example, it was written around Mesa IR and when TGSI was introduced the developers added a TGSI-to-Mesa IR translator at the front of the Gallium3D driver and kept the existing shader compiler code.

                  This wasn't a matter of intertia though -- some of the IRs are structured as trees or linked lists which a compiler can work on directly (eg optimization steps) while others like TGSI are "flat" and intended for exchange between components rather than as an internal representation worked on directly by the compiler.

                  That breaks the problem down into two parts :

                  1. Should the IR be something suitable for direct use by compiler internals, or should it be something designed primarily for transmittal between driver components ?

                  The advantage of something "flat" like TGSI or AMDIL is that it is relatively independent of compiler internals. The disadvantage is that all but the simplest compilers will require a more structured IR internally and so translation to and from TGSI will be required at each component boundary. Complicating the matter is that while the extra translations seem like they would slow things down they only slow down the compilation step not the runtime execution. Compilation does not usually happen every time the shader is run - minimum is once at program startup, with recompilation sometimes needed when state info that affects shader code changes or if the driver's cache of compiled shaders fills up.

                  If the choice is something "flat" then TGSI is probably the most likely choice for the open source stacks. If a flat IR is *not* chosen, then we get to question 2...

                  2. Assuming a structured IR is used, which one should be used ?

                  This is where GLSL IR and LLVM IR enter the picture, and where the choice of shader compiler internals becomes a factor.

                  For graphics, the Intel devs were talking about feeding GLSL IR directly into the HW shader compiler for graphics.

                  Before you say "that's wierd", remember that the high level compiler in Mesa (the "OpenGL state tracker") generates GLSL IR directly which is then converted into TGSI or Mesa IR for use by HW layer drivers so using GLSL IR bypasses some translation steps. For graphics, "Classic" HW drivers use Mesa IR today while "Gallium3D" HW drivers use TGSI. Bottom line is that when you run a GL program on any option source driver the shader starts as GLSL IR then gets optionally translated to something else.

                  Clover, on the other hand, starts with Clang which generates LLVM IR directly, so the kernel starts as LLVM IR then gets optionally translated to something else.

                  Once you get down to the HW driver, the shader compiler is likely to need a structured IR such as GLSL IR or LLVM IR. You can see where this is going...

                  Originally posted by benjamin545 View Post
                  also, whats the chances someone will start slinging gcc ir in there as an option what with their plans to try and make a competing ir more like what llvm has?
                  I doubt that gcc will get plumbed into the existing GL/CL driver stack but it seems pretty likely that gcc *will* end up generating GPU shader code and that runtime stacks will exist to get that code running on hardware. This may already have happened although I haven't seen anyone do it yet.
                  Last edited by bridgman; 05-13-2012, 08:13 PM.

                  Comment


                  • #10
                    well, then it seems like the obvious answer is if we cant have both a structured ir thats as easily transportable between componants (best of both worlds) then we have to use the right ir at the right time for the right solution.

                    i guess you have to take a step back and try to realize what the big picture is, what is it we want. regarding gallium3d, and i know thats excluding intel and anchient stuff, but what can you realy do about that, is we want a strong central structure that interconects various piecies that do specific functionalities (heres a opencl state tracker, heres a nvidia generation X driver, heres a windows xp winsys connector). this is what gallium3d was billed as. but it was intended for use initialy and primarily for the linux ecosystem, even if it wasn't locked into that specific role.

                    so in the linux ecosystem, we have some paid hardcore developers and we have a lot of hobbyists. hobbyists will never ever individualy on their own design a modern graphics driver thats competitive with todays standards, and thats ok. now as we have seen in the linux graphics stack over the past few years, paid hardcore developers have come a long way in creating a very competative graphics stack, but we really want hobbyists to be a part of that too, and while some have, i think a lot of people while willing a possibly able to conribute, still feel overwhelmed with the complexity of it all.

                    getting more the the point i guess, is that if tgsi is a simpler ir to transport between various componants, if i was a newcomer wanting to develop a componant, it would be easier to deal with tgsi. if it is then nessicary to convert it to something more specific to what i am doing, (whitch is what ive been hearing all along is that its too hard to create one all encompasing ir that is perfect for all state trackers and all hardware drivers), then that is what would hae to be done. at least then you could try to make your internal ir something specific to your hardware, for instance, i sure the nvfx/nv30 driver, with its ununified shader cores, is much diferent than the nv50 or nv0c or whatever.

                    it would be best if other parts of gallium had that same kind of mentality, for instance, memory management is one where initialy gallium was sold as being able to abstract memory management compleatly into the sinsys portion of the driver, but whate ive read before is that a lot of the memory management has been implemented in the hardware drivers usualy due to some feature missing from gallium or it just being easier for whoever is doing it to do it in the driver (im guessing proboobly a lot of that comes from the initial testing and learning stages).

                    Comment


                    • #11
                      Originally posted by benjamin545 View Post
                      so in the linux ecosystem, we have some paid hardcore developers and we have a lot of hobbyists. hobbyists will never ever individualy on their own design a modern graphics driver thats competitive with todays standards, and thats ok. now as we have seen in the linux graphics stack over the past few years, paid hardcore developers have come a long way in creating a very competative graphics stack, but we really want hobbyists to be a part of that too, and while some have, i think a lot of people while willing a possibly able to conribute, still feel overwhelmed with the complexity of it all.
                      This is one of the interesting tradeoffs. Do you want the driver to be simple and accessible so more people can potentially contribute, or do you want it to be sufficiently sophisticated that it could potentially match the performance of proprietary drivers at the cost of reducing the pool of potential contributors ?

                      The current open source driver standards seem to be aimed at the knee of the curve, where they're sufficiently complex to allow writing "fairly performant" without becoming "twice as complex for a small increase in performance". Seems like a good compromise, but it's important to understand that it *is* a compromise.

                      As an example, the open source drivers have a relatively large amount of common code and a relatively small amount of HW-specific code but if you want to get that last 20% of potential performance you generally need to move the line up and have substantially more of the driver stack being hardware-specific. That makes the driver code larger and more complex, which in turn makes it a lot harder for potential developers to contribute.

                      Originally posted by benjamin545 View Post
                      getting more the the point i guess, is that if tgsi is a simpler ir to transport between various componants, if i was a newcomer wanting to develop a componant, it would be easier to deal with tgsi. if it is then nessicary to convert it to something more specific to what i am doing, (whitch is what ive been hearing all along is that its too hard to create one all encompasing ir that is perfect for all state trackers and all hardware drivers), then that is what would hae to be done. at least then you could try to make your internal ir something specific to your hardware, for instance, i sure the nvfx/nv30 driver, with its ununified shader cores, is much diferent than the nv50 or nv0c or whatever.
                      IR is affected both by hardware characteristics and choice of compiler frameworks being used. If everyone settles on a single compiler framework then that IR will probably win -- otherwise TGSI will probably get extended so that it can serve as a common language between the different compiler stacks. The interesting question is whether it will be noticeably faster to convert directly from one structured IR to another, or whether going through a common "flat" IR will be close enough in performance that the benefits outweigh the costs.

                      Originally posted by benjamin545 View Post
                      it would be best if other parts of gallium had that same kind of mentality, for instance, memory management is one where initialy gallium was sold as being able to abstract memory management compleatly into the sinsys portion of the driver, but whate ive read before is that a lot of the memory management has been implemented in the hardware drivers usualy due to some feature missing from gallium or it just being easier for whoever is doing it to do it in the driver (im guessing proboobly a lot of that comes from the initial testing and learning stages).
                      The winsys layer was supposed to *abstract* things like memory management not *implement* them. The implementation was always expected to be in the lower level drivers (eg the kernel driver aka drm)-- the Gallium3D abstractions just provide a standard way to call those functions.
                      Last edited by bridgman; 05-13-2012, 11:15 PM.

                      Comment


                      • #12
                        Originally posted by bridgman View Post
                        This is one of the interesting tradeoffs. Do you want the driver to be simple and accessible so more people can potentially contribute, or do you want it to be sufficiently sophisticated that it could potentially match the performance of proprietary drivers at the cost of reducing the pool of potential contributors ?
                        Excuse me for short intervention, but what about complexity of Linux? Was it fatal? Has number of hackers reduced from version 0.01 till recent or increased? We don't get 80% performance, we get 20%. We are missing raw programming resource, everyone knows this. Also, everyone knows we don't achieve this resource by talking it over and over, or by endlessly changing points of view and finding array of arguments. If we are to change something, we are to pick direction and follow it. This will never happen, it is fact. Again, excuse me for interruption.

                        Comment


                        • #13
                          Originally posted by crazycheese View Post
                          Excuse me for short intervention, but what about complexity of Linux? Was it fatal? Has number of hackers reduced from version 0.01 till recent or increased?
                          IIRC the number of hackers for 0.01 was "one", so presumably it's gone up from there

                          Graphics driver development community seems to be growing at roughly the same pace as the general Linux developer community, with roughly the same mix of commercial and volunteer developers.

                          Take a read through http://go.linuxfoundation.org/who-writes-linux-2012 - starting to see a number of graphics developers showing up in the "top contributors" list.

                          Originally posted by crazycheese View Post
                          We don't get 80% performance, we get 20%.
                          I think you'll find the average is more like 45-50% these days and continuing to increase. You can still cherry-pick numbers to get 20% (eg recent Llano benchmarks with lower clocks for open source than for Catalyst) but it's getting harder every month.

                          Originally posted by crazycheese View Post
                          We are missing raw programming resource, everyone knows this.
                          Agreed - the number of developers per KSLOC is maybe 1/2 as high for the "desktop bits" (X, graphics drivers etc..) as it is for the "server bits" (kernel, filesystems etc..). That is probably related to the higher $$ earned from server Linux business but that's just a guess.

                          Originally posted by crazycheese View Post
                          Also, everyone knows we don't achieve this resource by talking it over and over, or by endlessly changing points of view and finding array of arguments.
                          I don't understand what point you are making here. Are you saying people shouldn't ask questions, or shouldn't answer them, or something else ?

                          Originally posted by crazycheese View Post
                          If we are to change something, we are to pick direction and follow it. This will never happen, it is fact.
                          That's not what happened with the Linux kernel either and my impression was that you thought the Linux kernel was a good example to follow.

                          Linux started out with fairly basic implementations of all the major functions, then over the years different subsystems were gradually replaced with more complex but more featureful and higher performing implementations. That's the same pattern we are seeing with graphics -- UMS gets replaced with KMS, shader translators get replaced with shader compilers, classic Mesa HW layer gets replaced with Gallium3D layer, layers get blended together for performance (eg Intel plans to use GLSL IR in the graphics HW layer) etc...

                          That seems like the right approach to me, but it is not consistent with "open source drivers running faster than proprietary drivers in the first couple of years" which is what everyone except the developers seemed to expect. Now I guess the popular sentiment is "things aren't moving as fast as I hoped so open source drivers are always going to suck", which is just as wrong.
                          Last edited by bridgman; 05-14-2012, 10:13 PM.

                          Comment


                          • #14
                            Originally posted by bridgman View Post
                            I think you'll find the average is more like 45-50% these days and continuing to increase. You can still cherry-pick numbers to get 20% (eg recent Llano benchmarks with lower clocks for open source than for Catalyst) but it's getting harder every month.
                            in your rhetoric we only get 45-50% in "LOL-OEM-LOL" products and we get it because the "LOL-OEM-LOL" take care about opensource drivers.

                            and we get 10-20% performance with the Llano because amd sell these GPUs with there CPUs directly to the customer and no "LOL-OEM-LOL" is there to protect us from AMD's make fun on open-source customers LOL!

                            amd should really stop to sell APUs to customers directly then maybe the performance increase from 10-20% to 45-50%

                            i'll take you seriously again if the products AMD sell directly to consumers are on the same performance level than the products from the LOL-OEM-LOL companies.

                            i can only recommend to buy INTEL if someone don't wanna spend expensive money on a LOL-OEM-LOL graphic card product.

                            Comment


                            • #15
                              Originally posted by Qaridarium View Post
                              in your rhetoric we only get 45-50% in "LOL-OEM-LOL" products and we get it because the "LOL-OEM-LOL" take care about opensource drivers.

                              and we get 10-20% performance with the Llano because amd sell these GPUs with there CPUs directly to the customer and no "LOL-OEM-LOL" is there to protect us from AMD's make fun on open-source customers LOL!

                              amd should really stop to sell APUs to customers directly then maybe the performance increase from 10-20% to 45-50%

                              i'll take you seriously again if the products AMD sell directly to consumers are on the same performance level than the products from the LOL-OEM-LOL companies.

                              i can only recommend to buy INTEL if someone don't wanna spend expensive money on a LOL-OEM-LOL graphic card product.
                              Beat me! I didn't understood anything from the above!

                              Comment

                              Working...
                              X