Announcement

Collapse
No announcement yet.

AMD Radeon HD 7970 On Linux

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #21
    Southern Islands compute technologies.

    Would it be possible to work something out with Phoronix to write and article focused onthe new compute capabilities of Southern Islands under Linux. One thing I'm interested in is how compute loads impact graphics usage, are we to the point where long running compute jobs do not impact graphics significantly. I guess that is a question about support for threads. However any info that you are free to spill that gives us a better mental image of improvements to Southern Islands compute performance would be welcomed.

    This question you probably can't answer but when will we see a Fusion processor using Southern Islands technology?

    Originally posted by bridgman View Post
    I haven't looked much at UVD in the latest GPUs but I thought they were still UVD 3. Can't really talk much about UVD at this point because I don't know what we are going to be able to release, although we are going to take another look in the new year.

    I don't think we have looked at PowerTune specifically; my guess is that the first chance to do that will be a few months from now. There are some more fundamental power management improvements I would like to get out first, and they will probably be a pre-requisite for PowerTune anyways.

    Comment


    • #22
      Originally posted by zoomblab View Post
      There has to be an easy and safe way for end-users to install/update drivers without having to depend on distributions and wait for months in order for code to be merged into the latest kernel.
      There is already a way to do this on Linux. It's the same way that you do it on Windows.

      If you switched to Linux because you want a pure open source platform, then yes, you will have to wait for a PPA of a future kernel RC (if you use Ubuntu) or compile a kernel RC or compile a release kernel+drm next (if you use another distro). This is due to the DRM component of the open-source drivers residing in the kernel.

      I predict that patches providing open-source support for the Radeon HD7xxx series will appear in 4 to 6 weeks time, given the following:
      • Support for the Radeon HD5xxx series took about 4.5 months
      • Support for the Radeon HD6xxx series took about 2.5 months
      • AMD has publicly stated that they are aiming for launch-day open-source support for the upcoming Radeon HD8xxx series


      Unfortunately this might me too late for the 3.3 kernel merge window, unless Dave and/or Alex can talk Linus into having an extended merge window. The last extended merge window was a bit worrying though.

      Comment


      • #23
        Originally posted by liam View Post
        Lastly regarding drm I was attempting to give the impression that since hybrid mode will do most of the work through the CUs we could ignore the small bit that the uvd handles and thus not use uvd at all thus ignoring drm
        I'm guessing it's probably not that simple because the closed source driver has to protect the content all the way to the display. If you say it does UVD -> *shader functions* -> display and the open source driver does CPU -> *shader functions* -> display and they share the same shader function it'd make it a lot easier to look for that same call in the closed source driver and dump protected content. DRM is poison to openness. Anyway as I understood hybrid mode that was mostly for GPU-assisted encoding, not decoding - though I suppose it could implement some of the same functions.

        Comment


        • #24
          Originally posted by madbiologist View Post
          Unfortunately this might me too late for the 3.3 kernel merge window, unless Dave and/or Alex can talk Linus into having an extended merge window. The last extended merge window was a bit worrying though.
          Linus has always stated that post RC merges are allowed for bug fixes and support for bringing up new hardware - which this would qualify as

          Comment


          • #25
            Originally posted by liam View Post
            Obviously you would know better than the guys at Anandtech, but I got a different impression as to the amount of difference in the compute bitstream from VLIW->non-VLIW SIMD. They seemed to say that the compiler was the ALL IMPORTANT COMPONENT in order to get decent utilization (meaning it was neccessary to keep vast amounts of the program branches in memory and be REALLY good at best guesses for dependencies). Though they didn't say this part I assume that since the compiler was so important for <=NI it becomes less so with >=SI, and should even need to be rewritten for the new architecture.
            Everything you are saying is correct, but you are assuming that the open source graphics driver already has that ALL IMPORTANT COMPILER for r6xx-NI and (quite reasonably) wondering why we don't use it for compute as well ? The answer is simple -- it doesn't have one.

            The Catalyst driver has a fancy compiler but the r600/r600g open source drivers do not (at least they didn't the last time I looked). The current TGSI-VLIW compiler takes advantage of the fact that most of the TGSI instructions are 3- or 4-wide vector operations and translates them directly into 3- or 4-slot VLIW instructions.

            The Catalyst shader compiler for pre-GCN analyzes dependencies and packs multiple operations into a single VLIW instruction... last time I looked the open source shader compiler did not. Either way, there is a lot of VLIW-specific code in the pre-GCN shader compilers which is not needed for GCN, to the point where it's easier to start over and leverage more conventional compilers which would have sucked on VLIW (absent something like LunarGLASS) but which should fit GCN peachy-fine.

            Originally posted by liam View Post
            What you seem to be saying is that <=NI will make good use of the IR->VLIW, which makes sense (also, graphics stays the same).
            Honestly, we weren't expecting great efficiency at first no matter which path we took -- LLVM IR to VLIW or LLVM IR to TGSI to VLIW. We went with the LLVM IR to VLIW route for a few reasons :

            - it was the shortest path to getting GPU acceleration into clover
            - since the TGSI to VLIW path didn't have much relevent optimization we didn't think we would lose performance by going direct from LLVM IR to VLIW
            - it gave us a way to test out the LLVM to GPU instruction code on available hardware before we had GCN boards
            - it produced code which was more in line with what other developers were looking for in order to build other compute stacks

            Neither approach would take much advantage of VLIW hardware for compute at first. If the graphics shader compiler gets more optimized in the future (or already is and we missed it ) we would probably try the LLVM IR to TGSI to VLIW path, but I think we would have started with this approach anyways because of the other benefits above.

            Originally posted by liam View Post
            For >=SI EVERYTHING goes through the new code (which I didn't know, but I also didn't know there were 2 code drops). Again, what surprises me is that the same compiler code used to generate VLIW is also being used to generate the new SIMD code.
            I don't remember if it was 2 code drops or 1 drop with 2 parts. I think it was 1 drop with some LLVM patches and some Mesa/Gallium3D driver patches.

            Originally posted by liam View Post
            Again, if what Anandtech says is accurate it seems like the old compiler was so hideously complex thay you'd want to jettison it as soon as possible instead of making it able to output to yet another kind of architecture (so presumable it now addresses VLIW4/5/SIMD).
            That's essentially what we are doing (even though in our case the old compiler was not hideously complex ).

            For GCN both graphics and compute will go through the new LLVM paths.
            Last edited by bridgman; 23 December 2011, 11:58 AM.
            Test signature

            Comment


            • #26
              Originally posted by wizard69 View Post
              Would it be possible to work something out with Phoronix to write and article focused onthe new compute capabilities of Southern Islands under Linux.
              Don't know but I'll ask around.

              Originally posted by wizard69 View Post
              One thing I'm interested in is how compute loads impact graphics usage, are we to the point where long running compute jobs do not impact graphics significantly. I guess that is a question about support for threads. However any info that you are free to spill that gives us a better mental image of improvements to Southern Islands compute performance would be welcomed.
              We pushed code for "multiple ring support" a couple of months ago :

              Phoronix, Linux Hardware Reviews, Linux hardware benchmarks, Linux server benchmarks, Linux benchmarking, Desktop Linux, Linux performance, Open Source graphics, Linux How To, Ubuntu benchmarks, Ubuntu hardware, Phoronix Test Suite


              A number of things will use that code in the future, but one of them is allowing compute operations to go through a separate command queue from graphics operations so that the hardware can flip between tasks at a fairly fine-grained level. The multiple ring support started with Cayman but GCN is the first generation where I expect we will really use it.

              Originally posted by wizard69 View Post
              This question you probably can't answer but when will we see a Fusion processor using Southern Islands technology?
              Correct, I can't answer

              Originally posted by FireBurn View Post
              Linus has always stated that post RC merges are allowed for bug fixes and support for bringing up new hardware - which this would qualify as
              We are trying to get all the invasive changes (multiple rings, memory management etc..) pushed out in time for the merge window. Hopefully the remaining changes for GCN will be specific to new HW, but I don't think we have discussed getting them in post-merge yet.

              BTW from this point on I'm probably going to switch from talking about GCN to talking about SI (the first generation of GCN parts), partly because it's one less letter (I'm big into efficiency) and partly because that's the terminology we use internally and I'm getting tired of typing SI, backpacing over it and typing GCN instead.
              Last edited by bridgman; 23 December 2011, 11:59 AM.
              Test signature

              Comment


              • #27
                Bridgman, Anandtech states GCN cards have an IOMMU and can access the full system ram.

                - how will this affect the driver?
                - GPU malware. What, if anything, can the driver/kernel/compiler to against these?

                Comment


                • #28
                  Originally posted by curaga View Post
                  Bridgman, Anandtech states GCN cards have an IOMMU and can access the full system ram.

                  - how will this affect the driver?
                  - GPU malware. What, if anything, can the driver/kernel/compiler to against these?
                  The IOMMU will actually be in the CPU/NB, not the GPU :



                  Note that current GPUs can access the full system RAM already, and one of the important jobs of the kernel driver is making sure that the GPU only accesses the bits it is supposed to access. If you see "CS checker" mentioned in the driver discussions or patches that's the relevant piece.

                  The cool thing is that future GPUs will be able to use future IOMMUs to manage system RAM accesses rather than having to maintain a parallel implementation using different hardware. As a consequence, the GPU will need to work with virtual addresses rather than the pre-translated physical addresses it uses today. The memory management changes we are hoping to push for the upcoming merge window are a first step in that direction.

                  One of the design challenges is making sure that future GPUs can still work well on hardware which does not have an ATS/PRI-capable IOMMU, and the initial code we are pushing out will be aimed at the more general case ie running on existing CPU/NB hardware without relying on having an IOMMU or ATS/PRI support.
                  Last edited by bridgman; 23 December 2011, 12:59 PM.
                  Test signature

                  Comment


                  • #29
                    Well, the AMD PR so far implied that the card could bypass the cpu and access system ram on its own. It was also mentioned that new frameworks could be used instead of just opencl and dx11 (c++, pointer support etc).

                    I believe the cs checker can only validate what gets sent to the gpu. Could you not write a GPGPU program that calculates a pointer address at runtime (= on the gpu, thus after the CS checker)?

                    Comment


                    • #30
                      Originally posted by curaga View Post
                      Well, the AMD PR so far implied that the card could bypass the cpu and access system ram on its own.
                      True, but that is not new. The new part is...

                      Originally posted by curaga View Post
                      It was also mentioned that new frameworks could be used instead of just opencl and dx11 (c++, pointer support etc).
                      Yep. AFAIK the big change with SI is that shader programs are able to contain and generate addresses, so...

                      Originally posted by curaga View Post
                      I believe the cs checker can only validate what gets sent to the gpu. Could you not write a GPGPU program that calculates a pointer address at runtime (= on the gpu, thus after the CS checker)?
                      Correct, and that's why we had to change the memory management design as a pre-requisite to implementing SI support.

                      On previous GPUs you could more or less control system memory accesses by controlling the commands used to set up the GPU, so the cs checker could do that. Starting with SI, we run all system memory accesses through the on-chip page tables; the page tables protect system memory, and cs checker protects the page tables.

                      The open source drivers have used the page tables for a few years; they're just getting used more now because they're the most practical way to (a) deal with relocations (translating handles to physical addresses) in shader programs and (b) limit shader program access to specific areas of memory. The alternative was to re-patch the shader programs every time a buffer moved and to inspect each shader program to make sure it couldn't go outside its allocated buffers.
                      Last edited by bridgman; 23 December 2011, 06:23 PM.
                      Test signature

                      Comment

                      Working...
                      X