Announcement

Collapse
No announcement yet.

Trinity APU memory layout?

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Trinity APU memory layout?

    Hi there,

    Hoping JB might be able to answer.

    Under Trinity, can I update textures directly in memory (store on ptr like on a PS3), or do I have to go through OpenGL? If the latter, is there a method for updating multiple textures in a single pass? Is the memory banking arrangement among the various Trinity based offerings static, or does it vary between models?

    F

  • #2
    Perhaps there is a possibility with openCL: http://developer.amd.com/SDKS/AMDAPP...s/default.aspx

    We have and had sever Problems while using the xvba API to really speed things up, cause of similar problems, that could be solved if the things you are searching were possible. So good luck and fun with the AMD Linux support :-)

    Code:
     uint64_t starttime = CurrentHostCounter();
    
    // decoding
     XVBA_Decode_Picture_Start_Input startInput;
     startInput.size = sizeof(startInput);
     startInput.session = xvba->m_xvbaConfig.xvbaSession;
     startInput.target_surface = render->surface;
     { CSingleLock lock(xvba->m_apiSec);
       if (Success != g_XVBA_vtable.StartDecodePicture(&startInput)) //this needs at least 22ms time on E350
       {
         xvba->SetError(__FUNCTION__, "failed to start decoding", __LINE__);
         return;
       }
     }
    
     CLog::Log(LOGNOTICE,"--------- time3 :%lld", (CurrentHostCounter()-starttime)/1000000LL);
    Last edited by fritsch; 05-16-2012, 02:08 AM. Reason: add code

    Comment


    • #3
      Originally posted by fritsch View Post
      Perhaps there is a possibility with openCL: http://developer.amd.com/SDKS/AMDAPP...s/default.aspx
      It does not appear to be possible through openCL. A few days ago, I was shown a technology demo where certain operations were three to four orders of magnitude faster via direct-memory/hardware-access than through openGL/D3D. A good example was that (re)surfacing was literally 10,000 times faster.

      While this may trigger your pavlovian response, note that it does not correlate directly into an increase in FPS. However, it does mean that scene complexity can be increased by a tremendous amount.

      I was blown away, which is not something that happens all that often. Four days later, and I'm still unable to reconcile what I saw.

      F

      Comment


      • #4
        Hehe. Pawlow - i like his dogs.

        If you look at the code snippet, you see that there are a lot memory to (graphics) memory transformations. These all were not needed with the feature you asked for.

        Comment


        • #5
          Indeed. While I have only a rough working understanding of how this all works, as a technologist, I am confused by why this is so much trouble. The bottom line of the demo was, "It's all memory", so why can't I just update a texture in graphics memory? Why do I have to update it in main memory, and upload it to the graphic's memory, and worry about state and API overhead, etc...

          My initial thought was that doing this would break AA and other features, but it looks like I was wrong.

          Fast-resurfacing was only one of the half-dozen jaw dropping things I was shown. I need a few more days of research, but I'm almost convinced that our current 3D APIs are severely deficient, and that something is about to happen in the next 2 years to change this.

          F

          Comment


          • #6
            Originally posted by russofris View Post
            Indeed. While I have only a rough working understanding of how this all works, as a technologist, I am confused by why this is so much trouble. The bottom line of the demo was, "It's all memory", so why can't I just update a texture in graphics memory? Why do I have to update it in main memory, and upload it to the graphic's memory, and worry about state and API overhead, etc...
            It's fair to say "it's all memory" when you're dealing with IGP or having a discrete GPU texture from (slower) system memory, but when you're dealing with a discrete GPU doing texturing from the separate graphics memory that lets *it* run fast you're dealing with memory which is *not* linearly accessible by the CPU without some hoop-jumping.

            The BAR aperture for CPU access to memory on a PCI device still seems to be limited to 256MB (not sure why, seems too big for 32-bit and too small for 64-bit) and that aperture is used by a *lot* of activities at the same time so in addition to the API hooks mentioned below you would need additional hooks to map the texture area you want to update into the BAR somewhere.

            Originally posted by russofris View Post
            My initial thought was that doing this would break AA and other features, but it looks like I was wrong.
            That's going to be driver- and hardware-specific but I don't *think* textures are pre-processed before uploading unless requested via API, so agree that probably shouldn't be an issue. There is a bunch of cache- and pipe-flushing that needs to be considered each time though if you want predictable behaviour (remember GPUs have texture caches and really long graphics pipelines, even if the shader core itself has a short pipeline).

            Originally posted by russofris View Post
            Fast-resurfacing was only one of the half-dozen jaw dropping things I was shown. I need a few more days of research, but I'm almost convinced that our current 3D APIs are severely deficient, and that something is about to happen in the next 2 years to change this.
            My assumption was that API definitions would be extended to allow the kind of partial texture and surface updates that Carmack and others have requested. The technology is there, but AFAIK the APIs don't support it today. I believe we have a GL extension which does that but haven't had a chance to look it up - it may just be "partial update from system memory" rather than replacing the entire texture.
            Last edited by bridgman; 05-16-2012, 11:51 AM.

            Comment


            • #7
              Bridgman, the topic was Trinity

              Comment


              • #8
                Understood, but the issue is not really HW-specific other than whether the GPU uses shared system memory or GPU-attached graphics memory.

                In case you are asking "does the chip do some HW magic to let you go around a running API without breaking stuff ?" (eg having GPU graphics-related caches snoop CPU writes) I believe the answer is "no".
                Last edited by bridgman; 05-16-2012, 02:27 PM.

                Comment


                • #9
                  Originally posted by bridgman View Post
                  My assumption was that API definitions would be extended to allow the kind of partial texture and surface updates that Carmack and others have requested. The technology is there, but AFAIK the APIs don't support it today. I believe we have a GL extension which does that but haven't had a chance to look it up - it may just be "partial update from system memory" rather than replacing the entire texture.
                  PRT, or whatever name will stick, which is basically just texture atlas systems supported in hardware, I think (so yes, partial update). AMD only that one - and only on the latest cards. But it's a very interesting extension, and I sorely hope that it becomes more standardised. Graphics APIs have been needing virtualised textures for some time now.

                  @russofris: is http://www.opengl.org/registry/specs...ned_memory.txt what you're after?

                  Comment


                  • #10
                    Originally posted by russofris View Post
                    Under Trinity, can I update textures directly in memory (store on ptr like on a PS3), or do I have to go through OpenGL? If the latter, is there a method for updating multiple textures in a single pass? Is the memory banking arrangement among the various Trinity based offerings static, or does it vary between models?
                    With every passing hour I think of another possible way to interpret your question

                    When you say "have to go through OpenGL" are you using OpenGL for most of the drawing (ie you want to go *around* OpenGL to update textures but then keep drawing with OpenGL and the updated textures) or are you asking if you can program the 3D hardware directly and do cool texture update things without using OpenGL at all ?

                    If the former, previous answer stands (you can do it but you need some API extensions in OpenGL to deal with things like establishing addressibility and cache flushing to pick up the new contents from memory).

                    If the latter, the answer is "yes", but you'd probably want to implement some kind of 3D API yourself anyways.

                    Comment


                    • #11
                      Originally posted by bridgman View Post
                      With every passing hour I think of another possible way to interpret your question

                      When you say "have to go through OpenGL" are you using OpenGL for most of the drawing (ie you want to go *around* OpenGL to update textures but then keep drawing with OpenGL and the updated textures) or are you asking if you can program the 3D hardware directly and do cool texture update things without using OpenGL at all ?

                      If the former, previous answer stands (you can do it but you need some API extensions in OpenGL to deal with things like establishing addressibility and cache flushing to pick up the new contents from memory).

                      If the latter, the answer is "yes", but you'd probably want to implement some kind of 3D API yourself anyways.
                      I geuss the real question is, how much abstraction can be removed from every level of the stack to increase performance without burdening developers. The old adage, clock cycles are cheaper then developer man hours.

                      I know D#D in its many incarnations has started to become very problematic performance wise. I often ponder the wisdom of gallium3d doing so many code transformations etc. I geuss what needs to come out, is a way to get as close to the metal as can be achieved, which in my mind would mean making a kernel/compiler right into the hardware "as a hardware feature" and make it intellegent enough that drivers would become nearly obsolete.

                      buts thats just a pipe dream anyways

                      Comment


                      • #12
                        Originally posted by bridgman View Post
                        With every passing hour I think of another possible way to interpret your question
                        Indeed.

                        I just typed four different responses and after each iteration, was less and less certain of what I saw and what I'm actually asking. I'm fairly confident that what was presented was the updating of surfaces via the modification of the discrete memory of the GPU.

                        The "AMD_pinned_memory" extension looks close to what I saw, with the exception that the modifications are made to a page on system memory, and not discrete memory. If your extension coincidently results in a four-orders-of-magniture increase in resurfacing performance, then it is entirely possible that this is what I saw and the presenter misspoke or was unclear. If this is the case, you need to call marketing immediately, as this is ground-breakingly awesome.

                        I'll see if I can fire off an e-mail to the presenter. I also need to ping corp-legal to see if there's an NDA between the two companies which I somehow inherit by virtue of my employment arrangement.

                        F
                        Last edited by russofris; 05-16-2012, 10:36 PM.

                        Comment


                        • #13
                          Other thoughts.

                          Re: http://www.opengl.org/registry/specs...ned_memory.txt

                          This really needs a touchup by an english speaking tech writer. Trivial fixes for stuff like "As an example, consider the following example".

                          Even though the spec deals with (slower) main memory, the implications are really neat, and I'm surprised AMD hasn't released one of their half-naked-3D-woman tech demos to feature it. This is at least as big for textures/surfaces as tessellation is for polygons.

                          F

                          Comment


                          • #14
                            http://developer.amd.com/afds/assets...2114_final.pdf

                            Slide 22-24 also look related, and neat.

                            Also looked at

                            http://developer.amd.com/afds/assets...1004_final.pdf
                            http://developer.amd.com/afds/assets...2117_final.pdf
                            Last edited by russofris; 05-17-2012, 12:18 AM.

                            Comment


                            • #15
                              Originally posted by russofris View Post
                              Other thoughts.

                              Re: http://www.opengl.org/registry/specs...ned_memory.txt

                              This really needs a touchup by an english speaking tech writer. Trivial fixes for stuff like "As an example, consider the following example".

                              Even though the spec deals with (slower) main memory, the implications are really neat, and I'm surprised AMD hasn't released one of their half-naked-3D-woman tech demos to feature it. This is at least as big for textures/surfaces as tessellation is for polygons.

                              F
                              All depends on where the bottleneck is. If the operations of updating texture information over the PCIe bus is the bottleneck, then it's likely that pinned memory will really help out. I would actually like to test some of my own "megatexture" type stuff with that - 32k*32k images updated on the fly with persistent effects (weather and vegetation patterning across terrain), but it's a little far down on the list right now.

                              Comment

                              Working...
                              X