Announcement

**fritsch** · 16 May 2012, 02:06 AM

Perhaps there is a possibility with openCL: http://developer.amd.com/SDKS/AMDAPP...s/default.aspx

We have and had sever Problems while using the xvba API to really speed things up, cause of similar problems, that could be solved if the things you are searching were possible. So good luck and fun with the AMD Linux support :-)

Code:

 uint64_t starttime = CurrentHostCounter();

// decoding
 XVBA_Decode_Picture_Start_Input startInput;
 startInput.size = sizeof(startInput);
 startInput.session = xvba->m_xvbaConfig.xvbaSession;
 startInput.target_surface = render->surface;
 { CSingleLock lock(xvba->m_apiSec);
   if (Success != g_XVBA_vtable.StartDecodePicture(&startInput)) //this needs at least 22ms time on E350
   {
     xvba->SetError(__FUNCTION__, "failed to start decoding", __LINE__);
     return;
   }
 }

 CLog::Log(LOGNOTICE,"--------- time3 :%lld", (CurrentHostCounter()-starttime)/1000000LL);

**russofris** · 16 May 2012, 10:32 AM

Originally posted by fritsch View Post

Perhaps there is a possibility with openCL: http://developer.amd.com/SDKS/AMDAPP...s/default.aspx

It does not appear to be possible through openCL. A few days ago, I was shown a technology demo where certain operations were three to four orders of magnitude faster via direct-memory/hardware-access than through openGL/D3D. A good example was that (re)surfacing was literally 10,000 times faster.

While this may trigger your pavlovian response, note that it does not correlate directly into an increase in FPS. However, it does mean that scene complexity can be increased by a tremendous amount.

I was blown away, which is not something that happens all that often. Four days later, and I'm still unable to reconcile what I saw.

F

**fritsch** · 16 May 2012, 10:58 AM

Hehe. Pawlow - i like his dogs.

If you look at the code snippet, you see that there are a lot memory to (graphics) memory transformations. These all were not needed with the feature you asked for.

**russofris** · 16 May 2012, 11:26 AM

Indeed. While I have only a rough working understanding of how this all works, as a technologist, I am confused by why this is so much trouble. The bottom line of the demo was, "It's all memory", so why can't I just update a texture in graphics memory? Why do I have to update it in main memory, and upload it to the graphic's memory, and worry about state and API overhead, etc...

My initial thought was that doing this would break AA and other features, but it looks like I was wrong.

Fast-resurfacing was only one of the half-dozen jaw dropping things I was shown. I need a few more days of research, but I'm almost convinced that our current 3D APIs are severely deficient, and that something is about to happen in the next 2 years to change this.

F

**bridgman** · 16 May 2012, 11:43 AM

Originally posted by russofris View Post

Indeed. While I have only a rough working understanding of how this all works, as a technologist, I am confused by why this is so much trouble. The bottom line of the demo was, "It's all memory", so why can't I just update a texture in graphics memory? Why do I have to update it in main memory, and upload it to the graphic's memory, and worry about state and API overhead, etc...

It's fair to say "it's all memory" when you're dealing with IGP or having a discrete GPU texture from (slower) system memory, but when you're dealing with a discrete GPU doing texturing from the separate graphics memory that lets *it* run fast you're dealing with memory which is *not* linearly accessible by the CPU without some hoop-jumping.

The BAR aperture for CPU access to memory on a PCI device still seems to be limited to 256MB (not sure why, seems too big for 32-bit and too small for 64-bit) and that aperture is used by a *lot* of activities at the same time so in addition to the API hooks mentioned below you would need additional hooks to map the texture area you want to update into the BAR somewhere.

Originally posted by russofris View Post

My initial thought was that doing this would break AA and other features, but it looks like I was wrong.

That's going to be driver- and hardware-specific but I don't *think* textures are pre-processed before uploading unless requested via API, so agree that probably shouldn't be an issue. There is a bunch of cache- and pipe-flushing that needs to be considered each time though if you want predictable behaviour (remember GPUs have texture caches and really long graphics pipelines, even if the shader core itself has a short pipeline).

Originally posted by russofris View Post

Fast-resurfacing was only one of the half-dozen jaw dropping things I was shown. I need a few more days of research, but I'm almost convinced that our current 3D APIs are severely deficient, and that something is about to happen in the next 2 years to change this.

My assumption was that API definitions would be extended to allow the kind of partial texture and surface updates that Carmack and others have requested. The technology is there, but AFAIK the APIs don't support it today. I believe we have a GL extension which does that but haven't had a chance to look it up - it may just be "partial update from system memory" rather than replacing the entire texture.

**curaga** · 16 May 2012, 01:45 PM

Bridgman, the topic was Trinity

**bridgman** · 16 May 2012, 02:12 PM

Understood, but the issue is not really HW-specific other than whether the GPU uses shared system memory or GPU-attached graphics memory.

In case you are asking "does the chip do some HW magic to let you go around a running API without breaking stuff ?" (eg having GPU graphics-related caches snoop CPU writes) I believe the answer is "no".

**mirv** · 16 May 2012, 04:44 PM

Originally posted by bridgman View Post

My assumption was that API definitions would be extended to allow the kind of partial texture and surface updates that Carmack and others have requested. The technology is there, but AFAIK the APIs don't support it today. I believe we have a GL extension which does that but haven't had a chance to look it up - it may just be "partial update from system memory" rather than replacing the entire texture.

PRT, or whatever name will stick, which is basically just texture atlas systems supported in hardware, I think (so yes, partial update). AMD only that one - and only on the latest cards. But it's a very interesting extension, and I sorely hope that it becomes more standardised. Graphics APIs have been needing virtualised textures for some time now.

@russofris: is http://www.opengl.org/registry/specs...ned_memory.txt what you're after?

**bridgman** · 16 May 2012, 07:11 PM

Originally posted by russofris View Post

Under Trinity, can I update textures directly in memory (store on ptr like on a PS3), or do I have to go through OpenGL? If the latter, is there a method for updating multiple textures in a single pass? Is the memory banking arrangement among the various Trinity based offerings static, or does it vary between models?

With every passing hour I think of another possible way to interpret your question

When you say "have to go through OpenGL" are you using OpenGL for most of the drawing (ie you want to go *around* OpenGL to update textures but then keep drawing with OpenGL and the updated textures) or are you asking if you can program the 3D hardware directly and do cool texture update things without using OpenGL at all ?

If the former, previous answer stands (you can do it but you need some API extensions in OpenGL to deal with things like establishing addressibility and cache flushing to pick up the new contents from memory).

If the latter, the answer is "yes", but you'd probably want to implement some kind of 3D API yourself anyways.

Announcement

Trinity APU memory layout?

Trinity APU memory layout?

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment