Announcement

**oiaohm** · 06 March 2018, 10:51 AM

Originally posted by iive View Post

GLSL will be converted to NIR that will go to TGSI and sent to LLVM for GCN compilation.
So it will use the same benefits from the shader compilation improvements.

That is being reordered so TSGI is sitting on top of NIR in the same location of SPIR-V. This makes at 4.6 opengl the shader processing path between galluim nine and wined3d can come close to the same.

Originally posted by iive View Post

All functions are real. But yeah, their implementation might involve multiple steps performed by the driver, instead of a single GPU command.
That's also a point in my argument, why using High Level API is trouble.

I am not saying High Level API is not trouble at times. The issue is those multiple steps that don't always make any sense and they can have very nasty effects on your ability to thread.

Originally posted by iive View Post

Looking at the description, this doesn't seem to be the case.
Both functions are identical.

They are not identical in operational function requirement.

https://www.khronos.org/registry/Ope...ate_access.txt
Even that the API documentation on khronos will lead you to a false point of view. You need to read the specifications for each opengl extension and find the extension the function comes from and read the operational requirements.

The difference between glMapBufferRange and glMapNamedBufferRange. Turns out that glMapNamedBufferRange is from the ARB_direct_state_access of 4.5.

Due to glMapBufferRange being from opengl 3.0 Its the horrible using the old opengl `bind-to-edit'. So using glMapBufferRange before modification object has to be bound to a opengl context and has to be checked that its bound to a opengl context(and not bound to multi contexts hello thread lock).

When you see Named in the new 4.5 function names this is saying direct state access function none of this old bind-to-edit requirement just check the buffer if I can get operational lock on that go for it..

Other than the fact that glMapBufferRange could lock multi parts like the context while attempting to get approve and go a gpu sync due to bind-to-edit there is no other difference to glMapNamedBufferRange function. Of course glMapBufferRange is going to eat more cpu time and have higher latency than glMapNamedBufferRange if both functions are implemented to extension specifications.

ARB_direct_state_access for 4.5 opengl adds a lot of functions most with the word Named in that can be swapped one for one for functions without the Named in name removing heck load of locking. Yes ARB_direct_state_access over 50 new functions in one hit.

Yes current wined3d uses glMapBufferRange in all cases no use glMapNamedBufferRange even if it available. So there is a lot of low hanging fruit for optimisation like this one. But a lot of it requires pushing opengl version up.

Originally posted by iive View Post

GPU sync is unrelated to CPU multi thread support.

Not when you are dealing with evil stuff that is checking does this own to a single context on the GPU. So checking for locks between structures being done in the GPU is the worst place possible not only are you stalling the cpu while waiting on confirmation you are also stalling the GPU. This is why those writing and wanting performance on older opengl care so much about order.

QUOTE=iive;n1011416]Once again, your arguments are making the point I've been trying to make, that Gallium Nine is better approach than OpenGL.
It also explains why it was able to achieve consistently better results in much shorter time and with less effort.[/QUOTE]

But Gallium Nine has not address the problem of how to use FBO from opengl in direct x or how to use direct x buffers in opengl. Galluim nine has avoided lot of the multi version DX problems and DX with opengl problems. Those are hard problems in their own rights.

When you understand the effects of opengl "bind to edit" feature on ability to thread of course being able to sit on Gallium and avoid that gave you a huge speed advantage and still gives you a huge speed advantage due to wined3d still not using the faster newer paths. Some of the reason is that people see that in khronos API documentation that they newer functions read identical to the old ones so totally not aware there is a performance difference of quite a few orders of magnitude and you work that out when you read the extension where the function was declared .

Bind to edit also causes lot higher cpu usage when you cannot order you operations how opengl likes due to having binding conflicts.

I am really not sure if Gallium Nine is a better approach if you had wined3d using newer better performing opengl functions that are arguement identical to the old ones. I do know that the performance difference between Gallium Nine and wined3d should not be as large as it currently is if we can get to using 4.3-4.6 opengl functions where possible. Yes this makes OS X a nightmare sticking on 4.1 that means you have the "bind to edit" crap.

The "bind to edit" is at least in the top 10 of the worst design mistakes of early opengl. In that top 10 was not being thread safe as well.

This is one of the things I find highly stupid. Lower down stuff like Galluim added a lot of threading then the opengl layer on top using "bind to edit" and other evil like it effectively made that lower threading work not accessible to opengl programs these evils only start being properly addressed in the 2012 and latter versions of opengl.

Vulkan starts off with multi thread and min locking in the base design. But it still leaves us with the problem we need to support opengl dx hybrid applications.

**iive** · 07 March 2018, 04:33 PM

Originally posted by oiaohm View Post

That is being reordered so TSGI is sitting on top of NIR in the same location of SPIR-V. This makes at 4.6 opengl the shader processing path between galluim nine and wined3d can come close to the same.

If there is one thing that is plenty in Mesa3D, it is shader intermediate representation (IR) formats. There are actually a few different flavors of NIR and LLVM also uses its own IR. There are also a lot of conversions between the formats.

Originally posted by oiaohm View Post

I am not saying High Level API is not trouble at times. The issue is those multiple steps that don't always make any sense and they can have very nasty effects on your ability to thread.

They are not identical in operational function requirement.

https://www.khronos.org/registry/Ope...ate_access.txt
Even that the API documentation on khronos will lead you to a false point of view. You need to read the specifications for each opengl extension and find the extension the function comes from and read the operational requirements.

The difference between glMapBufferRange and glMapNamedBufferRange. Turns out that glMapNamedBufferRange is from the ARB_direct_state_access of 4.5.

Due to glMapBufferRange being from opengl 3.0 Its the horrible using the old opengl `bind-to-edit'. So using glMapBufferRange before modification object has to be bound to a opengl context and has to be checked that its bound to a opengl context(and not bound to multi contexts hello thread lock).

When you see Named in the new 4.5 function names this is saying direct state access function none of this old bind-to-edit requirement just check the buffer if I can get operational lock on that go for it..

Other than the fact that glMapBufferRange could lock multi parts like the context while attempting to get approve and go a gpu sync due to bind-to-edit there is no other difference to glMapNamedBufferRange function. Of course glMapBufferRange is going to eat more cpu time and have higher latency than glMapNamedBufferRange if both functions are implemented to extension specifications.

ARB_direct_state_access for 4.5 opengl adds a lot of functions most with the word Named in that can be swapped one for one for functions without the Named in name removing heck load of locking. Yes ARB_direct_state_access over 50 new functions in one hit.

Yes current wined3d uses glMapBufferRange in all cases no use glMapNamedBufferRange even if it available. So there is a lot of low hanging fruit for optimisation like this one. But a lot of it requires pushing opengl version up.

Not when you are dealing with evil stuff that is checking does this own to a single context on the GPU. So checking for locks between structures being done in the GPU is the worst place possible not only are you stalling the cpu while waiting on confirmation you are also stalling the GPU. This is why those writing and wanting performance on older opengl care so much about order.

But Gallium Nine has not address the problem of how to use FBO from opengl in direct x or how to use direct x buffers in opengl. Galluim nine has avoided lot of the multi version DX problems and DX with opengl problems. Those are hard problems in their own rights.

When you understand the effects of opengl "bind to edit" feature on ability to thread of course being able to sit on Gallium and avoid that gave you a huge speed advantage and still gives you a huge speed advantage due to wined3d still not using the faster newer paths. Some of the reason is that people see that in khronos API documentation that they newer functions read identical to the old ones so totally not aware there is a performance difference of quite a few orders of magnitude and you work that out when you read the extension where the function was declared .

Bind to edit also causes lot higher cpu usage when you cannot order you operations how opengl likes due to having binding conflicts.

I am really not sure if Gallium Nine is a better approach if you had wined3d using newer better performing opengl functions that are arguement identical to the old ones. I do know that the performance difference between Gallium Nine and wined3d should not be as large as it currently is if we can get to using 4.3-4.6 opengl functions where possible. Yes this makes OS X a nightmare sticking on 4.1 that means you have the "bind to edit" crap.

The "bind to edit" is at least in the top 10 of the worst design mistakes of early opengl. In that top 10 was not being thread safe as well.

This is one of the things I find highly stupid. Lower down stuff like Galluim added a lot of threading then the opengl layer on top using "bind to edit" and other evil like it effectively made that lower threading work not accessible to opengl programs these evils only start being properly addressed in the 2012 and latter versions of opengl.

Vulkan starts off with multi thread and min locking in the base design. But it still leaves us with the problem we need to support opengl dx hybrid applications.

If you think that using out-of-bind functions are really a low hanging fruit, then please, implement them as soon as possible.
I checked, Mesa3D already provides that extensions. But even if it is missing then Wine could just fall-back to the old functions.

There is however one small problem with that. You see, there is no "Named" Draw function. Drawing is done using the "global" state, so you need to bind the buffers that you are going to use. This means that while the buffer editing function itself might be faster and avoid sync, these could still happen in the bind before the draw. And since the order is "update, draw; update, draw" there might be no benefit at all.

Well, one way to improve that is to reorder all non-overlapping buffer edits and do them together before start drawing "update, update ; draw, draw".
However instead of doing multiple named buffer edits, you can combine the non-overlapping changes of all updates into one big update and use the old function just once. "update+update; draw, draw".

---
Please, stop making up stuff about what Gallium Nine cannot do.

I took a look of the sample code using Gallium Nine natively under Linux. It has SDL port, X11 port and EGL port. So I don't think there would be any problem using it with OpenGL.

As to the shared surfaces between DX10/11. I'm still waiting for the list of games that do this.
As I've said before, this case is so rare, we can't find a working test case.

**oiaohm** · 07 March 2018, 10:57 PM

Originally posted by iive View Post

If there is one thing that is plenty in Mesa3D, it is shader intermediate representation (IR) formats. There are actually a few different flavors of NIR and LLVM also uses its own IR. There are also a lot of conversions between the formats.

There is a fixed layout in number of steps between you and the GPU.

On video drivers supporting NIR.
GLSL IR, Spir-v and TGSI all convert to NIR then it is NIR to native graphics card IR.

Please note GLSL IR is a step after GLSL so wined3d using GLSL has extra steps to get to the graphics card.

Originally posted by iive View Post

There is however one small problem with that. You see, there is no "Named" Draw function. Drawing is done using the "global" state, so you need to bind the buffers that you are going to use. This means that while the buffer editing function itself might be faster and avoid sync, these could still happen in the bind before the draw. And since the order is "update, draw; update, draw" there might be no benefit at all.

The bad news is DX9 is not following this pattern of update, draw, update, draw.
https://msdn.microsoft.com/en-us/lib..._Index_Buffers

DX9 in fact uses a individual buffer state. Yes Drawing is is kind of global but when buffer is transferred to GPU it can be unlocked for modification before the drawing processes are complete. This is why wine PBA sees a performance boost. Yes current wine PBA is done wrong it does not have the clean ups so end up consuming all memory but it confirms without question the performance boost is there. You have buffer transfers for next drawing cycle happening while drawing is happening with DX9 this is only possible in opengl using the Named functions.

Next is drawing global state its not in fact true if you are stick to opengl 2014 and later covered in the bellow presentation.
https://www.slideshare.net/CassEveri...river-overhead
Because you have Multi-draw indirect GL_ARB_multi_draw_indirect things change a lot the queing of drawing instructions direct x 9 does with opengl 4.3 is possible in opengl.

Fairly much everything you are told to avoid in the 2014 presentation wined3d uses. So wined3d getting to 50 percent of native seams insane with how far its performance is crippled. This means wined3d cpu overhead is at least 5x what it should be at worst 30x what it should be. Galluim nine is not beating wined3d by that much.

Also read the end you will notice something dx11 is less than 1/5 of the performance of the best opengl method but its beats the horrible methods that wined3d is currently using.

Originally posted by iive View Post

Please, stop making up stuff about what Gallium Nine cannot do.

I took a look of the sample code using Gallium Nine natively under Linux. It has SDL port, X11 port and EGL port. So I don't think there would be any problem using it with OpenGL..

So no formal specification of interface just a sample hack up you are free to break in future. There are a set of opengl extensions documented by Nvidia todo it and was a requirement from Microsoft for drivers to have them at one point in time.

GitHub - halogenica/WGL_NV_DX: Initial checkin of WGL_NV_DX demo

https://github.com/halogenica/WGL_NV_DX

Initial checkin of WGL_NV_DX demo. Contribute to halogenica/WGL_NV_DX development by creating an account on GitHub.

The interesting part is that this extension is disappearing out of the newer drivers. Yet there are a lot of legacy internal business programs out there that depend on them.

Originally posted by iive View Post

As to the shared surfaces between DX10/11. I'm still waiting for the list of games that do this.

Maybe the problem should be in your face. You are ask me for a list of games. I have mention MS Office with extensions a few times. Most of the items that have this problem are not a games. There are a few games where the intro videos playback are done by DX9/10 but the game engine is DX11 so there is buffer sharing going on.
https://docs.microsoft.com/en-us/pre...607501(v=vs.85)

By the way you don't need a list of games. Its part of the Microsoft driver certification process to test that sharing between versions of DX in fact work. The sharing between opengl and dx9/10 was in fact in the Microsoft windows 7 conformance suite at one point in time this is why every graphics driver at that point in time has it so that OEM could buy their cards and stick the windows certified hardware sticker on.

Its something I have not really understood is why if you have built a driver with galluim nine can I not see in glxinfo WGL_NV_DX so that the hybrid opengl/dx programs that exist on windows could be source ported with min alteration.

**iive** · 08 March 2018, 08:43 PM

Originally posted by oiaohm View Post

There is a fixed layout in number of steps between you and the GPU.

On video drivers supporting NIR.
GLSL IR, Spir-v and TGSI all convert to NIR then it is NIR to native graphics card IR.

Please note GLSL IR is a step after GLSL so wined3d using GLSL has extra steps to get to the graphics card.

There is no such thing as native graphics card intermediate representation. This is kind of oxymoron.

The internal workings of the driver should be of no concern, since they are supposed to do conversions without changing the meaning of the code.

Originally posted by oiaohm View Post

The bad news is DX9 is not following this pattern of update, draw, update, draw.
https://msdn.microsoft.com/en-us/lib..._Index_Buffers

DX9 in fact uses a individual buffer state. Yes Drawing is is kind of global but when buffer is transferred to GPU it can be unlocked for modification before the drawing processes are complete. This is why wine PBA sees a performance boost. Yes current wine PBA is done wrong it does not have the clean ups so end up consuming all memory but it confirms without question the performance boost is there. You have buffer transfers for next drawing cycle happening while drawing is happening with DX9 this is only possible in opengl using the Named functions.

Your claim is factually wrong and the link you provide does not support it.

The link just shows that you can have static buffers (e.g. where you store the 3D mesh of the drawing object) and you can have dynamic buffers (where you store the morphing points, used by the vertex shaders to warp the mesh vertexes).

It would be quite boring game if nothing in the 3D scene ever changes.

The PBA article explains nicely what is the recommended D3D9 model. It's also what every existing game does.

Originally posted by oiaohm View Post

Next is drawing global state its not in fact true if you are stick to opengl 2014 and later covered in the bellow presentation.
https://www.slideshare.net/CassEveri...river-overhead
Because you have Multi-draw indirect GL_ARB_multi_draw_indirect things change a lot the queing of drawing instructions direct x 9 does with opengl 4.3 is possible in opengl.

The multi-draw indirect is just combining multiple draw calls into one, the parameters for these draws are stored in buffer array. You still need to reorder the updates and draws, the only difference is that instead of multiple draw calls you'll do one. aka "update+update; draw+draw".

Once again,
your claim is factually wrong and the link you provide does not support it.

It does however give interesting test program and nice example of how to use ARB_buffer_storage.
So we are back to square 1.

Originally posted by oiaohm View Post

Fairly much everything you are told to avoid in the 2014 presentation wined3d uses. So wined3d getting to 50 percent of native seams insane with how far its performance is crippled. This means wined3d cpu overhead is at least 5x what it should be at worst 30x what it should be. Galluim nine is not beating wined3d by that much.

Also read the end you will notice something dx11 is less than 1/5 of the performance of the best opengl method but its beats the horrible methods that wined3d is currently using.

Well, too bad you just discovered this.

If only wine experts had discovered this presentation 4 years ago, they might have known how to properly implement wined3d.

Originally posted by oiaohm View Post

So no formal specification of interface just a sample hack up you are free to break in future. There are a set of opengl extensions documented by Nvidia todo it and was a requirement from Microsoft for drivers to have them at one point in time.

GitHub - halogenica/WGL_NV_DX: Initial checkin of WGL_NV_DX demo

https://github.com/halogenica/WGL_NV_DX

Initial checkin of WGL_NV_DX demo. Contribute to halogenica/WGL_NV_DX development by creating an account on GitHub.

The interesting part is that this extension is disappearing out of the newer drivers. Yet there are a lot of legacy internal business programs out there that depend on them.

Mesa3D, Gallium and Nine are Open Source. There is no problem in implementing extensions to do this kind of things. If somebody ever wants it.

Originally posted by oiaohm View Post

Maybe the problem should be in your face. You are ask me for a list of games. I have mention MS Office with extensions a few times. Most of the items that have this problem are not a games. There are a few games where the intro videos playback are done by DX9/10 but the game engine is DX11 so there is buffer sharing going on.
https://docs.microsoft.com/en-us/pre...607501(v=vs.85)

By the way you don't need a list of games. Its part of the Microsoft driver certification process to test that sharing between versions of DX in fact work. The sharing between opengl and dx9/10 was in fact in the Microsoft windows 7 conformance suite at one point in time this is why every graphics driver at that point in time has it so that OEM could buy their cards and stick the windows certified hardware sticker on.

Its something I have not really understood is why if you have built a driver with galluim nine can I not see in glxinfo WGL_NV_DX so that the hybrid opengl/dx programs that exist on windows could be source ported with min alteration.

You are the one making a big fuss about these obscure corner cases.
We are simply having problem finding a real test case for them.

Are you sure this Windows Logo Kit works at all under Wine? I can't find it in AppDB, do you have any hints for installing and using it? Should I install Mono or use winetricks for the DotNet?
So far it looks like it needs 64 bit prefix to install at all, while there are 32 and 64 test archives.

**oiaohm** · 09 March 2018, 05:38 AM

Originally posted by iive View Post

There is no such thing as native graphics card intermediate representation. This is kind of oxymoron.

The internal workings of the driver should be of no concern, since they are supposed to do conversions without changing the meaning of the code.

On intel there native graphics card intermediate representation because there s a final transformation when its allocated to the shaded engine where more address become fixed values instead of relocatable ones. Its not a oxymoron is not knowing the stack. So its NIR to native graphics IR to non relocatable native graphics byte-code. Ok I will give you that the native graphics IR can be avoided in some cards. Other not so much. Some the non relocatable include what section of the gpu shader system the code is running in so the final bytecode will like run on Execution Unit 1 because that what is non relocatable form to run on different Execution Unit it has to be relocated as well at times. Execution Unit is what intel calls it but other vendors give it different names.

So there is a native graphics card intermediate representation in quite a few brands of cards.

Internal workings of the driver is a worry with some GPUs when you are worried about cpu usage if they offload the final transformation back to cpu due to excessive complexity. To understand why at times you cpu usage spikes to hell using particular shaders on particular brands of gpu in android and the like requires understand what is happening at each of the transformation points and how you can run yourself straight into hell. It really bad to have the gpu with accelerated native graphics IR processing bouncing native graphics IR back to cpu because it too complex accelerated processing so forcing the cpu todo final transform native graphics non-relocatable bytecode before resending back across to gpu this chews up you cpu time and your very important bus transfer bandwidth.

Basically not changing the meaning of the code does not mean it will not totally cripple your performance. Every time you pass though a transformation that can end up in CPU its a point where stuff can stall.

Originally posted by iive View Post

The PBA article explains nicely what is the recommended D3D9 model. It's also what every existing game does.

Yes and without using newer opengl than 4.2 you cannot do the D3D9 model correctly. Even PBA is using extensions newer than 4.2.

Originally posted by iive View Post

The multi-draw indirect is just combining multiple draw calls into one, the parameters for these draws are stored in buffer array. You still need to reorder the updates and draws, the only difference is that instead of multiple draw calls you'll do one. aka "update+update; draw+draw".

Stored in buffers means you can to interrelationship between buffers so you transfer orders from mmu to GPU are no longer fixed. So the commands can be transferred to GPU before the buffers they work on are delivered. So draw commands transfer then update then execute this inverted order of transfers can improve bus transfer speed usage effectiveness.

So multi-draw indirect is not just bundling draw commands it alters when they can be sent to GPU and this is important change. Interesting enough sending draw commands first and updating buffers second works out faster because the multi draw indirect executes as soon as it requirements are on the card.

So update draw is backwards to optimal method for a lot of gpus. Optimal being draw update. Of course execution on the card end up being done update draw but it reduces trips between cpu and gpu also reduced latency by remove the trips using the inverted order and have gpu execute the draw commands when everything they require is ready without the cpu having to say do it now.

Indirect draw the indirect means you are not in 100 control of when draw executed this is in fact a good thing.

Yes the change of cards support the inverted order is something Microsoft direct x 12 only now supports. So following the Microsoft documentation leads you down the slow path of update draw. Of course old opengl also does update draw its one of the 10 mistakes that ends up not making effective usage of GPU due to requiring extra messages between the gpu/cpu to know when gpu is ready to process draw commands instead of letting the GPU just do it when everything is ready..

So it would pay for galluim nine developers to go though that 2014 paper very careful taken careful note of the way that are many times faster than the DX methods and understand how those are being done. Then you might see galluim nine being twice as fast as native.

Originally posted by iive View Post

Well, too bad you just discovered this.

If only wine experts had discovered this presentation 4 years ago, they might have known how to properly implement wined3d.

The in fact had wine developer had discovered some of it early in 2012. Its why galluim nine perform claims fall on deaf ears because they knew that the performance difference was not big enough for what was possible. This is also where the claim optimising for Nvidia is 100 percent wrong. The issue is there anything of the 2014 presentation you can implement on native Mac OS opengl the answer is absolutely nothing because its barely 4.2 and everything to get performance is 4.3 or latter opengl.

The issue why it has not been done sooner is what customers are paying the bills. The reality is the claim you need a lower graphics api access for performance has been false. You need a functional graphics API with all the features and being stuck to using opengl 4.2 and older means full performance is not possible. You also have to produce what you paying customers want to keep the servers running and programmers employed.

Its also when you know this stuff and they are claiming 20% faster with Vulkan on Mac OS Metal compared to Mac OS opengl you know that this is still crippled performance because full current opengl should be way faster than that let alone fully functional Vulkan.

Originally posted by iive View Post

Are you sure this Windows Logo Kit works at all under Wine? I can't find it in AppDB, do you have any hints for installing and using it? Should I install Mono or use winetricks for the DotNet?
So far it looks like it needs 64 bit prefix to install at all, while there are 32 and 64 test archives.

You need to read the license you are not allowed to use Windows Logo Kit testing on any other OS other than Windows by license why its not in the appdb. But it does give you when particular features are expected to exist by applications. Remember I am support not developer so I can go play with items that developers legally cannot touch. There are reasons why you need support people as well a developer people when doing these things that get you close to DCMA issues and I am also in a country where saying a item can only be used on a particular OS is not legal if it provided independent of the OS. Just because I mention something does not mean you should play with it.

**iive** · 14 March 2018, 07:35 PM

You write something wrong. I refute it.

You do not correct yourself, you double down and write two times longer text contains two times more non-sense claims that are two times more absurd. Sometimes providing links that doesn't support what you say.
I refute them, point by point.

Rinse and Repeat.

Please, stop making stuff up. You are humiliating not only yourself, but also the company you are working for.

Originally posted by oiaohm View Post

On intel there native graphics card intermediate representation because there s a final transformation when its allocated to the shaded engine where more address become fixed values instead of relocatable ones. Its not a oxymoron is not knowing the stack. So its NIR to native graphics IR to non relocatable native graphics byte-code. Ok I will give you that the native graphics IR can be avoided in some cards. Other not so much. Some the non relocatable include what section of the gpu shader system the code is running in so the final bytecode will like run on Execution Unit 1 because that what is non relocatable form to run on different Execution Unit it has to be relocated as well at times. Execution Unit is what intel calls it but other vendors give it different names.

So there is a native graphics card intermediate representation in quite a few brands of cards.

All .EXE and ELF binaries support relocations, where the binary loader fills in addresses right before execution. It doesn't make the code any less native.

The term "intermediate representation" is specifically picked to distinguish it from the native code, assembler or binary. It is used to describe code that is machine independent.

Making mistake ones is OK. But then trying to smear, blur and confuse just to cover up your blunder...

Originally posted by oiaohm View Post

Internal workings of the driver is a worry with some GPUs when you are worried about cpu usage if they offload the final transformation back to cpu due to excessive complexity. To understand why at times you cpu usage spikes to hell using particular shaders on particular brands of gpu in android and the like requires understand what is happening at each of the transformation points and how you can run yourself straight into hell. It really bad to have the gpu with accelerated native graphics IR processing bouncing native graphics IR back to cpu because it too complex accelerated processing so forcing the cpu todo final transform native graphics non-relocatable bytecode before resending back across to gpu this chews up you cpu time and your very important bus transfer bandwidth.

Basically not changing the meaning of the code does not mean it will not totally cripple your performance. Every time you pass though a transformation that can end up in CPU its a point where stuff can stall.

Thanks to your mismatch of wrong terms your writing above doesn't make any sense.
It also doesn't seem to be related to Mesa3D. And is completely off-topic and I don't care.

Originally posted by oiaohm View Post

Yes and without using newer opengl than 4.2 you cannot do the D3D9 model correctly. Even PBA is using extensions newer than 4.2.

Stored in buffers means you can to interrelationship between buffers so you transfer orders from mmu to GPU are no longer fixed. So the commands can be transferred to GPU before the buffers they work on are delivered. So draw commands transfer then update then execute this inverted order of transfers can improve bus transfer speed usage effectiveness.

So multi-draw indirect is not just bundling draw commands it alters when they can be sent to GPU and this is important change. Interesting enough sending draw commands first and updating buffers second works out faster because the multi draw indirect executes as soon as it requirements are on the card.

So update draw is backwards to optimal method for a lot of gpus. Optimal being draw update. Of course execution on the card end up being done update draw but it reduces trips between cpu and gpu also reduced latency by remove the trips using the inverted order and have gpu execute the draw commands when everything they require is ready without the cpu having to say do it now.

Indirect draw the indirect means you are not in 100 control of when draw executed this is in fact a good thing.

Yes the change of cards support the inverted order is something Microsoft direct x 12 only now supports. So following the Microsoft documentation leads you down the slow path of update draw. Of course old opengl also does update draw its one of the 10 mistakes that ends up not making effective usage of GPU due to requiring extra messages between the gpu/cpu to know when gpu is ready to process draw commands instead of letting the GPU just do it when everything is ready..

So it would pay for galluim nine developers to go though that 2014 paper very careful taken careful note of the way that are many times faster than the DX methods and understand how those are being done. Then you might see galluim nine being twice as fast as native.

As other people have already said, Gallium Nine already manages to achieve GPU utilization close to 100%. Good luck improving that.

There is no such thing as inverted order, at least, not in the linked presentation article. (It's also not term google can connect to DX12.) It also doesn't make sense from engineering point of view, because it is a lot more complicated.

The indirect draws also don't do what you say they do. Actually everything is just the opposite of what you say.

The indirect draw does not take argument from the calling function, it takes all parameters from a buffer object. (This time, it's just a single element instead of array.)
It means that you have to fill that buffer before you do the draw (or make sure that the value would be filled there by some previous operation).

The draw itself could happen right away (aka if synced) or at some point much later (before presentation). This is why if you use ARB_buffer_storage, you have to keep it for a much longer time. The example in the linked article basically separates the buffer to 3 parts, to be used with triple buffering (3 presentation framebuffers).

Now usually the draw command is queued in CS and not executed right away, so issuing a draw and then filling the buffer data may seem to work for you. However it might not work with another driver. It might also break under debug mode that forces synchronization.

Originally posted by oiaohm View Post

The in fact had wine developer had discovered some of it early in 2012. Its why galluim nine perform claims fall on deaf ears because they knew that the performance difference was not big enough for what was possible. This is also where the claim optimising for Nvidia is 100 percent wrong. The issue is there anything of the 2014 presentation you can implement on native Mac OS opengl the answer is absolutely nothing because its barely 4.2 and everything to get performance is 4.3 or latter opengl.

The issue why it has not been done sooner is what customers are paying the bills. The reality is the claim you need a lower graphics api access for performance has been false. You need a functional graphics API with all the features and being stuck to using opengl 4.2 and older means full performance is not possible. You also have to produce what you paying customers want to keep the servers running and programmers employed.

Its also when you know this stuff and they are claiming 20% faster with Vulkan on Mac OS Metal compared to Mac OS opengl you know that this is still crippled performance because full current opengl should be way faster than that let alone fully functional Vulkan.

Nothing you have written above makes any sense.

Gallium Nine has managed to provide 95% of native speed for most of the games. Not in theory, but in practice. And that's been the case for years.

The wined3d that is using OpenGL is slow, even on NVidia binary driver and much slower on Mesa3D. The speed improvements though new extensions does not exist yet... we are yet to see how well they will work.

Originally posted by oiaohm View Post

You need to read the license you are not allowed to use Windows Logo Kit testing on any other OS other than Windows by license why its not in the appdb. But it does give you when particular features are expected to exist by applications. Remember I am support not developer so I can go play with items that developers legally cannot touch. There are reasons why you need support people as well a developer people when doing these things that get you close to DCMA issues and I am also in a country where saying a item can only be used on a particular OS is not legal if it provided independent of the OS. Just because I mention something does not mean you should play with it.

I haven't been shown any license, before download or before selection install components. Guess I will get it after that?

Anyway, once again your example for the "huge" Nine shortfall is legally barred.

I'm not sure if I should even try testing forward. I'm in EU, so I'm quite confident that such clause is invalid around here. Still I'm not sure I want to taint myself even through I don't have my own code in Gallium Nine...

Thank you for nothing.

**oiaohm** · 14 March 2018, 09:44 PM

Originally posted by iive View Post

You write something wrong. I refute it.
All .EXE and ELF binaries support relocations, where the binary loader fills in addresses right before execution. It doesn't make the code any less native.

The term "intermediate representation" is specifically picked to distinguish it from the native code, assembler or binary. It is used to describe code that is machine independent.

Making mistake ones is OK. But then trying to smear, blur and confuse just to cover up your blunder...

Go watch the Australia Linux conference video where the Intel graphics driver developer gives a talk on how it hooked up. Uses native intermediate representation exactly where I do.

They don't say all the alterations that happen from the Native intermediate representation but that is exactly what they call it. Go get the arm Mali specs find same thing..

Inside mesa this extra layer classed a lot of times as not existing. They class the native intermediate representation that gets converted in gpu to native as native what is incorrect.

Originally posted by iive View Post

Thanks to your mismatch of wrong terms your writing above doesn't make any sense.
It also doesn't seem to be related to Mesa3D. And is completely off-topic and I don't care.

See problem you are think from Mesa3D you are saying I wrong when its something they get wrong all the time. I was giving you exactly the order that happens in hardware of the transformations. Not guess me ideas

Originally posted by iive View Post

As other people have already said, Gallium Nine already manages to achieve GPU utilization close to 100%. Good luck improving that.

You can have 100 percent cpu utilisation stuck in a spin-lock doing nothing useful. You can do the same thing to a GPU.

Originally posted by iive View Post

There is no such thing as inverted order, at least, not in the linked presentation article. (It's also not term google can connect to DX12.) It also doesn't make sense from engineering point of view, because it is a lot more complicated.

Inverting the order the items are transfered cross the PCI-e to the graphics card makes sense like it or not.

Originally posted by iive View Post

The indirect draw does not take argument from the calling function, it takes all parameters from a buffer object. (This time, it's just a single element instead of array.) It means that you have to fill that buffer before you do the draw (or make sure that the value would be filled there by some previous operation).

The draw itself could happen right away (aka if synced) or at some point much later (before presentation). This is why if you use ARB_buffer_storage, you have to keep it for a much longer time. The example in the linked article basically separates the buffer to 3 parts, to be used with triple buffering (3 presentation framebuffers).

Now usually the draw command is queued in CS and not executed right away, so issuing a draw and then filling the buffer data may seem to work for you. However it might not work with another driver. It might also break under debug mode that forces synchronization.

What you prep CPU side and the order its transferred to card is different when you get into DX12 and Vulkan and newest opengl. The some point latter why did the GPU trigger to-do the draw when it did. Got the draw instructions and noticed it did have the buffers yet so put it on hold when GPU got the buffer the draw as to work on it run the draw.

Also the idea that you have to fill buffer before you do draw commands it is kind of wrong. void* in ARB_buffer_storage might be pointing to memory map of a file on disc so the buffer may not be filled yet may not even be in ram when you create all your draw commands. Yes the object buffer has to have description of format that the void* pointed to area contains so that the correct draw command can be crafted. The mistake is the buffer does not need to be fully filled first but the buffer has to be fully filled before that buffer can be fully transferred to the GPU. Please note it can be part transferred while its part filled.

This is all about taking advantage of the pci-e transfer speed to transfer the items that are ready as soon as they ready and items you are waiting on from disc or else where end up have as much pci-e transfer speed as possible so reducing GPU equal to being stuck in spinlock. This is why it possible to go past what was reported as 100 percent utilisation of a GPU because the report is not giving the fine details why the GPU is stuck at full load.

Originally posted by iive View Post

Gallium Nine has managed to provide 95% of native speed for most of the games. Not in theory, but in practice. And that's been the case for years.

The wined3d that is using OpenGL is slow, even on NVidia binary driver and much slower on Mesa3D. The speed improvements though new extensions does not exist yet... we are yet to see how well they will work.

The point is you don't know how well those extensions will work so you don't know if gallium path is the fastest. Yet you had the gaul to say that wine should drop opengl and just use Gallium. The reality is there has not been a proper duke out between opengl and gallium.

Originally posted by iive View Post

I haven't been shown any license, before download or before selection install components. Guess I will get it after that?

Anyway, once again your example for the "huge" Nine shortfall is legally barred.

I'm not sure if I should even try testing forward. I'm in EU, so I'm quite confident that such clause is invalid around here. Still I'm not sure I want to taint myself even through I don't have my own code in Gallium Nine..

Its a pure ass design it display the license after it run before it will save or print the results. I did not have a choice on tainting self with some of this stuff because I had to run it for work. Think about Microsoft products normally display license screen in the installer these evil things don't. Once it displays the terms of the license in many countries you are up the creek. The fact you had not see a license is the big nightmare.

Basically Microsoft installer not display license stop find out what license you are in fact going to come into contact with if you want to avoid Taint and if you cannot find what it is the license will be something extremely nasty. Yes Microsoft will let you perform copyright infringement of using a product without license to-do so before informing you of it.

I am support around wine I am not restricted in what I can see. So I can go to areas that are legally barred to wine developers.

By the way the description page I pointed to gave a rough over view of what the software would test. That you are legally able to look at without tainting self. I am able to tell you want is roughly in those tests without tainting you. Its like point to the nvidia extention in opengl for direct x access. Give Microsoft test case for the nvidia opengl/direct x would risk tainting you. You might think at times I am being vague sometimes this is simply because I cannot think of how to show you the information without tainting you. But this time you annoyed me a little too much and I lost part of my due care.

Announcement

Wine-Staging Has Been Revived, Working Towards New Release

Comment

Comment

Comment

Comment

Comment

Comment

Comment