Radeon Gallium3D OpenCL Is Coming Close

Stebs replied

23 May 2012, 05:04 AM
@bridgman (and other devs etc.)

In the slight hope that you are still reading this thread: don't forget that for every vocal hmm "crazy" here in the forum, there are several silent readers who appreciate such insightful information of yours (well, there is at least one for sure...), so keep posting if you are in the mood, it IS appreciated
Leave a comment:
droste replied

19 May 2012, 02:25 PM
stratagem 26, but you got me there ;-) you won this time :-P
Leave a comment:
droste replied

19 May 2012, 01:30 PM
Originally posted by Qaridarium

because you admit that I am rhetorically gifted and that makes me competent to judge him.

Insulting people and getting personal does _not_ mean you're rhetorically gifted. (You used stratagem 20)
Leave a comment:
droste replied

19 May 2012, 11:28 AM
Originally posted by Qaridarium

read this book then you understand how bridgman works:"The Art of Being Right: 38 Ways to Win an Argument" http://en.wikipedia.org/wiki/The_Art_of_Being_Right

I think you're the last person here that should interpret others rhetoric... And it's even more funny that you mention "the art of being right", because you use at least 3 of the stratagems in each of your post (with your favorite being 14, 20, 36 and the most favorite of yours 38) ;-)
Leave a comment:
Drago replied

19 May 2012, 03:26 AM
Originally posted by Qaridarium

in your rhetoric we only get 45-50% in "LOL-OEM-LOL" products and we get it because the "LOL-OEM-LOL" take care about opensource drivers.

and we get 10-20% performance with the Llano because amd sell these GPUs with there CPUs directly to the customer and no "LOL-OEM-LOL" is there to protect us from AMD's make fun on open-source customers LOL!

amd should really stop to sell APUs to customers directly then maybe the performance increase from 10-20% to 45-50%

i'll take you seriously again if the products AMD sell directly to consumers are on the same performance level than the products from the LOL-OEM-LOL companies.

i can only recommend to buy INTEL if someone don't wanna spend expensive money on a LOL-OEM-LOL graphic card product.

Beat me! I didn't understood anything from the above!
Leave a comment:
bridgman replied

14 May 2012, 09:50 PM
Originally posted by crazycheese View Post

Excuse me for short intervention, but what about complexity of Linux? Was it fatal? Has number of hackers reduced from version 0.01 till recent or increased?

IIRC the number of hackers for 0.01 was "one", so presumably it's gone up from there

Graphics driver development community seems to be growing at roughly the same pace as the general Linux developer community, with roughly the same mix of commercial and volunteer developers.

Take a read through http://go.linuxfoundation.org/who-writes-linux-2012 - starting to see a number of graphics developers showing up in the "top contributors" list.

Originally posted by crazycheese View Post

We don't get 80% performance, we get 20%.

I think you'll find the average is more like 45-50% these days and continuing to increase. You can still cherry-pick numbers to get 20% (eg recent Llano benchmarks with lower clocks for open source than for Catalyst) but it's getting harder every month.

Originally posted by crazycheese View Post

We are missing raw programming resource, everyone knows this.

Agreed - the number of developers per KSLOC is maybe 1/2 as high for the "desktop bits" (X, graphics drivers etc..) as it is for the "server bits" (kernel, filesystems etc..). That is probably related to the higher $$ earned from server Linux business but that's just a guess.

Originally posted by crazycheese View Post

Also, everyone knows we don't achieve this resource by talking it over and over, or by endlessly changing points of view and finding array of arguments.

I don't understand what point you are making here. Are you saying people shouldn't ask questions, or shouldn't answer them, or something else ?

Originally posted by crazycheese View Post

If we are to change something, we are to pick direction and follow it. This will never happen, it is fact.

That's not what happened with the Linux kernel either and my impression was that you thought the Linux kernel was a good example to follow.

Linux started out with fairly basic implementations of all the major functions, then over the years different subsystems were gradually replaced with more complex but more featureful and higher performing implementations. That's the same pattern we are seeing with graphics -- UMS gets replaced with KMS, shader translators get replaced with shader compilers, classic Mesa HW layer gets replaced with Gallium3D layer, layers get blended together for performance (eg Intel plans to use GLSL IR in the graphics HW layer) etc...

That seems like the right approach to me, but it is not consistent with "open source drivers running faster than proprietary drivers in the first couple of years" which is what everyone except the developers seemed to expect. Now I guess the popular sentiment is "things aren't moving as fast as I hoped so open source drivers are always going to suck", which is just as wrong.

Last edited by bridgman; 14 May 2012, 10:13 PM.
Leave a comment:
crazycheese replied

14 May 2012, 07:55 PM
Originally posted by bridgman View Post

This is one of the interesting tradeoffs. Do you want the driver to be simple and accessible so more people can potentially contribute, or do you want it to be sufficiently sophisticated that it could potentially match the performance of proprietary drivers at the cost of reducing the pool of potential contributors ?

Excuse me for short intervention, but what about complexity of Linux? Was it fatal? Has number of hackers reduced from version 0.01 till recent or increased? We don't get 80% performance, we get 20%. We are missing raw programming resource, everyone knows this. Also, everyone knows we don't achieve this resource by talking it over and over, or by endlessly changing points of view and finding array of arguments. If we are to change something, we are to pick direction and follow it. This will never happen, it is fact. Again, excuse me for interruption.
Leave a comment:
bridgman replied

13 May 2012, 11:12 PM
Originally posted by benjamin545 View Post

so in the linux ecosystem, we have some paid hardcore developers and we have a lot of hobbyists. hobbyists will never ever individualy on their own design a modern graphics driver thats competitive with todays standards, and thats ok. now as we have seen in the linux graphics stack over the past few years, paid hardcore developers have come a long way in creating a very competative graphics stack, but we really want hobbyists to be a part of that too, and while some have, i think a lot of people while willing a possibly able to conribute, still feel overwhelmed with the complexity of it all.

This is one of the interesting tradeoffs. Do you want the driver to be simple and accessible so more people can potentially contribute, or do you want it to be sufficiently sophisticated that it could potentially match the performance of proprietary drivers at the cost of reducing the pool of potential contributors ?

The current open source driver standards seem to be aimed at the knee of the curve, where they're sufficiently complex to allow writing "fairly performant" without becoming "twice as complex for a small increase in performance". Seems like a good compromise, but it's important to understand that it *is* a compromise.

As an example, the open source drivers have a relatively large amount of common code and a relatively small amount of HW-specific code but if you want to get that last 20% of potential performance you generally need to move the line up and have substantially more of the driver stack being hardware-specific. That makes the driver code larger and more complex, which in turn makes it a lot harder for potential developers to contribute.

Originally posted by benjamin545 View Post

getting more the the point i guess, is that if tgsi is a simpler ir to transport between various componants, if i was a newcomer wanting to develop a componant, it would be easier to deal with tgsi. if it is then nessicary to convert it to something more specific to what i am doing, (whitch is what ive been hearing all along is that its too hard to create one all encompasing ir that is perfect for all state trackers and all hardware drivers), then that is what would hae to be done. at least then you could try to make your internal ir something specific to your hardware, for instance, i sure the nvfx/nv30 driver, with its ununified shader cores, is much diferent than the nv50 or nv0c or whatever.

IR is affected both by hardware characteristics and choice of compiler frameworks being used. If everyone settles on a single compiler framework then that IR will probably win -- otherwise TGSI will probably get extended so that it can serve as a common language between the different compiler stacks. The interesting question is whether it will be noticeably faster to convert directly from one structured IR to another, or whether going through a common "flat" IR will be close enough in performance that the benefits outweigh the costs.

Originally posted by benjamin545 View Post

it would be best if other parts of gallium had that same kind of mentality, for instance, memory management is one where initialy gallium was sold as being able to abstract memory management compleatly into the sinsys portion of the driver, but whate ive read before is that a lot of the memory management has been implemented in the hardware drivers usualy due to some feature missing from gallium or it just being easier for whoever is doing it to do it in the driver (im guessing proboobly a lot of that comes from the initial testing and learning stages).

The winsys layer was supposed to *abstract* things like memory management not *implement* them. The implementation was always expected to be in the lower level drivers (eg the kernel driver aka drm)-- the Gallium3D abstractions just provide a standard way to call those functions.

Last edited by bridgman; 13 May 2012, 11:15 PM.
Leave a comment:
benjamin545 replied

13 May 2012, 09:14 PM
well, then it seems like the obvious answer is if we cant have both a structured ir thats as easily transportable between componants (best of both worlds) then we have to use the right ir at the right time for the right solution.

i guess you have to take a step back and try to realize what the big picture is, what is it we want. regarding gallium3d, and i know thats excluding intel and anchient stuff, but what can you realy do about that, is we want a strong central structure that interconects various piecies that do specific functionalities (heres a opencl state tracker, heres a nvidia generation X driver, heres a windows xp winsys connector). this is what gallium3d was billed as. but it was intended for use initialy and primarily for the linux ecosystem, even if it wasn't locked into that specific role.

so in the linux ecosystem, we have some paid hardcore developers and we have a lot of hobbyists. hobbyists will never ever individualy on their own design a modern graphics driver thats competitive with todays standards, and thats ok. now as we have seen in the linux graphics stack over the past few years, paid hardcore developers have come a long way in creating a very competative graphics stack, but we really want hobbyists to be a part of that too, and while some have, i think a lot of people while willing a possibly able to conribute, still feel overwhelmed with the complexity of it all.

getting more the the point i guess, is that if tgsi is a simpler ir to transport between various componants, if i was a newcomer wanting to develop a componant, it would be easier to deal with tgsi. if it is then nessicary to convert it to something more specific to what i am doing, (whitch is what ive been hearing all along is that its too hard to create one all encompasing ir that is perfect for all state trackers and all hardware drivers), then that is what would hae to be done. at least then you could try to make your internal ir something specific to your hardware, for instance, i sure the nvfx/nv30 driver, with its ununified shader cores, is much diferent than the nv50 or nv0c or whatever.

it would be best if other parts of gallium had that same kind of mentality, for instance, memory management is one where initialy gallium was sold as being able to abstract memory management compleatly into the sinsys portion of the driver, but whate ive read before is that a lot of the memory management has been implemented in the hardware drivers usualy due to some feature missing from gallium or it just being easier for whoever is doing it to do it in the driver (im guessing proboobly a lot of that comes from the initial testing and learning stages).
Leave a comment:
bridgman replied

13 May 2012, 07:59 PM
Originally posted by benjamin545 View Post

Ah, so, basicaly, we( we as in you, dont you just love when people say we but really they themselves aren't part of the "we") have no clue how demanding compute will be on the ir and in what way the ir will need to bend to effectivly operate.

The problem is that AFAIK essentially all of the "serious" GPU compute experience has been in proprietary stacks so far, generally using proprietary IRs. The source programming languages and runtime environments are evolving as well, which makes it even harder to leverage existig experience.

Originally posted by benjamin545 View Post

can you tell me this then. i know we often hear about the ir languages probobly more so than any other componant of the graphics stack below the actual end user api's, but how inconviniant is it really to switch from one to the other? in a game engine it would be a real job to change out from ogl to dx, even from ogl to ogl es if ddone certain ways, but how much of a bother would it be for you to change amd compute back end frrom the llvm over to tgsi if that was the more unified aproch?

So far I haven't seen much in the way of *changing* IRs... it's more common to just translate from the new IR to whatever was being used before. If you look at the r3xx driver as an example, it was written around Mesa IR and when TGSI was introduced the developers added a TGSI-to-Mesa IR translator at the front of the Gallium3D driver and kept the existing shader compiler code.

This wasn't a matter of intertia though -- some of the IRs are structured as trees or linked lists which a compiler can work on directly (eg optimization steps) while others like TGSI are "flat" and intended for exchange between components rather than as an internal representation worked on directly by the compiler.

That breaks the problem down into two parts :

1. Should the IR be something suitable for direct use by compiler internals, or should it be something designed primarily for transmittal between driver components ?

The advantage of something "flat" like TGSI or AMDIL is that it is relatively independent of compiler internals. The disadvantage is that all but the simplest compilers will require a more structured IR internally and so translation to and from TGSI will be required at each component boundary. Complicating the matter is that while the extra translations seem like they would slow things down they only slow down the compilation step not the runtime execution. Compilation does not usually happen every time the shader is run - minimum is once at program startup, with recompilation sometimes needed when state info that affects shader code changes or if the driver's cache of compiled shaders fills up.

If the choice is something "flat" then TGSI is probably the most likely choice for the open source stacks. If a flat IR is *not* chosen, then we get to question 2...

2. Assuming a structured IR is used, which one should be used ?

This is where GLSL IR and LLVM IR enter the picture, and where the choice of shader compiler internals becomes a factor.

For graphics, the Intel devs were talking about feeding GLSL IR directly into the HW shader compiler for graphics.

Before you say "that's wierd", remember that the high level compiler in Mesa (the "OpenGL state tracker") generates GLSL IR directly which is then converted into TGSI or Mesa IR for use by HW layer drivers so using GLSL IR bypasses some translation steps. For graphics, "Classic" HW drivers use Mesa IR today while "Gallium3D" HW drivers use TGSI. Bottom line is that when you run a GL program on any option source driver the shader starts as GLSL IR then gets optionally translated to something else.

Clover, on the other hand, starts with Clang which generates LLVM IR directly, so the kernel starts as LLVM IR then gets optionally translated to something else.

Once you get down to the HW driver, the shader compiler is likely to need a structured IR such as GLSL IR or LLVM IR. You can see where this is going...

Originally posted by benjamin545 View Post

also, whats the chances someone will start slinging gcc ir in there as an option what with their plans to try and make a competing ir more like what llvm has?

I doubt that gcc will get plumbed into the existing GL/CL driver stack but it seems pretty likely that gcc *will* end up generating GPU shader code and that runtime stacks will exist to get that code running on hardware. This may already have happened although I haven't seen anyone do it yet.

Last edited by bridgman; 13 May 2012, 08:13 PM.
Leave a comment:

Announcement

Radeon Gallium3D OpenCL Is Coming Close

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment: