Announcement

**allquixotic** · 01 February 2012, 01:00 AM

What isn't eminently clear to me is, why is there such a big push to use OpenGL or XRender as a backend in web browsers, especially for simpler scenes that are just HTML/CSS with some images? Any driver worth its salt (even Intel, now, with SNA) has hardware-backed 2D acceleration, so just using Cairo with the default backend eventually uses hardware acceleration, and that path has been well-tested and stabilized for years (evidenced by its rock solid stability on most hardware supported by the Linux kernel). The path between Cairo, the X server, the DDX and KMS is a venerable, fast path for even rapidly-changing dynamic scenes with HTML5. Granted, it's not suitable for rendering crazy amounts of polygons and 3d scenes, but that's what WebGL is for, which is a separate topic entirely.

I've observed that the latency and overhead involved in preparing the pipeline for the more sophisticated backends such as OpenGL and XRender causes more CPU usage and worse performance than just using the existing 2D paths.

That's why Microsoft came up with Direct2D for Windows, because they realized that implementing the entire GUI stack on top of Direct3D would be too inefficient, because Direct3D is designed to handle a bigger job.

One of the presentations that was on Phoronix a couple years ago (by Intel related to Larabee, I believe) said it best. And I think that even with the evolution of the hybrid CPU/GPU (Bulldozer, Tegra, etc) there's still a need to distinguish between these two workloads.

One workload is comparable to the real life workload of transporting one person from their origin to their destination. Assuming that this person needs to arrive at their destination in the minimum amount of time, while consuming the least amount of resources (but time is given higher priority over resource consumption), the most optimal way to get them there is to put them in a small vehicle, such as a fuel-efficient car, and zip them down the highway to their destination. This captures the use case of most traditional 2D applications, such as HTML/CSS rendering, 2D native GUIs in GTK or Qt, and so on. There are a lot of individual requests being sent, and there is a need for a very fast response time while consuming minimal resources.

The other workload is comparable to the real life workload of a company that needs to transport ten forklift pallets full of bricks from one place to another, once per week. Compared to the first example, this workload has different requirements: we know that it would be prohibitively inefficient (too resource-costly) to put a few bricks in the trunk of a dozen 2-door sedans and haul them over using 12 different drivers. So instead, we load them up on a very large flatbed truck that consumes a great deal of diesel fuel, BUT it ends up being significantly cheaper and faster than using a bunch of small cars. We also know that we can better manage any time inefficiencies that might be introduced in this slower mode of transit, because we are regularly delivering the same workload (ten forklift pallets of bricks) every week, so if we anticipate delays, we can create an artificial surplus by shipping early, and so on. This workload captures the use case of "heavy" 2D applications such as Flash, Clutter, 3d games, and so on. Because of the huge size of the work to be done, we send it out in larger batches, and employ a more expensive (in terms of power and latency) resource -- the full GPU -- in order to get the work done.

While I think it is possible to build a hardware chip that handles both workloads fine, that is not my point in this article. My point is that we should be using an API that maps well to the workload's requirements. It's as simple as that.

Real life performance data has continued to show that we see little benefit from using the "heavier" APIs for the "lighter" workloads: I think even past Phoronix articles showed that the software rasterizer is as fast or faster than the OpenGL backend for e.g. QT4. This kind of thing is very interesting to me, because you're using two different APIs to render the same workload, but the internal paths that each backend takes is very different: one of them turns on the entire GPU and starts mapping textures and maybe even compiling and executing shaders. The other path just uses some very basic graphics operations on the GPU and does a lot of rasterization in memory, although the final result is usually still zero-copy DMA transferred into the framebuffer somehow.

And aside from performance, it's pretty clear that even the most carefully optimized implementations of "3d stack for everything" -- such as Windows 7's WDDM 1.1 based Aero -- consume at least 10 - 15% more energy than a 2d path. This actual figure will depend on power characteristics of your graphics hardware, and I think 10 - 15% is a figure they came up with for Intel IGPs... if you were to look at, say, power-hungry Nvidia cards like the GTX 580, you might see even more dramatic power differences between a workload that's constantly hitting the GPU, and a workload that just dithers around on the relatively lower-power CPU.

So if you care at all about latency or power consumption, it's probably a good idea to configure your web browser, Gtk and Qt stacks to continue to use software rasterization. If you just want to use the latest developing technologies for the fun of it, enjoy using OpenGL for web browsing, which is a greatly over-engineered and unnecessary step that provides little measurable benefit.

**smitty3268** · 01 February 2012, 01:34 AM

Originally posted by allquixotic View Post

What isn't eminently clear to me is, why is there such a big push to use OpenGL or XRender as a backend in web browsers, especially for simpler scenes that are just HTML/CSS with some images?

So your argument is to dynamically switch the entire rendering engine based on what type of website is loaded? That seems incredibly more complicated than just coming up with a single working pipeline. And the push, as always, is coming from the fact that people want to do more and more complicated things on the web. If you think that rendering text websites is all a browser should handle, then you should probably look at using Links.

Any driver worth its salt (even Intel, now, with SNA) has hardware-backed 2D acceleration, so just using Cairo with the default backend eventually uses hardware acceleration, and that path has been well-tested and stabilized for years (evidenced by its rock solid stability on most hardware supported by the Linux kernel). The path between Cairo, the X server, the DDX and KMS is a venerable, fast path for even rapidly-changing dynamic scenes with HTML5.

That is exactly how Firefox uses XRender, so I'm not sure what your complaint is.

My understanding is that the OpenGL part (layers) is used to composite the page together, and provides things like smoother scrolling and many of the same benefits that a composited desktop bring. But i'm not really up to speed on everything it does or does not do. This part of Firefox is not used yet, because it's buggy, so they've turned that option off.

Future plans in Firefox: because Cairo does not match up well with the nature of the requirements Firefox has it is not as efficient as they hoped, so they are creating new backends to do everything. On Windows, they have already switched over to using D2D directly instead of going through Cairo. They plan to do the same on other platforms, although XRender is not seen as an important enough target to shoot for. Instead, they'll likely create a generic OpenGL backend which can be used on many platforms - if this doesn't provide any benefit there would be no reason to switch away from Cairo, although the general feeling among devs is that it's likely to provide a big speed boost. The basic Cairo backend will always be available as a fallback in case the system doesn't support 3D or any other reason.

**allquixotic** · 01 February 2012, 02:02 AM

Originally posted by smitty3268 View Post

So your argument is to dynamically switch the entire rendering engine based on what type of website is loaded? That seems incredibly more complicated than just coming up with a single working pipeline. And the push, as always, is coming from the fact that people want to do more and more complicated things on the web. If you think that rendering text websites is all a browser should handle, then you should probably look at using Links.

Actually, Chrome by default does "dynamically switch the entire rendering engine based on what type of website is loaded". Install Chrome and go to chrome://flags. You'll see options such as:

Originally posted by chrome

GPU compositing on all pages Mac, Windows, Linux, Chrome OS
Uses GPU accelerated compositing on all pages, not just those that include GPU-accelerated layers.

This option is considered experimental, slow, and will probably never be enabled by default (it's been available as an experimental flag for over a year, which by Chrome standards is about 60 years for anything else :P). You'd think that on Windows at least, where the graphics stack -- albeit proprietary -- is quite stable and fast with new drivers on Windows 7, this flag would be turned on by default if it granted a performance benefit. But nope -- it's off by default on all platforms. So they literally use software rendering for any layer that is not "GPU-accelerated". They also have complicated logic to determine whether a layer should be GPU-accelerated in the first place, and it has a lot to do with just how much dynamic content is present on the page, whether it uses external NPAPI plugins, and so on. There's even a separate debug flag to automatically force all layers to be GPU-accelerated, which, unsurprisingly, has been disabled by default since its introduction in 2010.

If Chrome is leading the way in advanced rendering techniques for the web (and I'm pretty sure that it is, especially on Windows and Chrome OS), and even they can't figure out how to make OpenGL fast and low-power for every layer on the web, then you'd better believe that they are only going to use GPU acceleration when it is clearly advantageous to do so. Why else would they have invested countless man hours in working out stability issues between the GPU-accelerated and non-GPU-accelerated layers; writing code to calculate rendering complexity and determine whether to GPU accelerate; and so on? And to do this for three, no four platforms in the space of about two years?

Funny thing is, if you actually force the flags to enabled, the browser works. Everything about it works, assuming you have a sufficiently featureful graphics driver (nvidia binary or fglrx on Linux, or any Windows WDDM driver should work). There are no rendering artifacts, no glitches, no tearing. It looks a lot like software rendering.

But then, just take a peek under the hood at the CPU usage, memory usage and power consumption, and compare them to what you get when the flags are disabled. Whoa. There's the kicker, laddy.

Future plans in Firefox: because Cairo does not match up well with the nature of the requirements Firefox has it is not as efficient as they hoped, so they are creating new backends to do everything. On Windows, they have already switched over to using D2D directly instead of going through Cairo. They plan to do the same on other platforms, although XRender is not seen as an important enough target to shoot for. Instead, they'll likely create a generic OpenGL backend which can be used on many platforms - if this doesn't provide any benefit there would be no reason to switch away from Cairo, although the general feeling among devs is that it's likely to provide a big speed boost. The basic Cairo backend will always be available as a fallback in case the system doesn't support 3D or any other reason.

Funny thing is that I don't think OpenGL matches up with the requirements of Firefox (or any modern web browser), either! I mean, sure, if you work really really hard with a large engineering team, and coordinate closely with graphics driver developers, you could probably come up with a solution that's, at least, fast -- for most/all common levels of page complexity on the web 2.0. But would it be efficient? If there's any truth in the experience I've seen with similar efforts in the past, the answer is probably that the performance would come at a very significant memory, CPU and power consumption cost.

You might say "eh, leave that challenge up to the hardware guys", but that doesn't do a whole lot of good for people using current-gen or previous-gen tech, which won't magically improve itself just because the software guys say so. I mean, I have no doubt that research between ARM, Nvidia and Intel will land some extremely impressive, power-efficient GPUs that can run 3d-rendered content all day without tiring out a standard lithium-ion battery. But that's several years down the road; today we have devices in tablets and laptops that, if you really start to hammer the OpenGL stack like they're proposing in Chrome and Firefox, you use between 20 and 50% more power than if you do software rast. It's just not worth it with today's tech, until they manage to trim down the power usage by finding ways to only turn on parts of the GPU, and make it sleep deeper (consume less power) and sleep more often (render in batches then go to sleep for a "long" time, like PulseAudio does with time-based scheduling).

**smitty3268** · 01 February 2012, 02:27 AM

Originally posted by allquixotic View Post

If Chrome is leading the way in advanced rendering techniques for the web (and I'm pretty sure that it is, especially on Windows and Chrome OS),

It's definitely not, at least for Windows and OSX. It might be on linux. All you have to do is look at the benchmarks to see that - IE9 and Firefox are miles ahead on windows. Safari is miles ahead on OSX. Chrome tends to be a very distant 3rd, ahead of Opera.

Funny thing is that I don't think OpenGL matches up with the requirements of Firefox (or any modern web browser), either! I mean, sure, if you work really really hard with a large engineering team, and coordinate closely with graphics driver developers, you could probably come up with a solution that's, at least, fast -- for most/all common levels of page complexity on the web 2.0. But would it be efficient? If there's any truth in the experience I've seen with similar efforts in the past, the answer is probably that the performance would come at a very significant memory, CPU and power consumption cost.

You're actually probably right about that. I think the thinking is that it's still much better than Cairo though.

Note that Firefox is using D3D10 (only in Vista/Win7) for layers, and it seems to be working relatively well. I'm sure they would disable that functionality if they found it made the browser worse. Although, it's probably fair to assume OpenGL won't fit as well (and they are using D2D for now instead of putting everything through the 3D API - there will eventually be some experimentation with this to see whether D3D will work well or not, but no one seems to know how that will turn out and D2D is the easier quick fix)

Oh, and power tests on Windows have shown that the browsers using hardware acceleration (Firefox and IE) actually use less power. http://blogs.msdn.com/b/ie/archive/2...xplorer-9.aspx

Yes, it's a MS source but you are free to duplicate their results. Obviously the display code isn't the only thing that a browser does which changes the power usage, but it seems like a pretty strong link to me.

**aceman** · 01 February 2012, 05:18 AM

Originally posted by Reloaded211 View Post

I also wonder that, if we could enable GPU acceleration at Firefox via layers.acceleration.force-enabled flag with full OpenGL3 support and have proper graphics??

You could enable layers.acceleration.force-enabled long ago, OpenGL 2.1 is enough for it.

Originally posted by Reloaded211 View Post

It doesn't change anything, the OpenGL rendering in Firefox is still buggy, besides FF9 is already accelerated via XRender and I don't see any major improvements performance-wise between them both.

Yes it is quite broken also for me, with rendering problems and slowness. It even consumed 1GB of RAM just by starting, fortunately they fixed it recently.

See https://bugzilla.mozilla.org/show_bug.cgi?id=594876 .

They plan to move to Skia library which is also used by Chrome. They think that should solve the OpenGL problems on Linux, as they think there are more problems with xrender than OpenGL.
See https://bugzilla.mozilla.org/show_bug.cgi?id=702158 .

**Ibidem** · 26 February 2012, 09:02 PM

Performance impact...

I did a little playing around, and glxgears gets 1-3 FPS more with R600_STREAMOUT=1, but not R600_GLSL130.
Which means: you might be slightly better of that way, but you won't get any real difference.

.

**elanthis** · 26 February 2012, 09:55 PM

Originally posted by allquixotic View Post

That's why Microsoft came up with Direct2D for Windows, because they realized that implementing the entire GUI stack on top of Direct3D would be too inefficient, because Direct3D is designed to handle a bigger job.

That is not why D2D was created. Not even close.

D2D exists because it exposes a very different API. D3D's problem was not that it "handles a bigger job" (it doesn't, it actually handles _smaller_ job), but that D3D basically only does triangle rasterization while 2D rendering wants to deal with complex curves and paths, which require much more complex rendering steps. Implementing those steps is very non-trivial, and it requires GPGPU support to do well (which Microsoft did not have a public stable API for at the time D2D shipped).

The final rendering by D2D (or an API like Cairo) can and _should_ be accelerated by the modern 3D GPU hardware, but the mapping of 2D rendering to 3D rendering is not all that clear. In many cases, it's best to just tessellate a 2D path/shape and pass that off to the triangle rasterizer. In other cases, it's better to render a simple quad and use a sophisticated fragment shader to compute all the shapes within that area. In yet other cases, it's best to do a mixture of the two using vertex stream processing. Doing all this requires GPGPU support (OpenCL, D3D11 Compute Shaders), geometry shaders and vertex stream processing (OpenGL 3.2, D3D10), and efficient resource pipelines and threading (D3D10 sorta, D3D11, and OpenGL if your drivers happen to support it even though the spec doesn't require it or specify how it should work).

Before D3D11, Microsoft opted to implement bits and pieces of the requirements on top of a special internal layer of the DirectX runtime, hence D2D (it was still being accelerated by the extra same hardware paths and capabilities in the drivers; drivers did not implement special "low-power low-latency" D2D hooks or anything like that). With D3D11, the entire D2D API can be implemented purely on top of D3D.

This captures the use case of most traditional 2D applications, such as HTML/CSS rendering, 2D native GUIs in GTK or Qt, and so on. There are a lot of individual requests being sent, and there is a need for a very fast response time while consuming minimal resources.

If an Intel person said that, that probably helps illustrate why Intel's Windows drivers are so awful.

There is absolutely no reason to make "lots of little requests" when rendering a 2D API. The vast majority of a 2D UI can be batched together and rendered using the same kinds of techniques we use to render millions of blades of grass in a 3D simulation, and doing so reduces latency and reduces power consumption.

Part of the problem I feel is that nobody outside of high-end game/simulation programmers really have any freaking clue how to actually make use of modern graphics hardware. I've had to fix a couple different 2D UI libraries in the last few months that were using horrendously stupid rendering engines (lots of little individual operations like you mentioned, with a mixture of stupid CPU implementations and incredibly inefficient "small" GPU operations). In both cases I managed to get the libraries rendering their entire HTML/CSS-based complex GUI layouts using a single-digit number of GPU draw calls, cranking our UI overlay rendering way up in the process, with complete texturing, gradients, shadows, and so on. And it's worth noting that I am _not_ what I consider a skilled graphics programmer: compared to most the people I work with, I'm a very crappy graphics programmer. Being the worst of the best is still apparently worth a lot more than being the best of the worst, it seems.

This workload captures the use case of "heavy" 2D applications such as Flash, Clutter, 3d games, and so on. Because of the huge size of the work to be done, we send it out in larger batches, and employ a more expensive (in terms of power and latency) resource -- the full GPU -- in order to get the work done.

It is generally both more efficient and more power friendly to use the GPU. It is not "more expensive," especially not on modern hardware.

This is another reason why it's so valuable to get apps using the proper graphics stack for rendering instead of CPU-bound 2D APIs: it improves battery life on laptops and mobile devices.

Real life performance data has continued to show that we see little benefit from using the "heavier" APIs for the "lighter" workloads: I think even past Phoronix articles showed that the software rasterizer is as fast or faster than the OpenGL backend for e.g. QT4.

That reflects poorly on QT4 and/or the graphics drivers in question. Not on the general approach.

And aside from performance, it's pretty clear that even the most carefully optimized implementations of "3d stack for everything" -- such as Windows 7's WDDM 1.1 based Aero -- consume at least 10 - 15% more energy than a 2d path.

Not quite apples and oranges. When you enable Aero, you're not just enabling a different rendering path, but also a ton of extra effects and features. To compare these fairly, you need to implement the exact same UX in all respects on top of both technologies.

This actual figure will depend on power characteristics of your graphics hardware, and I think 10 - 15% is a figure they came up with for Intel IGPs... if you were to look at, say, power-hungry Nvidia cards like the GTX 580, you might see even more dramatic power differences between a workload that's constantly hitting the GPU, and a workload that just dithers around on the relatively lower-power CPU.

While the GPU is more power-hungry, it also completes the workloads much quicker than the CPU (if the graphics programmer was halfway competent, at least). If the CPU uses 50% of the power but takes 400% as long, you'll find that you're at a net loss compared to just using the GPU (assuming both switch to low-power idle states when they're not doing anything).

Remember also that with Intel's drivers/hardware, until relatively recently, they basically were doing half the work on the CPU, so again comparing the approaches is more a reflection on Intel's low quality Windows drivers than on the actual approach with good drivers (like Intel's FOSS Linux drivers are shaping up to be, especially once solid OpenCL/GL3.2+ support lands).

Your comments about Chrome are also misleading. Chrome hasn't enabled the various flags it has because of bugs in their implementation and in drivers. The Chrome devs very much already know that using the GPU everywhere is both faster and more power efficient. The problem is that in many cases, enabling those features will decrease _stability_ (not efficiency) because of bugs. Having worked with Google very closely on a WebGL title last year, I can tell you just how many of those bugs there are and how big of a pain it is for both us app devs and Google's devs to have to work around them.

The reason that the "use GPU for compositing" flag can be slower is because mixing software and hardware rendering can be a huge bottleneck, and Chrome still does a lot of software rendering (not because it's better, but because of those aforementioned bugs and because Google is still working on modernizing the rendering layers in Webkit). If you're going to use software rendering anywhere then it's often better to use it everywhere (in simpler cases) because of inefficiency of making copies between system and video RAM. Once you get everything on the GPU that problem disappears. Once you get everything on the GPU, everything gets faster and more efficient. Getting everything on the GPU is a bit of challenge, especially on Linux's still very antiquated graphics stack (no working OpenCL, no recent GL version/features, etc., all of which are critical to getting that speed/power efficiency out of your modern D3D11 class hardware), which is why you're not seeing much benefit out of it on the FOSS toolkits and graphics stack.

So if you care at all about latency or power consumption, it's probably a good idea to configure your web browser, Gtk and Qt stacks to continue to use software rasterization.

On Linux, sure. On other OSes, no, that is not good advice.

The long-term solution is still not what you suggest. The work being done to move to the modern rendering techniques is very much the correct approach. You're just not going to see a positive result until _everything_ is updated completely and correctly, from the low-level Linux DRI/DRM bits to Mesa/Gallium to Cairo to the toolkits. So long as any piece of that chain is weakened by poor implementation or outright missing critical features, the end result of the whole stack will be far below ideal.

**darkbasic** · 27 February 2012, 04:53 AM

full gl3.0

**BlackStar** · 27 February 2012, 07:39 PM

Btw, the Opera browser team has been working on OpenGL acceleration for at least two years now, but they still don't have it enabled by default. Their software renderer is still faster, more stable and consumes less resources than the hardware accelerated renderer.

GPU acceleration is not some magic cure-all solution - especially when you have to go through something as unstable as OpenGL.

**Death Knight** · 28 March 2012, 09:50 PM

Help on enabling OpenGL 3.0

I have some problems:
Using OpenSUSE 12.1 x86_64.

Code:

death@triQuad:~/Desktop>uname -a
Linux triQuad.site 3.3.0-16-desktop #1 SMP PREEMPT Mon Mar 19 10:17:31 UTC 2012 (7e199aa) x86_64 x86_64 x86_64 GNU/Linux

death@triQuad:~/Desktop> R600_GLSL130=1 R600_STREAMOUT=1 glxinfo | grep Open
OpenGL vendor string: X.Org
OpenGL renderer string: Gallium 0.4 on AMD BARTS
OpenGL version string: 2.1 Mesa 8.1-devel (git-5ac910c)
OpenGL extensions:

Using latest Mesa Master and compiled with:

Code:

./configure --enable-gallium-g3dvl --with-gallium-drivers=r600 --prefix=/usr --libdir=/usr/lib64 --enable-{vdpau,xvmc} [B]--enable-texture-float[/B] --with-dri-drivers=""

Why I still have OpenGL 2.1?

Announcement

R600 Gallium3D Can Now Do OpenGL 3.0, GLSL 1.30

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment