Intel SNA Performance Continues To Be Compelling

entropy replied

02 August 2012, 07:59 AM
Thanks for taking the time and sharing your thoughts!
This is very much appreciated.
Leave a comment:
ickle replied

01 August 2012, 04:38 PM
Originally posted by entropy View Post

Is this statement related to intel hardware only or do you think there are general (significant) bottlenecks connected to Glamor?

There is a significant impedance mismatch between X and GL, that is tricky to overcome and adds lots of extra complexity, and with the extra abstraction layer you cannot exploit hardware features not exposed through a GL extension. Also you need to leak many details through that abstraction layer in order to allocate shared objects between multiple clients and your acceleration routines (which is quite, quite scary and hairy.) And there is the tiny issue of having a critcal system process relying on several hundred thousand lines of code that has not been written with robustness in mind, and having no failsafe method.

With regards to performance, the current bottlenecks I see in glamor are due to the CPU overhead of the Intel mesa stack, and the many assumptions that interact extremely poorly with the 2D workload of glamor. Where you do find yourself mostly GPU bound (such as the fish-demo), glamor still falls short by 10-30% due to inefficiences in the GPU programming (too many state changes and poor optimisation of shaders) and the multiple abstraction layers. However, being GPU bound is the exception and typically you end up being ratelimited by one of the paths that are orders of magnitude slower. And then there is the issue that glamor is an absolute resource hog, as the intel mesa driver's buffer management has never been used like that before...

In a perfect world, glamor would equal the performance of a highly specialised driver like SNA; much of the routines used in SNA can be mapped directly onto the OpenGL API - and most have been copied over to glamor. Lots of work needs to be done to tune the entire mesa stack, a lot of which I suspect will only benefit glamor.

And remember, RENDER acceleration is just one small part of the driver.
Leave a comment:
entropy replied

01 August 2012, 03:53 PM
Originally posted by ickle View Post

unlike UXA and glamor where they are the bottleneck

Is this statement related to intel hardware only or do you think there are general (significant) bottlenecks connected to Glamor?
Leave a comment:
ickle replied

01 August 2012, 03:41 PM
Originally posted by devius View Post

Very nice. Basically SNA = Good, UXA = Bad. When SNA has regressions they are negligible, but when it performs better, it really performs a lot better.

Right, the regressions tend to be a consequence of choosing one method that gives the better performance elsewhere at a cost. Most of the regressions are in the noise of the measurement, IvyBridge is very sensitive to thermals (in some of those tests the initial run is 2x faster than the final run due to turbo). The only significant regression there is -compwinwin500. The reason for the regression is that last week it was 2x faster due to hitting the Render cache - however that was missing a flush. Having added that flush for correctness, it becomes faster to use the BLT for that particular test, a trivial change already made.

But what I find truly fascinating is how competitive we actually are with a discrete GPU that has a good driver, over 4x the fill rate of the igfx and several times the shader flops. With regards to 2D performance the limitation tends not to be SNA (unlike UXA and glamor where they are the bottleneck), but the application - which is as it should be. :-)
Leave a comment:
devius replied

01 August 2012, 03:12 PM
Originally posted by ickle View Post

Here you go, IvyBridge (i7-3720qm) in comparison with an Nvidia GTX-550: http://openbenchmarking.org/result/1...SU-1207273SU39

Very nice. Basically SNA = Good, UXA = Bad. When SNA has regressions they are negligible, but when it performs better, it really performs a lot better.
Leave a comment:
ickle replied

01 August 2012, 02:30 PM
Originally posted by tenzero View Post

I would however like to see some kind of baseline against the usual suspects in discrete gpus. Nothing fancy, just the bottom of the range from AMD and Nvidia to give it all some kind of perspective.

Here you go, IvyBridge (i7-3720qm) in comparison with an Nvidia GTX-550: http://openbenchmarking.org/result/1...SU-1207273SU39
Leave a comment:
GreatEmerald replied

29 July 2012, 03:50 PM
Originally posted by devius View Post

This is irrespective of which drivers are being used on the Radeon (binary or open source, doesn't make a difference in terms of desktop compositing).

Well, it does make a difference for me on KDE. KWin won't work with fglrx and will work with radeon (though this is on openSUSE 12.1, I think they made improvements in later KDE versions).
Leave a comment:
devius replied

29 July 2012, 07:04 AM
Originally posted by GreatEmerald View Post

You missed the word "discrete" in the post you quoted. And I assume he meant with proprietary drivers.

I know, but I was just talking about comparable products because that's what I do know. I have read that even some powerful discrete AMD GPUs have problems providing a smooth desktop experience but still, a Radeon HD4200 theoretically has about the same performance as a Intel HD2000, but in reality the intel GPU provides a much better desktop experience. This is irrespective of which drivers are being used on the Radeon (binary or open source, doesn't make a difference in terms of desktop compositing).
Leave a comment:
ickle replied

29 July 2012, 04:34 AM
@Michael it looks like your CPU was being throttled during those tests, the dmesg does indeed some throttling due to exceeding its safe temperature.

For example, on your machine both UXA and SNA should score over 3000 op/s for -putimage500 (even higher for shorter benchmark runs due to turbo). And there are several other cases where your machine underperformed by a factor of 2-3x, consistent with throttling - or another CPU hog running. What would be fantastic were if pts were automatically able to perform system profiling in conjunction with running the benchmark, which would be both a boon for people trying to understand a test result and for spotting anomalous runs.
Leave a comment:
GreatEmerald replied

29 July 2012, 02:52 AM
Originally posted by devius View Post

I can guarantee you that intel GPUs offer a much better experience than AMD's chipset integrated GPUs. I have no idea how the comparision is like with the newer Llano iGPUs, which is something I would also like to know, but the difference was massive with the chipset ones. That is in real-world desktop usage not synthetic benchmarks.

You missed the word "discrete" in the post you quoted. And I assume he meant with proprietary drivers.
Leave a comment:

Announcement

Intel SNA Performance Continues To Be Compelling

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment: