There is definately bottleneck somewhere!
This bottleneck is related to screen resolution, as opensource driver does not reduce the fps on par to closed source.
Maybe some huge data copy function/s that is unoptimized or unaccelerated? Maybe closed source copies data in single SIMD every frame, but opensource calls on demand?
Do amd developers have some profiling software, some software to monitor cpu load, memory bus, pcie bus, gpu load, gpu memory controller load, disk io, software io(like kernel ring 0/1 changes per (milli)second, time percentage and overall the cpu spends - in kernel, driver or elsewhere or idles)?
again a useless comparison. my card is way faster tha in the test. vsnc colortiling swap buffers and so on are disabled. it does not matte what is current standard. it does matter whats acually possible with the cards and driver stack. double or triple the r600 bars and u get real results
There are some tools but they don't generally help as much as you would expect. GPU driver programming involves very long pipelines with many things going on at once, so things that run slow are frequently related to code that ran some time earlier. Performance work usually ends up more like :
Originally Posted by crazycheese
- stare at all the things you mentioned
- get an idea
- rewrite a bunch of code and see what happens
- repeat until you have to work on something else
That said, I believe the main work right now is finishing the enablement of "known" performance-related features, ie the ones which are enabled in r300g but not enabled in r600g. In most cases I think code exists and works on many configurations but not enough to enable by default yet.
Last edited by bridgman; 05-27-2011 at 10:07 AM.
Ah I remember that 1:1 compiler post on Phoronix a long time ago that made the r600 actualy work. But that was an ugly hack, right? And this hasn't been fixed?! Isn't this top priority work? =o
You mean like HyperZ? What about rendering using both GPUs of a dual-GPU card? Crossfire? These things would probably help a lot, but only if we eliminate the most severe of the existing CPU bottlenecks. The biggest issue is that the pipeline stalls for a very, very long time; it's not that the GPU has any problem handling the requests it does get.
Originally Posted by bridgman
Other problem I see is that certain workloads (some 3D apps) are constantly throwing errors inside DMAR / DRHD subsystems -- several times per frame. The kernel is protecting itself from data corruption and potentially system-crashing issues by detecting DMA remapping faults -- so that part of the kernel is doing its job. But clearly DRM is not doing its job, or the faults wouldn't occur in the first place.
While the fault prevention of DMAR/DRHD is great, the downside is that each fault is very expensive. It ends up generating an interrupt each time. If this is happening dozens of times per second, then no wonder we're getting crap FPS.
I wouldn't be surprised if several of the programs Michael tested in this article have brought out this behavior. It seems to only occur with mesa 7.11-dev, which he's using. Revert back to mesa stable, and although you lose a lot of features, mesa doesn't use libdrm in such a way as to trigger these constant faults, so the stack is much less preoccupied with handling a constant stream of invalid DMAR requests, and FPS wins. This appears to be closely related to the IOMMU.
In fact I remember FPS being much more competitive in past articles Michael has written pitting r600g against Catalyst. I would be surprised if this particular problem isn't to blame for several of the tests Michael used.
Edit: (this editing thing is cool!) -- Then again, maybe Michael doesn't have the same problem. You can clearly see whether you have the problem by looking for this in dmesg while rendering (the numbers are irrelevant as this is just an example):
From my (limited) understanding here, the IOMMU actually resides on the motherboard chipset, so it is not directly controlled by either the CPU or GPU manufacturer. Therefore maybe this is an isolated problem that is only buggering up on my specific motherboard chipset. That's entirely possible; I have an early first-generation Intel X58 chipset (ASUS P6T Deluxe v1 is the specific make/model). It was the first enthusiast / desktop Nehalem Architecture motherboard to market. With an Intel CPU and an AMD GPU, who knows if I've just got bad luck and the IOMMU hardware on the mobo doesn't perform to spec?
Originally Posted by Linux Kernel
Regardless of whether or not that's true, I must insist that this is an issue that can be handled in software. Otherwise they would probably recall the motherboard, and the Catalyst drivers wouldn't work properly for me on Windows or Linux. As it stands, I can play all the big AAA titles on Windows just fine, so maybe the Catalyst team already discovered and squashed this bug.
Or I'm way off the mark and the hardware is fine but there's just a software bug in DRM. Sorry for over-speculating.
Last edited by allquixotic; 05-27-2011 at 11:06 AM.
I just wish for the driver to deliver some usable performance. :/
Originally Posted by bridgman
Its seems I am the only person in universe who wants performance opensource drivers on 2gb 5870 or equivalent amd card. :/
Even if I switched to gtx260sp216, every time I see awesome amd hardware, I have heart ache. :/
When I see how slow opensource driver is, I nearly have infarction ://
Wow, the HD4670 Gallium3D performance has regressed hard since the last batch of tests! The Catalyst performance on this ASIC has stayed rougly the same. Could this be due to what allquixotic was talking about in comment #16?
allquixotic, is there an open bug for the issue you mentioned?
If not, it shouldn't be too hard to bisect the problem if 7.10 is okay.
I too thought that a hd 4670 was way faster a few weeks before than what I see here.
Originally Posted by bongmaster2
So here is what to benchmark next on phoronix: How much speed improvements the mentioned features bring, both individually and combined.
(color tiling, page flipping, swapbufferwait off etc.).