If this is your first visit, be sure to
check out the FAQ by clicking the
link above. You may have to register
before you can post: click the register link above to proceed. To start viewing messages,
select the forum that you want to visit from the selection below.
Congrats on the BSc and thanks for the work.
As soon as I am more free I'll try to read some of the thesis. Just looked at pages around 11 and it seems that I might actually understand some of it. It is nice to learn something about how things work.
Stop TCPA, stupid software patents and corrupt politicians!
Congrats on getting the draft finished (or was this the final draft?). I've read a few earlier versions from the github site, and I'll be going through the current draft as time permits (already saved locally).
At some point, I'll probably give a shot to building the training app and see how some of the newer linux-native steam apps function (Metro: 2033, Civ: Beyond Earth, Star Citizen, etc). Those applications might benefit a bit more than Q3-derived engines.
Congrats on getting the draft finished (or was this the final draft?).
Semi-final, still some formatting and proofreading fixes getting there.
At some point, I'll probably give a shot to building the training app and see how some of the newer linux-native steam apps function (Metro: 2033, Civ: Beyond Earth, Star Citizen, etc). Those applications might benefit a bit more than Q3-derived engines.
The training app is orthogonal to the runtime support. For benching you only need the kernel and mesa.
I used PTS mainly for reproducibility and graphing. The most demanding titles I tested were etqw, nexuiz and xonotic.
The code is available if you'd like to bench
I would if I had both components necessary in order to test (GPU and game).
So, what are your thoughts about where the remaining delta between catalyst and radeon lie?
It depends entirely on the case. As written in the thesis, the found solution is very likely not the global optimum, so even under VRAM pressure there's probably a few percent left to gain.
For cases like OpenArena 0.8.5 on 2GB VRAM? That's cpu-bound, so any general optimizations can be done by coders without gpu knowledge.
The SB shader optimizer is not yet perfect for Cayman cards, which includes many shipping APUs.
The glxgears/tri synthetic tests will get speedups by dri3.
GTT is not currently using uncached memory, which may provide speedups on all APUs.
VRAM memory compaction could be useful to better take advantage of it.
Anyway, hw features are all being utilized, and there aren't many places that need/benefit from algorithmic improvements. Mostly it's just specific cpu-side optimizations. If you have a case performing badly, whip up oprofile, see what shows up.
Urban Terror for instance showing 10% lower FPS while 30% better peaks.
Depending on the frequency and duration of the peaks this means the average FPS for the rest of the (non-peak) frames will have actually dropped by more than 10%. This sounds like quite disastrous behaviour as both AMD and Nvidia have been working on frame pacing in their proprietary drivers to eliminate this kind of peaky behaviour due to the noticeable effect it has on game play.
Depending on the frequency and duration of the peaks this means the average FPS for the rest of the (non-peak) frames will have actually dropped by more than 10%. This sounds like quite disastrous behaviour
Peaks are more noticeable than a constant but lower framerate. So it's not as disastrous as you think.
How did you choose the size of your network?
I suggest you add that to your thesis too.
Also in your discussion part, beware of not taking the result of what the NN learned as perfect.
They are an approximation. A small number of write could lead to a better score but the NN didn't get it,
or perhaps it is false only for a sub-region, etc.
Comment