Announcement

**curaga** · 15 April 2014, 05:18 PM

Originally posted by liam View Post

Was testing done with more demanding engines?

I used PTS mainly for reproducibility and graphing. The most demanding titles I tested were etqw, nexuiz and xonotic.

The code is available if you'd like to bench

**Adarion** · 15 April 2014, 05:20 PM

Congrats on the BSc and thanks for the work.
As soon as I am more free I'll try to read some of the thesis. Just looked at pages around 11 and it seems that I might actually understand some of it. It is nice to learn something about how things work.

**Veerappan** · 15 April 2014, 08:32 PM

Congrats on getting the draft finished (or was this the final draft?). I've read a few earlier versions from the github site, and I'll be going through the current draft as time permits (already saved locally).

At some point, I'll probably give a shot to building the training app and see how some of the newer linux-native steam apps function (Metro: 2033, Civ: Beyond Earth, Star Citizen, etc). Those applications might benefit a bit more than Q3-derived engines.

Enjoy the well-earned vacation.

**curaga** · 16 April 2014, 04:16 AM

Originally posted by Veerappan View Post

Congrats on getting the draft finished (or was this the final draft?).

Semi-final, still some formatting and proofreading fixes getting there.

At some point, I'll probably give a shot to building the training app and see how some of the newer linux-native steam apps function (Metro: 2033, Civ: Beyond Earth, Star Citizen, etc). Those applications might benefit a bit more than Q3-derived engines.

The training app is orthogonal to the runtime support. For benching you only need the kernel and mesa.

**liam** · 16 April 2014, 05:23 AM

Originally posted by curaga View Post

I used PTS mainly for reproducibility and graphing. The most demanding titles I tested were etqw, nexuiz and xonotic.

The code is available if you'd like to bench

I would if I had both components necessary in order to test (GPU and game).
So, what are your thoughts about where the remaining delta between catalyst and radeon lie?

**curaga** · 16 April 2014, 05:44 AM

It depends entirely on the case. As written in the thesis, the found solution is very likely not the global optimum, so even under VRAM pressure there's probably a few percent left to gain.

For cases like OpenArena 0.8.5 on 2GB VRAM? That's cpu-bound, so any general optimizations can be done by coders without gpu knowledge.
The SB shader optimizer is not yet perfect for Cayman cards, which includes many shipping APUs.
The glxgears/tri synthetic tests will get speedups by dri3.
GTT is not currently using uncached memory, which may provide speedups on all APUs.
VRAM memory compaction could be useful to better take advantage of it.

Anyway, hw features are all being utilized, and there aren't many places that need/benefit from algorithmic improvements. Mostly it's just specific cpu-side optimizations. If you have a case performing badly, whip up oprofile, see what shows up.

**Herem** · 17 April 2014, 08:39 AM

Urban Terror for instance showing 10% lower FPS while 30% better peaks.

Depending on the frequency and duration of the peaks this means the average FPS for the rest of the (non-peak) frames will have actually dropped by more than 10%. This sounds like quite disastrous behaviour as both AMD and Nvidia have been working on frame pacing in their proprietary drivers to eliminate this kind of peaky behaviour due to the noticeable effect it has on game play.

**curaga** · 17 April 2014, 01:58 PM

No need to guess, the frame time graph for Urban Terror is posted.

**V10lator** · 17 April 2014, 02:47 PM

Originally posted by Herem View Post

Depending on the frequency and duration of the peaks this means the average FPS for the rest of the (non-peak) frames will have actually dropped by more than 10%. This sounds like quite disastrous behaviour

Peaks are more noticeable than a constant but lower framerate. So it's not as disastrous as you think.

**mannerov** · 17 April 2014, 11:25 PM

Curaga:

How did you choose the size of your network?
I suggest you add that to your thesis too.

Also in your discussion part, beware of not taking the result of what the NN learned as perfect.
They are an approximation. A small number of write could lead to a better score but the NN didn't get it,
or perhaps it is false only for a sub-region, etc.

Announcement

The Results Of Optimizing Radeon's VRAM Behavior

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment