Announcement

**Deathsimple** · 01 May 2014, 01:19 PM

You should have waited a slightly bit longer. I've just pushed out a patch that will improve performance for SI and CIK quite a bit: http://lists.freedesktop.org/archive...ay/058858.html

**Michael** · 01 May 2014, 01:25 PM

Originally posted by Deathsimple View Post

You should have waited a slightly bit longer. I've just pushed out a patch that will improve performance for SI and CIK quite a bit: http://lists.freedesktop.org/archive...ay/058858.html

Ill do new benchmarks when it's merged... Since as usual, whenever posting benchmarks, there's always comments "well you should have waited for XXX", but needless to say this is the current 3.15 situation and I'll happily run more benchmarks whenever any other interesting code lands.

**narciso** · 01 May 2014, 01:58 PM

What's the state of r9 290 on opensource?

**curaga** · 02 May 2014, 04:29 AM

I remember asking about large page support before. Is there really no support for it in r600-eg?

**tomtomme** · 02 May 2014, 06:43 AM

Michael, just a technical question, sorry if its a noob one...
if the performance change in 3.15git is of video-memory optimization - shouldn?t this be benchmarked with cards that have limited amount of video memory (e.g. APUs, iGPUs) or with games that need more vRAM than available on the dGPU? Or should just everything be faster with this?

**Deathsimple** · 02 May 2014, 08:18 AM

Originally posted by curaga View Post

Is there really no support for it in r600-eg?

Correct, there is no support for it before NI and even NI-CIK only support it for VRAM not GART.

**bridgman** · 02 May 2014, 08:33 AM

Originally posted by tomtomme View Post

Michael, just a technical question, sorry if its a noob one...
if the performance change in 3.15git is of video-memory optimization - shouldn?t this be benchmarked with cards that have limited amount of video memory (e.g. APUs, iGPUs) or with games that need more vRAM than available on the dGPU? Or should just everything be faster with this?

Fair question. This particular optimization does not affect how efficiently the actual memory is used, just the efficiency of how the translation cache (TLB) between GPU and memory is used. Rather than storing instructions/data, the TLB is part of the memory management unit and caches the physical addresses corresponding to virtual addresses issued by the CPU. It's just like any other cache in the sense that cache misses require you to access much slower memory... each time you get a TLB miss the MMU has to walk page tables in memory to find the translation and then free up a TLB to store the translation it reads from memory.

More efficient use of TLBs means fewer interruptions while the MMU walks through page tables, and higher *effective* memory bandwidth. Same idea as huge page support on the CPU's MMU -- using 2M pages rather than 4K pages can give a 30% performance improvement in some cases just from reduced overhead walking page tables and reloading TLB entries. TLB misses are really expensive -- without a TLB each memory access on a GPU would require ~2 additional accesses for page tables, while a CPU would require 3-4 additional accesses since the page tables have more levels. Yes that's 3-4 page table accesses for every "useful" memory access.

The obvious follow-on question is "why not make the TLBs really big ?" -- answer is that TLBs are content-addressable memories and so fairly expensive in terms of die size, so the die area required for a bigger TLB can usually be better used by adding more cache, or more cores etc...

Announcement

Radeon DRM Graphics Benchmarks On Linux 3.15

Radeon DRM Graphics Benchmarks On Linux 3.15

Comment

Comment

Comment

Comment

Comment

Comment

Comment