Announcement

**Azpegath** · 23 June 2014, 04:34 PM

Originally posted by asdfblah View Post

you have to enable hyperz:
R600_DEBUG=hyperz

drm/radeon: add large PTE support for NI, SI and CIK v4 - ~deathsimple/linux - Some patches for drm/radeon

http://cgit.freedesktop.org/~deathsimple/linux/commit/?h=pte-compress&id=b41c689ed5193378b7d08a27fc476aedaafea1fa

NI is affected, too.

Isn't it time to enable hyperz by default, and just deal with the bugs that will come in, and fix them? I know the developers have a lot to do, but leaving half-finished features in, and never enabling them for most users is really sad. It's like a constant state of "not quite good enough" and not finishing features (as in "definition of done" for agile development).

On another topic:

With Unigine Tropics, the GCN graphics cards using RadeonSI Gallium3D are finally becoming competitive with the Radeon HD 6870 that's back on the more mature R600 stack. The Radeon HD 7850 for this test was up by 21%, the HD 7950 up by 27%, and the R9 270X up by 34%.

Awesome! I'm looking forward to the coming year, when us RadeonSI users will see the newer cards surpass the old ones in performance. Well, for me, at least when the RadeonSI will manage to start with the 290 card...

**dungeon** · 23 June 2014, 04:43 PM

Just recompiled with/out VM optimization patch and tested module

, so yes i don't see performance goes up with large PTE support (i tested that earlier, it is the same) and that actally slightly decreased fps (to be honest it is not measurabile... something like -0.5%

), but this one made a difference for me:

drm/radeon: optimize CIK VM handling v2 - ~deathsimple/linux - Some patches for drm/radeon

http://cgit.freedesktop.org/~deathsimple/linux/commit/?h=drm-next-3.16-wip&id=be8f93ae5845f87dee166d7a5c583672fd9447b6

@agd5f

Thanks for the info

. Could you explain these options, vm_block_size 9 bits

parm: vm_size:VM address space size in megabytes (default 4GB) (int)
parm: vm_block_size:VM page table size in bits (default 9) (int)

**Luke** · 23 June 2014, 05:17 PM

Not everyone is willing to use a closed driver

Originally posted by xeekei View Post

We are not satisfied with the performance the open source drivers deliver. As soon as Radeon can compete with the NVidia blob, I will switch.

I do not want to run a 50+ MB closed blob where it would be easy for someone to conceal a backdoor aimed at say, keylogging after a special code is sent. I have done some analysis of that question and figure Nvidia would not want to risk a "phone home to the NSA" backdoor being caught by a Wireshark user, but when I was playing with that blob I did not dare enter any encryption passphrases after X and the blob were running. Had they been permitted to use KMS I could not have used it at all, as I would not then have been able to unlock the disks prior to running the untrusted code.

Still, Catalyst is on the verge of being caught by the r600 and RadeonSI drivers, already has been in some applications. I've seen framerates on the HD6750 with r600 that beat Catalyst on Scorched3d, and without having to blacklist opening an encypted disk after X has been run just in case. At that point, the Nvidia blob becomes an interesting benchmark to compare to, but only if the right Nvidia and AMD cards are compared. My HD6750 and GTS450 are pretty close, giving about the same framerates in Scorched3d either on blob drivers, or (back in 2012) with the Radeon locked to mid frequency to match the default clocks of the GTS450 on open drivers.

Nouveau is also doing very well on those cards that can now be reclocked, giving suprising results even on those cards that can only be reclocked to mid-speed. Apparently the newer Nvidia cards boot to low speed, not mid speed like the older Fermi's did. Once they get to full clocks and the Nouveau driver optimizations can take off, I see Catalyist getting passed by both brands of cards on open drivers, and then Nvidia's blob too had better watch its back and get ready to move over. With KMS and native Linux development, I would not be at all surprised to see ports of Windows drivers end up slower. At least Nvidia's code ports well, AMD's apparently does not.

**agd5f** · 23 June 2014, 05:20 PM

Originally posted by Azpegath View Post

Isn't it time to enable hyperz by default, and just deal with the bugs that will come in, and fix them? I know the developers have a lot to do, but leaving half-finished features in, and never enabling them for most users is really sad. It's like a constant state of "not quite good enough" and not finishing features (as in "definition of done" for agile development).

It was enabled by default and the the number of bugs was more than we could deal with which is why it was disabled by default. Others can look into it too. All of the relevant registers are exposed in the driver. If you have a reproducible test case, start by looking at hyperZ registers and see if adjusting any of them helps. If not, move out to looking at other depth/stencil related state and see if there are any patterns in the state combinations that trigger hangs with hyperz. Once you've figured out the patterns, you can narrow it further or dynamically enable/disable hyperz based on the state.

**dungeon** · 23 June 2014, 05:35 PM

Originally posted by agd5f View Post

If not, move out to looking at other depth/stencil related state and see if there are any patterns in the state combinations that trigger hangs with hyperz.

I suspect that is the one depth/stencil state

anything shadow related are broken somewhere (even without hyperz enabled) and triggers those artifacts + further make those GPU hangs

.

So yeah i think hyperz is rock stable to be enabled by user, but only if your code or game does not use any shadows

. Maybe we need to write a warning: If you does not afraid of shadows...

.

**agd5f** · 23 June 2014, 05:37 PM

Originally posted by dungeon View Post

@agd5f

Thanks for the info

. Could you explain these options, vm_block_size 9 bits

vm_size is the size of the GPU's virtual address space. It's the amount of virtual address space into which GPU accessible buffers can be mapped. The larger the address space, the more vram is used to maintain page tables to support that address space.

vm_block_size defines the number of bits in the page table vs. the page directory. A GPU page is 4KB so we have 12 bits of offset, minimum 9 bits in the page table block and the remaining bits are in the page directory. It lets you adjust the relative size of the page directory vs. page table blocks.

**dungeon** · 23 June 2014, 06:37 PM

Thanks, maybe that explination needs to be on RadeonFeature page i don't figured out it on first sight what is it

.

**Delgarde** · 23 June 2014, 06:40 PM

Originally posted by kaprikawn View Post

But a while back my fan was making a very loud sound as it spun

Last time I had that problem, it was a poorly placed cable hanging close enough to brush against the CPU fan...

**geearf** · 23 June 2014, 11:13 PM

Originally posted by scorp View Post

It's probably not the best time to switch to RadeonSI (at least if you don't wont to participate on bugfixing). I have the HD7850 on Arch and here RadeonSI is suffering on the LLVM register allocation bug heavily. Currently, I can't run a lot of games, not native nor through wine.

Read this

Tom fixed that bug last week, unfortunately my game (DIII) now freezes within a minute... so not much better, but at least I get a minute of gaming instead of 0...

**liam** · 23 June 2014, 11:32 PM

Originally posted by agd5f View Post

vm_size is the size of the GPU's virtual address space. It's the amount of virtual address space into which GPU accessible buffers can be mapped. The larger the address space, the more vram is used to maintain page tables to support that address space.

vm_block_size defines the number of bits in the page table vs. the page directory. A GPU page is 4KB so we have 12 bits of offset, minimum 9 bits in the page table block and the remaining bits are in the page directory. It lets you adjust the relative size of the page directory vs. page table blocks.

So VM is referring to virtual memory. So, how was memory handled prior to SI (excluding, iirc, trinity/richland)? It just used an unprotected flat memory space? So the driver was then responsible for making sure the process (context) addressed the proper segment of memory?

Announcement

Open-Source Radeon Performance Boosted By Linux 3.16

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment