Originally posted by Obscene_CNN
View Post
Announcement
Collapse
No announcement yet.
r600/r700 libdrm, mesa, and radeon performance patches
Collapse
X
-
Originally posted by Obscene_CNN View Post
"unlikely"/"likely" on their own? I highly doubt
that they are of any use (except maybe the MIPS R10K): modern x86
are very smart wrt. branch prediction and they obfuscate the code
in a non-readable way IMO.
Comment
-
The likely and unlike do give a very very minor speed increase. The improvement is not due to branch prediction. It is due to code location. Branches tagged if unlikely are moved out of the primary instruction flow. This reduces the instruction cache fill cycles per function and prevents the unlikely code branches from displacing useful instructions from the cache unless they are taken.
Comment
-
Originally posted by Obscene_CNN View PostThe likely and unlike do give a very very minor speed increase. The improvement is not due to branch prediction. It is due to code location. Branches tagged if unlikely are moved out of the primary instruction flow. This reduces the instruction cache fill cycles per function and prevents the unlikely code branches from displacing useful instructions from the cache unless they are taken.
Comment
-
Originally posted by mlau View PostHave you actually measured it or are you just talking hypothetically?
If you think polluting the instruction cache with null check handling code and code for handling corner cases will make things better feel free to take them out.
I forgot to mention that while using Torcs as a benchmark my worst case frame rate entering the straight away before the final turn on the Forza track was 2.7 FPS without my patch. With my patch it jumped to 3.3 FPS
Comment
-
Originally posted by Obscene_CNN View PostI haven't tested each likely/unlikely by itself if that is what you mean. Most were test as part of each section of code modification. A few were tested by themselves and did show improvement.
just measure their impact on their own. (I played a bit with "unlikely"
in a few places in MIPS kernel code. GCC 4.4.3 didn't lay out code
any differently).
I don't doubt that the other changes (batching writes) improve performance,
these changes do make sense.
Originally posted by Obscene_CNN View PostIf you think polluting the instruction cache with null check handling code and code for handling corner cases will make things better feel free to take them out.
predictor and a license to prefetch code to minimize pipeline stalls...
Comment
-
I think, if Oscene_CNN(great work btw!) patches result in improved performance, then upstream code misses something. For hardware, code niceness does not mean anything, if it runs faster, then this code is better for hardware. For human, it is something opposite. I suggest some kind of autopatching be enabled upstream, where Obscene_CNN patches are applied with explanations in form of end-point optimization. This means they are not dissolved, but they are used. In the end its gcc to become better or Obscene_CNN work to be incorporated in drivers. I hope upstream understands that drivers code performance side is at least of equal value as code beauty.
Comment
Comment