DRI2 vs. DRI3 Radeon Linux OpenGL Performance

papu replied

06 November 2015, 06:43 AM
Radeon R9 270

Code:

cat /var/log/Xorg.0.log | grep -i DRI3 [ 19.107] (**) RADEON(0): Option "DRI3" "on" [ 20.866] (**) RADEON(0): DRI3 enabled
Last edited by papu; 06 November 2015, 06:49 AM.
Leave a comment:
agd5f replied

05 November 2015, 02:14 PM
Originally posted by nanonyme View Post

Does it work easier on Wayland where there's a single compositor managing everything?

No, it does not.
Likes 1
Leave a comment:
nanonyme replied

05 November 2015, 05:02 AM
Originally posted by agd5f View Post

Freesync (actually DP DRR (Dynamic Refresh Rate) which is the underlying hw feature) allows you to change the refresh rate at runtime. DP DRR was originally designed to allow the driver to reduce the refresh rate of the display on the fly to save power if the display image was not changing. Freesync extends on this functionality by integrating feedback from the 3D or multimedia drivers to adjust the refresh rate on the fly based on a target frame rendering speed. If the 3D driver can render at 100fps, it can ask the display driver to change the refresh rate to 100hz. If it slows down to 54 fps, it can ask the display driver to change the refresh rate to 54 hz, etc.

Adding support for freesync to the Linux graphics stack is a lot of work. You basically have to plumb the entire graphics stack with the idea that apps can query and select a refresh rate dynamically at runtime. It breaks a lot of established models and assumptions about timing.

Does it work easier on Wayland where there's a single compositor managing everything?
Leave a comment:
Guest replied

04 November 2015, 09:50 PM
Originally posted by Espionage724 View Post

Weird; my man entry says it's a boolean for radeon (Fedora 23):

Option "DRI3" "boolean"
Enable the DRI3 extension. The default is off.

The behavior changed in openSUSE TW with pontostroy's repo (still radeon):

Option "DRI" "integer"
Define the maximum level of DRI to enable. Valid values are 2 for DRI2
or 3 for DRI3. The default is 2 for DRI2.

Is there anything that might explain why the option has different behavior?
Leave a comment:
agd5f replied

03 November 2015, 01:31 PM
Freesync (actually DP DRR (Dynamic Refresh Rate) which is the underlying hw feature) allows you to change the refresh rate at runtime. DP DRR was originally designed to allow the driver to reduce the refresh rate of the display on the fly to save power if the display image was not changing. Freesync extends on this functionality by integrating feedback from the 3D or multimedia drivers to adjust the refresh rate on the fly based on a target frame rendering speed. If the 3D driver can render at 100fps, it can ask the display driver to change the refresh rate to 100hz. If it slows down to 54 fps, it can ask the display driver to change the refresh rate to 54 hz, etc.

Adding support for freesync to the Linux graphics stack is a lot of work. You basically have to plumb the entire graphics stack with the idea that apps can query and select a refresh rate dynamically at runtime. It breaks a lot of established models and assumptions about timing.
Likes 1
Leave a comment:
SystemCrasher replied

03 November 2015, 06:34 AM
Originally posted by profoundWHALE View Post

As a Canadian, I got a little confused by the acronym CRTC

It is "historic" name and its direct meaning isnt much better: it means "Cathode Ray Tube Controller". Ironically, CRTs are rarely used today. But name survived and used in Linux graphic stack. It is a hardware automation. It takes data from framebuffer in (V)RAM and throws it to the wire, being automatic and independent of CPU/GPU, creating low-level timings like H and V sync, pixel data timings, etc on the fly. Modern CRTCs could have several "heads" attached via different wire-level protocols, DRM/KMS introduces notion of "encoders" to address this. There're different names for this hardware. AMD internally calls it "DCE". Others call it like "display controller", etc. Good thing about it is that CPU or GPU do not have to take actions unless picture should change. Some early computers lacked such automation and system CPU had to abandon all tasks like 50-60 times per second just to go draw current frame itself, here and now, matching hard timings. Needless to say it made system programming harder than it should to be, imposed many limits on picture resolution, as well as stealing plenty of CPU time. So some smartass eventually came with idea to automate frame drawing using (relatively simply) automation circuitry, which would create appropriate timings on its own and transfer data from some RAM area (frame buffer) in hardware-assisted way, without getting system CPU involved.

This Idea proven to be so damn good it survived many years and these days such hardware automation exists in almost any graphic-capable setup, ranging from SoC and microcontrollers up to huge powerful GPUs. Though there're some few fancy exceptions, like small SPI/I2C LCDs where "CRTC-like" thing is built-in and so independent you do not even have to know it exists at all.

So these days it usually looks like this: you set up CRTC parameters once ("video modesetting") and forget about it for a while, just dumping new frame data to CRTC's framebuffer area when you want to change picture on display. Everything else is up to CRTC. CRTC repeats sending of framebuffer data over wire on its own. Fixed "hardware" FPS is all about this.

Though most straightforward programming could lead to "tearing": CRTC does not cares what to throw into the wire. If you change framebuffer right when CRTC throwing it to the wire, half-old and half-new frame isn't looking good. Furthermore, if you failed to compute particular frame in time, there're only 2 options left: either wait for NEXT frame blanking, killing latency even further, or put what you've got to frame buffer as is, so CRTC would transfer half-old, half-new frame. That's why it so critical for GPU to be able to cope with frames at rate greater or equal of hardware refresh rate, no matter what.

Freesync does not defer, or, if they do, it happens very very quickly.

I guess ideally it could happen on frame-by-frame basis, there is nothing wrong about it, as long as wire protocol and display are okay with it. There is no real notion of "FPS" in software. You render frame. It takes some time to compute (or decode if its video player). That's your inter-frame time, here and now. It could be different, subject to frame complexity. In fact each frame can take different time to compute it. FPS is something averaged and dumbed down, so mere mortals could understand it "better". Those who understand difference between "instant" and "average" values should get very clear idea what I mean. Frame time is an "instant" value, FPS is something averaged. And what one really wants is that no even single frame to be "late", so new frame should arrive before CRTC starts throwing data to the wire.

From pure rendering perspective it also makes no sense to compute more frames than CRTC can throw over wire. If you throw 60 frames per second into the wire, being able to compute 100 frames within second means you have to discard 40 frames without even pushing them over wire. However, many gaming engines are doing ALL computations like physics, motion and so on on a per-frame basis. So even if one would just discard these 40 frames in terms of rendering and displaying, it would mean lower latency for motion computations and somesuch. In fact this process hardly could be called optimal in terms of heat, power consumption and GPU load, but its gaming engines to blame. It seems it is just easier for devs to do all computations on a per-frame basis rather than anything else.

VSync = defers the frames. In the background, it is rendering as much/little as it would normally, and then disgarding anything that is not in sync with the display's refresh rate. It introduces latency, and decreases/eliminates tearing.

Vsync means something like this: let's wait until CRTC done with current frame transfer and goes idle ("vertical blanking" interval starts). Then you can put new frame into framebuffer and as long as you complete it before CRTC begins throwing next frame, there is no tearing. CRTC just fires new frame into the wire. But if you do not have your frame computed when its time to throw new frame, you're in trouble. Either wait for NEXT frame killing latency even further, or dump it to frame buffer and face tearing, because CRTC already transferred part of frame and it contained OLD data.

And freesync is probably like this: if it getting obvious GPU would not finish particular frame in time, it probably possible to request CRTC to wait a bit and defer sending new frame to the wire until we computed it and put in place. This requres underlying protocol and display to be okay with this approach though.

Historically, display protocols were like a direct description on how to drive ray to show you a picture. You start frame, start horizontal row, wait until invisible overscan area ("front porch") expires, ray finaly reaches visible display area. Then one fires actual pixel values into DAC, it controls beams currents while ray travels, row of of RGB pixels appears on CRT. Then ray leaves visible area, "back-porch" starts. Whole process usually referred as H-sync. Then next row starts, etc until whole frame appears. And then same idea applies on frame top and bottom boundary as well, that's what V-sync REALLY meant in regard of CRTs.

When digital displays appeared, while LCDs have different nature, it turned out that just chopping DAC away and transferring what was fed into DAC in just direct digital form is a viable option. Most direct manifestation of this approach would be "RGB bus", where you can see something similar to VGA but in native, digital representation. DAC has been removed, RGB pixels are directly sent as RGB bits over wires. Bad thing about it is that takes 24 wires for 24 bits per pixel, plus h/v sync signals and more. It is not very practical to have long cable with like 30 wires to attach display to computers, etc. But smaller devices like tablets, phones and so on are ok with it and short flat cable can carry whole 18-24 bits in most straightforward way from SoC to LCD, as seen in some devices.

To reduce number of wires, things like LVDS, DVI and HDMI appeared. Roughly speaking, they transfer same thing like RGB-bus does, but data are serialized at one end of wire and deserialised on another end, sometimes using several lanes. This allows to reduce number of wires in cable and go for higher resolutions as high-speed serial bus is not prone to timings skew unlike 24 bits represented by separate wires. Yet, in LVDS and HDMI/DVI such signal still carries heavy dependence on the very same H/V timings and what goes over wire is basically being a serialized bitstream version of RGB-bus (could also be YUV in HDMI). Such protocol design does not really allows to change timings on the fly, etc.

Display Port was probably first serious departure from this approach: DP uses packets to transfer data, so it could be far more flexible. That's what allows to introduce some smarter ways of signalling. its not a "direct" RGB+timings bitstream anymore, and first time someone SERIOUSLY rethought display interface from CRT ages. Hopefully it explains why both Freesync and G-sync reqire DP. You can't afford arbitrary delay on RGB bus, LVDS, VGA, DVI or HDMI, because signal on the wire is a direct manifestation of pixel drawing timings and there is no way to do smarter signalling between display and rest of system anyway. And of course it was AMD who pioneered in doing something advanced and new rather than just copy-pasting old designs.

http://www.pcper.com/reviews/Graphic...ologies-Differ

Well, it seems AMD did most logic and proper engineering thing aka delaying frames. Nvidia did some awkward proprietary hack. Frame doubler? Oh, wtf, lol? What about tripler or quadrupler? Or 1.5x'er?
Likes 1
Leave a comment:
renox replied

03 November 2015, 05:50 AM
Originally posted by profoundWHALE View Post

Some incorporate some sort of frame doubler or interpolation which is required with GSync which would totally eliminate tearing and acts as a VSync with no latency. This kicks in with the GSYNC guys when it gets below 45fps or so.

The wikipedia website say that GSync has 'frame doubler' to avoid screen flickering if the game FPS output is too low, it says nothing about interpolation.
Leave a comment:
profoundWHALE replied

02 November 2015, 07:21 PM
Originally posted by SystemCrasher View Post

First and foremost goal of vsync is to replace frames in CRTC's framebuffer at "proper" ("vblank") times. You see, when you carelessly update picture in CRTC's framebuffer at arbitrary time "just because you already computed new frame", it often happens CRTC already started throwing frame into the wire. If you change image right when CRTC throwing it into the wire, half-old, half-new frame goes to the wire. Since frames could be different, there could be visible border somewhere in the middle of frame received by display. If it happens a lot, users are swearing about "tearing". So it could be important to change frames only when CRTC does not throws data into the wire, if you value picture quality over latency. Generally, all changes and re-configurations should happen during vblank times unless one is okay with distorted/weird picture, flickering, tearing and other fancy stuff.

As a Canadian, I got a little confused by the acronym "CRTC"

Originally posted by SystemCrasher View Post

As far as I understand, idea behind Freesync is that CRTC could be is able to "defer" throwing frame into the wire to some degree, if it turns out you need more time to finish rendering. Vblank is "extended", CRTC does not throws half-old, half-new frame, but rather waits for complete new frame, which is then thrown into the wire once it ready and put in place. It results in tear-free picture, even if GPU being somewhat late to compute some frames. As such, there is no constant FPS at this point.

Freesync does not defer, or, if they do, it happens very very quickly.

Code:

Defer: [COLOR=#222222][FONT=arial]put off (an action or event) to a later time; postpone.[/FONT][/COLOR]

VSync = defers the frames. In the background, it is rendering as much/little as it would normally, and then disgarding anything that is not in sync with the display's refresh rate. It introduces latency, and decreases/eliminates tearing.

Freesync + No VSync = The display refreshes just about as often as it receives frames. Some incorporate some sort of frame doubler or interpolation which is required with GSync which would totally eliminate tearing and acts as a VSync with no latency. This kicks in with the GSYNC guys when it gets below 45fps or so. No idea how it all works though.

Freesync + VSync = Adds latency just like it did before, but AFAIK it's much better due to the variable refresh rate in Freesync. Again, would have no effect on GSync since it already does a VSYNC-like activity.

Dissecting G-Sync and FreeSync - How the Technologies Differ - PC Perspective

http://www.pcper.com/reviews/Graphics-Cards/Dissecting-G-Sync-and-FreeSync-How-Technologies-Differ

Dissecting G-Sync and FreeSync - How the Technologies Differ As a part of my look at the first wave of AMD FreeSync monitors hitting the market, I wrote an
Last edited by profoundWHALE; 02 November 2015, 07:24 PM.
Leave a comment:
dungeon replied

02 November 2015, 06:21 PM
Originally posted by SystemCrasher View Post

...and what happened to R7 370? One of most dumb and annoying misfeatures of benchmarks at phoronix is lack of half results for half tests without any single word about reason begind this. Has R7 370 crashed under load? Has it failed to deliver reasonable performance? Or this test has been just considered pointless and skipped?

I guess that should be ALARM!!! for any GCN 1.0 user to check Xonotic Ultra If something crashes it will be easy to bisect from low, maybe offsetmapping that was crashing llvm earlier here and there.
Leave a comment:
SystemCrasher replied

02 November 2015, 06:15 PM
Originally posted by profoundWHALE View Post

Yeah. The goal of vsync is consistent frame times over latency

First and foremost goal of vsync is to replace frames in CRTC's framebuffer at "proper" ("vblank") times. You see, when you carelessly update picture in CRTC's framebuffer at arbitrary time "just because you already computed new frame", it often happens CRTC already started throwing frame into the wire. If you change image right when CRTC throwing it into the wire, half-old, half-new frame goes to the wire. Since frames could be different, there could be visible border somewhere in the middle of frame received by display. If it happens a lot, users are swearing about "tearing". So it could be important to change frames only when CRTC does not throws data into the wire, if you value picture quality over latency. Generally, all changes and re-configurations should happen during vblank times unless one is okay with distorted/weird picture, flickering, tearing and other fancy stuff.

As far as I understand, idea behind Freesync is that CRTC could be is able to "defer" throwing frame into the wire to some degree, if it turns out you need more time to finish rendering. Vblank is "extended", CRTC does not throws half-old, half-new frame, but rather waits for complete new frame, which is then thrown into the wire once it ready and put in place. It results in tear-free picture, even if GPU being somewhat late to compute some frames. As such, there is no constant FPS at this point.

...and btw, if I remember, opensource graphic stack makes assumptions that FPS is constant. It has been true for decades. But not anymore.
Leave a comment:

Announcement

DRI2 vs. DRI3 Radeon Linux OpenGL Performance

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment: