Disable unneeded drivers in mesa config to speed up compilation.
Announcement
Collapse
No announcement yet.
How to tell if a driver is gallium or just mesa? (Slow renderng with radeon)
Collapse
X
-
You can use the make localmodconfig
Btw I am now running on my own kernel - hurray! I think this is the first time for doing so ;-)
The sad news is that barely anything have changed with the performance despite the whole lot of changes. In glxgears I can see the frame rate is now constantly 356..357 FPS which is 1-3 FPS better than ever before. In extreme tux racer - that I am also used to test every kind of configuration so far along with glxgears - the performance change is negligible. In the menu maybe I get 1-5 FPS more, but in the game it is either the same, or just a very small amount better and I actually think it is sometimes slower. I used to go though the same map and in the same pathway as far as I can do so and look at FPS at predefined points. One of these points is when I cross the goal and the finishing animation is played - it is not a video or that kind of animation, just the control is gone and the penguin turns towards the player so because I do not need to control the game I can always pay attention to how FPS count is and also the scene is very much the same. At this point now I saw that FPS change more variably than before: earlier it was basically always around 16 FPS and now it is changing rapidly from 14 to 18. But it can be that I just see the measurement takes place in a more fine-grained way and before this was smoothed out and avaraged to 16 FPS.
In any ways there is maybe a very slight noticable difference so it seems to be a good idea to run with these settings, but the difference is barely noticable while the original problem is a "barely not noticable" performance drop from the earlier system.
I think there is some bigger issue here and maybe we can rule out the kernel playing the biggest part in that. Still the changes made a very slight difference in speed here too.
So:
- Not really anything is running in the background
- Kernel is now optimized like crazy (of course I can make multiple compilations with different config still to get 1-2% more)
- Mesa is built from latest git and to me it seems there is only gallium drivers for r300 according to the source code
- The /etc/xorg.conf is as shown above while /etc/X11/xorg.conf.d/ directory is empty and not overriding anything (is there other files?)
- I have not built X myself so far but I doubt it helps. :-)
I am not sure if AGP acceleration is on or not. From dmesg I only see one line for some driver being activated, but that is why I was trying "agp=try_unsupported" as an extra parameter in grub config. It made no change neither.
PS.: For the latest runs I was not putting the video card into "high" profile so I might gain 1 more FPS. Also might gain one more in case the RETPOLINE=y removal did not remove all meltdown mitigations too. So maybe there is still 1-2% gain in this kernel if I configure more, but the original problem is like a 50-70% performance loss compared to my earlier system on the same machine with the open source radeon driver.
PS.: Earlier I've used lightdm and now just start an x server and dwm manually. Is there any userspace hackz that a display manager might do and I might don't do? Also is there any comprehensive information anywhere about what to look for in the graphics pipeline? Maybe I miss something very easy that makes the performance bad... :-(
Comment
-
Btw I have uploaded the current kernel config here so that in the future people can see it if interested:
Btw I am not sure if that is needed, but my user is added to the "video" group among others:
Code:[prenex@prenex-laptop ~]$ groups sys ftp log http games video storage wheel adm prenex
Code:[prenex@prenex-laptop ~]$ lat /dev/dri/ összesen 0 crw-rw----+ 1 root video 226, 0 máj 21 11.10 card0 drwxr-xr-x 19 root root 3,2K máj 21 11.10 .. drwxr-xr-x 2 root root 80 máj 21 11.10 by-path drwxr-xr-x 3 root root 100 máj 21 11.10 . crw-rw-rw- 1 root render 226, 128 máj 21 11.10 renderD128
Comment
-
Okay... I went on and asked "perf" about where I am spending the around-100% CPU time in extreme-tux-racer.
The perf recording and a simple text report are here:
One better download the files because they show up badly in my browser for the lines are too wide for it.
Also from the manpage of perf-report this is relevant to understand the textual output:
The overhead can be shown in two columns as Children and Self when perf collects callchains. The self overhead is simply calculated by adding all period values of the entry - usually a function (symbol). This is the value that perf shows traditionally and sum of all the self overhead values should be 100%. The children overhead is calculated by adding all period values of the child functions so that it can show the total overhead of the higher level functions even if they don’t directly execute much. Children here means functions that are called from another (parent) function.
Following the call stack it seems there is some memory move going on, as indicated by the name of this function: "ttm_bo_handle_move_mem".
Most of the things from that on (38%) is spent in some "get_page_from_freelist" function. It might be some kind of problem with memory management in the driver - or of course it can mean that I would get the same results even on a system where I have 200FPS in extreme-tux-racer but there it would do the same things and not do it slowly. The perf tool can only measure where the CPU is, but it cannot tell easily if that point is GPU or I/O or whatever bound.
Or maybe it is still the overhead that does the most harm? If I scroll down I get to this part:
Code:38.90% 36.28% etr [kernel.vmlinux] [k] get_page_from_freelist | |--35.96%--__kernel_vsyscall | entry_SYSENTER_32 | do_fast_syscall_32 | sys_ioctl | do_vfs_ioctl | radeon_drm_ioctl | drm_ioctl | drm_ioctl_kernel | radeon_gem_create_ioctl | radeon_gem_object_create | radeon_bo_create | ttm_bo_init | ttm_bo_init_reserved | ttm_bo_validate | ttm_bo_handle_move_mem | ttm_tt_bind | radeon_ttm_tt_populate | ttm_populate_and_map_pages | ttm_pool_populate | __alloc_pages_nodemask | get_page_from_freelist | --2.62%--get_page_from_freelist | |--1.30%--apic_timer_interrupt | smp_apic_timer_interrupt | | | --1.26%--hrtimer_interrupt | | | --0.95%--__hrtimer_run_queues.constprop.5 | | | --0.64%--tick_sched_timer | --1.07%--rmqueue_pcplist.isra.19.constprop.41
Maybe there should not have been so many memmoves at all and that indicates some underlying (configuration?) problem.
Comment
-
I went further in this direction because I now have the source code for the kernel and mesa both at hand...
One can find the relevant file in the linux kernel repository:
Code:drivers/gpu/drm/radeon/radeon_ttm.c
Code:static int radeon_ttm_tt_populate(struct ttm_tt *ttm, struct ttm_operation_ctx *ctx) { struct radeon_ttm_tt *gtt = radeon_ttm_tt_to_gtt(ttm); struct radeon_device *rdev; bool slave = !!(ttm->page_flags & TTM_PAGE_FLAG_SG); if (gtt && gtt->userptr) { ttm->sg = kzalloc(sizeof(struct sg_table), GFP_KERNEL); if (!ttm->sg) return -ENOMEM; ttm->page_flags |= TTM_PAGE_FLAG_SG; ttm->state = tt_unbound; return 0; } if (slave && ttm->sg) { drm_prime_sg_to_page_addr_arrays(ttm->sg, ttm->pages, gtt->ttm.dma_address, ttm->num_pages); ttm->state = tt_unbound; return 0; } rdev = radeon_get_rdev(ttm->bdev); #if IS_ENABLED(CONFIG_AGP) if (rdev->flags & RADEON_IS_AGP) { return ttm_agp_tt_populate(ttm, ctx); } #endif #ifdef CONFIG_SWIOTLB if (rdev->need_swiotlb && swiotlb_nr_tbl()) { return ttm_dma_populate(>t->ttm, rdev->dev, ctx); } #endif return ttm_populate_and_map_pages(rdev->dev, >t->ttm, ctx); }
I can see my card in lspci output, but that doesn't mean it is PCIe and not AGP didn't it? I always set AGP values for it earlier but now at this system it seems they are not really doing anything even when in the xorg.conf? If the card is AGP-accelerated, but now cannot use that at all that would easily explain the radical performance drop from the earlier system of mine on the same machine...
Looking at the kernel configuration the AGP support should be in kernel external module(s):
Code:... # # Graphics support # CONFIG_AGP=m CONFIG_AGP_ALI=m CONFIG_AGP_ATI=m CONFIG_AGP_AMD=m CONFIG_AGP_AMD64=m CONFIG_AGP_INTEL=m CONFIG_AGP_NVIDIA=m CONFIG_AGP_SIS=m CONFIG_AGP_SWORKS=m CONFIG_AGP_VIA=m CONFIG_AGP_EFFICEON=m CONFIG_INTEL_GTT=m CONFIG_VGA_ARB=y CONFIG_VGA_ARB_MAX_GPUS=10 CONFIG_VGA_SWITCHEROO=y CONFIG_DRM=m ...
Btw the CONFIG_SWIOTLB is not found in the kernel config file of mine at all - not even commented out...
It can also be this line tells my card is not agp while it is in real life:
Code:if (rdev->flags & RADEON_IS_AGP) {
But I kind of have the feeling that this is the problem and the card is capable of doing AGP while the system either thinks it is not - or configured to not use is...
Comment
-
A really ugly and dirty hack might be that I remove the tests around the call and force the kernel driver to call the agp version regardless, also it might help if I change the [m] in the configuration into a[*] so that the agp related stuff are not an external module, but built directly into the kernel of mine (it is made for my machine only now, so who cares).
But I think there might be some configuration issue so before I make these radical things that might do bad stuff too, I should maybe look around more a bit...
Comment
-
Hmm... According to this page:
Perhaps this is really helpful to other PPC-users as the desktop on my PowerMac G5 7,3 was *reeaaally* sluggish and now it's not. Got this idea after reading various PPC-specific bug reports on the xorg bugtracker, e.g. Bug 95017, Bug 94877. I added the following boot-parameter to /etc/yaboot.conf: append="root=... radeon.agpmode=8" Don't forget to write out your yaboot.conf to disk via sudo ybin -v afterwards. And use the following xorg.conf in /etc/X11: Section "Device"
This is what dmesg output should look like if agp acceleration is going on:
Code:... Sep 19 11:29:54 Debian-G5 kernel: pmac_zilog: 0.6 (Benjamin Herrenschmidt <[email protected]>) Sep 19 11:29:54 Debian-G5 kernel: Linux agpgart interface v0.103 Sep 19 11:29:54 Debian-G5 kernel: agpgart-uninorth 0000:f0:0b.0: Apple U3 chipset Sep 19 11:29:54 Debian-G5 kernel: agpgart-uninorth 0000:f0:0b.0: configuring for size idx: 64 Sep 19 11:29:54 Debian-G5 kernel: agpgart-uninorth 0000:f0:0b.0: AGP aperture is 256M @ 0x0 ...
Code:... [ 10.422564] battery: ACPI: Battery Slot [BAT0] (battery present) [ 10.707533] Linux agpgart interface v0.103 [ 10.901497] asus_laptop: Asus Laptop Support version 0.42 ...
I find it really likely however that AGP acceleration is now gone for my card and I have no idea why. That seems to be the cause of my problem and it would explain what I see really much...
Comment
-
I think I see that my card have both an AGP and a PCI bridge (seems to be natural to me):
Code:[prenex@prenex-laptop ~]$ lspci -knn 00:00.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD/ATI] RC410 Host Bridge [1002:5a31] (rev 01) Subsystem: ASUSTeK Computer Inc. RC410 Host Bridge [1043:13d7] Kernel modules: ati_agp 00:01.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD/ATI] RC4xx/RS4xx PCI Bridge [int gfx] [1002:5a3f] 00:13.0 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD/ATI] IXP SB4x0 USB Host Controller [1002:4374] (rev 80) Subsystem: ASUSTeK Computer Inc. IXP SB4x0 USB Host Controller [1043:13d7] Kernel driver in use: ohci-pci Kernel modules: ohci_pci ...
Comment
-
Use the command top to find out if dwm or other processes use too much hw resources. Arch Linux has too many moving parts to cause problems and when using single core only you will see big changes.
This is my top output now:
Code:top - 15:10:35 up 1:19, 4 users, load average: 1,36, 1,45, 1,21 Tasks: 78 total, 1 running, 75 sleeping, 0 stopped, 2 zombie %Cpu(s): 64,9 us, 6,5 sy, 0,0 ni, 28,6 id, 0,0 wa, 0,0 hi, 0,0 si, 0,0 st MiB Mem : 1380,8 total, 136,6 free, 478,1 used, 766,0 buff/cache MiB Swap: 988,3 total, 985,6 free, 2,8 used. 719,4 avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 475 prenex 1 0 1004996 375044 95104 S 42,1 26,5 20:37.38 palemoon 349 prenex 1 0 161416 40212 22896 S 3,6 2,8 1:37.86 Xorg 361 prenex 1 0 22328 11544 8256 S 0,7 0,8 0:00.85 xterm 372 prenex 1 0 9096 2700 2388 S 0,3 0,2 0:00.32 dwm 1 root 1 0 16952 2696 2096 S 0,0 0,2 0:02.43 systemd 2 root 1 0 0 0 0 S 0,0 0,0 0:00.00 kthreadd 4 root 1 -20 0 0 0 I 0,0 0,0 0:00.00 kworker/0:0H-kblockd 6 root 1 -20 0 0 0 I 0,0 0,0 0:00.00 mm_percpu_wq 7 root 1 0 0 0 0 S 0,0 0,0 0:01.94 ksoftirqd/0 8 root -51 0 0 0 0 S 0,0 0,0 0:00.00 idle_inject/
Memory usage:
Code:[prenex@prenex-laptop zen-kernel-5.0.17-lqx1]$ free -m total used free shared buff/cache available Mem: 1380 462 169 0 748 735 Swap: 988 2 985
For testing I always first close the browser. Will measure that for you with the browser closed as that is how I am running my tests. Also while running my tests there is nothing but the tested app running because not even pulseaudio is present and only alsa (and for glxgears not any sound needed anyways).
Also the "perf" output shows most of the CPU spends in the driver and I now highly suspect it is doing so because AGP is not accelerated for some reason. I might be wrong of course.
Comment
-
Top output after the browser has closed:
Code:top - 15:19:08 up 1:28, 4 users, load average: 0,1, 0,75, 1,00 Tasks: 79 total, 1 running, 76 sleeping, 0 stopped, 2 zombie %Cpu(s): 1,4 us, 0,5 sy, 0,0 ni, 98,1 id, 0,0 wa, 0,0 hi, 0,0 si, 0,0 st MiB Mem : 1380,8 total, 553,2 free, 78,9 used, 748,7 buff/cache MiB Swap: 988,3 total, 985,6 free, 2,8 used. 1119,4 avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 361 prenex 1 0 22460 11628 8256 S 0,3 0,8 0:01.23 xterm 612 prenex 6 0 10884 3304 2792 R 0,3 0,2 0:00.04 top 1 root 1 0 16952 2696 2096 S 0,0 0,2 0:02.51 systemd 2 root 1 0 0 0 0 S 0,0 0,0 0:00.00 kthreadd 4 root 1 -20 0 0 0 I 0,0 0,0 0:00.00 kworker/0:0H-kblockd 6 root 1 -20 0 0 0 I 0,0 0,0 0:00.00 mm_percpu_wq 7 root 1 0 0 0 0 S 0,0 0,0 0:01.96 ksoftirqd/0
Code:[prenex@prenex-laptop zen-kernel-5.0.17-lqx1]$ free -m total used free shared buff/cache available Mem: 1380 81 542 0 757 1116 Swap: 988 2 985
I really start to feel it is an AGP acceleration problem. Looking around in the driver source code, the message that I find missing should have been written out. It is still in the code and at a relevant place it seems.
I have nothing against debian neither, once I even installed one on S/390 years ago with xfce (I see your nickname :-). I just I feel like I want to understand the root of the problem here and it seems like it is not a distro issue - or if it is still, then it is some valuable piece of configuration that I better know about anyways as it is key to good/acceptable 3D performance.
Comment
Comment