Announcement

Collapse
No announcement yet.

How to tell if a driver is gallium or just mesa? (Slow renderng with radeon)

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • #51
    I am running on a debian 7 with xfce. It should be exactly the same as you run now and this is on a live USB. Kernel is 3.2.0-4-686-pae.

    I already see differences in the dmesg log. Some things are the same, but some are different. Okay... I mean... I only talk about the things I wrote about earlier on other systems and kernels.

    Look at this part of dmesg for example:
    Code:
    [    0.062532] PCI: Ignoring host bridge windows from ACPI; if necessary, use "pci=use_crs" and report a bug
    [    0.062628] ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff])
    [    0.062828] pci_root PNP0A03:00: host bridge window [io  0x0000-0x0cf7] (ignored)
    [    0.062832] pci_root PNP0A03:00: host bridge window [io  0x0d00-0xffff] (ignored)
    [    0.062837] pci_root PNP0A03:00: host bridge window [mem 0x000a0000-0x000bffff] (ignored)
    [    0.062841] pci_root PNP0A03:00: host bridge window [mem 0x000d0000-0x000dffff] (ignored)
    [    0.062845] pci_root PNP0A03:00: host bridge window [mem 0x58000000-0xffffffff] (ignored)
    [    0.062862] pci 0000:00:00.0: [1002:5a31] type 0 class 0x000600
    [    0.062886] pci 0000:00:00.0: reg 1c: [mem 0xe0000000-0xffffffff 64bit]
    [    0.062918] pci 0000:00:01.0: [1002:5a3f] type 1 class 0x000604
    [    0.063017] pci 0000:00:13.0: [1002:4374] type 0 class 0x000c03
    It seems now it can read reg 1c of the Host controller and I do not see any quirks logged out for it!
    That is a good sign at least and is maybe something... Also I got the values for that register so maybe
    some manual hacking will be possible to provide the same data region configured somewhere?

    The lspci -v is the same however:
    Code:
    00:00.0 Host bridge: Advanced Micro Devices [AMD] nee ATI Device 5a31 (rev 01)
        Subsystem: ASUSTeK Computer Inc. Device 13d7
        Flags: bus master, 66MHz, medium devsel, latency 64
        Memory at <ignored> (64-bit, non-prefetchable)
    (Maybe it is normal to write <ignored> there actually?)

    There is a radeon kernel module seen in dmesg, but from dmesg I see that there is no r300 microcode and radeon fails to load because of this. There is 696 Mb of overlay available so I guess I can save data on this live USB and install things. I have no idea how to install the microcode and where to get it from for this older debian but I can only measure 3d performance if I get that. Are there still packages for this version or should I search for a later one that is easy to try out things with and install stuff?
    Code:
    [   19.861009] [drm] radeon kernel modesetting enabled.
    [   19.861647] [drm] initializing kernel modesetting (RS400 0x1002:0x5A62 0x1043:0x1392).
    [   19.861678] [drm] register mmio base: 0xFE1F0000
    [   19.861681] [drm] register mmio size: 65536
    [   19.861867] [drm] Generation 2 PCI interface, using max accessible memory
    [   19.861875] radeon 0000:01:05.0: VRAM: 128M 0x0000000058000000 - 0x000000005FFFFFFF (128M used)
    [   19.861880] radeon 0000:01:05.0: GTT: 512M 0x0000000060000000 - 0x000000007FFFFFFF
    [   19.861895] [drm] Supports vblank timestamp caching Rev 1 (10.10.2010).
    [   19.861898] [drm] Driver supports precise vblank timestamp query.
    [   19.861912] [drm] radeon: irq initialized.
    [   19.862380] [drm] Detected VRAM RAM=128M, BAR=256M
    [   19.862384] [drm] RAM width 128bits DDR
    [   19.868504] [TTM] Zone  kernel: Available graphics memory: 446938 kiB
    [   19.868509] [TTM] Zone highmem: Available graphics memory: 712062 kiB
    [   19.868512] [TTM] Initializing pool allocator
    [   19.868524] [TTM] Initializing DMA pool allocator
    [   19.868566] [drm] radeon: 128M of VRAM memory ready
    [   19.868569] [drm] radeon: 512M of GTT memory ready.
    [   19.868602] [drm] GART: num cpu pages 131072, num gpu pages 131072
    [   19.992619] psmouse serio4: synaptics: Touchpad model: 1, fw: 6.2, id: 0x92a0b1, caps: 0xa0471b/0x200000/0x0
    [   20.009379] [drm] radeon: ib pool ready.
    [   20.009472] [drm] radeon: 3 quad pipes, 1 z pipes initialized.
    [   20.015730] [drm] PCIE GART of 512M enabled (table at 0x0000000034880000).
    [   20.019226] radeon 0000:01:05.0: WB enabled
    [   20.019232] [drm] fence driver on ring 0 use gpu addr 0x60000000 and cpu addr 0xf7102000
    [   20.020302] [drm] Loading R300 Microcode
    [   20.058147] input: SynPS/2 Synaptics TouchPad as /devices/platform/i8042/serio4/input/input10
    [   20.093414] platform radeon_cp.0: firmware: agent aborted loading radeon/R300_cp.bin (not found?)
    [   20.093558] [drm:r100_cp_init] *ERROR* Failed to load firmware!
    [   20.093612] radeon 0000:01:05.0: failed initializing CP (-2).
    [   20.093662] radeon 0000:01:05.0: Disabling GPU acceleration
    So there are radeon modules. GART things log out the very same way and same sizes etc, and it even names the missing firmware file or microcode file or whatever.

    What is new for me is the log lines for [TTM] - I have no idea what these are supposed to be, but I doubt they help because I saw them in the ubuntu 16.04 live cd too and the performance was not better.

    If I search for "agp" this is the only place for it in dmesg:

    Code:
    [    1.172292] Linux agpgart interface v0.103
    To me this indicates that the AGP capability is still not listed. I have no idea if that was originally the source of my problem as it was just a big suspicion of mine and maybe I just debugged the kernel agpgart drivers for long time while actually going into a completely bad direction in why the performance was bad... I would only know for sure if I would still have the really original system...

    The original system however surely had a later-than-3.2 kernel!

    Full logs are available here:

    http://ballmerpeak.web.elte.hu/config_486_debian.txt
    http://ballmerpeak.web.elte.hu/config_686_debian.txt
    http://ballmerpeak.web.elte.hu/dmesg_debian.txt
    http://ballmerpeak.web.elte.hu/lspci_debian.txt
    http://ballmerpeak.web.elte.hu/uname_debian.txt

    PS.: Does anyone know about a way to measure VRAM memory throughput? Is there any benchmarks specifically rigged to do so?
    PS.: The glxinfo and glxgears are not installed as of now, but I would only be able to test software rendering anyways so far now... Also the release is old enough to not have apt sources so I would need to compile a lot of things myself I guess.

    TL;DR:

    The most relevant information is that the 0x1c register sizing works here and logs a memory area: reg 1c: [mem 0xe0000000-0xffffffff 64bit] while agpgart seem to act the same ways even on this old kernel. Cannot measure 3d performance because of missing binaries.

    I think it was worth a try. The system also booted fast and nice otherwise and I understand the r300 bin is missing because it is specific for a device and maybe should not be on a live cd otherwise images grow unnecessary big.

    Comment


    • #52
      I think I should try a debian 8 jessie live image because it has packages until 2020 for it and I can install them on the live usb I hope so maybe I can have a working 3D infrastructure. Or even try a 14.04 ubuntu LTS despite it is now EOL because maybe the servers didn't shut down yet as EOL was not even a month ago... The latter only because they seem to have up and running radeon prepared in the live image despite it is overly specific to be present in a live image.

      Surely I prefer the debian 8 way of course because that is more clean maybe, but at least there are alternatives. Also of course I am only talking about installing these older systems for debugging only.

      Comment


      • #53
        Originally posted by debianxfce View Post

        Due to free software nature of Debian, firmware files must be installed from the non free repository.
        https://packages.debian.org/jessie/f...e-amd-graphics

        This is not a problem when you install Debian to the drive. If the networking works, you can download additional software and drivers.
        I would even undestand if it would be not included because it is specific hardware, but it is good you cleared it up what is the cause.

        I had no idea before that even with the open source driver stack there are propriately binary blobs needed :-(. That is kind of sad :-(.

        Comment


        • #54
          Debian derivatives are a waste of human resources.
          I think some of them serve a purpose. People like my mom or girlfriend find ubuntu safe and started using open software. Also I like armbian because it has prebuilt nice images for my orange pi machines and save time with that. Maybe once I "grow up" and just use debian, arch, void, linux from scratch, gentoo, whatever on it and spend the time to set up all the things, but we still grow all the time. ;-)


          Btw I have found that the usb key in mom's laptop case still had the 16.04 boot on it with a version from 2016 instead of all the recent updates. She kept it like it was necessary - in a way I am happy for that ))

          I nearly started to believe that speed was just this bad always haha, now I see I was right before because even without my xorg configuration applied extreme tux racer had 60FPS constantly in the menu and throughout the gameplay a minimum of 20 and sometimes 60 FPS! Really big difference from the 10-19 FPS on the latest ubuntu I tried before.

          I made have saved all the logs, perf data, kernel configuration, kernel version info to compare with the other measurements.

          Just by looking at the result by the naked eye I was able to tell the difference is huge, because on arch (I am there once again) and other new kernels or setups I have 90-100% CPU usage when playing extreme tux racer while with that old system I have 15-25% and never more. Also looking at the perf output there is not a single area where cpu spends most of its time. Sadly the system did not have some kernel address mapping file so the perf output is not telling function names properly, but it is enough to just look on the overall radical CPU usage drop and that the usage is now quite evenly spread and mostly userspace things from the etr binary.

          TL;DR: I have a system now to compare against (Linux ubuntu 4.4.0-21-generic) but wll need to lend back the pendrive later once ;-)

          Comment


          • #55
            I have saved the logs here:

            http://ballmerpeak.web.elte.hu/glxinfo_16_04_old.txt
            http://ballmerpeak.web.elte.hu/lspci_16_04_old.txt
            http://ballmerpeak.web.elte.hu/dmesg_16_04_old.txt
            http://ballmerpeak.web.elte.hu/config_16_04_old.txt
            http://ballmerpeak.web.elte.hu/perf_..._16_04_old.txt
            http://ballmerpeak.web.elte.hu/perf_16_04_old.data
            http://ballmerpeak.web.elte.hu/uname_16_04_old.txt

            Some highlights that I can easily spot:

            Code:
            # glxinfo
            Extended renderer info (GLX_MESA_query_renderer):
                Vendor: X.Org R300 Project (0x1002)
                Device: ATI RC410 (0x5a62)
                Version: 11.2.0
                Accelerated: yes
                Video memory: 128MB
                Unified memory: no
                Preferred profile: compat (0x2)
                Max core profile version: 0.0
                Max compat profile version: 2.1
                Max GLES1 profile version: 1.1
                Max GLES[23] profile version: 2.0
            OpenGL vendor string: X.Org R300 Project
            OpenGL renderer string: Gallium 0.4 on ATI RC410
            OpenGL version string: 2.1 Mesa 11.2.0
            OpenGL shading language version string: 1.20
            Code:
            [    0.068199] pci 0000:00:00.0: [Firmware Bug]: reg 0x1c: invalid BAR (can't size)
            ^^also dmesg not only has this, but agp seems to only print what is always printed. Maybe not even an agp problem?!


            Code:
            #
            # Timers subsystem
            #
            CONFIG_TICK_ONESHOT=y
            CONFIG_NO_HZ_COMMON=y
            # CONFIG_HZ_PERIODIC is not set
            CONFIG_NO_HZ_IDLE=y
            CONFIG_NO_HZ=y
            CONFIG_HIGH_RES_TIMERS=y
            ...
            # CONFIG_HZ_100 is not set
            CONFIG_HZ_250=y
            # CONFIG_HZ_300 is not set
            # CONFIG_HZ_1000 is not set
            CONFIG_HZ=250
            I have no idea what it means when neither full preemtion, neither voluntary preemtion is mentioned (no preemtion at all?) but the frequency is quite low. Btw this was the same with the other 16.04 and the Debian 7 system too for the frequency. I was not paying attention to the "voluntaryness".

            And as I told you all, the most visible things is that cpu usage is never above 25% when running exteme tux racer now so on the later systems the cpu do crazy amount of extra work despite there is hardware rendering.

            And finally I see "Gallium 0.4 on ATI RC410" - so at least the name of the topic still means something xD

            Comment


            • #56
              I will list some of the dmesg differences I have spotted in vimdiff (before-after: - and +):

              Code:
              - Initializing cgroup subsys .... (many messages scattered in dmesg)
              ...
              - x86/fpu: Legacy x87 FPU detected.
              - x86/fpu: Using 'lazy' FPU context switches.
              +x86/fpu: x87 FPU will use FXSAVE
              ...
               -original variable MTRRs
               -reg 0, base: 0GB, range: 1GB, type WB
               -reg 1, base: 1GB, range: 512MB, type WB
               -reg 2, base: 1408MB, range: 128MB, type UC
              ...
              - New variable MTRRs
              - reg 0, base: 0GB, range: 1GB, type WB
              - reg 1, base: 1GB, range: 256MB, type WB
              - reg 2, base: 1280MB, range: 128MB, type WB
              - found SMP MP-table at [mem 0x000ff780-0x000ff78f] mapped at [c00ff780]
              (both dmesg have MTRR entries, but the later ones miss these)
              ...
              + tsc: Fast TSC calibration using PIT
               BRK [0x18fbe000, 0x18fbefff] PGTABLE
               Zone ranges:
                 DMA      [mem 0x0000000000001000-0x0000000000ffffff]
                 Normal   [mem 0x0000000001000000-0x00000000377fdfff]
                 HighMem  [mem 0x00000000377fe000-0x0000000057fcffff]
               Movable zone start for each node
               Early memory node ranges
                 node   0: [mem 0x0000000000001000-0x000000000009efff]
                 node   0: [mem 0x0000000000100000-0x0000000057fcffff]
              + Reserved but unavailable: 32914 pages
              (tsc calibration fails with the old kernel, but later succeeds)
              ...
              -RCU: Adjusting geometry for rcu_fanout_leaf=32, nr_cpu_ids=2
              +RCU: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=2
              ...
              +NMI watchdog: Enabled. Permanently consumes one hw-PMU counter.
              ...
              +[    0.005996] HugeTLB registered 4.00 MiB page size, pre-allocated 0 pages
              ...
              - pci_bus 0000:03: busn_res: can not insert [bus 03-ff] under [bus 02-03] (conflicts with (null) [bus 02-03])
              - pci_bus 0000:03: busn_res: [bus 03-ff] end is updated to 06
              - pci_bus 0000:03: busn_res: can not insert [bus 03-06] under [bus 02-03] (conflicts with (null) [bus 02-03])
              - pci_bus 0000:03: [bus 03-06] partially hidden behind transparent bridge 0000:02 [bus 02-03]
              +  pci_bus 0000:03: busn_res: [bus 03] end can not be updated to 06
              (A PCI-PCI bus bridge in the new version tells it can only reach to bus3 instead of bus6 as earlier)
              (Looking at the lspci -t output this is far from the vga and the host bridge and is a Cardbus bridge which is empty anyways)
              ...
              +EDAC MC: Ver: 3.0.0
              (I have no idea what is this but the old ones didn't have it)
              ...
               [drm] radeon kernel modesetting enabled.
              - [drm] initializing kernel modesetting (RS400 0x1002:0x5A62 0x1043:0x1392).
              + [drm] initializing kernel modesetting (RS400 0x1002:0x5A62 0x1043:0x1392 0x00).
              - [drm] register mmio base: 0xFE1F0000
              - [drm] register mmio size: 65536
               [drm] Generation 2 PCI interface, using max accessible memory
               radeon 0000:01:05.0: VRAM: 128M 0x0000000058000000 - 0x000000005FFFFFFF (128M used)
               radeon 0000:01:05.0: GTT: 512M 0x0000000060000000 - 0x000000007FFFFFFF
               [drm] Detected VRAM RAM=128M, BAR=256M
               [drm] RAM width 128bits DDR
              (no new kernels talk about any kind of mmio for drm in the whole dmesg output!)
              ...
               fbcon: radeondrmfb (fb0) is primary device
              (both say this - look at the lspci diff to see why I pay attention)
              ...
              -pci_bus 0000:02: Raising subordinate bus# of parent bus (#02) from #03 to #06
              ...
              - Uhhuh. NMI received for unknown reason 2d on CPU 0.
              - Do you have a strange power saving mode enabled?
              - Dazed and confused, but trying to continue
              ...
              I have spent a whole lot of time to figure out what "subordinate bus" numbers are and how brigdes work just to find out that likely it is not causing any slowdown that there is one bridge that in the old systems looked to be able to reach to pci bus number #6 and now only reaches #3 at most. This sounded like there is a hole in the pci bus hierarchy in the new system at that bridge but it is for cardbus (seen in lspci -v) and nothing is below it until I plug in some cardbus cards so I guess this does not affect (should not affect?) anything. Visible change I can spot here still as the newer pci kernel modules handle this cardbus differently it seems and I have lost a whole lot of time to track it down what is this.

              This mmio thing is visible though:
              Code:
              - [drm] register mmio base: 0xFE1F0000
              - [drm] register mmio size: 65536
              This only exists in the old dmesg log where the 3D performance is still good and it is a log that belongs to the 3D card/driver.

              Comment


              • #57
                This might affect me too:

                Code:
                - original variable MTRRs
                ...
                - reg 2, base: 1408MB, range: 128MB, type UC
                ...
                - new variable MTRRs
                ...
                - reg 2, base: 1280MB, range: 128MB, type WB
                ...
                This really looks to be the video ram in the original variable MTTRs list because I have 1512Mb system ram and 128 Mb is used by the graphics card as VRAM memory. 1512-128~1408 and I guess it is only not perfect because it is calculated in MB, with 1024X multipliers. I have no idea what type UC means but type WB I can understand as "write back" maybe? Then it must be some kind of caching and maybe it is able to set for a better strategy or something like that.

                I have no idea why the base address is changed however in the process. That is weird for me. These lines present in the system that has good 3D performance and not present in the ones that are having bad performance...

                EDIT: "Heureka!" If WB means write-back then UC is uncached!
                Last edited by prenex; 05-23-2019, 03:53 PM.

                Comment


                • #58
                  In vimdiffing lspci I have found the:
                  • PCI cardbus handling changes (#3->#6)
                  • The pci bridge that bridges the GPU to the PCI bus has shpchp (plug&play driver) on it on systems with good performance, but not with the bad ones (should not count).
                  • I see two kernel modules listed for the gpu: "Kernel modules: radeonfb, radeon" (alongside: Kernel driver in use: radeon)
                  Not related to diffing the two systems, but this is my pci bus layout:

                  Code:
                  [[email protected] ~]$ sudo lspci -t -H 1
                  -[0000:00]-+-00.0
                             +-01.0-[01]----05.0
                             +-13.0
                             +-13.1
                             +-13.2
                             +-14.0
                             +-14.1
                             +-14.2
                             +-14.3
                             \-14.4-[02-03]--+-00.0
                                             +-01.0-[03]--
                                             +-01.1
                                             \-01.2
                  I have marked the GPU (05.0) and the host bridge (00.0). The GPU is on a seperate bus that connects via 01.0 and I guess through that bridge can access the host bridge... always good to learn new things...

                  Comment


                  • #59
                    Originally posted by debianxfce View Post
                    However,software development needs a better machine than a 15 year old single core laptop.
                    For userland software it is pretty good still. I see that compiling kernels all day long as a main goal would make me change my opinion though haha

                    Btw I have found that arch32 has a package archive starting with 2017 and I can easily go back to earlier kernels and packages on arch too (it is good it is easy with debian too - I see the value of this much now). Now I went back to 4.4.39-1 which is quite close enough to the 4.4.0-21 that I see on the fast-graphics Ubuntu. There is no change at all so chances are that this might be not an issue in the kernel source tree but somewhere else.

                    I will try going back to older mesa and xorg. First trying with an old mesa and things related to that.

                    PS.: Actually for long times there was an issue on my machine that wifi only worked if I start-stopped it a lot of times after boot (manually) so I had to script it. I remember this issue was gone after a while and thought maybe hardware got in a better shape or it was contact or heating problem but now I see it was actually a kernel fix haha. Going back to this one I still have the issue and it is related to usb handling. This is a side-track, but good to know the hardware was always allright, just there was a window of time this bug was happening to me and not anymore. Also because of knowing that I remember this issue already went away from my original system I have a really high chance that the 3D slowdown is not because of the kernel sources as this means I had a later-than 4.4.39-1 still working fast ;-)

                    Comment


                    • #60
                      Pffff... downgrading mesa is harder than downgrading the kernel...

                      My current mesa is needed for some "libglvnd" package and that is the one that provides "libgl" or applications so it is a bit messy because early versions I want to test do not go through this random layer unknown to me...

                      Also arch32 archives went down right when I was looking for an alternative things that provide libgl :-)

                      Comment

                      Working...
                      X