AMD's Raven Ridge Botchy Linux Support Appears Worse With Some Motherboards/BIOS
It's been three days since last having any new Raven Ridge Linux tests not out of running out of benchmarking ideas for these interesting Zen+Vega chips or taking a break, but because I've simply been struggling to get the systems working well and reliably. With my initial testing I was using an MSI X370 XPOWER GAMING TITANIUM motherboard out of having it available for testing, but I doubt most people would be using a such a $260~270 USD motherboard to go with a $99~169 APU. The motherboards I picked up for the Raven Ridge benchmarking systems in the longer term were a MSI B350M GAMING PRO and ASUS PRIME B350M-E motherboards. They use the B350 chipset, are micro-ATX, and cost less than $100 USD -- certainly more appropriate for pairing with the Ryzen 3 2200G that retails for $99 or the Ryzen 5 2400G that goes for $169, rather than the X370 motherboards that can cost around $300 USD.
With both of these motherboards I first had to flash their BIOS to their latest versions when using a Ryzen 7 1800X CPU in order to enable the Raven Ridge APU support. After that BIOS update with that CPU and an external graphics card, I tried the two Raven Ridge APUs in each of these boards. When installing Ubuntu 17.10 and Fedora 27 with their pre-4.15 kernels, the displays were working albeit without hardware acceleration since the Raven Ridge AMDGPU kernel driver support relies upon Linux 4.15+ due to its dependence on the DC display code stack.
On both Ubuntu 17.10 and Fedora 27 as testing on both systems, I then proceeded to run all available system updates, fetch the latest AMDGPU firmware files from linux-firmware.git, tried with the latest Mesa patches, and also using 4.15+ kernels. Sadly, with both systems it's often difficult to get a working display on Linux 4.15+ short of booting with "nomodeset" to disable the AMDGPU driver. Occasionally I get lucky with the mode-setting going alright, but when beginning the tests, hangs are still very common. It does also seem to have a better success rate when doing a cold boot rather than a restart on problems.
I've tried out the Linux 4.15.4 stable kernel, Linux 4.16 Git, and Alex Deucher's 4.17 work-in-progress AMDGPU development branch. With all of those different kernels, the system often gets stuck at boot with varying behavior: either no display at all but can SSH in only to find nothing helpful from the dmesg output, a hard hang with no ability to SSH into the system. When trying the APUs back in the MSI X370 XPOWER GAMING TITANIUM board, it's back to working albeit with the spotty support. But when I do get lucky and hit the desktop, when engaging in common OpenGL/Vulkan Linux games, hangs are still an issue.
I have tried different RAM modules and enabled/disabled the memory profiles from the BIOS, adjusting the amount of exposed RAM for the graphics, trying different DVI / HDMI / DP outputs and monitors, other kernel command line options, varying versions of Mesa, and changing around other basic tunables for trying to get the Vega APU graphics working on these two APUs and two B350 motherboards on their latest firmware. Unfortunately rolling back the BIOS isn't an option since only their latest BIOS support the Raven Ridge APUs. When putting in a discrete GPU, the systems boot up fine with the AMDGPU driver. Trying out AMDGPU-PRO 17.50 on Ubuntu 16.04 LTS also hadn't worked out nor did an Antergos 18.2 live image.
Whenever seeing new AMDGPU DC code drops in recent months, they have often mentioned "Raven" fixes, but I really didn't expect the support to be this bad at launch. Open-source AMD developers have been working on the Raven Ridge support going back to last year and all. Thus I'm thinking some motherboard vendors with their BIOS updates changed some behavior adversely affecting the Linux driver support and/or the pre-production bring-up of Raven with the Linux driver stack is a lot different than what ended up being with the production hardware/BIOS... Perhaps this is why AMD hadn't offered any review samples this time around for our Linux testing or even briefed on the launch.
Disappointing and very frustrating after buying the hardware in order to deliver this Linux testing. For now back to toying around with this Raven Ridge hardware to see if I get lucky and can deal with enough hangs or failed boots to get some more benchmarks completed today.