Navi 10/14 GPUs On Linux Should Be More Reliable With Blanket ATS Disabling
Recently merged to the Linux 5.17 Git code as a fix and now working its way to stable kernel series as a back-port is blanket disabling of PCI ATS on all Navi 10 and 14 GPUs due to problematic vBIOS configurations.
The Linux kernel to now selectively blocked PCI ATS (Address Translation Service) support for select Navi PCI IDs and revisions due to problematic ATS behavior but with the recent Linux 5.17 Git change and that patch being carried to prior kernels is just outright disabling the feature for all Navi 10 and 14 GPUs. PCI ATS can help out in a virtualized environment with performance, but the lack of use for the functionality on Windows has led to buggy behavior for Navi 10 and 14 GPUs with harvested (partially defective) silicon.
For those with Navi 10/14 GPUs on problematic systems/vBIOSes, having PCI ATS enabled could lead to system crashes requiring a hard reset. Rather than trying to chase down all combinations, it's easier to just outright disable the feature for Navi 10 and 14 GPUs.
The patch in v5.17 and queued for back-porting explains:
Long story short, if facing stability problems with Navi 10 or Navi 14 GPUs under Linux, it may be worthwhile upgrading to a patched kernel release (there is also the "pci=noats" kernel option).
The Linux kernel to now selectively blocked PCI ATS (Address Translation Service) support for select Navi PCI IDs and revisions due to problematic ATS behavior but with the recent Linux 5.17 Git change and that patch being carried to prior kernels is just outright disabling the feature for all Navi 10 and 14 GPUs. PCI ATS can help out in a virtualized environment with performance, but the lack of use for the functionality on Windows has led to buggy behavior for Navi 10 and 14 GPUs with harvested (partially defective) silicon.
For those with Navi 10/14 GPUs on problematic systems/vBIOSes, having PCI ATS enabled could lead to system crashes requiring a hard reset. Rather than trying to chase down all combinations, it's easier to just outright disable the feature for Navi 10 and 14 GPUs.
The patch in v5.17 and queued for back-porting explains:
There are enough vbios escapes without the proper workaround that some users still hit this. [Microsoft] never productized ATS on [Windows] so OEM platforms that were [Windows] only didn't always validate ATS.
The advantages of ATS are not worth it compared to the potential instabilities on harvested boards. Just disable ATS on all navi10 and 14 boards.
Long story short, if facing stability problems with Navi 10 or Navi 14 GPUs under Linux, it may be worthwhile upgrading to a patched kernel release (there is also the "pci=noats" kernel option).
7 Comments