Announcement

Collapse
No announcement yet.

NVMe SSDs with buggy firmware "disappearing" with kernel 5.18+

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • NVMe SSDs with buggy firmware "disappearing" with kernel 5.18+

    I recently bought two identical NVMe SSDs (fanxiang S880) to build a RAID 1. It turns out the SSDs have buggy firmware: According to the NVMe spec, they should have unique nsid values. These nsids, AFAIK, are used to populate the /dev/disk/by-id/ folder and should be globally unique for the mapping to work. As the SSDs I got share the same "globally unique" ID, only one of them can be used on my system at any given time. The other one gets rejected by the kernel as the supposedly unique ID is already in use. This change was introduced in kernel 5.18 to prevent drives with identical nsids overwriting each other in the /dev/disks/ folder. This is what kernel 6.2 spits out with the fanxiang SSDs in my machine:

    Code:
    [    1.283918] nvme nvme0: pci function 0000:01:00.0
    [    1.284155] nvme nvme1: pci function 0000:14:00.0
    [    1.289018] nvme nvme1: missing or invalid SUBNQN field.
    [    1.289365] nvme nvme0: missing or invalid SUBNQN field.
    [    1.293463] nvme nvme1: allocated 32 MiB host memory buffer.
    [    1.295186] nvme nvme0: allocated 32 MiB host memory buffer.
    [    1.297751] nvme nvme1: 4/0/0 default/read/poll queues
    [    1.298662] nvme nvme0: 4/0/0 default/read/poll queues
    [    1.300332] nvme nvme0: globally duplicate IDs for nsid 1
    [    1.300341] nvme nvme0: VID:DID 1e4b:1602 model:Fanxiang S880 1TB firmware:SN11273
    ‚Äč
    After googling for this problem, it seems that quite a few cheap NVMe drives have this issue. The workaround seems to be to have a quirk added to the kernel nvme driver.



    phoronix have you encountered this behavior in NVMe drives before? Do you know whether the manufacturers know about this problem? With NVMe-only NAS boxes becoming more mainstream, I suspect that these firmware bugs will start to hit more and more users.

    A quick glance through the bug report brings up affected drives by PNY, Netac and Sk Hynix. I've seen the same problem mentioned with drives by Adata and my Fanxiang drives also show this problem.

    This current kernel behavior was introduced with this commit: https://git.kernel.org/pub/scm/linux...105b2228467aec 7
    The quirks were introduced here: https://git.kernel.org/pub/scm/linux...05b2708b63b8cf 6

    My hope is that Michael might be able to shine a light on this as I'm afraid that only publicity will effect the necessary change. So far it looks as though quite a few manufacturers are content with the situation as it is, where the customer needs to report the drives to the kernel maintainer who then provides a workaround for a product that does not follow the NVMe specification.

    Concerning my own two SSDs I will try to get a refund as I feel the drives are defective. I'd rather not have them added to the quirks list - that would mean that the practice of not adhering to the spec is a-ok and the burden of defective firmware is on the customer and the kernel maintainers.
Working...
X