Announcement

Collapse
No announcement yet.

Questions about the address translation mechanism of KFD and IOMMU

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Questions about the address translation mechanism of KFD and IOMMU

    Hi all,

    I'm currently wroking on project to virtualize an HSA compliant system. In particular, I am attempting to allow multiple guest OSes to share the resources in an HSA system. The implementation is based on KVM and Kaveri platform. My work has been blocked by one problem, and I was hoping any of you can give me some advice.

    For this project, I need to enable IOMMU's two-stage (guest virtual -> guest physical -> machine physical) address translation. When I turned on IOMMU's two-stage address translation (I set the page table for stage 1&2 properly) and run a HSA program on the guest OS. However, IO_PAGE_FAULTs showed up in log and the faulting address is the same as physical address of the cik_mqd (Memory Queue Descriptors) created in init_mqd function (inside kfd_mqd_manager.c) during KFD_IOC_CREATE_QUEUE ioctl. With some additional tracing study and surveys, I found that there are two HW components to carry out address translation for GPU: GPUVM and IOMMU.

    Here are my questions:
    1. Is cik_mqd translated by GPUVM?

    I made this guess since gart_mqd_addr is assigned to pm4_packet in the pm_create_map_queue function during KFD_IOC_CREATE_QUEUE ioctl.

    2. Will GPUVM go through two-stage address translation when IOMMU is enable to do two-stage address translation?

    I assume GPUVM translates cik_mqd since mqd_gart_addr is set in the pm_create_map_queue, and it ends up translating to physical address of cik_mqd. With two-stage translation enabled, GPUVM takes the physical address of cik_mqd as input of the 2nd-stage. This causes translation fault because it is already a physical address. I guess this is why I got the IO_PAGE_FAULT.

    3. Can I set GPUVM to do one-stage address translation while IOMMU is doing two-stage address translation?

    4. Can cik_mqd be translated by IOMMU rather than GPUVM?
Working...
X