The problem I saw when I messed with rocm in the past (meaning not just used it, I was having a go at compiling in support for the Picasso GPU on my notebook), the instructions I found involved compiling 43 different items. And you had to list which GPUs you were intending to support all the way from bottom to top, like building rocm-supporting version of tensorflow the build still involved setting an environment variable with the list of GPUs it would support. There was not a clean seperation like with CUDA where the client runs some CUDA lib that makes PTX bytecode then the Nvidia driver recompiles the PTX bytecode for the particular card it's running on (to the extent that you can usually run a driver supporting a newer CUDA version and run binaries intended for an older CUDA version and have everything work.) It seemed with rocm there were code paths linked in all the way up the stack to generate card-specific bytecode for every card the software was intended to support.
I did get support for Picasso built, and it "worked" but not really -- everything actually worked, but it would not simultaneously compute AND act as a GPU so the GUI (including mouse cursor) would totally lock up when a job was running, and if I ran a job that took more than like 5 or 10 seconds the video driver would decide the GPU had locked up and reset it.
I did get support for Picasso built, and it "worked" but not really -- everything actually worked, but it would not simultaneously compute AND act as a GPU so the GUI (including mouse cursor) would totally lock up when a job was running, and if I ran a job that took more than like 5 or 10 seconds the video driver would decide the GPU had locked up and reset it.
Comment