Back in January, CUDA GPU options were added to the pytorch and tensorflow benchmarks. I'd like to know whether these CUDA options are working as expected or if perhaps I'm missing something about the test requirements because I'm getting strange results.

With pytorch, it doesn't seem to matter what GPU I use on a given platform, the results are nearly identical (see table below). Although GPU utilization is low, (sub 50%), there is clearly GPU activity during the tests.
For tensorflow, there is no GPU utilization whatsoever, and on average, runs at about half the performance of the CPU tensorflow test i.e. extremely slow.

Are these known issues? Is additional configuration required to get these benchmarks to utilize the GPU hardware properly?
4090 4070 SUPER
pytorch: NVIDIA CUDA GPU - 1 - ResNet-50 217.04 216.84
pytorch: NVIDIA CUDA GPU - 1 - ResNet-152 76.24 76.25
pytorch: NVIDIA CUDA GPU - 16 - ResNet-50 214.77 215.24
pytorch: NVIDIA CUDA GPU - 32 - ResNet-50 214.86 215.94
pytorch: NVIDIA CUDA GPU - 64 - ResNet-50 215.78 214.88
pytorch: NVIDIA CUDA GPU - 16 - ResNet-152 76.29 76.92
pytorch: NVIDIA CUDA GPU - 256 - ResNet-50 215.99 214.65
pytorch: NVIDIA CUDA GPU - 32 - ResNet-152 75.98 76.63
pytorch: NVIDIA CUDA GPU - 512 - ResNet-50 216.33 216.65
pytorch: NVIDIA CUDA GPU - 64 - ResNet-152 76.11 76.62
pytorch: NVIDIA CUDA GPU - 256 - ResNet-152 75.75 76.69
pytorch: NVIDIA CUDA GPU - 512 - ResNet-152 76.45 76.83
pytorch: NVIDIA CUDA GPU - 1 - Efficientnet_v2_l 39.68 39.96
pytorch: NVIDIA CUDA GPU - 16 - Efficientnet_v2_l 39.04 38.81
pytorch: NVIDIA CUDA GPU - 32 - Efficientnet_v2_l 38.26 38.93
pytorch: NVIDIA CUDA GPU - 64 - Efficientnet_v2_l 38.78 38.88
pytorch: NVIDIA CUDA GPU - 256 - Efficientnet_v2_l 38.42 39.05
pytorch: NVIDIA CUDA GPU - 512 - Efficientnet_v2_l 39.09 38.73
OS: Ubuntu 22.04, Kernel: 6.5.0-21-generic (x86_64), Desktop: GNOME Shell 42.9, Display Server: X Server 1.21.1.4, Display Driver: NVIDIA 535.154.05, OpenGL: 4.6.0, OpenCL: OpenCL 3.0 CUDA 12.2.148, Vulkan: 1.3.242, Compiler: GCC 11.4.0 + CUDA 11.5, File-System: ext4, Screen Resolution: 3840x2160