With just one AVX512 vector unit (or 2x256 joining to make 1x512) it's going to be as power-optimized as it gets. Given the process node that is (14nm)......