There's https://github.com/vortexgpgpu/vortex for the GPU part (although, right now, it's mainly focused on OpenCL, a Vulkan compatible implementation is still far away).
As for the vector extension, you are wrong. There are vector extension in the RISC V instruction set. Unlike x86 and ARM8, the vector size is dynamic (there is no 128bits version, then 256 bits then 512 bits and so one). The CPU implementation actually dispatch the vectorized instruction as much as it's able to (so yes, there can be a 23 simultaneous operation per iterations in RISC V if the CPU has support for such number). This presents multiple advantages:
1. You don't need to rewrite your SIMD code when a new version of the CPU/architecture is out
2. The same code will always performs as fast as possible on RISC-V
The cons are less obvious:
1. You don't have as many vectorized instructions as a x86 CPU, so your pipeline might be struck by a non vectorized complex mask instruction for example.
2. You can't port your NEON, AVX, SSE2 assembly by just finding a 1:1 match in the right instruction. You have to rethink the algorithm à-la OpenMP way.
As for the vector extension, you are wrong. There are vector extension in the RISC V instruction set. Unlike x86 and ARM8, the vector size is dynamic (there is no 128bits version, then 256 bits then 512 bits and so one). The CPU implementation actually dispatch the vectorized instruction as much as it's able to (so yes, there can be a 23 simultaneous operation per iterations in RISC V if the CPU has support for such number). This presents multiple advantages:
1. You don't need to rewrite your SIMD code when a new version of the CPU/architecture is out
2. The same code will always performs as fast as possible on RISC-V
The cons are less obvious:
1. You don't have as many vectorized instructions as a x86 CPU, so your pipeline might be struck by a non vectorized complex mask instruction for example.
2. You can't port your NEON, AVX, SSE2 assembly by just finding a 1:1 match in the right instruction. You have to rethink the algorithm à-la OpenMP way.
Comment