Originally posted by coder
View Post
It's trivially easy to make a design on an FPGA that far outstrips a CPU, even a small one. All you need to do is design a pipelined core for MD5 or SHA256 or whatever, and tile it across the FPGA making dozens of parallel compute engines.
OpenCL is larger and more complicated, but not terribly different in principle. Take all the parallel math (eg matrix multiplications) break it apart and tile multipliers and adders across the chip to do as many operations in parallel and in a pipelined manor as possible.
It won't be nearly as fast as a GPU, because the GPU has both more units than even the largest FPGA could fit (extra circuitry is required for reconfigurability) and faster digital logic gates (LUT units are slower). But it will be easily faster than the fastest CPU.
For real world examples, see Microsoft. They decided that, rather than make a single fixed piece of hardware like google's TPU, they'd invest in FPGAs. The theory was that AI algorithms would evolve and they could update their designs accordingly. Meanwhile Google's TPU might become incompatible.
Comment