Originally posted by davidbepo
View Post
however the team that worked on larrabee weren't allowed to tell anyone how bad the performance was: it was not until jeff bush replicated that work in nyuzi and made the full design source code public that it was possible to determine *EXACTLY* where the performance was lacking.
i've spent a lot of time talking with jeff (he's really an amazing guy), and he pointed out things to me such as, if nyuzi / larrabee had a single instruction for converting 4-wide F.P. vectors of ARGB into 4 32-bit pixels, for example, that would knock something like... i can't remember exactly... let's say it would knock 20% off the time spent per pixel on rendering. then, the next highest priority to target would be... X (whatever).
basically his paper lays out the groundwork on how to go about profiling a software-rendered design (which is a LOT easier than profiling a hybrid hardware-software design), giving you the statistics needed to decide where to focus time and effort.
and, as the design is based on RISC-V and there are software emulators for that (qemu and spike), the process of doing iteratve development to add *in* the kinds of experimental custom instructions, to see what would and would not work, can be much more rapid than would otherwise be expected.
bottom line: we're aware of larrabee, and nyuzi, and have a strategy in place *thanks* to that work.
Comment