Originally posted by atomsymbol
Where you get burned by a limited number of ISA registers is if the compiler runs out of registers to hold all of the intermediate state needed to compute a result, and then has to resort to spilling stuff to memory.
Originally posted by atomsymbol
And yes, the physical registers referred to would be the 64-bit ones. I'm surprised about the number of physical FPU registers, but if those are indeed 256-bit, then it's not too much more than the 128 you'd need to support 2 threads (AVX-512 has 32 x 512-bit ISA registers = 64 x 256-bit per thread).
Originally posted by atomsymbol
Originally posted by atomsymbol
Originally posted by atomsymbol
Originally posted by atomsymbol
For instance, you could give each thread 1 page of scratchpad memory. You could implement it by locking a corresponding block of L1 cache to a page of physical RAM, with exclusive semantics. That should avoid most of the normal cache & TLB overhead associated with memory accesses to it.
Originally posted by atomsymbol
Leave a comment: