AMD Zen 2 CPUs Come With A Few New Instructions - At Least WBNOINVD, CLWB, RDPID
During the AMD Zen 2 + RDNA launch event they highlighted some of the new instructions to find with the Zen 2 processor but there is at least one more.
During the Zen 2 briefings they covered CLWB for Cache Line Write Back and WBNOINVD for Write Back and Do Not Invalidate Cache. CLWB writes back to memory the cache line specified. WBNOINVD is similar to WBINVD but handles the write back and does not invalidate the cache. WBNOINVD will write the modified cache lines from the CPU cache to system RAM without flushing (invalidating) the internal caches.
Those were just the cache instructions talked about during the LA event. But in recalling the AMD znver2 compiler patches to GCC that were published at the end of last year, there was also the RDPID instruction. RDPID is the instruction to read the processor ID in a faster manner than the likes of RDTSCP that also includes the time-stamp counter.
Curious, I asked around and found out that RDPID is in fact present for Zen 2 processors. That's good news and originally Intel plumbed the RDPID support for Cannonlake/Icelake, to which they got the support ready but for lack of Cannonlake shipping at scale and not yet Icelake, AMD is able to take advantage of that infrastructure work.
One of the areas the Linux kernel is already plumbed to make use of RDPID when present is within the __getcpu code. Given the getcpu system call for determining the CPU node/core where a thread has been active in terms of placement, this could have some interesting implications. At the very least, Intel developers who only support this instruction with Icelake have said this instruction to be much faster than the existing instructions for the task. When I have my hands on the new Ryzen 3000 processors (and review embargo expired), I'll certainly run some benchmarks to see if it makes any meaningful difference as at least Intel's RDPID support was mentioned in a code comment of being much snappier.
When finding out if RDPID was indeed in Zen 2, I also heard that there are likely "some" other new instructions as well with Zen 2 processors but couldn't be immediately recalled by one of the Zen architects. That's exciting to hear but wasn't covered at the AMD event as they may be waiting for the Hot Chips conference to talk about them more, but I'll hopefully be getting a list soon of these new instructions, and hearing they didn't think the "press" would be interested in ISA details at this AMD event plus obviously limited time.
The only downside of hearing there are more instructions at stake is that sadly the GCC and LLVM Clang compilers with their initial "znver2" support only expose the cache instructions and RDPID as part of this Zen 2 target. Thus having to add more instructions to be exposed by znver2 will just delay the time by which developers / code builders will find these new instructions flipped on when building with -march=znver2. That's one of the areas Intel has been good at meanwhile in generally providing new ISA support and compiler targets for forthcoming CPUs generally up to years in advance due to the annual release cadence of the GCC and six month releases for LLVM/Clang. So for any Znver2 compiler improvements we'll likely not see them until GCC 10 next year or LLVM Clang 9 in September.
I also raised the issue of the znver1 scheduler / cost tables sometimes not being optimal and the znver2 compiler support mostly reusing the znver1 code without being tuned yet and how that's unfortunate given the slow cadences for getting this support into new compilers. Hopefully by the time of Zen 3's launch we'll see good znver3 support in released compilers next year with proper tuning and all new instruction set extensions.
Anyhow, after 7 July I'll certainly be running some Zen 2 compiler benchmarks with GCC and Clang on Phoronix along with the AMD Optimizing C/C++ compiler whenever AMD updates that LLVM/Clang-forked compiler for Znver2.
During the Zen 2 briefings they covered CLWB for Cache Line Write Back and WBNOINVD for Write Back and Do Not Invalidate Cache. CLWB writes back to memory the cache line specified. WBNOINVD is similar to WBINVD but handles the write back and does not invalidate the cache. WBNOINVD will write the modified cache lines from the CPU cache to system RAM without flushing (invalidating) the internal caches.
Those were just the cache instructions talked about during the LA event. But in recalling the AMD znver2 compiler patches to GCC that were published at the end of last year, there was also the RDPID instruction. RDPID is the instruction to read the processor ID in a faster manner than the likes of RDTSCP that also includes the time-stamp counter.
Curious, I asked around and found out that RDPID is in fact present for Zen 2 processors. That's good news and originally Intel plumbed the RDPID support for Cannonlake/Icelake, to which they got the support ready but for lack of Cannonlake shipping at scale and not yet Icelake, AMD is able to take advantage of that infrastructure work.
One of the areas the Linux kernel is already plumbed to make use of RDPID when present is within the __getcpu code. Given the getcpu system call for determining the CPU node/core where a thread has been active in terms of placement, this could have some interesting implications. At the very least, Intel developers who only support this instruction with Icelake have said this instruction to be much faster than the existing instructions for the task. When I have my hands on the new Ryzen 3000 processors (and review embargo expired), I'll certainly run some benchmarks to see if it makes any meaningful difference as at least Intel's RDPID support was mentioned in a code comment of being much snappier.
When finding out if RDPID was indeed in Zen 2, I also heard that there are likely "some" other new instructions as well with Zen 2 processors but couldn't be immediately recalled by one of the Zen architects. That's exciting to hear but wasn't covered at the AMD event as they may be waiting for the Hot Chips conference to talk about them more, but I'll hopefully be getting a list soon of these new instructions, and hearing they didn't think the "press" would be interested in ISA details at this AMD event plus obviously limited time.
The only downside of hearing there are more instructions at stake is that sadly the GCC and LLVM Clang compilers with their initial "znver2" support only expose the cache instructions and RDPID as part of this Zen 2 target. Thus having to add more instructions to be exposed by znver2 will just delay the time by which developers / code builders will find these new instructions flipped on when building with -march=znver2. That's one of the areas Intel has been good at meanwhile in generally providing new ISA support and compiler targets for forthcoming CPUs generally up to years in advance due to the annual release cadence of the GCC and six month releases for LLVM/Clang. So for any Znver2 compiler improvements we'll likely not see them until GCC 10 next year or LLVM Clang 9 in September.
I also raised the issue of the znver1 scheduler / cost tables sometimes not being optimal and the znver2 compiler support mostly reusing the znver1 code without being tuned yet and how that's unfortunate given the slow cadences for getting this support into new compilers. Hopefully by the time of Zen 3's launch we'll see good znver3 support in released compilers next year with proper tuning and all new instruction set extensions.
Anyhow, after 7 July I'll certainly be running some Zen 2 compiler benchmarks with GCC and Clang on Phoronix along with the AMD Optimizing C/C++ compiler whenever AMD updates that LLVM/Clang-forked compiler for Znver2.
24 Comments