IBM Sends Out Initial Patches For "Dense Math" Support With Future Power CPUs

Written by Michael Larabel in Programming on 15 November 2022 at 06:40 AM EST. Add A Comment

IBM is working to extend Power10's MMA architecture with a new feature for "dense math" that is expected to premiere with future IBM Power processors.

IBM in the past has worked on new compiler support for future iterations of Power processors early within the GCC compiler. Prior to announcing the next-generation processors, IBM worked on Power10 support as a new target called "future" inside GCC. Now they again are preparing the same for what presumably may roll out as Power11.

The IBM patch calls it a "potential new feature" and again introduces the -mcpu=future target. There are all the caveats about the "potential" new features, but if their engineers are already working on the compiler-side support, this is already more than likely baked. In any event they have now begun this "future" bring-up for what may be Power11. With it they introduce a new feature called PowerPC Dense Math. With the IBM Power S1024 they have also talked of a "dense math engine" (DME) microarchitecture for accelerating machine learning, AI inference, and cognitive computing. But if this PowerPC Dense Math support is the same as S1024's Dense Math Engine, it's not clear why they are labeling it as "future" when the DME has been mentioned in IBM papers since this summer.

IBM PowerPC Dense Math is described in the patch series as:

"This patch is very preliminary support for a potential new feature to the PowerPC that extends the current power10 MMA architecture. This feature may or may not be present in any specific future PowerPC processor.

In the current MMA subsystem for Power10, there are 8 512-bit accumulator registers. These accumulators are each tied to sets of 4 FPR registers. When you issue a prime instruction, it makes sure the accumulator is a copy of the 4 FPR registers the accumulator is tied to. When you issue a deprime instruction, it makes sure that the accumulator data content is logically copied to the matching FPR register.

In the potential dense math system, the accumulators are moved to separate registers called dense math registers (DM registers or DMR). The DMRs are then extended to 1,024 bits and new instructions will be added to deal with all 1,024 bits of the DMRs.

If you take existing MMA code, it will work as long as you don't do anything with accumulators, and you follow the rules in the ISA 3.1 documentation for using the MMA subsystem.

These patches add support for the 512-bit accumulators within the dense math system, and for allocation of the 1,024-bit DMRs. At this time, no additional built-in functions will be done to support any dense math features other than doing data movement between the DMRs and the VSX registers. Before we can look at adding any new dense math support other than data movement, we need the GCC compiler to be able to allocate and use these DMRs."

If this is for Power11, expect more patches beyond just this PowerPC Dense Math work to continue coming out. See this patch series for the early work so far around PowerPC Dense Math for GCC and the adding of the -mcpu=future target.

Add A Comment