Linux 6.2 Adds AMD Zen 4 Pipeline Utilization Data To Help Find Performance Bottlenecks
Ahead of the Linux 6.2 merge window ending this weekend, a second batch of the perf subsystem changes have been submitted for this next Linux kernel version. Notable among the various additions to the powerful Linux kernel perf code is handling for various new performance monitoring events with new AMD Zen 4 processors.
With today's collection of perf tools fixes/improvements mailed in to Linus Torvalds for merging, the AMD Zen 4 additions are worth mentioning. A variety of new core performance monitor counters, L3 cache performance monitor counters, and fabric performance monitor counter events. A wide variety of event metrics around dispatch, execution and retirement, branch prediction, L1/L2 cache activity, and TLB activity are exposed in a compatible manner for Zen 4 processors.
In addition, there are new performance measurements that can now be tapped into by Linux's perf utility with Zen 4 processors headlined by now having pipeline utilization data. Among the many pipeline utilization metrics that are new with Zen 4 CPUs include details around bad speculation and miss-predicts, front-end bound bandwidth, back-end bound by the memory subsystem or CPU,
The Zen 4 pipeline utilization metrics allow analyzing activity at different stages of the CPU pipeline for determining performance bottlenecks in the executed code. This should help developers greatly in figuring out any shortcomings in their code and make more effective performance optimizations as a result of these hardware insights.
These new performance counters with Zen 4 have been detailed already in AMD's public Processor Programming Reference (PPR) manual while now it's sliced up in JSON form for consumption by Linux's perf tooling. It's unfortunate that these perf additions weren't upstreamed into the Linux kernel sooner to assist developers with early access to Ryzen 7000 series and EPYC 9004 series processors to help in their profiling and optimizations, but at least it's on the way now with Linux 6.2. Some Zen 4 additions did arrive earlier like the Zen 4 instruction-based sampling (IBS) with Linux 6.0 while these JSON additions are coming post-launch.
Today's perf tools pull also has various fixes, perf lock contention reporting, refreshed metrics/events for various generations of Intel CPUs, and other updates.
With today's collection of perf tools fixes/improvements mailed in to Linus Torvalds for merging, the AMD Zen 4 additions are worth mentioning. A variety of new core performance monitor counters, L3 cache performance monitor counters, and fabric performance monitor counter events. A wide variety of event metrics around dispatch, execution and retirement, branch prediction, L1/L2 cache activity, and TLB activity are exposed in a compatible manner for Zen 4 processors.
In addition, there are new performance measurements that can now be tapped into by Linux's perf utility with Zen 4 processors headlined by now having pipeline utilization data. Among the many pipeline utilization metrics that are new with Zen 4 CPUs include details around bad speculation and miss-predicts, front-end bound bandwidth, back-end bound by the memory subsystem or CPU,
The Zen 4 pipeline utilization metrics allow analyzing activity at different stages of the CPU pipeline for determining performance bottlenecks in the executed code. This should help developers greatly in figuring out any shortcomings in their code and make more effective performance optimizations as a result of these hardware insights.
These new performance counters with Zen 4 have been detailed already in AMD's public Processor Programming Reference (PPR) manual while now it's sliced up in JSON form for consumption by Linux's perf tooling. It's unfortunate that these perf additions weren't upstreamed into the Linux kernel sooner to assist developers with early access to Ryzen 7000 series and EPYC 9004 series processors to help in their profiling and optimizations, but at least it's on the way now with Linux 6.2. Some Zen 4 additions did arrive earlier like the Zen 4 instruction-based sampling (IBS) with Linux 6.0 while these JSON additions are coming post-launch.
Today's perf tools pull also has various fixes, perf lock contention reporting, refreshed metrics/events for various generations of Intel CPUs, and other updates.
1 Comment