Announcement

**coder** · 04 February 2024, 07:15 AM

Originally posted by debrouxl View Post

I'd say that this means (confirms) that the bottleneck isn't the FP computation, right ?

Did you see the images in my prior post? Haswell reduced the latency of mul + add from 8 cycles to 5. It also doubled the number of FP multiply ports, as stated in the bit I quoted.

So, it sounds to me like Haswell beefed up the FPU quite a lot, even if you're not utilizing explicit FMA instructions.

**debrouxl** · 04 February 2024, 07:55 AM

Ah yeah... I saw the image previously, but I forgot about the mul+add latency being lowered - that is, outside FMA. Sorry.

Announcement

Red Hat Evaluating x86-64-v3 Requirement For RHEL 10

Comment

Comment