Announcement

Collapse
No announcement yet.

GCC Lands AVX-512 Fully-Masked Vectorization

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #31
    Originally posted by coder View Post
    That can achieve the same theoretical throughput, but increasing latency can also be a performance killer.
    The reason people hated the initial implementation was that adding AVX instructions to your app affected the performance of all the other non-avx code running on the CPU. It was unpredictable and could screw you over in various ways. If they had just made it predictable, people would have been able to see the performance of AVX code and determined directly whether it was worth using or not. Compilers could have added heuristics about whether it made sense to add AVX instructions or not. Instead, you got something that seemed to speed up the executable you were running, but would also cripple anything else that was running at the same time.

    There's no free lunch, here.
    Yeah, that's what i said from the start.

    Going back to that: Pretty much everything about a chip depends on the process it is on. If Intel couldn't implement a decent version of AVX512 on the node they had, that's their decision to make and it's 100% fair to judge them on it. No one forced them to add a feature that wasn't ready yet to a new CPU that didn't need to have it. "They were on a bad node" is a crappy excuse.
    Last edited by smitty3268; 25 June 2023, 06:52 PM.

    Comment


    • #32
      Originally posted by smitty3268 View Post
      The reason people hated the initial implementation was that adding AVX instructions to your app affected the performance of all the other non-avx code running on the CPU. It was unpredictable and could screw you over in various ways.
      You mean by clock-throttling?

      Originally posted by smitty3268 View Post
      Compilers could have added heuristics about whether it made sense to add AVX instructions or not.
      No, they can't. They almost never have enough visibility to determine whether it makes sense. They already have enough trouble trying to figure out whether it makes sense to loop-unroll. You'd really need some PGO-based feedback to know how much time is spent in potentially AVX-512 code, and the answer would change for different workloads.

      Originally posted by smitty3268 View Post
      Pretty much everything about a chip depends on the process it is on. If Intel couldn't implement a decent version of AVX512 on the node they had, that's their decision to make and it's 100% fair to judge them on it. No one forced them to add a feature that wasn't ready yet to a new CPU that didn't need to have it. "They were on a bad node" is a crappy excuse.
      I don't think we disagree on that. I'm not trying to make excuses for them, just chipping in my own $0.02 on where they made their misstep.

      Comment

      Working...
      X