Announcement

**justin_webb** · 07 February 2023, 01:14 PM

Does anyone else think that vectors are getting a little too long? I can't quite put my finger on it, but I just feel a little, in a way, constipated, knowing that such large vectors are being processed by my computer. I think we should make vectors smaller again to make them easier to digest for the microarchitecture, maybe sprinkle in a few NOPs to make the going easier.

Sent from my iPhone

**hotaru** · 07 February 2023, 04:53 PM

Originally posted by justin_webb View Post

Does anyone else think that vectors are getting a little too long? I can't quite put my finger on it, but I just feel a little, in a way, constipated, knowing that such large vectors are being processed by my computer. I think we should make vectors smaller again to make them easier to digest for the microarchitecture, maybe sprinkle in a few NOPs to make the going easier.

sounds like something someone who works for Intel would say.

**pipe13** · 07 February 2023, 09:03 PM

Originally posted by hotaru View Post

sounds like something someone who works for Intel would say.

sounds like something from someone with tongue firmly in cheek.

**hotaru** · 07 February 2023, 11:40 PM

Originally posted by pipe13 View Post

sounds like something from someone with tongue firmly in cheek.

maybe, but Intel does have a track record of always finding some way to make the new vector length useless every time they extend it. SSE wasn't a viable replacement for MMX until SSE2. AVX was mostly useless until AVX2, and then it came with downclocking and thermal issues. Intel's implementation of AVX-512 also came with downclocking and thermal issues. Intel didn't stop releasing new CPUs without AVX until last year. it's like they always want these new extensions to fail.

**pegasus** · 08 February 2023, 05:26 AM

Originally posted by justin_webb View Post

Does anyone else think that vectors are getting a little too long?

If anything, they're too short to do serious stuff. Take a look at NEC Aurora, that's proper vector engine, not this puny avx stuff.

**stormcrow** · 08 February 2023, 06:16 AM

Originally posted by hotaru View Post

maybe, but Intel does have a track record of always finding some way to make the new vector length useless every time they extend it. SSE wasn't a viable replacement for MMX until SSE2. AVX was mostly useless until AVX2, and then it came with downclocking and thermal issues. Intel's implementation of AVX-512 also came with downclocking and thermal issues. Intel didn't stop releasing new CPUs without AVX until last year. it's like they always want these new extensions to fail.

Intel is into pushing gimmicks to justify otherwise anemic incremental releases. They did it with Pentiums, they're doing it now with Core 2. It's what they've been doing for decades, and most of the tech press is willing to along with it to have something to write about. This is why Intel keeps getting blindsided by others that come along and release fully baked versions of the same technology or new technology like multicore CPUs or Apple's ARM based M power/efficiency cores with fully functional and performant x86-64 compatibility that often clean Intel's clock on performance or some other necessary metric like power efficiency. As I see it, Intel keeps releasing before technology is in a useful state to generate press excitement - not for outstanding products because they aren't outstanding, but to generate financial market buzz. Financial markets reward perceptions, not market reality. In turn, all that hype and propaganda convinces enough people to buy to cover up the fact that the yearly desktop/server market isn't driven by hype, but by corporate update cycles which are planned years in advance. This reality is very apparent in the most recent numbers showing massive drop in PC sales that not even new CPU releases are denting, even for Macs. Good enough is good enough! The next round of corporate upgrades is going to be triggered by Windows 10 dropping out of general support in early 2025.

To me, most of the interesting technology stories come from companies other than Intel. Not because Intel doesn't try to innovate, but because what they do innovate is half baked and poorly thought out in comparison to other processor companies (i740, 8xx, etc GPUs; SGX; AVX; SSE; DOIT; and other half baked tech releases), and in some cases just plain stupid (clockspeed, clockspeed, clockspeed!!!11 & the x86 everywhere namely Larrabee.)

Intel suffers from tunnel vision like most gigantic market leaders and monopolists. They easily get stuck into NIH ruts till they get a baseball bat between the eyes every few years to wake them up.

**AdrianBc** · 10 February 2023, 05:35 AM

Originally posted by justin_webb View Post

Does anyone else think that vectors are getting a little too long? I can't quite put my finger on it, but I just feel a little, in a way, constipated, knowing that such large vectors are being processed by my computer. I think we should make vectors smaller again to make them easier to digest for the microarchitecture, maybe sprinkle in a few NOPs to make the going easier.

Sent from my iPhone

The GPUs have been using 1024-bit or 2048-bit vector registers for decades.

In 1976, Cray 1 (which was many orders of magnitude simpler than a smartphone CPU of today) was using 4096-bit vector registers.

The width of the vector registers can be much greater than the width of the execution units, as it happens in Zen 4 and in many GPUs.

The former has negligible influence on the power consumption and in fact bigger vector registers can reduce the power consumption, because less instructions are executed for completing a given task and the execution units are used more fully (because they are provided with independent operands, which can be processed concurrently in pipelines), without consuming power while staying idle.

The width of the vector registers vs. the width of the execution units (e.g. 512-bit vs. 256-bit in a 7950X) is the same like the total number of CPU threads vs. the number of CPU cores (e.g. 32 vs. 16 in a 7950X).

Both the width of the execution units and the number of CPU cores multiply the power consumption, but also proportionally the computing speed.

When the width of the vector registers is greater than the width of the execution units and/or the total number of CPU threads is greater than the number of CPU cores, this might not increase the computing speed, but usually it does, by providing work to do for the execution units that would have stayed idle otherwise.

Both a greater width of the vector registers and a greater number of CPU threads have additional costs that are determined mainly by the increased register size. These additional costs are normally small in comparison with the additional performance.

For general-purpose CPUs, it is likely that 512-bit is the best vector register size, because this is also the size of a data cache line. The matched sizes simplify the optimization of the algorithms in order to use most efficiently both the vector instructions and the hierarchy of data cache memories.

Announcement

GCC 13 Now Enables 512-bit Vector For AMD Zen 4 Tuning

GCC 13 Now Enables 512-bit Vector For AMD Zen 4 Tuning

Comment

Comment

Comment

Comment

Comment

Comment

Comment