Announcement

**Paradigm Shifter** · 28 August 2023, 06:54 PM

...initial AVX10 bits...

Please don't make AVX10 the same smörgåsbord of instructions that was AVX-512... it's really frustrating to have to go look up what functions a particular CPU may or may not have when it declares "AVX-512" support.

**onlyLinuxLuvUBack** · 29 August 2023, 12:42 AM

Originally posted by Paradigm Shifter View Post

Please don't make AVX10 the same smörgåsbord of instructions that was AVX-512... it's really frustrating to have to go look up what functions a particular CPU may or may not have when it declares "AVX-512" support.

at least name it something not confusing like avx-512m // "mulligan"

**Termy** · 29 August 2023, 02:32 AM

Sounds promising. Although we clearly need to criticize Intel for "glued together" CPUs

**AdrianBc** · 29 August 2023, 03:07 AM

Originally posted by Paradigm Shifter View Post

Please don't make AVX10 the same smörgåsbord of instructions that was AVX-512... it's really frustrating to have to go look up what functions a particular CPU may or may not have when it declares "AVX-512" support.

While the main purpose of introducing AVX10 was to enable the existence of CPUs that implement only a 256-bit subset of AVX-512 (i.e. the consumer CPUs and the server CPUs with E-cores), Intel has also used this opportunity to satisfy your demand by simplifying the testing for feature support in AVX-512.

AVX10 will have a version number. AVX10.1 is the version implemented by Granite Rapids in 2024. It will be followed in 2025 by AVX10.2, then by AVX10.3 and so on.

It is guaranteed that each later version will include all the features of the earlier versions.

Therefore, on any CPU with AVX10, it is enough to do two tests, the version number and whether it supports the full 512-bit instruction set, or only the 256-bit subset.

**Paradigm Shifter** · 29 August 2023, 07:41 AM

Originally posted by AdrianBc View Post

It is guaranteed that each later version will include all the features of the earlier versions.

Yes, I've heard similar before in multiple fields.

**AdrianBc** · 29 August 2023, 10:31 AM

Originally posted by Paradigm Shifter View Post

Yes, I've heard similar before in multiple fields.

Obviously, it may happen in the future that Intel will not keep their promise.

However, at this time, after they have promised to do what you have asked from them, there is nothing more that can be done.

You may ask them now to promise that they will keep their promise, then you can ask them to promise that they will keep their promise of keeping their promise and so on, but no such demands can change the likelihoods of their future actions.

While based on the history of the last ten years I will not trust anything Intel says about their future CMOS manufacturing processes until we see independent measurements of CPUs made with them, there is nothing in the history of Intel during the last 50 years that would be a cause of not trusting them when they promise backward compatibility.

**chuckula** · 29 August 2023, 03:20 PM

Originally posted by Termy View Post

Sounds promising. Although we clearly need to criticize Intel for "glued together" CPUs

Granite Rapids isn't glued. Yes I'm aware it uses different tiles, but if you think "glue" means more than one chiplet then you don't understand what "glue" actually means when the term was used by Intel engineers. Hint: The Core 2 used glue and AMD is still fundamentally using glue with its latest chips even more than it did with the original Zen 1/ Zen 2 cores because those chiplets at least put the glue on the same die as the CPUs.

Or put another way: Granite Rapids doesn't need an I/O translation hub with SERDES and other signal processing just to move data from one slice of the L3 cache on one tile to another tile. Everything AMD is shipping now does, and Granite Rapids does *not* use glue.

**AdrianBc** · 29 August 2023, 04:15 PM

Originally posted by chuckula View Post

Granite Rapids isn't glued. Yes I'm aware it uses different tiles, but if you think "glue" means more than one chiplet then you don't understand what "glue" actually means when the term was used by Intel engineers. Hint: The Core 2 used glue and AMD is still fundamentally using glue with its latest chips even more than it did with the original Zen 1/ Zen 2 cores because those chiplets at least put the glue on the same die as the CPUs.

Or put another way: Granite Rapids doesn't need an I/O translation hub with SERDES and other signal processing just to move data from one slice of the L3 cache on one tile to another tile. Everything AMD is shipping now does, and Granite Rapids does *not* use glue.

Core 2 cannot be compared with any modern CPUs, either glued or non-glued, because it did not use point-to-point communication links, like all modern devices, but it still used a shared bus for communication.

About Granite Rapids, I have not seen any information published by Intel about how the tiles communicate, so unless you have access to confidential information you do not know whether the communication links between tiles use SERDES or no.

The only alternative to using SERDES is to use communication links with many parallel connections and separate clock signals, exactly like in the HyperTransport links used by the old AMD CPUs, before they have switched to using PCIe.

The HT-like links have the advantage of lower latency, by skipping the SERDES, but they can be used only up to a certain combination of clock frequency and link length. Increasing either the clock frequency or the link length results in excessive skew between the data lines that cannot be compensated, when it becomes necessary to insert SERDES, which increases the communication latency, which is bad, but it also increases the communication throughput, which is good.

So the choice between parallel links like HT and serial links with SERDES, like PCIe or the inter-socket links of both Intel and AMD, is just a trade-off between latency and throughput, which in most modern systems has been decided in the favor of throughput. In any case, this is not a choice that deserves to be named as a difference between glued and non-glued devices.

Regarding the necessity of a central I/O hub, a switch included in the central hub will accelerate the communication between tiles, unless each tile includes sufficient separate links with the other tiles for a complete interconnection. This means that if there are no more than 4 tiles, each tile must have 3 inter-tile links. For more tiles the number of links grows proportionally and their cost becomes unacceptable, so the only solution is to have a central hub with a switch.

Sapphire Rapids has only 4 tiles, so like Zen 1 it does not need a central hub. Whenever Intel will increase the number of tiles, they will have to add a central hub like AMD.

**coder** · 30 August 2023, 06:24 AM

Originally posted by chuckula View Post

Or put another way: Granite Rapids doesn't need an I/O translation hub with SERDES and other signal processing just to move data from one slice of the L3 cache on one tile to another tile. Everything AMD is shipping now does,

How do you know AMD's chiplet link uses serial connections? Even DRAM DIMMs don't, nor does HBM. So, why would they take such a hit for their inter-chiplet communication?

BTW, I think people place too much emphasis on core-to-core latency. Data sharing between cores, via L3, is probably rather rare, in practice. The bigger issue would be all memory traffic has to incur a latency penalty of traversing a pair of SERDES.

Originally posted by chuckula View Post

Granite Rapids does *not* use glue.

It has I/O dies, even if the memory controllers are integrated into the compute dies.

Originally posted by AdrianBc View Post

if there are no more than 4 tiles, each tile must have 3 inter-tile links. For more tiles the number of links grows proportionally and their cost becomes unacceptable, so the only solution is to have a central hub with a switch.

Sapphire Rapids has only 4 tiles, so like Zen 1 it does not need a central hub. Whenever Intel will increase the number of tiles, they will have to add a central hub like AMD.

Not if you allow for more than one hop. Intel likes meshes, which is what Sapphire Rapids and Granite Rapids both use.

"Intel’s mesh has to connect 56 cores with 56 L3 slices. Because L3 accesses are evenly hashed across all slices, there’s a lot of traffic going across that mesh. SPR’s memory controllers, accelerators, and other IO are accessed via ring stops too, so the mesh is larger than the core count alone would suggest. Did I mention it crosses die boundaries too? Intel is no stranger to large meshes, but the complexity increase in SPR seems remarkable."

https://chipsandcheese.com/2023/03/1...pphire-rapids/

As for Granite Rapids, it carries on the approach of using mesh interconnects for cross-die communication:

https://www.tomshardware.com/news/in...e-xeon-roadmap

Announcement

Intel Talks Up 2024 Xeon Sierra Forest & Granite Rapids At Hot Chips

Intel Talks Up 2024 Xeon Sierra Forest & Granite Rapids At Hot Chips

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment