AMD Talks Up Zen 4 AVX-512, Genoa, Siena & More At Financial Analyst Day

brad0 replied

09 June 2022, 10:15 PM
Originally posted by Linuxxx View Post

So, will AVX-512 support mean all instructions are going to be implemented or only the ones benefitting AI/ML workloads?

Intel does not even implement all instructions in any processor, so the question doesn't make sense.
Likes 3
Leave a comment:
WorBlux replied

09 June 2022, 09:57 PM
Originally posted by Spacefish View Post

I am dreaming of an FPGA "Co-Processor" integrated into the CPU.. On the software side you would have some sort of "compiler" which translates your algorithm /kernels into Netlists for the FPGA.
The FPGA Nodes would be connected to CPU registers and you would have an x86 Instructions to load FPGA Netlists from memory and other ones to "trigger" them.
That FPGA Net could do things, which you typically would do in multiple normal instructions.

This could accelerate Algorithms where a later stage depends on data on an earlier stage, so no real vectorization/SIMD is possible or there are a lot of branches. These branches could just run in parallel on the FPGA.

For example if you have something like:
y[i] = y[i-1]*x[i]

or

if(x > 3) x+= 2 else x+=3

You could just have two binary full adders which do x+2 and x+3 in parallel and another logic LUT which looks at the relevant bits of "x" and generates a 1 if x > 3 or 0 otherwise.. The output of one of the binary full adders is connected to the result register depending on the 1/0 signal of the condition evaluation net.
That´s quite power inefficient, but depending on transistor speed, you could do all of that in one clockcycle.

With their recent Xilinx Acquisition, AMD has FPGA based accelerator cards in their portfolio (Alveo Lineup).. Guess the "Smart NIC" mentioned in the picture is one of them, maybe combined with a Zen 4 in one package / on the same card?

The later example can already be done with cmov, or a vector predicate.

And in the first example there is necessarily a 1 multiply latency between stores. Even doing it in FPGA you have to wait for that latency, and it's probably worse there as LUT's run slower than your ALU, the main gain being the decoder doesn't have to keep shoving instructions into the pipeline.

But it would be more beneficial to mix vector and scalar registors, and keep a scalar as an intermediate and let hardware counters issue the loop - Libre-soc Simple-V style.

And there are issue will connecting the FPGA directly to the architectural registers, a lot of potential issues and bottle-necks. It would be better using FPGA blocks as asynchronous accelerators operating on memory rather than getting so close to the critical path of the cpu.

Last edited by WorBlux; 09 June 2022, 10:00 PM.
Likes 2
Leave a comment:
rclark replied

09 June 2022, 09:26 PM
Sounds great! Thanks for the news! But I wonder how many of us with make the 'jump' having to buy at least 'motherboard/memory/cpu' out the gate . I mean those of us already on the 5000 series CPUs. I certainly can see data centers for example, wanting the extra performance.... I did jump on the first Ryzen 1600 when introduced and just upgraded CPUs (which was 'very' nice trick with the AM4 socket), but then I was actually 'needing' better performance back then. Now on 5000 series CPUs, VMs just 'fly'. All software I use is very responsive, compiles are quick. So, as for myself, I will initially just watch how this plays out as I am swimming in an ocean of unused performance ... and loving all that elbow room so to speak.

Last edited by rclark; 09 June 2022, 09:28 PM.
Leave a comment:
theriddick replied

09 June 2022, 08:53 PM
The CPU's sound great to me, using a 5700G atm which I kind of regret buying due to its stripped down cache and the fact I haven't really made good use of the iGPU. (plus the iGPU can't do video encoding correctly!!)

I think this time around I'll be skipping the 7000 series AMD GPU cards however (70% certain), and going with a 4080 or something.
I know it sounds lame but I want functional up-scaling (FSR1 looks bad and FSR2 is in a single game) and ray-tracing support.

My 6800XT has served me well but I've had just as many if not more problems then my 1080Ti I had previously!!!!!
Leave a comment:
shmerl replied

09 June 2022, 08:25 PM
What about 16-core CPUs with 3D V-cache?
Likes 3
Leave a comment:
Spacefish replied

09 June 2022, 08:04 PM
Originally posted by EvilHowl View Post

I'm really interested on the AMD Instinct MI300, which is basically an incredibly big and powerful APU with lots of memory bandwidth. Just imagine having that as a consumer product.

I am dreaming of an FPGA "Co-Processor" integrated into the CPU.. On the software side you would have some sort of "compiler" which translates your algorithm /kernels into Netlists for the FPGA.
The FPGA Nodes would be connected to CPU registers and you would have an x86 Instructions to load FPGA Netlists from memory and other ones to "trigger" them.
That FPGA Net could do things, which you typically would do in multiple normal instructions.

This could accelerate Algorithms where a later stage depends on data on an earlier stage, so no real vectorization/SIMD is possible or there are a lot of branches. These branches could just run in parallel on the FPGA.

For example if you have something like:
y[i] = y[i-1]*x[i]

or

if(x > 3) x+= 2 else x+=3

You could just have two binary full adders which do x+2 and x+3 in parallel and another logic LUT which looks at the relevant bits of "x" and generates a 1 if x > 3 or 0 otherwise.. The output of one of the binary full adders is connected to the result register depending on the 1/0 signal of the condition evaluation net.
That´s quite power inefficient, but depending on transistor speed, you could do all of that in one clockcycle.

With their recent Xilinx Acquisition, AMD has FPGA based accelerator cards in their portfolio (Alveo Lineup).. Guess the "Smart NIC" mentioned in the picture is one of them, maybe combined with a Zen 4 in one package / on the same card?
Likes 2
Leave a comment:
zamroni111 replied

09 June 2022, 07:25 PM
Siena cost cutting can be ddr4.
amd just need to make different io chiplet for that
Leave a comment:
zamroni111 replied

09 June 2022, 07:20 PM
3nm euv will be very expensive foundry unless cheaper and more efficient euv laser generation is invented.
Low profit margin products, e.g. consumer cpu and gpu, might not be profitable enough to use it.
i won't be surprised if even apple don't use it for a and m soc.
Leave a comment:
mumar1 replied

09 June 2022, 06:56 PM
Originally posted by EvilHowl View Post

I'm really interested on the AMD Instinct MI300, which is basically an incredibly big and powerful APU with lots of memory bandwidth. Just imagine having that as a consumer product.

If your pockets are deep enough you will have it as a "consumer product" next year ;-)
Likes 2
Leave a comment:
ms178 replied

09 June 2022, 06:46 PM
Originally posted by Linuxxx View Post

So, will AVX-512 support mean all instructions are going to be implemented or only the ones benefitting AI/ML workloads?

And will it be similar to first gen Zen 1 where AVX2 was realized as 2 × 128 bits wide registers, so maybe AVX-512 as 2 × 256 bits?

According to the Youtuber Moore's Law is Dead it was said to be comparable in performance to IceLake-X at same thread count / clock speeds (see the slide in the video here: https://youtu.be/6yFn85I5PbY?t=395).
Likes 1
Leave a comment:

Announcement

AMD Talks Up Zen 4 AVX-512, Genoa, Siena & More At Financial Analyst Day

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment: