Announcement

**coder** · 26 October 2021, 01:17 AM

Originally posted by oiaohm View Post

https://smartech.gatech.edu/handle/1853/27246 this is from 2009. The reality is Nvidia GPU bytecode is not pure SIMD. Nvidia GPU uses a hybrid of MIMD and SIMD.

Yes, of course. We were talking about a CPU core or a GPU EU/CU/SM all along.

Originally posted by oiaohm View Post

Yes you send in SIMD instructions into Nvidia core. But this is like sending x86 instructions into into a modern day x86 processor. Remember x86 instructions turn into microcode inside a normal x86 CPU and those are the what is in fact run.

That contradicts the paper you cited & common sense. The paper explicitly says that PTX is their architecture-independent format, which is then JIT-compiled to the specific GPU you're running on.

Originally posted by oiaohm View Post

Yes SIMD goes in then turned to MIMD microcode and that is what is executed.

Not true. Here's a reverse-engineering of Volta, and they mention nothing of the sort:

https://arxiv.org/pdf/1804.06826.pdf

...which reminds me of the greatest evidence that NVidia GPUs are really SIMD machines, at the SM-level: Tensor ops. To utilize the tensor "cores" on Volta, it broke Nvidia's fiction about warps being a batch of threads. They need the entire width of the 32-lane SIMD registers to feed the tensor cores and receive the results.

Warp-based SIMD consists of multiple scalar threads executing in a SIMD manner (i.e., same instruction executed by all threads)

Does not have to be lock step
Each thread can be treated individually (i.e., placed in a different warp) -> programming model not SIMD
- SW does not need to know vector length
- Enables memory and branch latency tolerance
ISA is scalar -> vector instructions formed dynamically
Essentially, it is SPMD programming model implemented on SIMD hardware

First thing to note is that it states "programming model (is) not SIMD". That's different than saying the hardware is not SIMD.

As for the part about vector instructions being formed dynamically, we don't know whether the author is talking about PTX or what. We don't even know which generation of hardware this refers to, or if it's simply wrong about that part. It's one slide in overview material for an intro class on computer architecture. It would be foolish to read into that.

This is a secondary source, at best, and it doesn't even have any citations. Basically, a waste of time. Show me something from Nvidia, or at least more peer-reviewed research papers.

Originally posted by oiaohm View Post

if you look at AMD you also find there SIMD in their GPU are not pure SIMD. Instead they are like the Nvidia one where you have MIMD form of microcode with SIMD instruction set on top.

You've presented zero evidence of either.

AMD is quite clear about how their GPUs work. It's straight SIMD, at the CU level.

https://www.amd.com/system/files/documents/rdna-whitepaper.pdf

Originally posted by oiaohm View Post

What is really not possible to make a true non MIMD gpu and it be high performance.

If your GPU doesn't use wide SIMD (e.g. 512-bit) at the lowest level, it will have horrible power- and area- efficiency.

Originally posted by oiaohm View Post

This is the problem GPU are really not SIMD at the core.

I'm still waiting to see any evidence.

I know you're just searching for "GPU MIMD", though. You're not actually looking at what AMD and Nvidia have published about their GPUs, or what independent researchers have discovered about Nvidia's. When you don't take an unbiased look at the information, but instead look for anything which supports your preconceived notion, that's called "confirmation bias". It's one of the underlying phenomenons that lead people to believe insane conspiracy theories.

Originally posted by oiaohm View Post

If you look closely the arguments for SIMD over MIMD turn out to be basically the same arguments about CISC over RISC.

No. Literally no one is arguing that.

**oiaohm** · 26 October 2021, 06:01 PM

Originally posted by coder View Post

Not true. Here's a reverse-engineering of Volta, and they mention nothing of the sort:

https://arxiv.org/pdf/1804.06826.pdf

...which reminds me of the greatest evidence that NVidia GPUs are really SIMD machines, at the SM-level: Tensor ops. To utilize the tensor "cores" on Volta, it broke Nvidia's fiction about warps being a batch of threads. They need the entire width of the 32-lane SIMD registers to feed the tensor cores and receive the results.

Except that does not tell you what you think it does. That complete reverse did not do anything to tell the difference between a fixed width SPMD(MIMD) and SIMD.

The instruction needing to fill 32 lanes does not mean the hardware is SIMD problem is fixed width SPMD has the same requirement. The difference here is multi instructions. So with a fixed SPMD you may have 4 instructions in the unit processing 16 of the 32 lanes might be on the first instruction and 16 of the 32 lanes might be on the last. Why do this memory caches so you can speed though what you have in cache and then possible free cachelines to get what you need to complete the remaining. Problem here you have gone in with the idea of proving that Nvidia was lieing over warps without thinking how they could be telling the truth and if that could explain the results you got. SPMD system running SIMD instructions has the side effect that horizontal SIMD instructions are forbidden if you attempt to keep silicon cost low this is because the SPMD system is allowed to go out of sync across the lanes based on what data is in cache..

The reality is what people call SIMD units in Nvidia gpus are fixed width SPMD units. Yes a fixed width SPMD same effects of not giving results until everything is complete as well. The differences is in the interactions with cache that results in shorter stalls .

Originally posted by coder View Post

If your GPU doesn't use wide SIMD (e.g. 512-bit) at the lowest level, it will have horrible power- and area- efficiency.

No this is wrong. wide SIMD or wide SPMD(this is what Nvidia using). This may be a 512-bit vector engine or greater using a different from of MIMD.

Also notice AMD in documentation.

The RDNA front-end can issue four instructions every cycle to every SIMD, which include a combination of vector, scalar, and memory pipeline. The scalar pipelines are typically used for control flow and some address calculation, while the vector pipelines provide the computational throughput for the shaders and are fed by the memory pipelines.

This is not functional description of a true SIMD unit. This is a functional description of SPMD fixed width unit with a programming length of at least 4.

Really coder it would have paid you to read that AMD documentation closer and notice hang on this does not match for a SIMD processing unit in hardware. Reality amd and nvidia are not using SIMD units in hardware yet they call them SIMD units because they are processing SIMD instructions. Yes Intel attempted to use normal SIMD in those early cards you mentioned and they did not end up performing well. Think about it on a x86 system can you issue 4 SIMD instructions into the SIMD unit in the same cycle you cannot because its not a SPMD unit. Yes read AMD SPMD is 20 wavefronts or program length of 20.

Coder the documentation you are holding up that appears to say SIMD but the descriptions of operation does not match it being pure SIMD hardware at all. Yes the S is right the MD is right the I is not.

Warp-based SIMD on Nvidia is a cross CU thing and is pure SPMD inter core communication. The items called SIMD units in Nvidia cores turn out to be fixed width SPMD units and AMD is also used fixed width SPMD units. The reality is GPU in hardware not using SIMD units but fixed width SPMD units that are incorrectly called SIMD units because instructions feed in are SIMD instructions. Its like saying modern x86 processor is cisc when at core its microops are more risc than cisc. Instruction set and hardware don't have to match.

MIMD is required at all levels of a GPU for it to work well. SPMD is also what GPU used between compute units be it AMD or Nvidia or Intel.

coder you would be correct to say all modern common GPU use a SIMD heavy instruction set. The SIMD heavy instruction set really simple to skip over all the parts in the documentation on these GPUs that says hang on these behaviours don't match the hardware being SIMD particularly when the hardware parts are being named SIMD units as well. There are very in face differences between pure SIMD and SPMD when you look for them.

Seen these differences explains why the Intel stuff kind of face planted. It was not just that intel attempt to use x86 what you want to call a general processor intel also used true SIMD and the intel vector unit was not MIMD either. So no MIMD you GPU going to be horrible bad in performance is what intel boards using what you called general processors in fact proved. Remember SPMD is a form of MIMD. Yes SPMD is a form people forget about even that it used a lot.

**coder** · 27 October 2021, 10:44 PM

Originally posted by oiaohm View Post

Except that does not tell you what you think it does. That complete reverse did not do anything to tell the difference between a fixed width SPMD(MIMD) and SIMD.

What it does is eliminate the gaps that would be needed to hide your supposed MIMD implementation. When they talk about detailed cycle times, memory bank conflicts and such, it's clear that what they're seeing is real SIMD behavior.

I'll tell you what: publish a peer-reviewed paper where you reverse-engineer Nvidia GPUs and reveal what you're claiming. I promise I'll listen to what you have to say, then.

**MartinN** · 28 October 2021, 03:54 PM

Originally posted by phoronix View Post

Phoronix: SiFive Has A New RISC-V Core To Improve Performance By 50%, Outperform Cortex-A78

SiFive just shared word that at today's Linley Conference they teased their Performance P550 successor that will "set a new standard for the highest efficiency RISC-V processor available."..

https://www.phoronix.com/scan.php?pa...P550-Successor

Apple will 5-10 years from now dump ARM, adopt RISC-V and call it "innovation". You read it here first.

Announcement

SiFive Has A New RISC-V Core To Improve Performance By 50%, Outperform Cortex-A78

Comment

Comment

Comment

Comment