Announcement

**Developer12** · 09 February 2024, 09:16 PM

I wish I could say I'm surprised, but I'm not.

A significant portion of intel's efforts are in adding specific compiler optimizations and ISA extensions (eg. AVX 512) which support them, just to get better results in highly-specific benchmarks so they can claim "bigger number supremacy." Suffice to say, many people won't find such optimizations helpful in general use.

See tovalds' thoughts on the matter: https://www.realworldtech.com/forum/...rpostid=193190

**sophisticles** · 09 February 2024, 11:15 PM

Originally posted by Developer12 View Post

I wish I could say I'm surprised, but I'm not.

A significant portion of intel's efforts are in adding specific compiler optimizations and ISA extensions (eg. AVX 512) which support them, just to get better results in highly-specific benchmarks so they can claim "bigger number supremacy." Suffice to say, many people won't find such optimizations helpful in general use.

See tovalds' thoughts on the matter: https://www.realworldtech.com/forum/...rpostid=193190

I think it's a shame that a guy with a Master's degree in Computer Science, that leads development of a project with such a wide reach and makes millions a year, thinks that floating point is such a special use case that no one cares about.

I guess mathematicians, scientists, analysts, economists and business people don't count.

**Volta** · 10 February 2024, 12:31 AM

sophisticles our troll left the cave recently.

**Developer12** · 10 February 2024, 12:47 AM

Originally posted by sophisticles View Post

I think it's a shame that a guy with a Master's degree in Computer Science, that leads development of a project with such a wide reach and makes millions a year, thinks that floating point is such a special use case that no one cares about.

I guess mathematicians, scientists, analysts, economists and business people don't count.

Do you know what 100% of the population uses? Integer performance. Your standard adds, subtracts, stores, shifts, loads, branches, jumps, calls, etc. Integer perf is what determines how fast the kernel runs, and it's what determines how fast every application runs.

Most of your examples are someone using a spreadsheet. This isn't the 90's anymore, when 100 MHz was king. Even if you drop FP hardware entirely you can emulate it in software on a modern CPU using integer operations far faster than a human can perceive. The bottleneck for these jobs is how fast human meat fingers can enter numbers. They're not going to notice the difference if you trim a few hundred thousand transistors and have two FP units in each core instead of four.

And for those people *actually* relying on floating point performance? I don't know if you noticed but for the last 10 years they've been doing it on a GPU. 3D rendering? GPU. AI? GPUs. Weather modelling? GPUs. Exhaustive mathematical proofs? GPU. Nuclear weapons simulation? GPUs. Guess what: job submission to the GPU is 100% determined by integer performance.

It's worth noting for historical context that in the era to which linus is referring (the late 90's) floating point was a big deal for one single reason: rendering 3D video games. 0% of games still do that on the CPU. No games are even still capable of it.

**sophisticles** · 10 February 2024, 01:43 AM

Originally posted by Developer12 View Post

Do you know what 100% of the population uses? Integer performance. Your standard adds, subtracts, stores, shifts, loads, branches, jumps, calls, etc. Integer perf is what determines how fast the kernel runs, and it's what determines how fast every application runs.

Most of your examples are someone using a spreadsheet. This isn't the 90's anymore, when 100 MHz was king. Even if you drop FP hardware entirely you can emulate it in software on a modern CPU using integer operations far faster than a human can perceive. The bottleneck for these jobs is how fast human meat fingers can enter numbers. They're not going to notice the difference if you trim a few hundred thousand transistors and have two FP units in each core instead of four.

And for those people *actually* relying on floating point performance? I don't know if you noticed but for the last 10 years they've been doing it on a GPU. 3D rendering? GPU. AI? GPUs. Weather modelling? GPUs. Exhaustive mathematical proofs? GPU. Nuclear weapons simulation? GPUs. Guess what: job submission to the GPU is 100% determined by integer performance.

It's worth noting for historical context that in the era to which linus is referring (the late 90's) floating point was a big deal for one single reason: rendering 3D video games. 0% of games still do that on the CPU. No games are even still capable of it.

You haven't ever taken a computer architecture class, have you?

I take it Mr. "Developer12" (were "Developer1-11" already taken?) that you are unaware that modern x86 CPU's do not have floating point units.

All of AMD's and Intel's current processors use the SIMD units for x87, aka floating point, math.

You may also want to recalculate your "100% of the population uses integer" conclusion but in an ironic twist that only people with functioning brains could have seen coming, you need floating point math to do so.

Thanks for playing.

**Kabbone** · 10 February 2024, 03:35 AM

Originally posted by Developer12 View Post

Do you know what 100% of the population uses? Integer performance. Your standard adds, subtracts, stores, shifts, loads, branches, jumps, calls, etc. Integer perf is what determines how fast the kernel runs, and it's what determines how fast every application runs.

Most of your examples are someone using a spreadsheet. This isn't the 90's anymore, when 100 MHz was king. Even if you drop FP hardware entirely you can emulate it in software on a modern CPU using integer operations far faster than a human can perceive. The bottleneck for these jobs is how fast human meat fingers can enter numbers. They're not going to notice the difference if you trim a few hundred thousand transistors and have two FP units in each core instead of four.

And for those people *actually* relying on floating point performance? I don't know if you noticed but for the last 10 years they've been doing it on a GPU. 3D rendering? GPU. AI? GPUs. Weather modelling? GPUs. Exhaustive mathematical proofs? GPU. Nuclear weapons simulation? GPUs. Guess what: job submission to the GPU is 100% determined by integer performance.

It's worth noting for historical context that in the era to which linus is referring (the late 90's) floating point was a big deal for one single reason: rendering 3D video games. 0% of games still do that on the CPU. No games are even still capable of it.

He might be trolling but I don't think you know what you are talking about. For engineering you need massive FP performance and not every code is optimized or suited for a GPU

**pong** · 10 February 2024, 04:35 AM

He's got some (general / abstract) points but I found some of the ad hoc assertions cringe-worthy as you apparently did as well.

It may be fair to say that we've reached a point where some things in computing architecture are better specialized than not. And
within a domain of specialization perhaps strive to create the most generally useful implementation until that gets too painful with trade-offs
where something is too bottlenecked to serve multiple distinct use cases well at which point one needs to bifurcate the implementation again
and sub-specialize according to optimum-for-use-case area implementations.

Early x86 had specializations for CPU vs FPU as separate chips, and the FPU did provide a big speed boost for its intended use cases.

Then we've had "DSP" processors good at MAC and small matrix / vector / FIR / FFT type stuff and eventually those bifurcated into FP DSP and integer DSP.

Then we got fixed function GPUs which worked ok for rendering 1990s 3D but were too limited and inflexible particularly for the increasing costs
and varied use cases so they morphed into programmable GPUs which basically were fast wide SIMD machines.

Now people (ab)use GPUs for HPC because they're programmable, highly parallel, and have 10x the RAM BW as motherboards except at the
consumer level GPUs suck because they're "toys" and overpriced and not integrated into the overall computer architecture so they're in a way
the sort of thing Linus laments -- good for special purpose FP / integer M/V/tensor / SIMD stuff but totally "special case" and not bringing any
integration to the common computer architecture being attached by slow crappy PCIE slots etc.

Now we're getting NPUs which are bifurcating in a way from GPUs, still doing fast / parallel stuff but also not always ideally mapping to GPUs
due to NN architectures and not having GPUs being optimized for those architectures, data types, etc.

And in the mean while ~1990s-now CPUs basically continue to "suck" at architecture. They've scaled pretty well on single thread performance,
they've gotten some very modest degree of parallelism so you can see 8-32 core boxes often enough, attached RAM capacity / BW has modestly increased (however totally sucky on consumer platforms).

But compared to GPUs/NPUs CPUs suck at vector operations FP and integer. CPUs suck an NN/ML. CPUs suck at RAM BW where let's see a 12-DIMM server ($10k*N) has what something like 700 GBy/s RAM BW where for under $1k you get a GPU with over 1TBy/s VRAM BW that it actually can routinely come close to saturating on real world memory BW heavy streaming data flow calculations.

Whereas you've finally gotten SOME architectural deviations from the "hasn't improved much since the 1990s" consumer CPU / motherboard level like the Apple M series high end parts with "unified memory" that can achieve something like 400 GBy/s RAM to "processing" throughput where that
"processing" includes some SOC level aggregation of NPU/CPU/GPU/DSP like functional blocks all sitting on a fast wide memory bus for a (bit) less
eye-watering cost / size than one could get 400 GBy/s RAM BW on X86.

Meanwhile x86 CPUs have been sucking at performance / power variously so ARM et. al. try to establish a TCO & MIPS/power cost and density efficiency for multi-core et. al. workloads.

GPUs/NPUs being streaming / dataflow "DSP" like things basically just can run full-out and will suck as much power & BW as they're designed to handle at close to peak indefinitely in a tight loop of processing.

So it's sort of double-speak to say "I want scalar general purpose CPUs to be better, damn the special vector instructions!" yet apparently accept GPUs as "good" though not acknowledging that one has GHz limits and power limits and consumer x86 is severely memory BW starved (compared to GPUs) and
the only reasons "GPUs" are good are that they're SIMD, deal with multiple use-case optimized data types from FP32 on down to i8 etc. and have HUGE RAM BW and efficient memory streaming dataflow designs for thousands of thread groups processing vectors / matrices of strided / contiguous RAM blocks.
.
He's right that Intel / AMD have done stupid stuff "for the benchmarks" as probably in their own ways NVIDIA, et. al. do and it's good to call them out for that.

OTOH there's nothing wrong with HPC / vector / NPUs and satisfying their need for FLOPS / IPS / BW and special instructions / architectures etc.
Is AVX512 good for useful stuff? Well I guess. But overall it seems "too little too late" since if Intel/AMD wanted to scale the fundamental
PC/CPU/RAM/MB ARCHITECTURE they could have followed GPU-like paths while keeping CISC/RISC general purpose compute functionality along for the ride and we'd have massive RAM BW, ECC, MMU, IOMMU, virtualization, efficient matrix / vector / streaming computing, massive SIMD, etc.
but we don't and in a lot of workloads the CPUs sit nearly idle (being relatively useless) while the GPUs run at 100%.
It's a sad time for Intel/AMD when newcomers like ARM/RISC-V/GPUs/NPUs threaten to eat their lunches but here we've been for a decade+.

Until NPUs specialize into their own thing entirely for training / inference the best we've got are GPUs (which sucks) and CPUs are kind of
sad left behind things. I would not be surprised to see NVIDIA pull an Apple and just stick some general ARM/RISCV cores or user-equivalent execution capability into their GPU dies in a generation or two and just call that a GPGPU computer, forget x86, forget traditional motherboard form factors and the GPU being a "peripheral" to some lame CPU.

Wake me up when I can get 1TBy/s ECCed RAM BW to 256 GBy of RAM on a platform AMD/Intel can sell me for around the price of a 4090
and have their SIMD FP / integer capability roughly match a 4090 and then I'll say the processing platform has evolved in a general purpose useful
way for OS/application/GPU/DSP/NPU functions. Otherwise the "specialized" HW is still what's being most relevant and
for mere "OS" tasks an ordinary ~16 core CPU is pretty good until it falls down at things it can't even fractionally compete with doing (NPU,GPU,...).
Otherwise I have more hope in NVIDIA/Apple than Intel/AMD.

Originally posted by sophisticles View Post

I think it's a shame that a guy with a Master's degree in Computer Science, that leads development of a project with such a wide reach and makes millions a year, thinks that floating point is such a special use case that no one cares about.

I guess mathematicians, scientists, analysts, economists and business people don't count.

**lowflyer** · 10 February 2024, 04:36 AM

Just another affirmation to *never ever buy intel again* in the future. It's not the first time that intel has been caught cheating.

**osw89** · 10 February 2024, 05:00 AM

Originally posted by sophisticles View Post

All of AMD's and Intel's current processors use the SIMD units for x87, aka floating point, math.

Leave talking about hardware to actual EEs since you obviously don't even know the basics. All modern CPUs have SIMD capable integer and FP units, not magic SIMD units that do math and somehow replace the FPU. You implement SIMD by having multiple instances of data processing blocks like adders and multipliers in your execution units. You see those blocks labeled "floating point"? Those make up the FPU that you claim doesn't exist anymore and it doesn't matter whether it's a separate chip or on the same die, it's still an FPU. There are multiple MUL/ADD/ALUs to enable SIMD. SIMD is a feature of FP/integer units, calling an FPU an SIMD unit is like calling a GPU an h264 unit since it can decode and encode h264.
AMD-Zen-2-vs-Zen-3.png

Announcement

Targeted Intel oneAPI DPC++ Compiler Optimization Rules Out 2k+ SPEC CPU Submissions

Targeted Intel oneAPI DPC++ Compiler Optimization Rules Out 2k+ SPEC CPU Submissions

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment