Announcement

**gsrcrxsi** · 06 October 2022, 09:28 AM

Originally posted by coder View Post

Well, it looks like DarkFoss was right:

https://www.tomshardware.com/news/in...-fp64-hardware

That sucks. I guess if you need lots of fp64, your best bet would be a used Radeon VII -- or better yet, a Radeon Pro VII.

Once upon a time, this was really a stand-out feature of Intel iGPUs. From Gen7 to Gen9, their iGPUs actually had a 4:1 fp32:fp64 ratio. That actually made the GT4 Iris Pro graphics, found in some Skylake laptop CPUs, the fastest fp64 consumer product on the market, between when the GTX Titan Black (Kepler) was discontinued and Radeon VII launched.

Wow. Now I feel betrayed. It'd be understandable if they wanted to just implement it at just like 32:1 or 64:1, but emulated performance is going to be garbage.

And I don't buy for a minute that it takes too many gates. We've had hardware fp80 in Intel FPUs since the 8087. The throughput and latency weren't good, but they used so few transistors you could almost write out the schematics by hand. There's certainly some compromise they could've struck that would've used negligible die area while still providing a lot better performance than emulation.

i would like to see this independently tested/verified. Forum moderators don’t necessarily have the most correct or up to date information. The Tom’s hardware and referenced Intel forum post are the only source for this so far. I’m skeptical of the accuracy until independently tested.

**coder** · 06 October 2022, 09:43 AM

Originally posted by gsrcrxsi View Post

i would like to see this independently tested/verified. Forum moderators don’t necessarily have the most correct or up to date information. The Tom’s hardware and referenced Intel forum post are the only source for this so far. I’m skeptical of the accuracy until independently tested.

I hope you're right, but I fear you're not.

Their product brief for the datacenter GPUs, based on the same chips, conspicuously omits any specification of fp64 performance:

https://www.intel.com/content/www/us...uct-brief.html

Also, their Hot Chips slide deck, from 2020, ominously shows the fp64 block as optional.

https://www.hc32.hotchips.org/assets...vid_Blythe.pdf

Intel had a 3rd GPU product line, which they cancelled in summer of 2021, that was to be datacenter-focused (I think what they were calling X^e_HP). Instead, they opted to repurpose their consumer GPU chips (X^e_HPG), for that market segment. Perhaps it was the former, which actually had the fp64 it the X^e EUs.

**pong** · 12 October 2022, 03:42 AM

It has been too long since I've looked at the low level documentation for NVIDIA/AMD but for instance IIRC NVIDIA documents their PTX "assembly" which is sort of a close to hardware reality but maybe is still a pseudo-code semi-intermediate representation (??) and in other CUDA / performance tuning documents they mention the cycle count / latency etc. of various arithmetic, logical, shift, memory access instructions. So I assume among those are the actual FP32, FP64 instructions. Aren't they something close to IEEE-754 compliant?
So I assume Intel should have such instruction level compatibility documents and also performance tuning / cycle cost documents if the architecture of the Xe-HPG stuff is really open for development.

Nvidia CUDA you can run standard C/C++ code on the GPU without any significant exceptions as I recall from the baseline C/C++ standards.
So I'd certainly hope with DPC++ or whatever you can write code using "float", "double" and have it "just work" moderately efficiently using only the GPU resources even if a factor of N slower in FP64 mode than FP32 mode.

I really can't understand how anyone considers FP64 or better something that is sane to have be "optional" in this decade. Yeah ok maybe most of the critical uses of it are in engineering / scientific computing but I think also graphics / modeling etc. is very relevant. Just like uint64, int64 has become the baseline data type for modern 64-bit processors, so should FP64 / "double" be the default lingua franca FP type for cases where one doesn't HAVE to save every last bit of memory and execution time when processing billions of operations / variables and one doesn't want to have to worry so much about whether one is going to get trash results from overflow / underflow / rounding etc. just like it's pretty easy to overflow I32, so FP32 isn't really great for either int or precision over moderate dynamic ranges.

Announcement

Intel Arc Graphics A770 Launching 12 October For $329 USD

Comment

Comment

Comment