Announcement

Collapse
No announcement yet.

Intel CR 23.35.27191.9 Released As A Big Update To Their Open-Source GPU Compute Stack

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • coder
    replied
    Originally posted by TemplarGR View Post
    OK, you are the one clearly butthurt and you are confusing more things in the new reply... It is getting embarrassing for you, really. If you actually knew what you talked about, that is...
    Bro, just stop while you can. Each round of this pointless debate just digs you in deeper.

    Originally posted by TemplarGR View Post
    For example, why are you confusing code branching latency with CUDA kernel launch latency?
    Your words:

    "Gpgpu has tons of latency, even on a SoC."

    I made the most reasonable interpretation that you were talking about kernel launch latency, since that's one of the main things that would be improved with an iGPU vs. dGPU, as the words "even on a SoC" implied.

    Originally posted by TemplarGR View Post
    I obviously meant how less efficient gpgpu is at "branching code".
    Then why would you refer to it as latency? I don't know about on Nvidia or Intel GPUs, but RDNA3 has no branch delay slots. If there's any latency, it's hidden by SMT.

    Originally posted by TemplarGR View Post
    Who talked about THAT latency? FFS man.... You are constantly making strawman arguments and attacking them to pretend in your head that you somehow "won the argument".
    Don't blame me for having to guess what you meant. Try using words to specify which latency you're talking about.

    In my experience, people who have trouble expressing themselves clearly also tend not to think very clearly. You might be on the verge of outing yourself as a lousy programmer.

    Originally posted by TemplarGR View Post
    GPGPU is like a VLIW architecture, it is great at parallelizing calculations but is trash for more serial calculations and branching code.
    Not true. They're not VLIW, but SIMD-oriented. The only thing they have in common is that they're in-order.

    In fact, it's a common misconception that GPUs aren't good at branching. They really are! You just need the entire wavefront/warp to follow the same codepath. Where they suffer is in control flow that depends on vector data.

    Originally posted by TemplarGR View Post
    i told you even in the first post, you are embarrassing yourself
    The problem is you're out of your depth, which is why you keep tripping over your words and then have to cover by trying to blame it on me.

    Originally posted by TemplarGR View Post
    Even if there was only 1 language to use for all programmers in the whole world, for both cpu and gpu, even then people would use cpus mostly.
    Nonsense. Interactive graphics overwhelmingly uses GPUs. Why? Because graphics APIs and hardware to implement them are nearly ubiquitous and the advantages are manifest. Same thing with AI, which also overwhelmingly uses accelerators. Video encode and decode acceleration is also quite widespread.

    This shows that the problem we have really isn't one of programming languages. You just need ubiquitous platform & hardware support for compute acceleration + good libraries, APIs, and frameworks.

    Obviously, the problem has to be a decent fit. For instance, you don't run data encryption workloads on a video compression engine. Nor would it make sense to port a C compiler to run on a GPU. However, nobody is saying that all code should run on a GPU - just in the cases where it makes sense.

    Originally posted by TemplarGR View Post
    CPUs are just more efficient for most general purpose computing,
    It's even narrower than that. They're good at serial code. Once you have a substantial degree of concurrency, such as in graphics or AI, CPUs are no longer optimal.

    Originally posted by TemplarGR View Post
    VLIW architectures are not a new thing,
    You're really trying to have a different argument, here. I guess you're running out of ground.

    I've programmed and subsequently forgotten more about VLIW processors than you'll probably ever know. Same with GPUs, from the sound of it.

    Originally posted by TemplarGR View Post
    If VLIW was indeed better overall, it would have won in cpu space a long time ago, but it didn't.
    I guess you've never heard of DSPs. Yeah, they're still VLIW - and actual VLIW, not like modern GPUs.

    Originally posted by TemplarGR View Post
    igpus do not really need hardware gpgpu fp64 support. Which is why no one cared when 12th gen, YEARS AGO, dropped it, and no one noticed, no one cared, no one lost anything of value....
    Repeating this doesn't make it true. If there were such a strong case against it, AMD and Nvidia iGPUs wouldn't have it, either.

    Leave a comment:


  • TemplarGR
    replied
    Originally posted by coder View Post
    Pot calling the kettle black.


    Not confusing, but relating.


    In x87, sure. However, Intel made conscious decisions to incorporate it into SSE and AVX. As with GPUs, those implementations don't natively support denormals. So, that aspect of the comparison is apples-to-apples. It's also not the easiest thing to program, not unlike GPUs. Yes, I've done both.


    Duh.


    That's not the claim you made that I objected to. You simply said enterprise doesn't need fp64. Now you're trying to move the goalposts.


    Sure, there are cases where it's used egregiously, but not in many cases of the examples I gave.


    There's no confusion, here. Anyone who understands how computers work will know that the theoretical numerical performance of a CPU typically isn't sustained, because programs have to do other things. That's where having a GPU can really help, since it's quite likely doing little else at the time.

    "CUDA launch overhead for null-kernels is typically around 5 to 7 microseconds in sane driver environments.​"
    source: https://forums.developer.nvidia.com/...d-opencl/48792



    Who's the n00b, now?

    And yes, you're really a n00b if you're dumb enough to make synchronous calls to run stuff on a GPU. The APIs have queues for good reasons. Anyone with any GPU programming experience appreciates the need to overlap as much processing as possible between the CPU and GPU.


    Another n00b comment. If your workload has enough concurrency, your GPU shouldn't be stalling.


    The reasons people still use CPUs so much are myriad. The number 1 issue is the lack of OpenCL (or comparable) ubiquity. If software developers can't count on there being a GPU-like accelerator capable of running their code, that greatly diminishes the value proposition.

    Other reasons include laziness, ignorance, and CPUs continually adding more cores. As long as people can get more performance by spinning up more CPU threads (which have their own communication latencies, mind you), it's less of a pressing need to use GPUs. Sadly, the efficiency benefits GPUs can provide too often go unutilized.


    For some definition of "need", they don't. That's not the same thing as saying it's worthless, or should be omitted.


    Not small.


    Granted, the former Intel GPUs were rather generous with it. Still, they didn't have to go to zero. They could've kept one scalar fp64 unit per EU, giving them an effective ratio of 8:1. That would still be enough to make it less painful when you need to operate on 64-bit matrices in either graphics or compute-oriented applications.

    TL;DR: take your butthurt elsewhere.
    OK, you are the one clearly butthurt and you are confusing more things in the new reply... It is getting embarrassing for you, really. If you actually knew what you talked about, that is...

    For example, why are you confusing code branching latency with CUDA kernel launch latency? Who talked about THAT latency? FFS man.... You are constantly making strawman arguments and attacking them to pretend in your head that you somehow "won the argument".

    I obviously meant how less efficient gpgpu is at "branching code". I said it in the previous post to which you replied as well. GPGPU is like a VLIW architecture, it is great at parallelizing calculations but is trash for more serial calculations and branching code. This is like coding 101, which is why i told you even in the first post, you are embarrassing yourself and just displaying to the whole forum you don't know what you are talking about man....

    Even if there was only 1 language to use for all programmers in the whole world, for both cpu and gpu, even then people would use cpus mostly. It has nothing to do with the reasons you claimed. CPUs are just more efficient for most general purpose computing, and this has always been the case and will always be. Unless you have a scientific calculation that benefits from running multiple parallel calculations (like graphics do), gpus are inefficient.

    VLIW architectures are not a new thing, they are a very old thing in processor design. And modern gpus are just glorified VLIW processors with a RAMDAC (ok obviously more modern designs are more than that, with their hierarchies etc, but it is the same principle). If VLIW was indeed better overall, it would have won in cpu space a long time ago, but it didn't. There is a reason, and it is not that "people snub OpenCL" (LOL, just LOL)

    So, to return to my original comment, AGAIN, igpus do not really need hardware gpgpu fp64 support. Which is why no one cared when 12th gen, YEARS AGO, dropped it, and no one noticed, no one cared, no one lost anything of value.... No one in their right mind will do large professional parallel fp64 calculations on an igpu. Unless it is a student or amateur practicing, in which case he can use the emulated fp64 and not bat an eye.
    Last edited by TemplarGR; 30 November 2023, 12:27 AM.

    Leave a comment:


  • coder
    replied
    Originally posted by TemplarGR View Post
    Basically, the whole of your reply is a snarky attempt to prove to everyone to this forum that you are completely ignorant about coding....
    Pot calling the kettle black.

    Originally posted by TemplarGR View Post
    You are confusing gpu compute fp64 with cpu fp64.....
    Not confusing, but relating.

    Originally posted by TemplarGR View Post
    Yes, cpus could calculate at fp64 since forever....
    In x87, sure. However, Intel made conscious decisions to incorporate it into SSE and AVX. As with GPUs, those implementations don't natively support denormals. So, that aspect of the comparison is apples-to-apples. It's also not the easiest thing to program, not unlike GPUs. Yes, I've done both.

    Originally posted by TemplarGR View Post
    But we are talking about gpgpu here, remember?
    Duh.

    Originally posted by TemplarGR View Post
    the vast majority of applications that use double precision floating point are not gpgpu apps.
    That's not the claim you made that I objected to. You simply said enterprise doesn't need fp64. Now you're trying to move the goalposts.

    Originally posted by TemplarGR View Post
    And even the apps that do use it, do not use it all the time, most of the calculations do not need it.
    Sure, there are cases where it's used egregiously, but not in many cases of the examples I gave.

    Originally posted by TemplarGR View Post
    Also, while Skylate did have fp64, you are confusing theoritical throughput with realistic performance. Unless software does fp64 calculations all day with no branching, you are not seeing those 220glops, not in your dreams.
    There's no confusion, here. Anyone who understands how computers work will know that the theoretical numerical performance of a CPU typically isn't sustained, because programs have to do other things. That's where having a GPU can really help, since it's quite likely doing little else at the time.

    Originally posted by TemplarGR View Post
    Gpgpu has tons of latency, even on a SoC.

    "CUDA launch overhead for null-kernels is typically around 5 to 7 microseconds in sane driver environments.​"
    source: https://forums.developer.nvidia.com/...d-opencl/48792

    Who's the n00b, now?

    And yes, you're really a n00b if you're dumb enough to make synchronous calls to run stuff on a GPU. The APIs have queues for good reasons. Anyone with any GPU programming experience appreciates the need to overlap as much processing as possible between the CPU and GPU.

    Originally posted by TemplarGR View Post
    While a cpu core has much less theoritical fp64 throughput, it doesn't stall nearly as much.
    Another n00b comment. If your workload has enough concurrency, your GPU shouldn't be stalling.

    Originally posted by TemplarGR View Post
    Or else we wouldn't be using cpus at all, everything would have been gpu only by now....
    The reasons people still use CPUs so much are myriad. The number 1 issue is the lack of OpenCL (or comparable) ubiquity. If software developers can't count on there being a GPU-like accelerator capable of running their code, that greatly diminishes the value proposition.

    Other reasons include laziness, ignorance, and CPUs continually adding more cores. As long as people can get more performance by spinning up more CPU threads (which have their own communication latencies, mind you), it's less of a pressing need to use GPUs. Sadly, the efficiency benefits GPUs can provide too often go unutilized.

    Originally posted by TemplarGR View Post
    i repeat, igpus do not need hardware fp64.
    For some definition of "need", they don't. That's not the same thing as saying it's worthless, or should be omitted.

    Originally posted by TemplarGR View Post
    Applications that for some reason need to run gpgpu fp64, can do it with emulation for a small performance hit.
    Not small.

    Originally posted by TemplarGR View Post
    For igpus were silicon space matters, it is better to not have it
    Granted, the former Intel GPUs were rather generous with it. Still, they didn't have to go to zero. They could've kept one scalar fp64 unit per EU, giving them an effective ratio of 8:1. That would still be enough to make it less painful when you need to operate on 64-bit matrices in either graphics or compute-oriented applications.

    TL;DR: take your butthurt elsewhere.

    Leave a comment:


  • qarium
    replied
    Michael

    Sting Operation Trapped Phoronix-forum exploid Hackers.

    The Hackers who did use Firefox version 119 exploids to install Trojan Horse on my Computer where trapped by a Sting Operation of the Militars Intelligence an undercover agent did pay these group of hackers money to target me and use the exploid to install a Trojan Horse on my Computer.

    One of the forum members involed is "Sophisticles" many more are involved.

    All the attackers will go down.

    SWAT Police Raid upcoming​

    The attackers used fraudulent benchmarks manipulated in favor of Intel and posted this in AMD Threadripper 7000 phoronix.com forum threats. this link did go to a webserver who was hacked by the attackers and if a Targeted Individual did go on that website the exploid was executed. this webserver was https://www. pugetsystems .com/

    Leave a comment:


  • Paradigm Shifter
    replied
    Originally posted by coder View Post
    Only the Intel-branded ones are discontinued. Those were sold as a "Limited Edition" model, so that's hardly surprising.

    Currently, 16 GB models are available from ASRock, Sparkle, and Acer, for prices starting at $260.
    Ah, so no. Because those aren't available here in 16GB. I did find a seller with a few 16GB cards left. Prices start at double that here.

    Leave a comment:


  • TemplarGR
    replied
    Originally posted by coder View Post
    Which is why your CPU doesn't have it, either.

    ...oh, wait.

    No, you're just ignorant. Luckily, that's not a crime. Do you think OpenGL made it a requirement just for fun? Not if you care about numerical stability of matrix inversions. Don't forget that GPUs don't implement support for denormals, so fp32 doesn't get you as far as good 'ol x87 floating point.

    Sure, you can use emulation, but at a considerable performance penalty. Why do you think even Nvidia and AMD consumer GPUs retain hardware fp64 support at 1:32?


    Sure, if you exclude SQL databases, spreadsheets, CAD, statistical modeling, and probably even a fair amount of the financial software out there, then you might be right.


    The iGPU in a regular Skylake i7 could sustain about 220 GFLOPS of fp64. That's about what all four CPU cores could manage, together. So, it effectively doubled your compute capacity. The real kicker is that it could do that at a mere 10 W, while it would take the CPU cores 90 W.

    Now, if you had an Iris Pro iGPU, they doubled the EUs and we could start to talk about some real muscle. They even had a Broadwell with 3x the amount of EUs and 128 MB of eDRAM. Definitely more than enough horsepower to be interesting.​
    Basically, the whole of your reply is a snarky attempt to prove to everyone to this forum that you are completely ignorant about coding....

    You are confusing gpu compute fp64 with cpu fp64.....

    Yes, cpus could calculate at fp64 since forever.... But we are talking about gpgpu here, remember? They are not the same thing. Their programing paradigms are not the same, and the vast majority of applications that use double precision floating point are not gpgpu apps. And even the apps that do use it, do not use it all the time, most of the calculations do not need it.

    Also, while Skylate did have fp64, you are confusing theoritical throughput with realistic performance. Unless software does fp64 calculations all day with no branching, you are not seeing those 220glops, not in your dreams. Gpgpu has tons of latency, even on a SoC. While a cpu core has much less theoritical fp64 throughput, it doesn't stall nearly as much. Or else we wouldn't be using cpus at all, everything would have been gpu only by now....

    In the end, i repeat, igpus do not need hardware fp64. Applications that for some reason need to run gpgpu fp64, can do it with emulation for a small performance hit. Anyone needing serious fp64 grunt, will use a dedicated gpu anyway. For igpus were silicon space matters, it is better to not have it and use the transistor budget for other things.

    Leave a comment:


  • zboszor
    replied
    The downside is that the new version still doesn't support LLVM 16 or newer.

    Leave a comment:


  • coder
    replied
    Originally posted by Paradigm Shifter View Post
    Can you actually buy 16GB A770s any more? I have one sitting on my shelf at work, but our IT supplier says they are no longer able to be sourced.
    Only the Intel-branded ones are discontinued. Those were sold as a "Limited Edition" model, so that's hardly surprising.

    Currently, 16 GB models are available from ASRock, Sparkle, and Acer, for prices starting at $260.

    Shop Arc A770,16GB,Shipped by Newegg GPUs / Video Graphics Cards on Newegg.com. Watch for amazing deals and get great pricing.

    Leave a comment:


  • Paradigm Shifter
    replied
    Can you actually buy 16GB A770s any more? I have one sitting on my shelf at work, but our IT supplier says they are no longer able to be sourced.

    Leave a comment:


  • coder
    replied
    Originally posted by TemplarGR View Post
    And nothing of value was lost. Consumers never needed fp64,
    Which is why your CPU doesn't have it, either.

    ...oh, wait.

    No, you're just ignorant. Luckily, that's not a crime. Do you think OpenGL made it a requirement just for fun? Not if you care about numerical stability of matrix inversions. Don't forget that GPUs don't implement support for denormals, so fp32 doesn't get you as far as good 'ol x87 floating point.

    Sure, you can use emulation, but at a considerable performance penalty. Why do you think even Nvidia and AMD consumer GPUs retain hardware fp64 support at 1:32?

    Originally posted by TemplarGR View Post
    and even in the enterprise its usage is rare. It would make no sense to waste silicon on tiny igpus for fp64 to be used only in demos and benchmarks.
    Sure, if you exclude SQL databases, spreadsheets, CAD, statistical modeling, and probably even a fair amount of the financial software out there, then you might be right.

    Originally posted by TemplarGR View Post
    And it makes sense, Intel igpus are extremely underpowered to the point that they don't make sense for heavy applications that benefit from fp64. And the cpu cores are really powerful enough to emulate it when needed, for example for a student project etc. It is more efficient to not include it.
    The iGPU in a regular Skylake i7 could sustain about 220 GFLOPS of fp64. That's about what all four CPU cores could manage, together. So, it effectively doubled your compute capacity. The real kicker is that it could do that at a mere 10 W, while it would take the CPU cores 90 W.

    Now, if you had an Iris Pro iGPU, they doubled the EUs and we could start to talk about some real muscle. They even had a Broadwell with 3x the amount of EUs and 128 MB of eDRAM. Definitely more than enough horsepower to be interesting.​
    Last edited by coder; 29 November 2023, 03:00 AM.

    Leave a comment:

Working...
X