Announcement

Collapse
No announcement yet.

AMD-Xilinx XDMA Driver Being Merged For Linux 6.3

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • AMD-Xilinx XDMA Driver Being Merged For Linux 6.3

    Phoronix: AMD-Xilinx XDMA Driver Being Merged For Linux 6.3

    Adding to all of the other AMD changes coming with Linux 6.3 is now also having the AMD-Xilinx XDMA driver in tow. Getting this XDMA subsystem driver upstreamed is important for unblocking more Xilinx-based feature code to be merged into the Linux kernel...

    Phoronix, Linux Hardware Reviews, Linux hardware benchmarks, Linux server benchmarks, Linux benchmarking, Desktop Linux, Linux performance, Open Source graphics, Linux How To, Ubuntu benchmarks, Ubuntu hardware, Phoronix Test Suite

  • #2
    Wondering when AMD will incorporate Xilinx FPGA cores into Radeon chips.

    Comment


    • #3
      Originally posted by Awesomeness View Post
      Wondering when AMD will incorporate Xilinx FPGA cores into Radeon chips.
      Instinct MI300 has som, yet

      Comment


      • #4
        Originally posted by Peter Fodrek View Post
        Instinct MI300 has som, yet
        Not a consumer GPU, though.

        Comment


        • #5
          Originally posted by Awesomeness View Post

          Not a consumer GPU, though.
          FPGAs specifically? Probably never in consumer hardware. FPGAs are a technology requiring special knowledge to use properly. Something very few consumers would have. That requires you to know how to layout, program, and debug logic machines. Once you identify a case for a particular logic machine it's usually more efficient to just design a static processor core and include that in the next product release.

          Comment


          • #6
            Originally posted by stormcrow View Post

            FPGAs specifically? Probably never in consumer hardware. FPGAs are a technology requiring special knowledge to use properly. Something very few consumers would have. That requires you to know how to layout, program, and debug logic machines. Once you identify a case for a particular logic machine it's usually more efficient to just design a static processor core and include that in the next product release.
            Nah the 7040 Phoenix Chips (Midrange Mobile Offering for Laptops by AMD for 2023) has an FPGA build in.
            You don´t need knowledge of the FPGA as a consumer in order to use it. The software vendors will just deliver small ip-cores with a defined interface to accelerate some algorithms. I think this will be like Shaders / GPGPU in the end.. You could even have some sort of "compiler" which vectorizes normal code and accelerates parts of it in the FPGA automatically.
            Benefit: It´s a general acceleration device, which can run many differnt algorithms optimally as compared to fixed function hardware / general purpose compute cores (even with specific instructions)..
            On the flipside: Not that energy efficient as compared to fixed function hardware specificaly made for one task + performance is worse as you can´t clock that high.

            Comment


            • #7
              Originally posted by Spacefish View Post

              Nah the 7040 Phoenix Chips (Midrange Mobile Offering for Laptops by AMD for 2023) has an FPGA build in.
              No they don't, they have dedicated AI accelerator blocks like those on Apple's chips. I haven't seen anything from AMD's own slides that indicates phoenix chips will contain non-fixed function hardware apart from one or two pages that were obviously written by people unaware of what IP means in this context of FPGAs misinterpreting this image from 2022.
              AMD%20AIE_575px.jpg
              This just means that they have developed an AI accelerator module/block/IP called XDNA that can be used on FPGAs or implemented in ICs. AMD's own CES slides(can't upload them here, just look at the article on anadtech) say that they have AI engines, not an FPGA. They claim that these AI accelerator blocks are both performant and in their own words "optimized for battery life with intense workloads", FPGAs are worse than ASICs in both of those.
              kek.webp

              Not only that, your FPGA needs to contain more logic elements than what your design requires, otherwise the placement and routing program you use to generate the bitstream for the FPGA out of your synthesized verilog or VHDL code will have no headroom and won't work at all. So basically if they already have this already working XDNA IP(fancy word for module/block you can use in your HDL code), putting an FPGA portion on the chip that can run it instead of just implementing it as a regular, fixed-function of the chip would need more area, consume more power, run at a slower clock rate and make testing the chip probably much harder. (Being able to test a chip well is what makes it financially viable at all).

              Originally posted by Spacefish View Post

              You don´t need knowledge of the FPGA as a consumer in order to use it. The software vendors will just deliver small ip-cores with a defined interface to accelerate some algorithms. I think this will be like Shaders / GPGPU in the end.. You could even have some sort of "compiler" which vectorizes normal code and accelerates parts of it in the FPGA automatically.


              Yeah, no. The software you need use to develop stuff for FPGAs are and have always been a pain to use let alone being so convenient that any random person can somehow magically get bitstream files and have them just work. Sure, having a bitstream repository and a local FPGA programming executable isn't hard to do but it makes zero financial sense. FPGAs are just the equivalent of millions of 74 series logic chips compressed into a "small" die, you still need to write new software and drivers for the new IP blocks just like with new hardware. Except in this scenario the company still has to develop the logic(the hardware if you will), drivers, and the software but loses the profit of being able to sell you the hardware as well. There's no reason for them to go through the extra effort of putting an FPGA into the chip, inflate their costs due to the now increased die size, and give your consumers reasons to avoid buying new products for longer times. Don't get me wrong, I would love my 5900x to have an integrated ultrascale part but it doesn't make much practical or financial sense.

              Lastly, we are hilariously away from magically being able to run arbitrary code on the FPGA. A lot of things aren't suitable for acceleration or FPGAs such as stuff involving lots of branching and sequential logic which includes most programs you run on a PC. Somehow extracting the acceleratable portions of programs and running them on an FPGA on the fly would be extremely difficult. HLS solutions exist that can turn special subsets of C to FPGA bitstreams but its results are inferior to proper HDL. Not only that, you still have to keep in mind that your code will be run on an FPGA and write accordingly if you want to get acceptable results, far from being able to feed your code and have it just work. And this is despite the potentially huge cost savings that can be achieved by having engineers write C code and have it just work instead of them having to spend valuable hours writing HDL. Sorry for the long rant/post.
              Last edited by osw89; 24 February 2023, 07:27 PM.

              Comment


              • #8
                Originally posted by osw89 View Post
                Sorry for the long rant/post.
                Naw. I learned a few more details myself on FPGAs. Thanks.

                Comment


                • #9
                  Originally posted by osw89 View Post
                  No they don't, they have dedicated AI accelerator blocks like those on Apple's chips. I haven't seen anything from AMD's own slides that indicates phoenix chips will contain non-fixed function hardware apart from one or two pages that were obviously written by people unaware of what IP means in this context of FPGAs misinterpreting this image from 2022.
                  Ok, seems like a lot of news sites reported this really wrong and i haven´t informed myself about it. So this is probably a somehow very wide compute core with special instructions for AI like FMA on some datatypes + a lot of SRAM?

                  Originally posted by osw89 View Post
                  Lastly, we are hilariously away from magically being able to run arbitrary code on the FPGA. A lot of things aren't suitable for acceleration or FPGAs such as stuff involving lots of branching and sequential logic which includes most programs you run on a PC. Somehow extracting the acceleratable portions of programs and running them on an FPGA on the fly would be extremely difficult.
                  yes, full VHDL -> synthesize -> route Route is probably hard..

                  I imagined it something like that:
                  You could "segment" the FPGA into multiple areas and load pre-routed blocks into these areas on demand + have defined input/output to the platforms memory subsystem.

                  For example a developer would write something like:
                  Code:
                  vectorD = vectorA * vectorB + vectorC
                  you would load a bitstream which would segment the FPGA into two "areas" load a half-size multiply block in the first section and a half size add-block into the next section..
                  As far as i know Xilinx already has a feature like that in their existing FPGAs, "Dynamic Function eXchange" which does something like that..

                  If the vectors are large and you don´t have a GPU, that might be faster than processing them on the CPU..
                  Much like GPUs, but you can build longer "chain" of operations "on-demand".

                  Another way could be to have an assortment of bitstreams of very wide softcores which only do one or two operations for specific datatypes of the operands. If you need to do a lot of these calculations, just load that softcore, let it run the calculation on many elements in parallel.

                  Edit:
                  Probably it´s this: https://www.xilinx.com/content/dam/x...-ai-engine.pdf
                  As far as i understand it, you have multiple wider compute "tiles" which are oriented in a grid... You can "configure" the operation each tile is performing and the dataflow between neighbouring tiles in the grid. So you essentially are able to map multiple operations in your code on vectors into these tiles by configuring the dataflow and operations accordingly..
                  The tiles seem to be 1024bit "wide" and the data transfer between them is realized via AXI
                  Last edited by Spacefish; 24 February 2023, 10:44 PM.

                  Comment


                  • #10
                    ignore.

                    Comment

                    Working...
                    X