Announcement

Collapse
No announcement yet.

Intel Details New Data Streaming Accelerator For Future CPUs - Linux Support Started

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #11
    Originally posted by wizard69 View Post

    I’m not so sure, it sounds more like an intelligent I/O processor than any thing else. Hopefully more info will come soon.
    Separate or integrated into the CPU die?

    Comment


    • #12
      Originally posted by timofonic View Post
      Separate or integrated into the CPU die?
      It should be on-die.

      If it were larger, I'd just say "in-package". However, something like this is probably much too small to put on its own die. Maybe as part of an "I/O die", if Intel follows AMD's Zen2 approach...

      Comment


      • #13
        Originally posted by coder View Post
        Wow, I figured Hyper Threading killed DMA
        What do you mean here? How are those related?

        Comment


        • #14
          Originally posted by mrugiero View Post
          What do you mean here? How are those related?
          Oh, quite simply. If you only have one CPU with one hardware thread, then the idea of tying it up with data movement is very unpalatable. However, if your CPU has 8 cores with 16 hardware threads, and one of them is tied up doing data movement across PCIe to a slow device, then you almost don't notice or much care - especially since that thread might be paired with a compute-heavy thread that keeps most of the core's functional units busy, anyhow.

          So, the value proposition of a dedicated DMA engine is much lower. Not to speak of a 28-core CPU with 56 threads, or a 64-core CPU with 128 threads.

          Comment


          • #15
            Originally posted by coder View Post
            Oh, quite simply. If you only have one CPU with one hardware thread, then the idea of tying it up with data movement is very unpalatable. However, if your CPU has 8 cores with 16 hardware threads, and one of them is tied up doing data movement across PCIe to a slow device, then you almost don't notice or much care - especially since that thread might be paired with a compute-heavy thread that keeps most of the core's functional units busy, anyhow.

            So, the value proposition of a dedicated DMA engine is much lower. Not to speak of a 28-core CPU with 56 threads, or a 64-core CPU with 128 threads.
            Oh, that makes sense. I thought you meant something like DMA not working properly or being slowed down by hyper threading, so I was confused.
            Further, big.LITTLE in the ARM world can be seen that way, maybe even more than HT.
            There are use cases, though. For example, deep packet processing at line rate on high speed interfaces requires saturating all cores, and not everyone has many cores either, specially in the in-development world.
            For example, I live in Argentina, and 2-4 threads are still common, even in retail computers, and that's also still very common in cellphones AFAIK.
            But yeah, I see your point.
            I have no idea if ARM does DMA tho.

            Comment


            • #16
              Originally posted by mrugiero View Post
              There are use cases, though.
              Yeah, like I think one thing they might be targeting is routing traffic between CPUs in a mesh, or something like that. Anyway, there was that reference to clustering, and it made me think of Nvidia's GPU interconect technology, NVLink.

              Originally posted by mrugiero View Post
              For example, deep packet processing at line rate on high speed interfaces requires saturating all cores, and not everyone has many cores either,
              Good points. I think datacenter networking is starting to embrace 400 Gbps(!). Also, toward the lower-end of core counts, there maybe some embedded use cases, where power-efficiency could benefit from using simpler, lower-clocked cores for data movement.

              Comment


              • #17
                Originally posted by mrugiero View Post
                I have no idea if ARM does DMA tho.
                DMA is there if the protocol requires it. PCIe and Sata/SAS have DMA while USB 3.0 and lesser versions do not.

                DMA is also very much there in any SoC as all processors in the SoC (CPU, GPU, modems, hardware decoding for media, and more) are sharing the same RAM.

                One of the reasons projects like Purism's phone have the modem on USB bus (electrical USB interface) instead than integrated in the SoC is just that. The modem will have its own RAM and its own stuff and will have no access to the "app processor" (the main CPU running the OS) world.

                Comment


                • #18
                  Sapphire Rapids does have a DSA, according to recent slide leaks.

                  There is also an Oct 2020 detailed spec available at this link

                  Comment


                  • #19
                    Originally posted by jayN View Post
                    Sapphire Rapids does have a DSA, according to recent slide leaks.

                    There is also an Oct 2020 detailed spec available at this link
                    https://software.intel.com/content/w...ification.html
                    Cool. Thanks for sharing!

                    I have to wonder how much of that can just be handled by a few CPU threads. With CPUs having SMT and so many cores, we don't need things like DMA engines, any more. Sure, it's a little bit of a waste to burn a big CPU core on that stuff, but a win for programmability.

                    If I had to chose between an Intel CPU with those engines but fewer cores, or an AMD/ARM CPU with more cores for the same or less $$$, my choice wouldn't be the Intel CPU.

                    Comment


                    • #20
                      Originally posted by coder View Post
                      With CPUs having SMT and so many cores, we don't need things like DMA engines, any more.
                      A couple of interesting features in there ... handles Optane, an operation for flushing caches, create and apply Delta

                      Intel added CXL on Sapphire Rapids also. It has biased cache coherency, but there may need to be some dma transfers between processor cache and accelerator memory when the bias is flipped. I wonder if they plan to use the DSA to do those transfers.

                      The operations of creating and applying Delta records is interesting, too. Perhaps it can be used to minimize writes to NVM.

                      Comment

                      Working...
                      X