Announcement

Collapse
No announcement yet.

Intel Continues Optimizing Linux Memory Placement For Optane DC Persistent Memory

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Intel Continues Optimizing Linux Memory Placement For Optane DC Persistent Memory

    Phoronix: Intel Continues Optimizing Linux Memory Placement For Optane DC Persistent Memory

    With a new patch series for the Linux kernel, memory access performance by one measurement can improve by 116% on a dual socket Intel server with Optane DC Persistent Memory...

    http://www.phoronix.com/scan.php?pag...nt-Optane-DCPM

  • #2
    Optane is ok I guess. I'd like to see spinning platter drives with an NVMe interface. That would be really sweet.

    Comment


    • #3
      I am not a hardware guy. So I know this question might expose real naivete.

      DRAM currently can be clocked at the nearly same frequency (transfer rate I suppose) as the CPU itself. By which I mean I am typing this in a PC with a 3800 MHz CPU and DDR4/3600 DRAM. The days of a double pumped 100MHz interface between CPU & chipset thus DRAM also are long long gone. The path between registers and DRAM is shorter and faster now than it used to be. So why do we need even more levels of cache?

      Optane qua persistent memory is great. As a replacement for flash SSD storage it is an absolute killer product and it cannot replace flash fast enough IMO*. Inserting it as a cache between DRAM and CPU seems like adding unneeded complexity.

      *) for the lawyers: the day when Optane can compete on price and resilience with flash SSD cannot come soon enough. I know it has price and resilience issue compared to flash now.
      Last edited by hoohoo; 02-18-2020, 11:56 AM.

      Comment


      • #4
        Originally posted by hoohoo View Post
        DRAM currently can be clocked at the nearly same frequency (transfer rate I suppose) as the CPU itself. By which I mean I am typing this in a PC with a 3800 MHz CPU and DDR4/3600 DRAM. The days of a double pumped 100MHz interface between CPU & chipset thus DRAM also are long long gone.
        Actually, that's not the case. The "3600" in DDR4 is not 3600 Mhz. It's 3600 "equivalent". Original DDR, as its name implied, did two transfers per clock "double pumped" as you say. DDR2 increased that to four data transfers per clock "quad pumped" so DDR2-800 actually ran at 200 Mhz. Likewise, the latest DDR4 real clock rate is a fraction of the advertised "marketing" rate. DDR4-3200 is running a 400 Mhz clock. CPU on the other hand, actually does run at the specified frequency.

        Originally posted by hoohoo View Post
        The path between registers and DRAM is shorter and faster now than it used to be. So why do we need even more levels of cache?
        Yes DRAM is faster than before, but caches are faster still, and not by a small amount. I think memtest86 will show you the actual transfer speed on your system from L1, L2, L3, and main memory. You'll be shocked to see how much faster transfers are from CPU cache than from main memory.

        Edit: This is actually a trick that the retro-computing crowd uses to run old games that are sensitive to cpu speed. Many games from the 1980's have their execution speed tied to the CPU clock, so running them on a Pentium III for example, will result in the game running "too fast" and being unplayable. The solution the retro folks found, is to completely disable all caches in the BIOS, so that the CPU can only interact with main memory and uses 0 onboard cache. This effectively makes a Pentium III run at 486 like performance level. So you can see what a tremendous benefit the CPU cache has on execution speed.
        Last edited by torsionbar28; 02-18-2020, 12:09 PM.

        Comment


        • #5
          Originally posted by torsionbar28 View Post
          Actually, that's not the case. The "3600" in DDR4 is not actually 3600 Mhz. It's 3600 "equivalent". Just like DDR2-800 actually ran at 200 Mhz, modern DDR4 real clock rate is a fraction of the advertised "marketing" rate. CPU on the other hand, actually does run at the specified frequency.


          Yes DRAM is faster than before, but caches are faster still, and not by a small amount. I think memtest86 will show you the actual transfer speed on your system from L1, L2, L3, and main memory. You'll be shocked to see how much faster transfers are from CPU cache than from main memory.
          Thanks torsionbar28. So L1, L2, L3 on the CPU chip are still valuable. I've seen the results from memtest86 also. But why would one want to use something like Optane as a cache for DRAM? Unless I'm misunderstanding Michael's article that's what Intel wants us to do.

          Comment


          • #6
            Originally posted by hoohoo View Post
            Thanks torsionbar28. So L1, L2, L3 on the CPU chip are still valuable. I've seen the results from memtest86 also. But why would one want to use something like Optane as a cache for DRAM? Unless I'm misunderstanding Michael's article that's what Intel wants us to do.
            The article is not about the intel Optane SSD used for storage. It's a different product, called "Optane DC Pmem". The former is for data storage, the latter is a new primary memory (RAM) technology. It is unusual and unique in that it sits on the memory bus, but acts as another tier between RAM and storage.

            Edit: here's a photo so you can see what it looks like:
            https://www.storagereview.com/intel_...ry_module_pmm/
            Last edited by torsionbar28; 02-18-2020, 12:18 PM.

            Comment


            • #7
              Originally posted by torsionbar28 View Post
              The article is not about the intel Optane SSD used for storage. It's a different product, called "Optane DC Pmem". The former is for data storage, the latter is a new primary memory (RAM) technology. It is unusual and unique in that it sits on the memory bus, but acts as another tier between RAM and storage.

              Edit: here's a photo so you can see what it looks like:
              https://www.storagereview.com/intel_...ry_module_pmm/
              Thanks . I had it backwards, I see now.

              Comment


              • #8
                You're welcome, yes to be clear, this is a niche product for servers, not a technology we'll see in end user devices (desktop or laptop) any time soon, if ever.

                Comment


                • #9
                  Originally posted by hoohoo View Post
                  I am not a hardware guy. So I know this question might expose real naivete.

                  DRAM currently can be clocked at the nearly same frequency (transfer rate I suppose) as the CPU itself. By which I mean I am typing this in a PC with a 3800 MHz CPU and DDR4/3600 DRAM. The days of a double pumped 100MHz interface between CPU & chipset thus DRAM also are long long gone. The path between registers and DRAM is shorter and faster now than it used to be. So why do we need even more levels of cache?
                  For the same reason that a Pentium D running at 3.6 Ghz (it's a dualcore) does not have anywhere near the same performance as a random modern dualcore CPU with the same clock speed (probably still called Pentium I guess).

                  Hz is Hertz, it measures frequency. It measures how many times "something" happens per second. Frequency alone is meaningless as you don't know how much work is done each "something" happens.

                  The modern CPU in the example can process MUCH more information per cycle than the Pentium D. Therefore it is faster in practice.

                  RAM frequency says how many times RAM chips do a full cycle per second, but the actual speed is how much information they can move around each cycle, multiplied by the cycles per second.

                  I don't feel like looking up the actual speeds, but in practice the bandwith (GB/s) that RAM can provide are still not anywhere near the speed of an onboard high speed SRAM cache that is sitting at a few microns or (in case of eDRAM aka external cache from the die) even a few millimeters from the CPU core.

                  Also there is latency to take into account. Calling up information from a chip that is electrically at a few cm from the core incurs a higher wait time for the request to reach the chip, be processed and be sent over to the CPU than the same happening with a much closer cache. For the speeds we are talking about, even a few cm matter. Notice how RAM modules are always as close as possible to the CPU.

                  Plus the cache used in the CPU is using a different memory technology, SRAM https://en.wikipedia.org/wiki/Static...-access_memory that is much faster than DRAM (what is used in RAM modules) just because it is a different kind of electrical circuit.
                  Last edited by starshipeleven; 02-18-2020, 12:45 PM.

                  Comment


                  • #10
                    Originally posted by starshipeleven View Post
                    For the same reason that a Pentium D running at 3.6 Ghz (it's a dualcore) does not have anywhere near the same performance as a random modern dualcore CPU with the same clock speed (probably still called Pentium I guess).

                    Hz is Hertz, it measures frequency. It measures how many times "something" happens per second. Frequency alone is meaningless as you don't know how much work is done each "something" happens.

                    The modern CPU in the example can process MUCH more information per cycle than the Pentium D. Therefore it is faster in practice.

                    RAM frequency says how many times RAM chips do a full cycle per second, but the actual speed is how much information they can move around each cycle, multiplied by the cycles per second.

                    I don't feel like looking up the actual speeds, but in practice the bandwith (GB/s) that RAM can provide are still not anywhere near the speed of an onboard high speed SRAM cache that is sitting at a few microns or (in case of eDRAM aka external cache from the die) even a few millimeters from the CPU core.

                    Also there is latency to take into account. Calling up information from a chip that is electrically at a few cm from the core incurs a higher wait time for the request to reach the chip, be processed and be sent over to the CPU than the same happening with a much closer cache. For the speeds we are talking about, even a few cm matter. Notice how RAM modules are always as close as possible to the CPU.

                    Plus the cache used in the CPU is using a different memory technology, SRAM https://en.wikipedia.org/wiki/Static...-access_memory that is much faster than DRAM (what is used in RAM modules) just because it is a different kind of electrical circuit.
                    Just a note: The bandwidth in a single direction is limited by signal-to-noise ratio and is not limited to just short distances. For example, 10GBASE ethernet can transfer 10 Gbit/s over copper wires to a distance of 100 meters and with optical fiber to a distance of up to 80 kilometers. Theoretically, for linear predictable (non-random) memory accesses a DDR4-3600 RAM module could be located 1 kilometer away from the CPU and still sustain the DDR4-3600 read speed of about 28 GB/s.

                    Comment

                    Working...
                    X