Announcement

Collapse
No announcement yet.

Intel Nehalem vs. Ice Lake Benchmarks - Including Clock + Power + Thermal Metrics

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #11
    Originally posted by stormcrow View Post
    If you're doing anything other than ML algorithms, apparently so.
    Is that a general statement or just for the Intel MKL-DNN (now DNNL) code ?

    I wonder if the MKL_DEBUG_CPU_TYPE=5 environment variable (enable AVX on non-Intel CPUs) works for the DNN library ?
    Test signature

    Comment


    • #12
      Originally posted by bridgman View Post

      Is that a general statement or just for the Intel MKL-DNN (now DNNL) code ?

      I wonder if the MKL_DEBUG_CPU_TYPE=5 environment variable (enable AVX on non-Intel CPUs) works for the DNN library ?
      According to the following reddit post it should work everywhere where this library is used, but I cannot test it myself: https://www.reddit.com/r/matlab/comm...depath_on_amd/

      I probably mentioned it before to you and it is unrelated to MKL-DNN but not enabling the fast path on AMD CPUs in Glibc is really in the same category: https://sourceware.org/ml/libc-alpha.../msg00155.html

      With these sorts of artificial crippling out of the way, the performance on AMD only can get even better.

      Comment


      • #13
        Originally posted by stormcrow View Post
        If you're doing anything other than ML algorithms, apparently so.
        Deep learning really belongs on GPUs and purpose-built chips. I'd bet even the iGPU in that Ice Lake chip out-performs its AVX-512 path.

        Comment


        • #14
          I have a 'real' Nehalem (Lynnfield). X3470 4c/8t 2,93 GHz/3,6 turbo
          It is not that bad.

          Comment


          • #15
            Originally posted by nuetzel View Post
            I have a 'real' Nehalem (Lynnfield). X3470 4c/8t 2,93 GHz/3,6 turbo
            It is not that bad.
            maybe because the performance increase over the last decade was not so substential. Due to the lack of competition. Luckily the game is changing thanks to the zen arch and chiplet design. Otherwise we would still have a 4 core base with an average of 5% performance increase by each "new" platform. If nahelem is performing well it is a bad sign for current gen cpus....

            Comment


            • #16
              What was most interesting to me is the that the only thing that really changed was clock speed and new instructions. Only 2MB+ cache in 10 years and still 4 cores. Really shows how Moore's law is slowing down. I'm ready for 16 core laptop processors at least considering we have 64 core server parts.

              Comment


              • #17
                As someone who is still running one of these old chips (stock i7 920) it's nice to see a rare article showing how they stack up against current generation CPUs. I only recall one other article in recent years. Thanks Michael.

                Comment


                • #18
                  Originally posted by kylew77 View Post
                  What was most interesting to me is the that the only thing that really changed was clock speed and new instructions.
                  WTF? No.

                  Let's start by looking at some more numbers.

                  Parameter Nehalem Ice Lake Improvement
                  L2 TLB (entries) 512 2048 400%
                  Load Buffer (entries) 48 72 150%
                  Store Buffer (entries) 32 128 400%
                  μOp Cache (entries) - 2.25k
                  Instruction Decoders 4 5 125%
                  Execution Ports 6 10 167%
                  Reorder Buffer (entries) 128 352 275%
                  L1 DCache (kB) 32 48 150%
                  L1 DCache (associativity) 8 12 150%
                  L2 Cache (kB) 256 512 200%


                  I've yet to find the size of their shadow register files (or actually a number of detailed μArch parameters of Ice Lake), but I'm sure that scaled up, as well.

                  Of course, the numbers don't tell the whole story. A lot of sophistication has been added to various aspects of the μArch, including things like μOp fusion, branch prediction, etc.

                  As for "new instructions" - that scarcely hints at the nature and extent of the functional differences between these cores. Under that rubric sits (listed by cpuid flag, with major extensions in bold):
                  And that's just up to Skylake - the newest generation I have. Ice Lake will also have AVX-512, specifically: F, CD, VL, DQ, BW, IFMA, VBMI, VBMI2, VPOPCNTDQ, BITALG, VNNI, VPCLMULQDQ, GFNI, and VAES.

                  Of particular note, AVX/AVX2 widens the SIMD units and registers from 128 bits to 256. AVX-512 obviously doubles this, again.

                  Originally posted by kylew77 View Post
                  Only 2MB+ cache in 10 years and still 4 cores.
                  The comparison is misleading. The Nehalem CPU he tested was a high-end part without integrated graphics, while the Ice Lake CPU is a mid/low-end SoC with an iGPU and more (see below).

                  Because 10 nm still can't deliver comparable clock speeds, Intel is keeping the performance segment at 14 nm, for the time being. That's why Ice Lake only goes up to quad-core, but you can already get 6-core mobile chips in the Comet Lake series.

                  You're also missing the fact that Ice Lake chips have up to 64 EU GPUs, whereas the previous limit for the mainstream was 24 EUs. In contrast, Nehalem never had an on-die GPU, but the Clarkdale chips had a dual-core CPU with a 12 EU GPU on a separate die. Don't forget that it was also a much more primitive GPU, with no media acceleration (as QuickSync video acceleration was only introduced in SandyBridge) and supporting only D3D 10.1 and OpenGL 2.1.

                  Ice Lake also has a dedicated neural processor, which they call the GNA (Gaussian Neural Accelerator). It has a lot else, as well - much more sophisticated clock gating and power management and Thunderbolt integration.

                  To summarize, here's a list of all the additional blocks Ice Lake has that you won't find in the Nehalem used for these benchmarks:
                  • Gen11 iGPU w/ media encode/decode acceleration
                  • GNA neural accelerator
                  • Image Processor (4th gen)
                  • Thunderbolt 3
                  • Integrated PCH with:
                    • Wi-Fi 6 (Gig+)
                    • Audio DSP
                    • 6x USB 3.1
                    • PCIe 3.0 x16
                    • 3x SATA-3
                    • eMMC 5.1


                  Originally posted by kylew77 View Post
                  Really shows how Moore's law is slowing down. I'm ready for 16 core laptop processors at least considering we have 64 core server parts.
                  45 nm vs. 10 nm is nominally a 20x density improvement. Sure, it should be about 32x in 10 years, but you seriously overstate your case.

                  As for where the 20x transistor budget went, consider the following:
                  • 10 nm wafers are certainly more expensive than the old 45 nm wafers used by Nehalem, so you can't assume constant area.
                  • Deeper, wider, more sophisticated cores means you don't get linear scaling of core count.
                  • 512-bit AVX registers & arithmetic takes a huge amount of area.
                  • Many specialized processing blocks (see above).
                  • This is a lower-end chip - Intel's roadmap shows 26-core Ice Lake server chips, arriving early next year.

                  That said, if the 10 nm manufacturing process were performing as Intel originally hoped, you'd probably be seeing 8 and 10 core Ice Lake chips for higher-end laptops.

                  References:
                  1. https://www.anandtech.com/show/14514...and-sunny-cove
                  2. https://www.anandtech.com/show/2594 (Nehalem - Everything You Need to Know about Intel's New Architecture)
                  3. https://www.anandtech.com/show/2663 (Nehalem: The Unwritten Chapters)
                  4. https://www.anandtech.com/show/2671 (Nehalem Part 3: The Cache Debate, LGA-1156 and the 32nm Future)
                  5. https://www.anandtech.com/show/2901 (The Clarkdale Review: Intel's Core i5 661, i3 540 & i3 530)
                  6. https://en.wikichip.org/wiki/intel/m...e_lake_(client)
                  7. https://en.wikichip.org/wiki/intel/m...ehalem_(client)
                  8. https://en.wikipedia.org/wiki/Ice_Lake_(microprocessor)
                  9. https://en.wikipedia.org/wiki/Nehale...roarchitecture)
                  10. https://en.wikipedia.org/wiki/List_o...1st_Generation)
                  11. https://www.realworldtech.com/nehalem/
                  12. https://www.agner.org/optimize/microarchitecture.pdf
                  Last edited by coder; 28 November 2019, 09:41 PM.

                  Comment


                  • #19
                    coder you are citing Agner Fog thats great. IMHO his work is no known enough.

                    Comment


                    • #20
                      Originally posted by CochainComplex View Post
                      coder you are citing Agner Fog thats great. IMHO his work is no known enough.
                      Yeah, although for the purpose of that post, the Real World Tech article was actually a better resource. In fact, I later noticed that he even cited it at the end of his Core 2/Nehalem section.

                      There were 8 more parameters that I found for Nehalem, but not Ice Lake. I'm sure part of that is due to Ice Lake's newness, but I think Intel is less forthcoming with details than it used to be. It also doesn't help that Ice Lake is mostly targeted at the mid-performance laptop market, so not very interesting for gamers & therefore attracting less attention.

                      Comment

                      Working...
                      X