Announcement

Collapse
No announcement yet.

Embree 4.0 Is Running Well On Intel 4th Gen Xeon Scalable "Sapphire Rapids"

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Embree 4.0 Is Running Well On Intel 4th Gen Xeon Scalable "Sapphire Rapids"

    Phoronix: Embree 4.0 Is Running Well On Intel 4th Gen Xeon Scalable "Sapphire Rapids"

    This week Intel released Embree 4.0 as the newest version of their open-source, high performance ray-tracing library. While the headline feature is now having support for GPU acceleration with SYCL to take advantage of Arc Graphics and other GPU hardware with SYCL support, for those that have long been using Embree on CPUs its performance has also improved. Here are some initial CPU-based benchmarks I did this week on Embree 4.0 with Intel's new 4th Gen Xeon Scalable "Sapphire Rapids" processors.

    Phoronix, Linux Hardware Reviews, Linux hardware benchmarks, Linux server benchmarks, Linux benchmarking, Desktop Linux, Linux performance, Open Source graphics, Linux How To, Ubuntu benchmarks, Ubuntu hardware, Phoronix Test Suite

  • #2
    coder Here it is, more data proving that AVX-512 can deliver more performance with even less power consumption vs. AVX2 on Sapphire Rapids.

    Comment


    • #3

      For Comparison this is How Genoa 9654 Compares. AMD AVX-512 is very Powerful.

      2S Genoa 9654 vs 2S Sapphire Rapids 8490H - Embree 4.0 Benchmark Comparison​​​​

      image.png​​
      Last edited by nicalandia; 10 February 2023, 03:07 PM.

      Comment


      • #4
        Interesting details on the Zen4 AVX512 implementation:



        It's more competent that I initially thought. Especially in the permutations department.

        Comment


        • #5
          Originally posted by ms178 View Post
          coder Here it is, more data proving that AVX-512 can deliver more performance with even less power consumption vs. AVX2 on Sapphire Rapids.
          You seem to be mischaracterizing my position.
          1. I never said AVX-512 isn't a performance win for vector-intensive workloads. So, the point about "more performance" was never in contention for these benchmarks. Or, quite frankly, any benchmarks Michael would tend to include in these articles, because he intentionally picks vector-intensive workloads for them.
          2. The power numbers aren't appreciably different. The difference is only 1.2%. However, the idea that it's doing more work using any less power is something that should provoke questions, unless one is being heavily partisan.
          3. The downsides of AVX-512 are certainly less on newer implementations than they were on the old 14 nm CPUs. That's not to say there's still no workload for which AVX-512 is a net-negative. We don't know that, because Michael hasn't run tests which could tease out such a case.

          I'd be happy to discuss why I think it's using a little less power, but your tone gives me some doubt about whether it'd be worthwhile.
          Last edited by coder; 15 February 2023, 06:31 AM.

          Comment


          • #6
            Originally posted by nicalandia View Post
            For Comparison this is How Genoa 9654 Compares. AMD AVX-512 is very Powerful.

            2S Genoa 9654 vs 2S Sapphire Rapids 8490H - Embree 4.0 Benchmark Comparison​​​​

            image.png​​
            True. However, I'd just point out that Genoa got 0.985 points per core, whereas Sapphire Rapids got 1.041 points per core, making it 5.7% faster per-core. Now, maybe Genoa would've come out ahead with a similar core count, or maybe Sapphire Rapids' wider implementation would win the day... in the end, it's a minor point. Perhaps more useful would be to compare them in perf/$ and perf/W, where I expect Genoa would still maintain a definitive lead.

            Comment


            • #7
              Originally posted by kobblestown View Post
              Interesting details on the Zen4 AVX512 implementation:



              It's more competent that I initially thought. Especially in the permutations department.
              Here, also (linked from the bottom of that post, but maybe some won't read to the bottom):

              Comment


              • #8
                Originally posted by coder View Post
                I'd be happy to discuss why I think it's using a little less power, but your tone gives me some doubt about whether it'd be worthwhile.
                No offense intended on my part, I summoned you simply because Michael provided more evidence of the quality for Sapphire Rapids AVX-512 implementation that backed up my optimism about its quality, maybe you might not have noticed that article by yourself. After all busy days are a thing that might prevent people from reading every Phoronix article.

                I still remember our past argument very well. While there might be AVX-512 workloads out there which behave differently, we won't know that until such evidence is presented to assess that. But at least in the case of EMBREE (next to a significant number of other AVX-512 heavy benchmarks during the last few months), I cannot remember to see any regresssion in the data. We at least seem to largely agree on the improved usefulness of that "new" ISA in newer CPU implementations.

                But as you still cast some doubt about other AVX-512 workloads, I'd like to see at least some data or potential interesting benchmarks to you that might back that up. And with regression, I mean significantly regressing in one of these metrics such as the 14nm parts did. As Michael cannot test every AVX-512 application out there, I'd say the presented tests so far are good enough for me to generalize the statement that AVX-512 implementations are now bringing real benefits without regressing either performance or power usage. For the longevity of a CPU purchase today, this feature becomes even more relevant on Linux as x86-64-v4 repos sooner or later will use AVX-512 over a wider range of packages. Don't hesitate to summon me if such data to the contrary will become available somewhen in the future. I am a fan of facts and data after all, not speculation or theories that aren't backed by sufficient evidence.

                Comment


                • #9
                  Originally posted by ms178 View Post
                  No offense intended on my part, I summoned you simply because Michael provided more evidence of the quality for Sapphire Rapids AVX-512 implementation that backed up my optimism about its quality,
                  The data he presented indeed looks positive.

                  Originally posted by ms178 View Post
                  maybe you might not have noticed that article by yourself. After all busy days are a thing that might prevent people from reading every Phoronix article.
                  I've indeed fallen behind.
                  😩

                  Originally posted by ms178 View Post
                  We at least seem to largely agree on the improved usefulness of that "new" ISA in newer CPU implementations.
                  Each new generation should lessen the downsides. The exception to that might come if Intel or AMD decide to significantly widen their implementation. Now that Intel has branched out to AMX, I wonder if they spend more die area on widening their AVX-512 pipeline, or just leave well enough alone.

                  Originally posted by ms178 View Post
                  But as you still cast some doubt about other AVX-512 workloads, I'd like to see at least some data or potential interesting benchmarks to you that might back that up. And with regression, I mean significantly regressing in one of these metrics such as the 14nm parts did.
                  Michael could add a test case to PTS (or maybe find an existing one) like what's described in this article:

                  That would be very illuminating to see how performance of encrypted web-serving, using an AVX-512 accelerated cipher, evolved from Intel's 14 nm server CPUs, through Ice Lake, and now Sapphire Rapids. Ideally, Milan and Genoa, too.

                  The first step would be to identify a workload that regresses with AVX-512 on Skylake SP. After that article was written, I wonder if OpenSSL might've disabled their AVX-512 codepath, or maybe just on those first-gen AVX-512 CPUs.

                  Originally posted by ms178 View Post
                  For the longevity of a CPU purchase today, this feature becomes even more relevant on Linux as x86-64-v4 repos sooner or later will use AVX-512 over a wider range of packages.
                  ISA feature levels & builtin runtime detection are more likely avenues for its use. Either way, you're right that the wins we're seeing on newer CPUs & Zen 4's mainstream support for it will motivate more people to use it.

                  Originally posted by ms178 View Post
                  Don't hesitate to summon me if such data to the contrary will become available somewhen in the future. I am a fan of facts and data after all, not speculation or theories that aren't backed by sufficient evidence.
                  😎

                  Comment

                  Working...
                  X