Announcement

Collapse
No announcement yet.

Mesa 20.3 Lands Rewritten AMD Zen L3 Cache Optimization

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Mesa 20.3 Lands Rewritten AMD Zen L3 Cache Optimization

    Phoronix: Mesa 20.3 Lands Rewritten AMD Zen L3 Cache Optimization

    You may recall going back to 2018 that well known open-source AMD Mesa driver developer Marek Olsak was working on Mesa optimizations around the AMD Zen architecture. In particular, better handling of Mesa for Zen's L3 cache design. A rewritten implementation of that has now landed along with some other improvements...

    Phoronix, Linux Hardware Reviews, Linux hardware benchmarks, Linux server benchmarks, Linux benchmarking, Desktop Linux, Linux performance, Open Source graphics, Linux How To, Ubuntu benchmarks, Ubuntu hardware, Phoronix Test Suite

  • #2
    Is smart access memory (resizable bar support) planned for Linux?
    ## VGA ##
    AMD: X1950XTX, HD3870, HD5870
    Intel: GMA45, HD3000 (Core i5 2500K)

    Comment


    • #3
      Originally posted by darkbasic View Post
      Is smart access memory (resizable bar support) planned for Linux?
      Phoronix: Linux Support Expectations For The AMD Radeon RX 6000 Series Lisa Su is about to begin the presentation unveiling the much anticipated Radeon RX 6000 "Big Navi" (RDNA 2) graphics cards. This article will be updated live as the event progresses but first up let's recap the current Linux open-source driver

      Comment


      • #4
        Michael is there some Benchmark scheme planed? Some Ryzen 3,5 up to maybe Threadripper + some Intel Xeons and a non-Xeon to see if those are profiting as well?

        edit.: this would be funny if Zen code would even leverage Intels performance. Good guy AMD xD
        Last edited by CochainComplex; 30 October 2020, 08:15 AM.

        Comment


        • #5
          Marek is one of the reasons I intend to buy an AMD GPU again.

          Comment


          • #6
            Originally posted by schmidtbag View Post
            Marek is one of the reasons I intend to buy an AMD GPU again.
            Mesa Driver stack for AMD is indeed a great piece of work.

            Comment


            • #7
              Hmmmm....would be nice to see a benchmark for various generation models of Zen. Here's why. With Zen 3 AMD is combining the once separate blocks of L3 cache into one whole block for the different cores. With Zen 2 you typically had two 16 MB blocks of L3 cache. Now Zen 3 has one unified 32MB block...and more with certain variants.

              So....what I would like to see is...

              A: What difference in performance does this change in Mesa produce on its own regardless of L3 cache design of the various Zen generations starting from Zen 1 to 3

              B: Does the unified L3 cache in Zen 3 produce some "multiplier" effect to this code change in Mesa. In other words, if we were to see a say....5% uplift in performance just from the unified L3 design in Zen 3 over Zen 2, would we then see a 10% rise simply by applying this Mesa optimization to the unified L3 cache design?

              Comment


              • #8
                Originally posted by Jumbotron View Post
                With Zen 3 AMD is combining the once separate blocks of L3 cache into one whole block for the different cores.
                no, they are doubling its size. from 4 cores per block to 8. and with 32m of cache instead of 16. yes, now you can have 8 core cpu on one block instead of on two. but maybe now it's time for 16 core cpu instead of 8
                Originally posted by Jumbotron View Post
                B: Does the unified L3 cache in Zen 3 produce some "multiplier" effect to this code change in Mesa.
                you mean twice as big l3 cache. yes, twice as big cache is better

                Comment


                • #9
                  Originally posted by pal666 View Post
                  no, they are doubling its size. from 4 cores per block to 8. and with 32m of cache instead of 16. yes, now you can have 8 core cpu on one block instead of on two. but maybe now it's time for 16 core cpu instead of 8
                  you mean twice as big l3 cache. yes, twice as big cache is better
                  I don't think you got what I was saying. L3 is not actually increasing or doubling. Zen 2 has 32 MB of L3 cache divided into two 16 mb sized blocks. Now it will be a unifed block of 32 MB in Zen 3.

                  Secondly, twice as big cache or what ever size increase isn't necessarily better if you don't have enough data to stuff it all in the first place. But what it DOES get you from going from two blocks to a single block is less hops between each core to each block as now there is only one for ALL cores to access whenever they need.

                  Now...look at my questions again.

                  Comment


                  • #10
                    Originally posted by Jumbotron View Post

                    I don't think you got what I was saying. L3 is not actually increasing or doubling. Zen 2 has 32 MB of L3 cache divided into two 16 mb sized blocks. Now it will be a unifed block of 32 MB in Zen 3.

                    Secondly, twice as big cache or what ever size increase isn't necessarily better if you don't have enough data to stuff it all in the first place. But what it DOES get you from going from two blocks to a single block is less hops between each core to each block as now there is only one for ALL cores to access whenever they need.

                    Now...look at my questions again.
                    I'm not really sure what you're getting at.

                    This code only pins threads if it detects a cpu with multiple L3 caches, so it should have no effect on anything but (most) Zen 2 and (fewer # of) Zen 3 models.

                    For the Zen 3 cache change, if anything I'd say it will probably minimize the benefits of this just simply because the scheduler isn't going to be guessing wrong as often on a Zen 3 chip with 2 L3 caches, as it would on a Zen 2 chip with 4 L3 caches. But beyond that, if you take the randomness of the scheduler out of it I suppose the bigger L3 cache would certainly help out when more active threads are pinned to it. I feel like you know that already, though, so I may be missing the point of your question.
                    Last edited by smitty3268; 30 October 2020, 09:43 PM.

                    Comment

                    Working...
                    X