Announcement

Collapse
No announcement yet.

Mesa Looks At Switching To Jemalloc For Faster Performance

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #21
    Originally posted by sdack View Post

    Sorry, minor typo. Setting the variable to the allocator library will force it to link-load it before libc.
    Some minor notes about preloading jemalloc or any other malloc replacer globally.

    1.) If you are developing C/C++ code, jemalloc/tbbmalloc/tcmalloc can hide/resolve/fix allocations that will SIGSEGV on glibc malloc, so be sure to pass a blank preload to your run enviroment to be sure it actually works on malloc.

    2.) More on 1. jemalloc and the likes can sometimes automagically fix memory leaks due the algorithm nature of memory block reusal, so again be sure to check on glibc as well or at least be freaking sure jemalloc need to be available and never fallback to glibc.

    3.) Valgrind and GDB can get very angry when this is preloaded, so always verify you clean the preload variable before reporting bugs on them.(yeah been there and was no fun)

    Comment


    • #22
      Originally posted by pal666 View Post
      why would you care about deps, your package manager will install them automatically
      Bloat, primarily. It may not be much (I don't know how much exactly) but stuff like this adds up over time.
      it takes imagination to think of memory allocator with dependencies
      Right, because it's totally obvious for the average person to know that (sarcasm). When I do development, I either work in Python or relatively simple things in C, neither of which would require a specific memory allocation library that I never heard of until now.

      Comment


      • #23
        Originally posted by jrch2k8 View Post
        Some minor notes about preloading jemalloc or any other malloc replacer globally.

        1.) If you are developing C/C++ code, jemalloc/tbbmalloc/tcmalloc can hide/resolve/fix allocations that will SIGSEGV on glibc malloc, so be sure to pass a blank preload to your run enviroment to be sure it actually works on malloc.

        2.) More on 1. jemalloc and the likes can sometimes automagically fix memory leaks due the algorithm nature of memory block reusal, so again be sure to check on glibc as well or at least be freaking sure jemalloc need to be available and never fallback to glibc.

        3.) Valgrind and GDB can get very angry when this is preloaded, so always verify you clean the preload variable before reporting bugs on them.(yeah been there and was no fun)
        LD_PRELOAD was for a quick test to quiet the sceptics. They'll have to do their own tests now.

        One can compile locklessmalloc with LTO and then use it for other code to basically get an inlined version of malloc. It only gets faster. It's a question of how fast you want it or if you need something more versatile with more features. jemalloc-4.2.1 uncompressed is 2,460KB of code, locklessmalloc is 180 KB.

        Comment


        • #24
          Originally posted by tpruzina View Post
          Does jemalloc ever return allocated memory back to system yet?
          IIRC that used to be the biggest no-go in many softwares, cacheing and pooling old chunks for reuse is nice and all, but apart from security implications this somewhat disqualifies it from usage in persistent services/daemons.
          Jemalloc 4.1 have a new feature "decay-based unused dirty page purging", which improve a lot the return of allocated memory back to the system (It's an optionnal build for now, but it should be the default for jemalloc 5)
          see https://github.com/jemalloc/jemalloc/releases (4.1)

          Comment


          • #25
            Originally posted by sdack View Post
            *lol*

            So it was released a while back. What do you think it does and why do you expect it to be so much more complicated? It's a only few functions and the API is ancient and very simple.

            Most allocators follow a complex implementation and therefore require continued development. Not just for the algorithms but for the configuration and build support, too. Lockless malloc is very simple and it works. It just doesn't need further development. It comes as a single C file with a few header files. It's thread-safe without a need for locking by using inline assembler statements for the critical parts. Sure, it doesn't come with a configure script of 10,000 lines, but why is this important when you can have something much simpler?
            I like the sound of how they say lockless is better than jemalloc on that page; stating that TLB pressure and memory usage increase with jemalloc in real world situations... but for their only real world test, MySQL, they left out jemalloc! Very strange to me.

            Now, lockless does look like a high-quality allocator, but that page is a bit fishy. They are also comparing eglibc, but call it glibc in the charts.

            Statically linking in the allocator does seem interesting; I wonder how good the effect of partial inlining would be.
            Last edited by microcode; 29 September 2016, 02:29 PM.

            Comment


            • #26
              "My program is allocation-bound" -> "switch allocators", totally not "allocate less you baka".

              Comment


              • #27
                Originally posted by microcode View Post
                I like the sound of how they say lockless is better than jemalloc on that page; stating that TLB pressure and memory usage increase with jemalloc in real world situations... but for their only real world test, MySQL, they left out jemalloc! Very strange to me.

                Now, lockless does look like a high-quality allocator, but that page is a bit fishy.
                It's not better, it's faster in some benchmarks. It's also not high quality, but very simple.

                The Phoronix article talks about a 10% gain and the use of jemalloc seems to be limited, meaning, it says it's used for compiling shaders in order to get more speed from it. This is why I was asking if locklessmalloc had been tested.

                I'm still puzzled how a simple question has brought out so many sceptics today. The code is GPL 3.0. What you then do with it is your business, but it's weak when you don't test it, don't ask questions, but instead talk paranoid and now go as calling an old web page "fishy". I've even given you some numbers - they are real and not made up - and told you how one can quickly test it and so you can see if you can get any speed gains from whatever application you like.

                What else do you want? Should I get some strippers, too, throw in some dollar bills and get everyone free drinks?

                Comment


                • #28
                  Not 100% sure, but I suspect one factor in the choice is that the AMD devs working on the low level code for Vulkan and similar APIs spent some time looking at alternatives and settled on jemalloc, so that would seem like a good starting point for us.
                  Test signature

                  Comment


                  • #29
                    Originally posted by schmidtbag View Post
                    Bloat, primarily. It may not be much (I don't know how much exactly) but stuff like this adds up over time.
                    so you have irrational fear of something which you even did not bother to look up, therefore you will choose 10% slower shader compiler. nice

                    Comment


                    • #30
                      Originally posted by sdack View Post
                      Using LD_RELOAD did I compare the 4 year old lockless malloc to the recent jemalloc by compiling my kernel just now:
                      did you test mesa or something other ? btw, your results are inconsistent with results from lockless website, did jemalloc became slower over the years? btw, i think it has license incompatibility with mesa

                      Comment

                      Working...
                      X