Announcement

**jrch2k8** · 29 September 2016, 01:00 PM

Originally posted by sdack View Post

Sorry, minor typo. Setting the variable to the allocator library will force it to link-load it before libc.

Some minor notes about preloading jemalloc or any other malloc replacer globally.

1.) If you are developing C/C++ code, jemalloc/tbbmalloc/tcmalloc can hide/resolve/fix allocations that will SIGSEGV on glibc malloc, so be sure to pass a blank preload to your run enviroment to be sure it actually works on malloc.

2.) More on 1. jemalloc and the likes can sometimes automagically fix memory leaks due the algorithm nature of memory block reusal, so again be sure to check on glibc as well or at least be freaking sure jemalloc need to be available and never fallback to glibc.

3.) Valgrind and GDB can get very angry when this is preloaded, so always verify you clean the preload variable before reporting bugs on them.(yeah been there and was no fun)

**schmidtbag** · 29 September 2016, 01:01 PM

Originally posted by pal666 View Post

why would you care about deps, your package manager will install them automatically

Bloat, primarily. It may not be much (I don't know how much exactly) but stuff like this adds up over time.

it takes imagination to think of memory allocator with dependencies

Right, because it's totally obvious for the average person to know that (sarcasm). When I do development, I either work in Python or relatively simple things in C, neither of which would require a specific memory allocation library that I never heard of until now.

**sdack** · 29 September 2016, 01:19 PM

Originally posted by jrch2k8 View Post

Some minor notes about preloading jemalloc or any other malloc replacer globally.

1.) If you are developing C/C++ code, jemalloc/tbbmalloc/tcmalloc can hide/resolve/fix allocations that will SIGSEGV on glibc malloc, so be sure to pass a blank preload to your run enviroment to be sure it actually works on malloc.

2.) More on 1. jemalloc and the likes can sometimes automagically fix memory leaks due the algorithm nature of memory block reusal, so again be sure to check on glibc as well or at least be freaking sure jemalloc need to be available and never fallback to glibc.

3.) Valgrind and GDB can get very angry when this is preloaded, so always verify you clean the preload variable before reporting bugs on them.(yeah been there and was no fun)

LD_PRELOAD was for a quick test to quiet the sceptics. They'll have to do their own tests now.

One can compile locklessmalloc with LTO and then use it for other code to basically get an inlined version of malloc. It only gets faster. It's a question of how fast you want it or if you need something more versatile with more features. jemalloc-4.2.1 uncompressed is 2,460KB of code, locklessmalloc is 180 KB.

**spirit** · 29 September 2016, 01:45 PM

Originally posted by tpruzina View Post

Does jemalloc ever return allocated memory back to system yet?
IIRC that used to be the biggest no-go in many softwares, cacheing and pooling old chunks for reuse is nice and all, but apart from security implications this somewhat disqualifies it from usage in persistent services/daemons.

Jemalloc 4.1 have a new feature "decay-based unused dirty page purging", which improve a lot the return of allocated memory back to the system (It's an optionnal build for now, but it should be the default for jemalloc 5)
see https://github.com/jemalloc/jemalloc/releases (4.1)

**microcode** · 29 September 2016, 02:20 PM

Originally posted by sdack View Post

*lol*

So it was released a while back. What do you think it does and why do you expect it to be so much more complicated? It's a only few functions and the API is ancient and very simple.

Most allocators follow a complex implementation and therefore require continued development. Not just for the algorithms but for the configuration and build support, too. Lockless malloc is very simple and it works. It just doesn't need further development. It comes as a single C file with a few header files. It's thread-safe without a need for locking by using inline assembler statements for the critical parts. Sure, it doesn't come with a configure script of 10,000 lines, but why is this important when you can have something much simpler?

I like the sound of how they say lockless is better than jemalloc on that page; stating that TLB pressure and memory usage increase with jemalloc in real world situations... but for their only real world test, MySQL, they left out jemalloc! Very strange to me.

Now, lockless does look like a high-quality allocator, but that page is a bit fishy. They are also comparing eglibc, but call it glibc in the charts.

Statically linking in the allocator does seem interesting; I wonder how good the effect of partial inlining would be.

**curaga** · 29 September 2016, 02:34 PM

"My program is allocation-bound" -> "switch allocators", totally not "allocate less you baka".

**sdack** · 29 September 2016, 02:43 PM

Originally posted by microcode View Post

I like the sound of how they say lockless is better than jemalloc on that page; stating that TLB pressure and memory usage increase with jemalloc in real world situations... but for their only real world test, MySQL, they left out jemalloc! Very strange to me.

Now, lockless does look like a high-quality allocator, but that page is a bit fishy.

It's not better, it's faster in some benchmarks. It's also not high quality, but very simple.

The Phoronix article talks about a 10% gain and the use of jemalloc seems to be limited, meaning, it says it's used for compiling shaders in order to get more speed from it. This is why I was asking if locklessmalloc had been tested.

I'm still puzzled how a simple question has brought out so many sceptics today. The code is GPL 3.0. What you then do with it is your business, but it's weak when you don't test it, don't ask questions, but instead talk paranoid and now go as calling an old web page "fishy". I've even given you some numbers - they are real and not made up - and told you how one can quickly test it and so you can see if you can get any speed gains from whatever application you like.

What else do you want? Should I get some strippers, too, throw in some dollar bills and get everyone free drinks?

**bridgman** · 29 September 2016, 03:27 PM

Not 100% sure, but I suspect one factor in the choice is that the AMD devs working on the low level code for Vulkan and similar APIs spent some time looking at alternatives and settled on jemalloc, so that would seem like a good starting point for us.

**pal666** · 29 September 2016, 04:11 PM

Originally posted by schmidtbag View Post

Bloat, primarily. It may not be much (I don't know how much exactly) but stuff like this adds up over time.

so you have irrational fear of something which you even did not bother to look up, therefore you will choose 10% slower shader compiler. nice

**pal666** · 29 September 2016, 04:15 PM

Originally posted by sdack View Post

Using LD_RELOAD did I compare the 4 year old lockless malloc to the recent jemalloc by compiling my kernel just now:

did you test mesa or something other ? btw, your results are inconsistent with results from lockless website, did jemalloc became slower over the years? btw, i think it has license incompatibility with mesa

Announcement

Mesa Looks At Switching To Jemalloc For Faster Performance

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment