Consider that THP is already default enabled for RHEL, and the page cache and swap are recent experimental kernel options that seem to help a lot. I have a suspicion that doing things like allocating memory within the kernel in huge pages might 'unclog' the TLB and lead to significant performance gains. If the TLB gets clobbered every time the system needs to do kernel stuff, the full advantages of THP aren't being realized.
Comment