Announcement

**RomuloP** · 24 August 2019, 12:52 PM

Originally posted by oiaohm View Post

Sorry some desktop users do end up running unstopped for half a year on laptops by do some work hibernate, restore do some more work then hibernate in a never ending loop. Result is no clean reboot in middle for quite sometime.

So yes fragmentation does effect some desktop users.

As I said they do not run huge databases that make use of THP/HP, fragmentation is not a problem when you can make data virtually contiguous, RAM don't suffer from random access penalty, and thanks to page table, it do not result in CPU penalty. My argument was that, even if using huge pages, majority do not run on for ages. And yes, some of them are doing both, don't make advocating to everyone they need swap really good education about the subject.

Originally posted by oiaohm View Post

Problem is ways people use desktops are not always black and white. Those hibernating require swap and that hibernation also means they can be running for 6 months without a clean reboot with all the security problems that brings.

Lot of ways with secureboot we have to rethink hibernate anyhow.

Swap when you are not using programs pushing into absolute out of swap and ram swap allows you to run programs that really do require more ram than you have at a price. If you don't in fact get to starvation swap works ok.

Sure, but majority that is removing swap don't care about hibernate, and the ones that do use for this must be aware to patch their system to be able to update without reboot. I'm not advocating against swap as default, it is a sane default, I'm advocating against all this "you need SWAP, no swap is always or almost always wrong", it is just bad educating.

**oiaohm** · 24 August 2019, 03:09 PM

Originally posted by RomuloP View Post

As I said they do not run huge databases that make use of THP/HP, fragmentation is not a problem when you can make data virtually contiguous, RAM don't suffer from random access penalty, and thanks to page table, it do not result in CPU penalty. My argument was that, even if using huge pages, majority do not run on for ages. And yes, some of them are doing both, don't make advocating to everyone they need swap really good education about the subject.

Huge pages in one place that talked about fragmentation because they refused to allocate. The performance hit is that IO operations end up fragmented in memory as well. Stuff may be appearing not fragmented in virtual memory page table but due to not be able to defrag structures is in fact fragmented in the physical.

[ubuntu_mate] Slow startup after deleting swap

https://ubuntuforums.org/showthread.php?t=2398822

I'm using and SSD and wanted to get rid of Swap partition, erased the swap partitio and erased the line on /etc/fstab for this swap partition. Now when I boot up my laptop it takes a long time to start as compared to what I'm used to with this same SSD.

The behavour of system slowing down because you have disabled swap due to buffers not being able to be allocated in physical memory continuously has been true of over a decade so you cannot in place DMA. THP/HP falls and DMA operations slow down. Yes fragmented structures can be killing your performance before you have even logged in. This normally 1-10 percent slower on the system because its normally not too bad. We don't have a good /proc/fraginfo thing to tell people to look at to monitor how badly their memory is in fact fragmented and how often DMA failed to use in place in physical in mainline kernel there are a few patches added to embedded kernels to extract this information.

Originally posted by RomuloP View Post

Sure, but majority that is removing swap don't care about hibernate, and the ones that do use for this must be aware to patch their system to be able to update without reboot. I'm not advocating against swap as default, it is a sane default, I'm advocating against all this "you need SWAP, no swap is always or almost always wrong", it is just bad educating.

You hear people complaining that swapon on/off takes ages. This is because they hibernate they have heard foolish arguement about swap effecting performance so disable swap once there system is booted and renable it to shutdown. Yes DMA screwed up by fragmentation makes hibernate slower.

https://static.sched.com/hosted_files/ossna19/cc/ELC2019%20Flash%20Aware%20Hibernation%20based%20Boot.pdf

This here trying to make hibernation faster with blind deduplication they found out how to rapidly fragment memory so after restore from their new form hibernation performance was always performing worst so they went from 150-140 MB/s read performance to 150-110MB/s read performance and yes the 110MB/s they were still dropping they did not run the system until IO performance in fact levelled out. This is a current day Linux kernel it does not like memory fragmentation. This is a nice 26% vaporisation in IO performance but the system had not run long enough to get to worst of what they done because they were stopping it after boot time so the effect of memory fragmentation on your IO performance is way worse than 26% if it really gets out of control.

Hibernation is another area where groups measure and detect this structure fragmentation badly effecting IO performance. Remember this is not that you hardware has got worse this is that Linux cannot exploit the IO hardware provides if physical memory is fragmented badly. You can be fragmented badly before boot is complete and are depending on swap so that defrags out while you are running at this stage. Yes that fragmentation is always hitting your IO but you want to keep it under 10 percent performance hit.

Yes we are leaving a hell load of performance sitting on the table due to memory fragmentation and lack of means to defrag all of it without using swap.

Those worried about swap usage one of the worse things is if you are not running zswap you don't get in memory deduplication and you are not compressed on what you are sending to the storage even using zswap. Swap does need a major rework.

**RomuloP** · 24 August 2019, 08:41 PM

Originally posted by oiaohm View Post

Huge pages in one place that talked about fragmentation because they refused to allocate. The performance hit is that IO operations end up fragmented in memory as well. Stuff may be appearing not fragmented in virtual memory page table but due to not be able to defrag structures is in fact fragmented in the physical.

And? As I said HP/THP is very rare in desktops, by default it is strictly limited to madvise, and it is an extremely sane default, Using THP in desktop is an horrible hack anyway, they where planed to avoid changing code base and still have HP but actually only run acceptable in HPC servers that are dominated by the database and have nothing more inducing fragmentation as generaly they are bad neighbor processes. The only sane use for HP is pre-allocating at boot what will guarantee contiguous allocation and avoid huge latency overhead on in-time allocations of THP and performance quirks on apps not planed with a page "moving" on physical layer in search of contiguity. Anyway, apart from wrong, using THP on desktops with high presurre on RAM is more than out of common place and not a good argument for "use swap always because alway it is needed".

Originally posted by oiaohm View Post

[ubuntu_mate] Slow startup after deleting swap

https://ubuntuforums.org/showthread.php?t=2398822

I'm using and SSD and wanted to get rid of Swap partition, erased the swap partitio and erased the line on /etc/fstab for this swap partition. Now when I boot up my laptop it takes a long time to start as compared to what I'm used to with this same SSD.

The link is clearly a case of someone forgetting to re-configure the init-ram to avoid getting kernel searching for a non existent UUID. This do not make "swap always important"true.

Originally posted by oiaohm View Post

The behavour of system slowing down because you have disabled swap due to buffers not being able to be allocated in physical memory continuously has been true of over a decade so you cannot in place DMA. THP/HP falls and DMA operations slow down. Yes fragmented structures can be killing your performance before you have even logged in. This normally 1-10 percent slower on the system because its normally not too bad. We don't have a good /proc/fraginfo thing to tell people to look at to monitor how badly their memory is in fact fragmented and how often DMA failed to use in place in physical in mainline kernel there are a few patches added to embedded kernels to extract this information.

You hear people complaining that swapon on/off takes ages. This is because they hibernate they have heard foolish arguement about swap effecting performance so disable swap once there system is booted and renable it to shutdown. Yes DMA screwed up by fragmentation makes hibernate slower.

It was true, not a so critical problem nowadays that we can move/compact user-space pages without help of swap thanks to techniques like ZONE_MOVABLE, lumpy reclaim and memory compaction, that even help with HP/THP. Lets move on and stop advocating duct tapes from 10 years ago around DMA buffers.

Desktop user don't need to monitor fragmentation, DMA buffers suffer from very low fragmentation nowadays, it is not the horrible place you clame to be in performance today, also THP/HP is not a thing in desktop.

But the most important, just another bunch of rare cases that are much more a thing in servers, glorified as common user things again... Not to point the anecdotes.

Originally posted by oiaohm View Post

https://static.sched.com/hosted_file...sed%20Boot.pdf

This here trying to make hibernation faster with blind deduplication they found out how to rapidly fragment memory so after restore from their new form hibernation performance was always performing worst so they went from 150-140 MB/s read performance to 150-110MB/s read performance and yes the 110MB/s they were still dropping they did not run the system until IO performance in fact levelled out. This is a current day Linux kernel it does not like memory fragmentation. This is a nice 26% vaporisation in IO performance but the system had not run long enough to get to worst of what they done because they were stopping it after boot time so the effect of memory fragmentation on your IO performance is way worse than 26% if it really gets out of control.

Hibernation is another area where groups measure and detect this structure fragmentation badly effecting IO performance. Remember this is not that you hardware has got worse this is that Linux cannot exploit the IO hardware provides if physical memory is fragmented badly. You can be fragmented badly before boot is complete and are depending on swap so that defrags out while you are running at this stage. Yes that fragmentation is always hitting your IO but you want to keep it under 10 percent performance hit.

Yes we are leaving a hell load of performance sitting on the table due to memory fragmentation and lack of means to defrag all of it without using swap.

Those worried about swap usage one of the worse things is if you are not running zswap you don't get in memory deduplication and you are not compressed on what you are sending to the storage even using zswap. Swap does need a major rework.

And? What a storage based method with fragmentation has to do with RAM fragmentation? Linux RAM is not obligating them to store the image deduplicated. Pointing a link and talking about anecdotes will not magically link it, I can show. Where it states the fragmentation comes from RAM? We all know fragmentation affect performance, but this do not make all the rest about "linux RAM is supper fragmented and slow without swap what makes huge difference in hibernation" true.

**oiaohm** · 24 August 2019, 09:49 PM

Originally posted by RomuloP View Post

It was true, not a so critical problem nowadays that we can move/compact user-space pages without help of swap thanks to techniques like ZONE_MOVABLE, lumpy reclaim and memory compaction, that even help with HP/THP. Lets move on and stop advocating duct tapes from 10 years ago around DMA buffers.

Except those reclaims did not in fact compact all memory structures.

https://www.kernel.org/doc/Documentation/sysctl/vm.txt

compact_unevictable_allowed there are reason why these options are still here. https://archive.org/details/lca2018-...uge_Page_overh This covered from 2018 that the two write up you pointed are not the full story and was just the start of the idea to fix it. Over 10 years developers started attempting to fix the memory fragmentation problem. Issue is they have not fixed it yet. Some of the work around to locked memory to allow fragmentation is throwing it out to swap. In fact documented that we are losing IO performance to fragmentation a lot. There is a problem when you hit low memory the worse you IO DMA buffers are performing the worse it going to hurt. I was having a problem finding the write up that was exactly it.

Unless you can find me some kernel change between start of 2018 and now the fragmentation issue exists and disabling swap disables some of the system that in fact deal with it.

I pointed to that android one because if the automatic memory fix system could full working the incorrect memory pattern should have only caused a temporary problem not a increasing getting worse performance loss.

**RomuloP** · 24 August 2019, 11:46 PM

Originally posted by oiaohm View Post

Except those reclaims did not in fact compact all memory structures.

https://www.kernel.org/doc/Documentation/sysctl/vm.txt

compact_unevictable_allowed there are reason why these options are still here. https://archive.org/details/lca2018-...uge_Page_overh This covered from 2018 that the two write up you pointed are not the full story and was just the start of the idea to fix it. Over 10 years developers started attempting to fix the memory fragmentation problem. Issue is they have not fixed it yet. Some of the work around to locked memory to allow fragmentation is throwing it out to swap. In fact documented that we are losing IO performance to fragmentation a lot. There is a problem when you hit low memory the worse you IO DMA buffers are performing the worse it going to hurt. I was having a problem finding the write up that was exactly it.

Unless you can find me some kernel change between start of 2018 and now the fragmentation issue exists and disabling swap disables some of the system that in fact deal with it.

I pointed to that android one because if the automatic memory fix system could full working the incorrect memory pattern should have only caused a temporary problem not a increasing getting worse performance loss.

I never said they would result in the nirvana of memory contiguity, the point is:

1. swap is an horrible duct tape for memory fragmentation, new techniques can defrag even mlocked pages and swap is vulnerable to garbage collectors. Cache locality will probably never be fixed with garbage collected languages, and also vulnerable to a bunch of caching patterns on desktop applications that maintain pages alive.

2. Swap will not solve the fragmentation problem that still exist, you can even look at/proc/buddyinfo, with swap or not, it will still show you fragmentation the same way, all swapable pages are movable and compactable but those techniques can even touch mlocked pages, very common in databases. We will only have almost 100% contiguous pages the day every single page is movable, guess what, the only ones left are kernel space pages, almost all structures that are not swapable, anyway, they are unmovable this means they cannot even be rearranged during hibernation as kernel is sensitive to those pages position in physic memory.

And a last addendum, THP will always be a half baked solution on desktop, HP where never planed to be defragged, most applications that make use of THP have horrible performance quirks with memory compression enabled to defrag them, and imediate defragmentation will always result in a lot of latency overhead, it is just simply much more sane to pre-alocate HP (that are unmovable and unswappable) on boot, generally maximum fragments will be countable in hand, and neither swap neither any other technique will improve it nowadays.

**uc_sam** · 25 August 2019, 01:12 AM

Originally posted by Solid State Brain View Post

Increasing vm.min_free_kbytes helps but does not solve it completely.

Didn't help me one bit

Any other ideas?

**geearf** · 25 August 2019, 04:02 AM

Originally posted by uc_sam View Post

Didn't help me one bit

Any other ideas?

I don't think this sysctl would solve the oom issue, but it could make it worse by reducing the free RAM even more.

**SystemCrasher** · 25 August 2019, 11:36 AM

Speaking for myself, I more or less deal with latency the following way:
1) No "real" swap on HDD. That one is nearly worst latency offender ever.
2) Zram swap instead, using LZ4 for speed. So "cold" pages can be stored in compressed form, yet latency fetching of these from RAM and decompressing that is nowhere like getting page out of mechanical HDDs, nor it causes SSD wearout.

Interestingly, such setup does well even in really low-memory systems, giving some extra margin where it otherwise would be out of memory, without thrashing HDD for minutes on "real" memory pressure. Though Linux still somewhat imperfect in this regard - it seems to lack control to disable using program files as extension of swap. So "some" HDD thrashing can still occur as kernel repeatedly discards pages, knowing it can re-read them from program's file and then fetches them from slow storage. One can complletely disable swapping and get rid of that problem. But ot would also kill any options to offload/compress "cold" unusued pages anywhere at all. Still it would be nice if Linux devs would put separate control to disable discarding pages of processes and re-reading these from program files. This idea is bad for overall system latency.

Also, unrelated, but using "realtime" (-rt, "preemptible") kernel on desktop is really good idea. It can cost like few % of bulk performance, but overall user experience would be way more pleasant. Responce time matters on desktop.

**oiaohm** · 25 August 2019, 11:59 AM

Originally posted by RomuloP View Post

1. swap is an horrible duct tape for memory fragmentation, new techniques can defrag even mlocked pages and swap is vulnerable to garbage collectors. Cache locality will probably never be fixed with garbage collected languages, and also vulnerable to a bunch of caching patterns on desktop applications that maintain pages alive.

I agree the use of swap to perform defragmentation on memory in not exactly the casee but it is the duct tape the Linux kernel currently uses when ever it has a new memory defrag method is created to protect against crash if new method has error.

Originally posted by RomuloP View Post

2. Swap will not solve the fragmentation problem that still exist, you can even look at/proc/buddyinfo, with swap or not, it will still show you fragmentation the same way, all swapable pages are movable and compactable but those techniques can even touch mlocked pages, very common in databases. We will only have almost 100% contiguous pages the day every single page is movable, guess what, the only ones left are kernel space pages, almost all structures that are not swapable, anyway, they are unmovable this means they cannot even be rearranged during hibernation as kernel is sensitive to those pages position in physic memory.

Making kernel pages movable [LWN.net]

https://lwn.net/Articles/650917/

I did not say swap solved the problem. Currently with swap enabled more memory fragmentation methods will be enabled. Sorry the almost all structure that are not swapable since 2015 has come very wrong the isolatepage process with a new memory defrag method will throw the page to swap yes this would be a historic kernel structures that have been unmovable that incorrectly placed damage you DMA/IO and THP performance. Why are the new kernel page defragment methods throwing page to swap it so if they have not properly isolated the page and the page has been pulled back from swap during operating (because something used it while it should have been locked out from use) there is a flaw in method and that flaw does not result in kernel panic or any system miss behaviour other than some extra cpu/swap usage.

Yes swap usage in this case is pure duct tape fix when it comes to kernel memory fragmentation and other histrionically locked memory. Its like your car bumper is lose you have a roll of duct tape and you are the moron who is saying drive on without it because duct tape is not a full fix so you screw up the bumper. Really I would have preferred if the developers of Linux kernel structure methods to allow more and more to be movable had not chosen to use swap as duct tape. Problem is they have and we have to live with that reality.

Something important to remember https://lwn.net/Articles/569635/ 2013 was also when kernel address space randomisation was introduced.

So yes when hibernating when all the kernel defragment methods are complete it should be possible to change every single kernel space structure address. 2013 we started down this route. 2015 did more. The 2018 video I point to is the update. So what you class as unmovable and a performance loss you have to live with is not in fact the case. LG developers if you look at there code they did in the new prototype hibernation you will find they did move address around with the deduplication without preventing the kernel from running but kernel could move the structures back to sane locations so this had now instant killed performance. So yes you should be able to fully defragment memory on hibernate. Current in kernel method of hibernation cannot defragment memory. In fact I would class what the LG guys did on their hibernation by mistake as a good test case if we do this to Linux kernel memory does the system recover with its built in defragmentation or does it keep on death spiralling on performance. Up until now there has really not been a method to make horrible memory fragmented Linux on demand to have repeatability on memory fragmentation caused performance issues.

Yes Address randomisation of kernel space means that you have a random roll of dice every time you system starts up on fragmentation. Part of the reason why some suffer from swap killing performance and others don't is random bad luck due to kernel space address randomisation of physical memory allocations. Yes if you roll a kernel space address randomisation that sees your kernel structures that are hard to move placed where they will cause slower DMA/IO when you start using swap the performance of swap will also be hindered.

Interesting question that cross my mind I know that different AMD cpus had issues starting up randomisation correctly those suffering from low memory performance hell a lot do you happen to be using same classes of CPU with some kind of stuffed up randomisation? Swap formal is doing some calculations base on IO performance as well and if it taken the IO numbers before IO performance collapses this going to be out.

Yes your workflow pattern also causes the Linux kernel to create structures at particular time if they are not moveable yet or are not moveable because you have disabled swap you can be killing your IO performance.

**RomuloP** · 26 August 2019, 01:44 AM

Originally posted by oiaohm View Post

I agree the use of swap to perform defragmentation on memory in not exactly the casee but it is the duct tape the Linux kernel currently uses when ever it has a new memory defrag method is created to protect against crash if new method has error.

My point is not that it is simply “ugly”, it mostly result in placebo if not the opposite in desktop. Argument for a fraction of performance gain sacrificing orders of magnitude of performance in another side with re-faults is going nowhere unless the setup for sure will not suffer from bad re-fault and swap is simply bad in guaranteeing this, it is the reason for swapness being not 100% fair in desktops. And this is why over-commit is a thing with much more relevant performance benefits despite its problems, because program knows better what to reclaim. Sure, it is great until it break in your face but this should not be like this, is just that until today nobody was seriously dealing with ulimits, cgroups, smart resource pressure metric tools, and a bunch of much less expensive solutions than playing dices on how much an anonymous page will be left alone.

Even in servers, swap has a bad reputation, deserved, but lets see, that is becoming more reasonable for databases thanks to solid state memories becoming faster and faster, but still it is a ongoing research and the effect on diverse workloads is not clear:

Reconsidering swapping [LWN.net]

https://lwn.net/Articles/690079/

Originally posted by oiaohm View Post

Making kernel pages movable [LWN.net]

https://lwn.net/Articles/650917/

I did not say swap solved the problem. Currently with swap enabled more memory fragmentation methods will be enabled. Sorry the almost all structure that are not swapable since 2015 has come very wrong the isolatepage process with a new memory defrag method will throw the page to swap yes this would be a historic kernel structures that have been unmovable that incorrectly placed damage you DMA/IO and THP performance. Why are the new kernel page defragment methods throwing page to swap it so if they have not properly isolated the page and the page has been pulled back from swap during operating (because something used it while it should have been locked out from use) there is a flaw in method and that flaw does not result in kernel panic or any system miss behaviour other than some extra cpu/swap usage.

Yes swap usage in this case is pure duct tape fix when it comes to kernel memory fragmentation and other histrionically locked memory. Its like your car bumper is lose you have a roll of duct tape and you are the moron who is saying drive on without it because duct tape is not a full fix so you screw up the bumper. Really I would have preferred if the developers of Linux kernel structure methods to allow more and more to be movable had not chosen to use swap as duct tape. Problem is they have and we have to live with that reality.

Thanks, I was wrong about kernel pages not being movable, but we do not need swap in the process.

Apart from lumpy reclaim (apparently removed by Mel Gorman), all other mechanisms do not need swap, as stated it is just a matter of changing the appropriate page-table entries and adequately update pointers in kernel space, as it only move, compact or group pages accordingly to its label, having swap or not, /proc/sys/vm/compact_memory works the same.

Anyway, we could argument anonymous pages would not be reclaimable but it is very questionable as pointed by Mel’s tests. Maybe it can benefit THP, I still believe it has a use in servers and very straight scenarios, but focusing on a workload and trowing a lot of many common cases on garbage is simply wrong and desktop is a very heterogeneous place to sell “swap as always a good thing”.

Originally posted by oiaohm View Post

Something important to remember https://lwn.net/Articles/569635/ 2013 was also when kernel address space randomisation was introduced.

So yes when hibernating when all the kernel defragment methods are complete it should be possible to change every single kernel space structure address. 2013 we started down this route. 2015 did more. The 2018 video I point to is the update. So what you class as unmovable and a performance loss you have to live with is not in fact the case. LG developers if you look at there code they did in the new prototype hibernation you will find they did move address around with the deduplication without preventing the kernel from running but kernel could move the structures back to sane locations so this had now instant killed performance. So yes you should be able to fully defragment memory on hibernate. Current in kernel method of hibernation cannot defragment memory. In fact I would class what the LG guys did on their hibernation by mistake as a good test case if we do this to Linux kernel memory does the system recover with its built in defragmentation or does it keep on death spiralling on performance. Up until now there has really not been a method to make horrible memory fragmented Linux on demand to have repeatability on memory fragmentation caused performance issues.

Sure, if it becomes a reality and if defraging entire RAM disk image before hibernation do not turn into a “wait a minute for a long sorting/compression of pages” it will be more interesting than reboot and wait only ~3 secconds for graphic.target. What they are talking about is a much easier task than make virtual memory space flexible during hibernation and resume, there is a very small margin for tradings with overheads.

Randomization is simply needed and is just one of a lot of things that make swap a dice game.

Originally posted by oiaohm View Post

Yes Address randomisation of kernel space means that you have a random roll of dice every time you system starts up on fragmentation. Part of the reason why some suffer from swap killing performance and others don't is random bad luck due to kernel space address randomisation of physical memory allocations. Yes if you roll a kernel space address randomisation that sees your kernel structures that are hard to move placed where they will cause slower DMA/IO when you start using swap the performance of swap will also be hindered.

Interesting question that cross my mind I know that different AMD cpus had issues starting up randomisation correctly those suffering from low memory performance hell a lot do you happen to be using same classes of CPU with some kind of stuffed up randomisation? Swap formal is doing some calculations base on IO performance as well and if it taken the IO numbers before IO performance collapses this going to be out.

Yes your workflow pattern also causes the Linux kernel to create structures at particular time if they are not moveable yet or are not moveable because you have disabled swap you can be killing your IO performance.

Movable has nothing to do with reclamaible, and it is not clear how many subsystems of kernel space where adapted to migrate pages with this feature, I expect the majority of them. Since 2007 we have reclaimable kernel pages even if not movable, and it is somewhat common for databases to mlock user-space pages, being still movable.

All the problem revolve around the fact kernel will never be smarter enough about the app needs, it cannot see the future. Some databases having fenomenal performance with HP/THP resolve around HP being unmovable and un-reclaimable, or in case of THP, aware of the defrag nature (so it can ask for defragmentation in defer mode) or having the entire system for itself as in a HPC. Most precisely arround very hand tuned setups to guarante contiguous spaces and small fragmentation in a predictable very restricted scenario.

Linux being very tunable for your needs is a complete win in case to case, throwing all tenable available on desktop as a whole is just wrong, swap as default is correct, removing it because you have a huge amount of ram and it is only faulting anonymous pages to produce quirks, you do not care about hibernation (rebooting is as fast as it), and do not play with THP is completely fine.

Announcement

New Low-Memory-Monitor Project Can Help With Linux's RAM/Responsiveness Problem

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment