Originally posted by tomas
View Post
Announcement
Collapse
No announcement yet.
Clear Linux Set To Begin Offering EarlyOOM For Better Dealing With Memory Pressure
Collapse
X
-
Last edited by duby229; 09 January 2020, 03:45 AM.
-
Originally posted by timrichardson View PostThe facebook solution is dependent on a yet-to-be-found default configuration that works well across all kinds of different situations. This is the hard part, of course. Killing stuff is easy, killing stuff intelligently is hard. Otherwise this debate would not exist (all the people who think this problem is due to negligence or arrogance of the kernel developers just look stupid in my eyes).
earlyoomd stops the deadlocking. a new project, nohang, can be tweaked to use memory pressure stats (Facebook work which is in recent kernels) and it can use zram stats too, both of which can provide more sophisticated warning of pending memory problems. It's important to remember that if your system becomes unusable because you ran out of ram, you already know what the fundamental problem is. You need more ram (or better code). The responsibility of the OS/user space is not to fix this, it is to fail gracefully. earlyoom does this, for a rough approximation of 'gracefully'. nohang is a bit more elegant (you can get desktop notifications out of the box, both for pending problems, and for what was killed)
the other part of the story is that there is something fundamentally wrong with how Linux pages to swap and that is the underlying cause of the hang and freeze before the kernel oom kicks in. earlyoom and oomd seem to work before that paging flaw, whatever it might be, appears to lock up the system. while they may well be more elegant solution for oom than the kernel oom, the way Linux pages to swap is still fundamentally broken somehow.Last edited by duby229; 09 January 2020, 03:48 AM.
Comment
-
Originally posted by duby229 View Postthe other part of the story is that there is something fundamentally wrong with how Linux pages to swap and that is the underlying cause of the hang and freeze before the kernel oom kicks in.
Then others have said that programs are getting dropped from caches and as they're still running, they get read back into memory and dropped again(there's a project on github just for handling this iirc that suspends processes briefly to prevent this type of thrashing activity so the kernel can actually avoid dealing with it and do it's job), and something about context switches which in this situation is pretty bad and degrades performance further?
So, just sounds like the issue is a lack of resources to work with and a lot of contention stressing/confusing the kernel that cause the snail pace slowdown and this supposedly is when oom is triggered but it's too late to respond swiftly, thus doing it in advance resolves it?(simple fix)
I don't know kernel development, but I would have thought that Facebook would, so if they decide to go with a userspace solution instead of contributing one to kernel to fix it there like you suggest, they'd have done so? Do you know for sure other OS aren't utilizing userspace at all to deal with the same situation?
Comment
-
Originally posted by grigi View PostI say perceived because Linux manages memory much better than other desktop OS'es.
Especially once enabling transparent memory compression that can give one a 10-20% more effective ram which really makes a difference on a lowly 4G system.
Comment
-
Originally posted by polarathene View Post
Isn't it just oom kicking in too late? swap is apparently needed for kernel buffers and without swap in the situation performs worse, I don't know how much that buffers stuff is, so not necessarily a large amount of swap(although I'd also have thought if that's the case why not just have some memory reserved for it?).
Then others have said that programs are getting dropped from caches and as they're still running, they get read back into memory and dropped again(there's a project on github just for handling this iirc that suspends processes briefly to prevent this type of thrashing activity so the kernel can actually avoid dealing with it and do it's job), and something about context switches which in this situation is pretty bad and degrades performance further?
So, just sounds like the issue is a lack of resources to work with and a lot of contention stressing/confusing the kernel that cause the snail pace slowdown and this supposedly is when oom is triggered but it's too late to respond swiftly, thus doing it in advance resolves it?(simple fix)
I don't know kernel development, but I would have thought that Facebook would, so if they decide to go with a userspace solution instead of contributing one to kernel to fix it there like you suggest, they'd have done so? Do you know for sure other OS aren't utilizing userspace at all to deal with the same situation?
I was just making the point that the underlying problem for why the system locks up and freezes before the kernel oom kicks in is still unresolved. there's still something fundamentally broken about how Linux pages to swap. even if the kernel oom were disabled and earlyoom was used exclusively there's still something fundamentally broken somewhere.
- Likes 1
Comment
Comment