Announcement

**waxhead** · 26 February 2017, 05:32 AM

While this in principle sounds like a good idea I ponder if it would not have been better to do this deduplication as part of zswap or zram (no they are not the same). As I see this it would not be any point in scanning for duplicates unless the kernel finds it interesting to swap out data. zswap needs to process the memory anyway so it should know the checksum for a block. If it match it can compare, deduplicate and optionally compress as well.

**starshipeleven** · 26 February 2017, 10:13 AM

Originally posted by boxie View Post

For shared systems I would agree - but if you control the host and the stuff running on it, why not? might be a good way to reduce memory footprint

"you control the host and the stuff running on it" lol. Yeah right. Because you can stop malware by just looking at your PC intensely to remind it who's boss.

**notanoob** · 26 February 2017, 03:15 PM

Originally posted by starshipeleven View Post

"you control the host and the stuff running on it" lol. Yeah right. Because you can stop malware by just looking at your PC intensely to remind it who's boss.

Even if you had the knowledge of what every little thing should be doing in linux
Shared memory is still a problem because of trust. Lets say you nitpicked through every application on your system and found it memory safe except one app, the web browser. That webbrowser could wreck havok with memory side channel attacks on the vannilla kernel. So do you trust all the people who ever worked on that project? Even if you did they are human so they make mistakes. Do you trust every website you connect to not to use those mistakes to say have persistent ad tracking for money? I for one do not.

**kokoko3k** · 26 February 2017, 05:41 PM

Originally posted by fguerraz View Post

And it also opens an avenue for all sorts of side-channel attacks, memory deduplication is a perfect example of a false good idea.

KSM - KVM

http://www.linux-kvm.org/page/KSM

**boxie** · 27 February 2017, 03:25 AM

Originally posted by starshipeleven View Post

"you control the host and the stuff running on it" lol. Yeah right. Because you can stop malware by just looking at your PC intensely to remind it who's boss.

Sure, I can imagine plenty of cases where lots of stuff might be duplicated in memory and that you do know there is no (ok, maybe a remote) chance of something nasty. Not every server is connected directly to the Internet!

**sarfarazahmad** · 27 February 2017, 03:41 AM

is the CPU used while frequently running through the memory to find duplicates low enough to make this worthwhile ? For virtual machines we already have KSM . For what kind of workloads can this be useful ?

**starshipeleven** · 27 February 2017, 06:54 AM

Originally posted by boxie View Post

Sure, I can imagine plenty of cases where lots of stuff might be duplicated in memory and that you do know there is no (ok, maybe a remote) chance of something nasty. Not every server is connected directly to the Internet!

Apart from some HPC, or maybe some industrial applications that will work disconnected from the internet, I don't see that large userbase.

KVM (a hypervisor) implements memory deduplication already since a long time (2009 according to google) and there it's kinda ok as it's done at a different level. And there it also makes the most sense, too. If you fire up a dozen of cloned VMs you can save a ton of RAM if you just dedup it.

**starshipeleven** · 27 February 2017, 07:02 AM

Originally posted by sarfarazahmad View Post

is the CPU used while frequently running through the memory to find duplicates low enough to make this worthwhile ? For virtual machines we already have KSM . For what kind of workloads can this be useful ?

I had an Android phone with a custom kernel patched with KSM (and I could set options to control that), the load was visible with cpu activity monitors but not noticeable in actual use on its dualcore 1 Ghz ARM processor, there were some RAM savings but not that much (around a 10% reduction in the best cases).
The big issue was that it was pulling the CPU out of low-power modes (not sleep modes, when it was sleeping KSM was also suspended) to do the scan so the battery lasted for less time. Not by a lot, by a few hours over a 2-day charge (normal usage, so most of these 2 days it was sleeping).

**boxie** · 27 February 2017, 08:12 AM

Originally posted by starshipeleven View Post

Apart from some HPC, or maybe some industrial applications that will work disconnected from the internet, I don't see that large userbase.

KVM (a hypervisor) implements memory deduplication already since a long time (2009 according to google) and there it's kinda ok as it's done at a different level. And there it also makes the most sense, too. If you fire up a dozen of cloned VMs you can save a ton of RAM if you just dedup it.

How about databases. They shouldn't be connected directly to the Internet and can really use from dedupe. heck not every database backs a website either.

Not saying there are potential problems with it, side channel attacks are definitely problematic. It does seem like an interesting option - especially if it can be enabled per container - then we can use flatpack/snap and dedupe memory just inside certain memory spaces.

**starshipeleven** · 27 February 2017, 02:24 PM

Originally posted by boxie View Post

How about databases. They shouldn't be connected directly to the Internet and can really use from dedupe. heck not every database backs a website either.

I don't think databases have large usage of duplicated RAM that could benefit from this. (but I can be wrong)

And a general rule of thumb in the trade here is DO NOT fuck with databases in any way. If your idea increases performance or data safety they are usually doing it already.

Announcement

UKSM Is Still Around For Data Deduplication Of The Linux Kernel

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment